auxmodels

Supplemental models that build on scalecast’s functionality.

vecm

This is a vector error correction model adapted from statsmodels. Since it has a similar sklearn API, it can be imported into a multivariate forecasting application using MVForecaster.add_sklearn_estimator().

This framework also offers a basis for adding other non-scikit-learn forecast models to the scalecast interface. The lags argument must always be 0 or None in the manual_forecast() function, but lags for the model can be specified through the k_ar_diff argument in the vecm model.

class src.scalecast.auxmodels.vecm(k_ar_diff=1, coint_rank=1, deterministic='n', seasons=0, first_season=0, freq=None)

__init__(k_ar_diff=1, coint_rank=1, deterministic='n', seasons=0, first_season=0, freq=None)

Initializes a Vector Error Correction Model. Uses the statsmodels implementation: https://www.statsmodels.org/dev/generated/statsmodels.tsa.vector_ar.vecm.VECM.html. See it used with scalecast: https://scalecast-examples.readthedocs.io/en/latest/vecm/vecm.html. It only works when the lags argument is set to 0 or None and the normalizer argument is set to None.

Parameters:

k_ar_diff (int) – The number of lags from each series to use in the model.
coint_rank (int) – Cointegration rank.
deterministic (str) – One of {“n”, “co”, “ci”, “lo”, “li”}. Default “n”. “n” - no deterministic terms. “co” - constant outside the cointegration relation. “ci” - constant within the cointegration relation. “lo” - linear trend outside the cointegration relation. “li” - linear trend within the cointegration relation. Combinations of these are possible (e.g. “cili” or “colo” for linear trend with intercept). When using a constant term you have to choose whether you want to restrict it to the cointegration relation (i.e. “ci”) or leave it unrestricted (i.e. “co”). Do not use both “ci” and “co”. The same applies for “li” and “lo” when using a linear term.
seasons (int) – Default 0. Number of periods in a seasonal cycle. 0 means no seasons.
first_season (int) – Default 0. Season of the first observation.
freq (str) – Optional. The frequency of the time-series. A pandas offset or ‘B’, ‘D’, ‘W’, ‘M’, ‘A’, or ‘Q’.

from scalecast.Forecaster import Forecaster
from scalecast.MVForecaster import MVForecaster
from scalecast.auxmodels import vecm
import pandas_datareader as pdr
import matplotlib.pyplot as plt

df = pdr.get_data_fred(
  [
    'APU000074714', # (monthly) retail gas prices
    'WTISPLC',      # (monthly) crude oil prices
  ],
  start = '1975-01-01',
  end = '2022-08-01',
)

rgp = Forecaster(
  y = df['APU000074714'],
  current_dates = df.index,
  future_dates = 12,
)
cop = Forecaster(
  y = df['WTISPLC'],
  current_dates = df.index,
  future_dates = 12,
)

mvf = MVForecaster(rgp,cop,names=['retail gas prices','crude oil prices'])
mvf.set_test_length(12)

mvf.add_sklearn_estimator(vecm,called='vecm')

vecm_grid = {
  'lags':[0],  # lags will be specified from statsmodels function, so this needs to be None or 0
  'normalizer':[None], # data will not be scaled -- use SeriesTransformer for scaling if desired
  'k_ar_diff':range(1,13), # try 1-12 lags
  'deterministic':["n","co","lo","li","cili","colo"], # deterministic part
  'seasons':[0,12], # seasonal part
}

mvf.set_estimator('vecm')
mvf.ingest_grid(vecm_grid)
mvf.cross_validate()
mvf.auto_forecast()

# access results
mvf.export('lvl_fcsts')
mvf.export('model_summaries')

# plot
mvf.plot()
plt.show()

auto_arima()

src.scalecast.auxmodels.auto_arima(f, call_me='auto_arima', Xvars=None, train_only=False, **kwargs)

Adds a forecast to a Forecaster object using the auto_arima function from pmdarima. This function attempts to find the optimal arima order by minimizing information criteria.

Parameters:

f (Forecaster) – The object to add the forecast to.
call_me (str) – Default ‘auto_arima’. The name of the resulting model.
Xvars (str or list-like) – Optional. Xvars to add to the model.
train_only (bool) – Default False. Whether to minimize the IC over the training set only.
**kwargs – Passed to the auto_arima function from pmdarima. See https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.auto_arima.html.

Returns:

None

>>> from scalecast.util import pdr_load
>>> from scalecast.auxmodels import auto_arima
>>> f = pdr_load('HOUSTNSA',start='1900-01-01',end='2021-06-01',future_dates=24)
>>> auto_arima(f,m=12) # saves a model called auto_arima
>>> print(f.auto_arima_params) # access the selected orders

mlp_stack()

src.scalecast.auxmodels.mlp_stack(f, model_nicknames, max_samples=0.9, max_features=0.5, n_estimators=10, hidden_layer_sizes=(100, 100, 100), solver='lbfgs', passthrough=False, call_me='mlp_stack', **kwargs)

Applies a stacking model using a bagged MLP regressor as the final estimator and adds it to a Forecaster or MVForecaster object. See what it does: https://scalecast-examples.readthedocs.io/en/latest/sklearn/sklearn.html#StackingRegressor. Recommended to use at least four models in the stack.

Parameters:

f (Forecaster or MVForecaster) – The object to add the model to.
model_nicknames (list-like) – The names of models previously evaluated within the object.
max_samples (float or int) – Default 0.9. The number of samples to draw with replacement from training set to train each base estimator. If int, then draw max_samples samples. If float, then draw that percentage of samples.
max_features (float or int) – Default 0.5 The number of features to draw from training set to train each base estimator. If int, then draw max_features features. If float, then draw that percentage of features.
n_estimators (int) – Default 10. The number of base estimators in the ensemble.
hidden_layer_sizes (tuple) – Default (100,100,100). The layer/hidden layer sizes for the bagged mlp regressor that is the final estimator in the stacked model.
solver (str) – Default ‘lbfgs’. The mlp solver.
call_me (str) – Default ‘mlp_stack’. The name of the resulting model.
**kwargs – Passed to the manual_forecast() or proba_forecast() method.

>>> from scalecast.auxmodels import mlp_stack
>>> from scalecast import GridGenerator
>>> GridGenerator.get_example_grids()
>>> models = ('xgboost','lightgbm','knn','elasticnet')
>>> f.auto_Xvar_select()
>>> f.tune_test_forecast(models,cross_validate=True)
>>> mlp_stack(f,model_nicknames=models) # saves a model called mlp_stack
>>> f.export('model_summaries',models='mlp_stack')