Forecasting Different Model Types

Any time you set an estimator, different arguments become available to you when calling manual_forecast or tuning the model. This page lists all model types native to scalecast. See also the auxmodels module.

arima

combo

Forecaster._forecast_combo(how='simple', models='all', dynamic_testing=True, determine_best_by='ValidationMetricValue', rebalance_weights=0.1, weights=None, splice_points=None)

Combines at least previously evaluted forecasts to create a new model. One-model combinations are supported to facilitate auto-selecting models. This model is always applied to previously evaluated models’ test sets and cannot be tuned. It will fail if models in the combination used different test lengths. See the following explanation for the weighted-average model: The weighted model in scalecast uses a weighted average of all selected models, applying the same weights to the fitted values, test-set metrics, and predictions. A user can supply their own weights or let the algorithm determine optimal weights based on a passed error metric (such as “TestSetMAPE”). To avoid leakage, it is recommended to use the default value, “ValidationMetricValue” to determine weights, although this is not possible if the selected models have not all been tuned on the validation set. The weighting uses a MaxMin scaler when an error metric is passed, and a MinMax scaler when r-squared is selected as the metric to base weights on. When this scaler is applied, the resulting values are then rebalanced to add to 1. Since the worst-performing model in this case will always be weighted zero, the user can select a factor to add to all scaled values before the rebalancing is applied; by default, this is 0.1. The higher this factor is, the closer the weighted average will be to a simple average and vice-versa. See the example: https://scalecast-examples.readthedocs.io/en/latest/combo/combo.html.

Parameters:

how (str) – One of {‘simple’,’weighted’,’splice’}. Default ‘simple’. The type of combination. If ‘simple’, uses a simple average. If ‘weighted’, uses a weighted average. If ‘splice’, splices several forecasts together at specified splice points.
models (list-like or str) – Default ‘all’. Which models to combine. Can start with top (‘top_5’).
dynamic_testing (bool) – Default True. Always set to True for combo.
determine_best_by (str) – One of Forecaster.determine_best_by, default ‘ValidationMetricValue’. If models does not start with ‘top_’ and how is not ‘weighted’, this is ignored. If how is ‘weighted’ and manual weights are specified, this is ignored.
rebalance_weights (float) – Default 0.1. How to rebalance the weights when how = ‘weighted’. The higher, the closer the weights will be to each other for each model. If 0, the worst-performing model will be weighted with 0. Must be greater than or equal to 0.
weights (list-like) – Optional. Only applicable when how=’weighted’. Manually specifies weights. Must be the same size as models. If None and how=’weighted’, weights are set automatically. If manually passed weights do not add to 1, will rebalance them.
splice_points (list-like) – Optional. Only applicable when how=’splice’. Elements in array must be parsable by pandas’ Timestamp function. Must be exactly one less in length than the number of models. models[0] –> :splice_points[0] models[-1] –> splice_points[-1]:

>>> f.set_estimator('combo')
>>> f.manual_forecast() # above args are now available in this function

hwes

Forecaster._forecast_hwes(dynamic_testing=True, **kwargs)

Forecasts with Holt-Winters exponential smoothing. See the example: https://scalecast-examples.readthedocs.io/en/latest/hwes/hwes.html.

Parameters:

dynamic_testing (bool) – Default True. Always set to True for HWES like all scalecast models that don’t use lags.
**kwargs – Passed to the HWES() function from statsmodels. endog passed automatically. See https://www.statsmodels.org/dev/generated/statsmodels.tsa.holtwinters.ExponentialSmoothing.html.

>>> f.set_estimator('hwes')
>>> f.manual_forecast() # above args are now available in this function

lstm

Forecaster._forecast_lstm(dynamic_testing=True, lags=1, lstm_layer_sizes=(8,), dropout=(0.0,), loss='mean_absolute_error', activation='tanh', optimizer='Adam', learning_rate=0.001, random_seed=None, plot_loss=False, **kwargs)

Forecasts with a long-short term memory neural network from TensorFlow. Only regressor options are the series’ own history (specified in the lags argument). Always uses a minmax scaler on the inputs and outputs. The resulting point forecasts are unscaled. The model is saved in the tf_model attribute and a summary can be called by calling Forecaster.tf_model.summary(). See the example: https://scalecast-examples.readthedocs.io/en/latest/lstm/lstm.html. See the rnn model: https://scalecast.readthedocs.io/en/latest/Forecaster/_forecast.html#rnn.

Parameters:

dynamic_testing (bool) – Default True. Always True for lstm. The model uses a direct forecast.
lags (int) – Must be greater than 0. Default 1. The number of lags to train the model with. However many lags are placed here will also be added to the Forecaster object as AR Xvars.
lstm_layer_sizes (list-like) – Default (8,). The size of each lstm layer to add. The first element is for the input layer. The size of this array minus 1 will equal the number of hidden layers in the resulting model.
dropout (list-like) – Default (0.0,). The dropout rate for each lstm layer. Must be the same size as lstm_layer_sizes.
loss (str) – Default ‘mean_absolute_error’. The loss function to minimize while traning the model. See available options here: https://www.tensorflow.org/api_docs/python/tf/keras/losses. Be sure to choose one that is suitable for regression tasks.
activation (str) – Default “tanh”. The activation function to use in each LSTM layer. See available values here: https://www.tensorflow.org/api_docs/python/tf/keras/activations.
optimizer (str) – default “Adam”. The optimizer to use when compiling the model. See available values here: https://www.tensorflow.org/api_docs/python/tf/keras/optimizers.
learning_rate (float) – Default 0.001. The learning rate to use when compiling the model.
random_seed (int) – Optional. Set a seed for consistent results. With tensorflow networks, setting seeds does not guarantee consistent results.
plot_loss (bool) – Default False. Whether to plot the LSTM loss function stored in history for each epoch. If validation_split is passed to kwargs, it will plot the validation loss as well. The resulting plot looks better if epochs > 1 passed to **kwargs.
**kwargs – Passed to fit() and can include epochs, verbose, callbacks, validation_split, and more.

>>> f.set_estimator('lstm')
>>> f.manual_forecast() # above args are now available in this function
>>> f.tf_model.summary() # view a summary of the model's parameters

multivariate

Although only scikit-learn estimators and the vecm model can be used with the MVForecaster object, it is possible to make any estimator supported by scalecast that accepts external regressors multivariate. See, for example, LSTM multivariate modeling.

MVForecaster._forecast_sklearn(fcster, dynamic_testing=True, Xvars='all', normalizer='minmax', lags=1, **kwargs)

Runs the vector multivariate forecast start-to-finish. All Xvars stored in the object are used always. All sklearn estimators supported. See example1: https://scalecast-examples.readthedocs.io/en/latest/multivariate/multivariate.html and example2: https://scalecast-examples.readthedocs.io/en/latest/multivariate-beyond/mv.html.

Parameters:

fcster (str) – One of MVForecaster.estimators. Scikit-learn estimators or APIs only. Reads the estimator set to set_estimator() method.
Xvars (str or list-like) – Default ‘all’. The exogenous/seasonal variables to use when forecasting. If None is passed, no Xvars will be used.
dynamic_testing (bool or int) – Default True. Whether to dynamically/recursively test the forecast (meaning AR terms will be propagated with predicted values). If True, evaluates dynamically over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out. The model will skip testing if the test_length attribute is set to 0.
normalizer (str) – Default ‘minmax’. The scaling technique to apply to the input data and lags. One of MVForecaster.normalizer.
lags (int | list[int] | dict[str,(int | list[int])]) – Default 1. The lags to add from each series to forecast with. Needs to use at least one lag for any sklearn model. Some models in the scalecast.auxmodels module require you to pass None or 0 to lags. If int, that many lags will be added for all series. If list, each element must be int types, and only those lags will be added for each series. If dict, the key must be a series name and the key is a list or int.
**kwargs – Treated as model hyperparameters and passed to the applicable sklearn or other type of estimator.

>>> mvf.set_estimator('gbt')
>>> mvf.manual_forecast(lags=3) # adds three lags for each series
>>> mvf.manual_forecast(lags=[1,3]) # first and third lags added for each series
>>> mvf.manual_forecast(lags={'y1':2,'y2':3}) # 2 lags added for first series, 3 lags for second
>>> mvf.manual_forecast(lags={'series1':[1,3],'series2':3}) # first and third lag for first series, 3 lags for second

>>> mvf.set_estimator('xgboost')
>>> mvf.manual_forecast()

naive

Forecaster._forecast_naive(seasonal=False, m='auto', **kwargs)

Forecasts with a naive estimator, meaning the last observed value is propagated forward for non-seasonal models or the last m-periods are propagated forward where m is the length of the seasonal cycle.

Parameters:

seasonal (bool) – Default False. Whether to use a seasonal naive model.
m (int or str) – Default ‘auto’. The number of observations that counts one seasonal step. Ignored when seasonal_lags = 0. When ‘auto’, uses the M4 competition values: for Hourly: 24, Monthly: 12, Quarterly: 4. Everything else gets inferred if possible.
**kwargs – Not used but added to the model so it doesn’t fail.

>>> f.set_estimator('naive')
>>> f.manual_forecast()
>>> f.manual_forecast(seasonal=True)

prophet

Forecaster._forecast_prophet(Xvars=None, dynamic_testing=True, cap=None, floor=None, callback_func=None, **kwargs)

Forecasts with the Prophet model from the prophet library. See example: https://scalecast-examples.readthedocs.io/en/latest/prophet/prophet.html.

Parameters:

Xvars (list-like, str, or None) – Default None. The regressors to predict with. None means no Xvars used (unlike sklearn models). AR terms can be accepted in this function if they are further lagged than the forecast horizon (for example, if predicting 2 periods into the future, lags 2 and greater can be used).
dynamic_testing (bool) – Default True. Always set to True for Prophet like all scalecast models that don’t use lags.
cap (float) – Optional. Specific to Prophet when using logistic growth – the largest amount the model is allowed to evaluate to.
floor (float) – Optional. Specific to Prophet when using logistic growth – the smallest amount the model is allowed to evaluate to.
calllback_func (callable) – Optional. The callback to use to modify the model, such as with different fourier terms.
**kwargs – Passed to the Prophet() function from prophet. See https://facebook.github.io/prophet/docs/quick_start.html#python-api.

>>> f.set_estimator('prophet')
>>> f.manual_forecast() # above args are now available in this function
>>> # using callbacks
>>> def add_seasonregr(m):
>>>       m.add_seasonality(name='monthly', period=30.5, fourier_order=5)
>>> f.manual_forecast(callback_func = add_seasonregr) # change a fourier order for seasonal regressors

rnn

Forecaster._forecast_rnn(dynamic_testing=True, Xvars=None, lags=None, layers_struct=[('SimpleRNN', {'units': 8, 'activation': 'tanh'})], loss='mean_absolute_error', optimizer='Adam', learning_rate=0.001, random_seed=None, plot_loss_test=False, plot_loss=False, scale_X=True, scale_y=True, **kwargs)

Forecasts with a recurrent neural network from TensorFlow, such as LSTM or simple recurrent. Not all features from tensorflow are available, but many of the most common ones for time series models are. This function accepts lags and external regressors as inputs. The model is saved in the tf_model attribute and a summary can be called by calling Forecaster.tf_model.summary(). See the univariate example: https://scalecast-examples.readthedocs.io/en/latest/rnn/rnn.html and the multivariate example: https://scalecast-examples.readthedocs.io/en/latest/multivariate-beyond/mv.html#8.-LSTM-Modeling.

Parameters:

dynamic_testing (bool) – Default True. Always True for rnn. The model uses a direct forecast.
Xvars (list-like) – Default None. The Xvars to train the models with. By default, all regressors added to the Forecaster object are used.
lags (int) – Alternative to Xvars. If wanting to train with lags only, specify this argument. If specified, Xvars is ignored. However many lags are placed here will also be added to the Forecaster object as AR Xvars.
layers_struct (list[tuple[str,dict[str,Union[float,str]]]]) – Default [(‘SimpleRNN’,{‘units’:8,’activation’:’tanh’})]. Each element in the list is a tuple with two elements. First element of the list is the input layer (input_shape set automatically). First element of the tuple in the list is the type of layer (‘SimpleRNN’,’LSTM’, or ‘Dense’). Second element is a dict. In the dict, key is a str representing hyperparameter name: ‘units’,’activation’, etc. The value is the hyperparameter value. See here for options related to SimpleRNN: https://www.tensorflow.org/api_docs/python/tf/keras/layers/SimpleRNN. For LSTM: https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM. For Dense: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense.
loss (str or tf.keras.losses.Loss) – Default ‘mean_absolute_error’. The loss function to minimize. See available options here: https://www.tensorflow.org/api_docs/python/tf/keras/losses. Be sure to choose one that is suitable for regression tasks.
optimizer (str or tf Optimizer) – Default “Adam”. The optimizer to use when compiling the model. See available values here: https://www.tensorflow.org/api_docs/python/tf/keras/optimizers. If str, it will use the optimizer with default args. If type Optimizer, will use the optimizer exactly as specified.
learning_rate (float) – Default 0.001. The learning rate to use when compiling the model. Ignored if you pass your own optimizer with a learning rate.
random_seed (int) – Optional. Set a seed for consistent results. With tensorflow networks, setting seeds does not guarantee consistent results.
plot_loss_test (bool) – Default False. Whether to plot the loss trend stored in history for each epoch on the test set. If validation_split passed to kwargs, will plot the validation loss as well. The resulting plot looks better if epochs > 1 passed to **kwargs.
plot_loss (bool) – default False. whether to plot the loss trend stored in history for each epoch on the full model. if validation_split passed to kwargs, will plot the validation loss as well. looks better if epochs > 1 passed to **kwargs.
scale_X (bool) – Default True. Whether to scale the exogenous inputs with a minmax scaler.
scale_y (bool) – Default True. Whether to scale the endogenous inputs (lags), as well as the model output, with a minmax scaler. The results will automatically return unscaled.
**kwargs – Passed to fit() and can include epochs, verbose, callbacks, validation_split, and more.

>>> f.set_estimator('rnn')
>>> f.manual_forecast() # above args are now available in this function
>>> f.tf_model.summary() # view a summary of the model's parameters

silverkite

Forecaster._forecast_silverkite(dynamic_testing=True, Xvars=None, cv_max_splits=0, **kwargs)

Forecasts with the silverkite model from LinkedIn greykite library. See the example: https://scalecast-examples.readthedocs.io/en/latest/silverkite/silverkite.html.

Parameters:

dynamic_testing (bool) – Default True. Always True for silverkite. It can use lags but they are always far enough in the past to allow a direct forecast.
Xvars (list-like, str, or None) – The regressors to predict with. None means no Xvars used (unlike sklearn models). AR terms can be accepted in this function if they are further lagged than the forecast horizon (for example, if predicting 2 periods into the future, lags 2 and greater can be used).
cv_max_splits (int) – Default 0. The number of cross-validation folds to use to optimize the model. This is separate from cross validation native to scalecast. It is native to the greykite library.
**kwargs – Passed to the ModelComponentsParam function from greykite.framework.templates.autogen.forecast_config.

>>> f.set_estimator('silverkite')
>>> f.manual_forecast() # above args are now available in this function

sklearn

tbats

Forecaster._forecast_tbats(dynamic_testing=True, random_seed=None, **kwargs)

Forecasts with TBATS.

Parameters:

dynamic_testing (bool) – Default True. Always set to True for HWES like all scalecast models that don’t use lags.
random_seed (int) – Optonal. Set a random seed for consistent results.
**kwargs – Passed to the TBATS() function. See https://github.com/intive-DataScience/tbats/blob/master/examples/detailed_tbats.py. show_warnings arg set to True in scalecast and warnings are logged.

>>> f.set_estimator('tbats')
>>> f.manual_forecast() # above args are now available in this function

theta

Forecaster._forecast_theta(dynamic_testing=True, **kwargs)

Forecasts with Four Theta from darts. See the example: https://scalecast-examples.readthedocs.io/en/latest/theta/theta.html.

Parameters:

dynamic_testing (bool) – Default True. Always set to True for theta like all scalecast models that don’t use lags.
**kwargs – passed to the FourTheta() function from darts. See https://unit8co.github.io/darts/generated_api/darts.models.forecasting.theta.html.

>>> f.set_estimator('theta')
>>> f.manual_forecast() # above args are now available in this function