MVForecaster

This object can be used to extend the univariate/exogenous regressor approach from the Forecaster class to make forecasts with multiple series that are all predicted forward dynamically using each other’s lags, seasonality, and any other exogenous regressors. This object is initiated by combining several Forecaster objects together. This approach can utilize any sklearn regressor model to make forecasts. All models can be dynamically tuned and tested.

from scalecast.Forecaster import Forecaster
from scalecast.MVForecaster import MVForecaster
from scalecast.SeriesTransformer import SeriesTransformer
import pandas_datareader as pdr # pip install pandas-datareader
data = pd.read_csv('data.csv') # df with 3 cols - Date, Series1, Series2
f1 = Forecaster(
  y = data['Series1'],
  current_dates = data['Date'],
  future_dates = 24,
)
f2 = Forecaster(
  y = data['Series2'],
  current_dates = data['Date'],
  future_dates = 24,
)
# before feeding to the MVForecaster object, you may want to add seasonal and other regressors
# you can add to one Forecaster object and in the MVForecaster object, it will be added to forecast both series

# initiate the MVForecaster object
mvf = MVForecaster(
  f1,
  f2,
  # add more Forecaster objects here
  # defaults below
  not_same_len_action='trim',
  merge_Xvars='union',
  merge_future_dates='longest',
  test_length = 0,
  cis = False,
  metrics = ['rmse','mape','mae','r2'],
  # specify names if you want them
  names=['My First Series', 'My Second Series'],
)

class src.scalecast.MVForecaster.MVForecaster(*fs, names=None, not_same_len_action='trim', merge_Xvars='union', merge_future_dates='longest', test_length=0, optimize_on='mean', cis=False, metrics=['rmse', 'mape', 'mae', 'r2'], carry_fit_models=False, **kwargs)

__init__(*fs, names=None, not_same_len_action='trim', merge_Xvars='union', merge_future_dates='longest', test_length=0, optimize_on='mean', cis=False, metrics=['rmse', 'mape', 'mae', 'r2'], carry_fit_models=False, **kwargs)

Parameters:

*fs (Forecaster) – Forecaster objects
names (list-like) – Optional. An array with the same number of elements as *fs that can be used to map to each series. Ex. if names == [‘UTUR’,’UNRATE’], the user must now refer to the series with the selected names. If specific names are not supplied, refer to the series with y1, y2, etc. The order the series are supplied will be maintained.
not_same_len_action (str) – One of ‘trim’, ‘fail’. default ‘trim’. What to do with series that are different lengths. ‘trim’ will trim each series so that all dates line up.
merge_Xvars (str) – One of ‘union’, ‘u’, ‘intersection’, ‘i’. default ‘union’. How to combine Xvars in each object. ‘union’ or ‘u’ combines all regressors from each object. ‘intersection’ or ‘i’ combines only regressors that all objects have in common.
merge_future_dates (str) – One of ‘longest’, ‘shortest’. Default ‘longest’. Which future dates to use in the various series. This can be changed later.
test_length (int or float) – Default 0. The test length that all models will use to test all models out of sample. If float, must be between 0 and 1 and will be treated as a fractional split. By default, models will not be tested.
optimize_on (str) – The way to aggregate the derived metrics when optimizing models across all series. This can be a function: ‘mean’, ‘min’, ‘max’, a custom function that takes a list of objects and returns an aggregate function (such as a weighted average) or a series name. Custom functions and weighted averages can also be added later by calling mvf.set_optimize_on().
cis (bool) – Default False. Whether to evaluate probabilistic confidence intervals for every model evaluated. If setting to True, ensure you also set a test_length of at least 20 observations for 95% confidence intervals. See eval_cis() and set_cilevel() methods and docstrings for more information.
metrics (list) – Default [‘rmse’,’mape’,’mae’,’r2’]. The metrics to evaluate when validating and testing models. Each element must exist in utils.metrics and take only two arguments: a and f. Or the element should be a function that accepts two arguments that will be referenced later by its name. See https://scalecast.readthedocs.io/en/latest/Forecaster/Util.html#metrics. The first element of this list will be set as the default validation metric, but that can be changed. For each metric and model that is tested, the test-set and in-sample metrics will be evaluated and can be exported.
carry_fit_models (bool) – Default False. Whether to store the regression model for each fitted model in history. Setting this to False can save memory.
**kwargs – Become attributes.

Methods:

`add_combo_regressors`(*args[, sep])	Combines all passed variables by multiplying their values together.
`add_covid19_regressor`([called, start, end])	Adds a dummy variable that is 1 during the time period that COVID19 effects are present for the series, 0 otherwise.
`add_cycle`(cycle_length[, fourier_order, called])	Adds a regressor that acts as a seasonal cycle.
`add_exp_terms`(*args, pwr[, sep, cutoff, drop])	Raises all passed variables (no AR terms) to exponential powers (ints or floats).
`add_lagged_terms`(*args[, lags, upto, sep, drop])	Lags all passed variables (no AR terms) 1 or more times.
`add_logged_terms`(*args[, base, sep, drop])	Logs all passed variables (no AR terms).
`add_metric`(func[, called])	Add a metric to be evaluated when validating and testing models.
`add_optimizer_func`(func[, called])	Add an optimizer function that can be used to determine the best-performing model.
`add_other_regressor`(called, start, end)	Adds a dummy variable that is 1 during the specified time period, 0 otherwise.
`add_poly_terms`(*args[, pwr, sep])	raises all passed variables (no AR terms) to exponential powers (ints only).
`add_pt_terms`(*args[, method, sep, drop])	Applies a box-cox or yeo-johnson power transformation to all passed variables (no AR terms).
`add_seasonal_regressors`(*args[, raw, ...])	Adds seasonal regressors.
`add_series`(series, called[, first_date, ...])	Adds other series to the object as regressors.
`add_signals`(model_nicknames[, series, ...])	Adds the predictions from already-evaluated models as covariates that can be used for future evaluated models.
`add_sklearn_estimator`(imported_module, called)	Adds a new estimator from scikit-learn not built-in to the forecaster object that can be called using set_estimator().
`add_time_trend`([called])	Adds a time trend from 1 to length of the series + the forecast horizon as a current and future Xvar.
`auto_forecast`([call_me, dynamic_testing, ...])	Auto forecasts with the best parameters indicated from the tuning process.
`chop_from_front`(n[, fcst_length])	Cuts the amount of y observations in the object from the front counting backwards.
`copy`()	Creates an object copy.
`corr`([train_only, disp, df])	Displays pearson correlation between all stored series in object.
`corr_lags`([y, x, lags])	Displays pearson correlation between one series and another series' lags.
`cross_validate`([k, test_length, ...])	Tunes a model's hyperparameters using time-series cross validation.
`deepcopy`()	Creates an object deepcopy.
`drop_Xvars`(*args[, error])	Drops regressors.
`drop_all_Xvars`()	drops all regressors.
`drop_regressors`(*args[, error])	Drops regressors.
`eval_cis`([mode, cilevel])	Call this function to change whether or not the Forecaster sets confidence intervals on all evaluated models.
`export`([dfs, models, series, cis, to_excel, ...])	Exports 1-all of 3 pandas dataframes.
`export_fitted_vals`([series, models])	Exports a dataframe of fitted values and actuals.
`export_validation_grid`(model)	Exports the validation grid from a model, converted to a pandas dataframe.
`generate_future_dates`(n)	Generates a certain amount of future dates in same frequency as current_dates.
`ingest_Xvars_df`(df[, date_col, drop_first, ...])	Ingests a dataframe of regressors and saves its Xvars to the object.
`ingest_grid`(grid)	Ingests a grid to tune the estimator.
`keep_smaller_history`(n)	Cuts y observations in the object by counting back from the beginning.
`limit_grid_size`(n[, min_grid_size, random_seed])	Makes a grid smaller randomly.
`manual_forecast`([call_me, dynamic_testing, ...])	Manually forecasts with the hyperparameters, Xvars, and normalizer selection passed as keywords.
`plot`([models, series, put_best_on_top, ci, ...])	Plots all forecasts with the actuals, or just actuals if no forecasts have been evaluated or are selected.
`plot_fitted`([models, series, ax, figsize])	Plots fitted values with the actuals.
`plot_test_set`([models, series, ...])	Plots all test-set predictions with the actuals.
`pop`(*args)	Deletes evaluated forecasts from the object's memory.
`set_best_model`([model, determine_best_by])	Sets the best model to be referenced as "best".
`set_cilevel`(n)	Sets the level for the resulting confidence intervals (95% default).
`set_estimator`(estimator)	Sets the estimator to forecast with.
`set_grids_file`([name])	Sets the name of the file where the object will look automatically for grids when calling tune(), cross_validate(), tune_test_forecast(), or similar function.
`set_last_future_date`(date)	Generates future dates in the same frequency as current_dates that ends on a specified date.
`set_metrics`(metrics)	Set or change the evaluated metrics for all model testing and validation.
`set_optimize_on`(how)	Choose how to determine best models by choosing which series should be optimized or the aggregate function to apply on the derived metrics across all series.
`set_test_length`([n])	Sets the length of the test set.
`set_validation_length`([n])	Sets the length of the validation set.
`set_validation_metric`(metric)	Sets the metric that will be used to tune all subsequent models.
`test`([dynamic_testing, call_me])	Tests the forecast estimator out-of-sample.
`transfer_cis`(transfer_from, model[, ...])	Transfers the confidence intervals from a model forecast in a passed Forecaster or MVForecaster object.
`transfer_predict`(transfer_from, model[, ...])	Makes predictions using an already-trained model over any given forecast horizon.
`tune`([dynamic_tuning, set_aside_test_set])	Tunes the specified estimator using an ingested grid (ingests a grid from Grids.py with same name as the estimator by default).
`tune_test_forecast`(models[, cross_validate, ...])	Iterates through a list of models, tunes them using grids in a grids file, and forecasts them.

add_combo_regressors(*args, sep='_')

Combines all passed variables by multiplying their values together.

Parameters:

*args (str) – Names of Xvars that aleady exist in the object.
sep (str) – Default ‘_’. The separator between each term in arg to create the final variable name.

Returns:

None

>>> f.add_combo_regressors('t','monthsin') # multiplies these two together (called 't_monthsin')
>>> f.add_combo_regressors('t','monthcos') # multiplies these two together (called 't_monthcos')

add_covid19_regressor(called='COVID19', start=datetime.datetime(2020, 3, 15, 0, 0), end=datetime.datetime(2021, 5, 13, 0, 0))

Adds a dummy variable that is 1 during the time period that COVID19 effects are present for the series, 0 otherwise. The default dates are selected to be optimized for the time-span where the economy was most impacted by COVID.

Parameters:

called (str) – Default ‘COVID19’. What to call the resulting variable.
start (str, datetime.datetime, or pd.Timestamp) – Default datetime.datetime(2020,3,15). The start date (default is day Walt Disney World closed in the U.S.). Must be parsable by pandas’ Timestamp function.
end – (str, datetime.datetime, or pd.Timestamp): Default datetime.datetime(2021,5,13). The end date (default is day the U.S. CDC first dropped the mask mandate/recommendation for vaccinated people). Must be parsable by pandas’ Timestamp function.

Returns:

None

add_cycle(cycle_length, fourier_order=2.0, called=None)

Adds a regressor that acts as a seasonal cycle. Use this function to capture non-normal seasonality.

Parameters:

cycle_length (int) – How many time steps make one complete cycle.
fourier_order (float) – Default 2.0. The fourier order to apply. This number is the number of complete cycles in that given seasonal period. 2 captures the fundamental frequency and its first harmonic. Higher orders will capture more complex seasonality, but may lead to overfitting.
called (str) – Optional. What to call the resulting variable. Two variables will be created–one for a sin transformation and the other for cos resulting variable names will have “sin” or “cos” at the end. Example, called = ‘cycle5’ will become ‘cycle5sin’, ‘cycle5cos’. If left unspecified, ‘cycle{cycle_length}’ will be used as the name.

Returns:

None

>>> f.add_cycle(13) # adds a seasonal effect that cycles every 13 observations called 'cycle13'

add_exp_terms(*args, pwr, sep='^', cutoff=2, drop=False)

Raises all passed variables (no AR terms) to exponential powers (ints or floats).

Parameters:

*args (str) – Names of Xvars that aleady exist in the object.
pwr (float) – The power to raise each term to in args. Can use values like 0.5 to perform square roots, etc.
sep (str) – default ‘^’. The separator between each term in arg to create the final variable name.
cutoff (int) – default 2. The resulting variable name will be rounded to this number based on the passed pwr. For instance, if pwr = 0.33333333333 and ‘t’ is passed as an arg to *args, the resulting name will be t^0.33 by default.
drop (bool) – Default False. Whether to drop the regressors passed to *args.

Returns:

None

>>> f.add_exp_terms('t',pwr=.5) # adds square root t called 't^0.5'

add_lagged_terms(*args, lags=1, upto=True, sep='_', drop=False)

Lags all passed variables (no AR terms) 1 or more times.

Parameters:

*args (str) – Names of Xvars that aleady exist in the object.
lags (int) – Greater than 0, default 1. The number of times to lag each passed variable.
upto (bool) – Default True. Whether to add all lags up to the number passed to lags. If you pass 6 to lags and upto is True, lags 1, 2, 3, 4, 5, 6 will all be added. If you pass 6 to lags and upto is False, lag 6 only will be added.
sep (str) – Default ‘_’. The separator between each term in arg to create the final variable name. Resulting variable names will be like “tlag_1” or “tlag_2” by default.
drop (bool) – Default False. Whether to drop the regressors passed to *args.

Returns:

None

>>> add_lagged_terms('t',lags=3) # adds first, second, and third lag of t called 'tlag_1' - 'tlag_3'
>>> add_lagged_terms('t',lags=6,upto=False) # adds 6th lag of t only called 'tlag_6'

add_logged_terms(*args, base=2.718281828459045, sep='', drop=False)

Logs all passed variables (no AR terms).

Parameters:

*args (str) – Names of Xvars that aleady exist in the object.
base (float) – Default math.e (natural log). The log base. Must be math.e or int greater than 1.
sep (str) – Default ‘’. The separator between each term in arg to create the final variable name. Resulting variable names will be like “log2t” or “lnt” by default.
drop (bool) – Default False. Whether to drop the regressors passed to *args.

Returns:

None

>>> f.add_logged_terms('t') # adds natural log t callend 'lnt'

add_metric(func, called=None)

Add a metric to be evaluated when validating and testing models. The function should accept two arguments where the first argument is an array of actual values and the second is an array of predicted values. The function returns a float.

Parameters:

func (function) – The function used to calculate the metric.
called (str) – Optional. The name that can be used to reference the metric function within the object. If not specified, will use the function’s name.

>>> from scalecast.util import metrics
>>> def rmse_mae(a,f):
>>>     # average of rmse and mae
>>>     return (metrics.rmse(a,f) + metrics.mae(a,f)) / 2
>>> f.add_metric(rmse_mae)
>>> f.set_validation_metric('rmse_mae') # optimize models using this metric

add_optimizer_func(func, called=None)

Add an optimizer function that can be used to determine the best-performing model. This is in addition to the ‘mean’, ‘min’, and ‘max’ functions that are available by default.

Parameters:

func (Function) – The function to add.
called (str) – Optional. How to refer to the function when calling optimize_on(). If left unspecified, will use the name of the function.

Returns:

None

>>> def weighted(x):
>>>     # weighted average of first two series in the object
>>>     return x[0]*.25 + x[1]*.75
>>> mvf.add_optimizer_func(weighted)
>>> mvf.set_optimize_on('weighted') # optimize on that function
>>> mvf.set_estimator('mlr')
>>> mvf.tune() # best model now chosen based on the weighted average function you added; series2 gets 3x the weight of series 1

add_other_regressor(called, start, end)

Adds a dummy variable that is 1 during the specified time period, 0 otherwise.

Parameters:

called (str) – What to call the resulting variable.
start (str, datetime.datetime, or pd.Timestamp) – Start date. Must be parsable by pandas’ Timestamp function.
end (str, datetime.datetime, or pd.Timestamp) – End date. Must be parsable by pandas’ Timestamp function.

Returns:

None

>>> f.add_other_regressor('january_2021','2021-01-01','2021-01-31')

add_poly_terms(*args, pwr=2, sep='^')

raises all passed variables (no AR terms) to exponential powers (ints only).

Parameters:

*args (str) – Names of Xvars that aleady exist in the object
pwr (int) – Default 2. The max power to add to each term in args (2 to this number will be added).
sep (str) – default ‘^’. The separator between each term in arg to create the final variable name.

Returns:

None

>>> f.add_poly_terms('t','year',pwr=3) # raises t and year to 2nd and 3rd powers (called 't^2', 't^3', 'year^2', 'year^3')

add_pt_terms(*args, method='box-cox', sep='_', drop=False)

Applies a box-cox or yeo-johnson power transformation to all passed variables (no AR terms).

Parameters:

*args (str) – Names of Xvars that aleady exist in the object.
method (str) – One of {‘box-cox’,’yeo-johnson’}, default ‘box-cox’. The type of transformation. box-cox works for positive values only. yeo-johnson is like a box-cox but can be used with 0s or negatives. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PowerTransformer.html.
sep (str) – Default ‘’. The separator between each term in arg to create the final variable name. Resulting variable names will be like “box-cox_t” or “yeo-johnson_t” by default.
drop (bool) – Default False. Whether to drop the regressors passed to *args.

Returns:

None

>>> f.add_pt_terms('t') # adds box cox of t called 'box-cox_t'

add_seasonal_regressors(*args, raw=True, sincos=False, dummy=False, drop_first=False, cycle_lens=None, fourier_order=2.0)

Adds seasonal regressors. Can be in the form of Fourier transformed, dummy, or integer values.

Parameters:

*args (str) – Values that return a series of int type from pandas.dt or pandas.dt.isocalendar(). See https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.year.html.
raw (bool) – Default True. Whether to use the raw integer values.
sincos (bool) – Default False. Whether to use a Fourier transformation of the raw integer values. The length of the cycle is derived from the max observed value unless cycle_lens is specified.
dummy (bool) – Default False. Whether to use dummy variables from the raw int values.
drop_first (bool) – Default False. Whether to drop the first observed dummy level. Not relevant when dummy = False.
cycle_lens (dict) – Optional. A dictionary that specifies a cycle length for each selected seasonality. If this is not specified or a selected seasonality is not added to the dictionary as a key, the cycle length will be selected automatically as the maximum value observed for the given seasonality. Not relevant when sincos = False.
fourier_order (float) – Default 2.0. The fourier order to apply to terms that are added using sincos = True. This number is the number of complete cycles in that given seasonal period. 2 captures the fundamental frequency and its first harmonic. Higher orders will capture more complex seasonality, but may lead to overfitting.

Returns:

None

>>> f.add_seasonal_regressors('year')
>>> f.add_seasonal_regressors(
>>>     'dayofyear',
>>>     'month',
>>>     'week',
>>>     'quarter',
>>>     raw=False,
>>>     sincos=True,
>>>     cycle_lens={'dayofyear':365.25},
>>> )
>>> f.add_seasonal_regressors('dayofweek',raw=False,dummy=True,drop_first=True)

add_series(series, called, first_date=None, forward_pad=True, back_pad=True)

Adds other series to the object as regressors. If the added series is less than the length of Forecaster.y + len(Forecaster.future_dates), it will padded with 0s by default.

Parameters:

series (list-like) – The series to add as a regressor to the object.
called (str) – Required. What to call the resulting regressor in the Forecaster object.
first_date (Datetime) – Optional. The first date that corresponds with the added series. If left unspecified, will assume its first date is the same as the first date in the Forecaster object. Must be datetime or otherwise able to be parsed by the pandas.Timestamp() function.
pad (bool) – Default True. Whether to put 0s before and/or after the series if the series is too short.

>>> x = [1,2,3,4,5,6]
>>> f.add_series(series = x,called='x') # assumes first date is same as what is in f.current_dates

add_signals(model_nicknames, series='all', fill_strategy='actuals', train_only=False)

Adds the predictions from already-evaluated models as covariates that can be used for future evaluated models. The names of the added variables will all begin with “signal_” and end with the given model nickname folowed by the series name.

Parameters:

model_nicknames (list) – The names of already-evaluated models with information stored in the history attribute.
fill_strategy (str or None) – The strategy to fill NA values that are present at the beginning of a given model’s fitted values. Available options are: ‘actuals’ (default) which will replace nulls with actuals; ‘bfill’ which will backfill null values; or None which will leave null values alone, which can cause errors in future evaluated models.
train_only (bool) – Default False. Whether to add fitted values from the training set only. The test-set predictions will be out-of-sample if this is True. The future unknown values are always out-of-sample. Even when this is True, the future unknown values are taken from a model trained on the full set of known observations.

>>> mvf.set_estimator('xgboost')
>>> mvf.manual_forecast()
>>> mvf.add_signals(model_nicknames = ['xgboost']) # adds regressors called 'signal_xgboost_{series1name}', ..., 'signal_xgboost_{seriesNname}'

add_sklearn_estimator(imported_module, called)

Adds a new estimator from scikit-learn not built-in to the forecaster object that can be called using set_estimator(). Only regression models are accepted.

Parameters:

imported_module (scikit-learn regression model) – The model from scikit-learn to add. Must have already been imported locally. Supports models from sklearn and sklearn APIs.
called (str) – The name of the estimator that can be called using set_estimator().

Returns:

None

>>> from sklearn.ensemble import StackingRegressor
>>> f.add_sklearn_estimator(StackingRegressor,called='stacking')
>>> f.set_estimator('stacking')
>>> f.manual_forecast(...)

add_time_trend(called='t')

Adds a time trend from 1 to length of the series + the forecast horizon as a current and future Xvar.

Parameters:: Called (str) – Default ‘t’. What to call the resulting variable.
Returns:: None

>>> f.add_time_trend() # adds time trend called 't'

auto_forecast(call_me=None, dynamic_testing=True, test_again=True)

Auto forecasts with the best parameters indicated from the tuning process.

Parameters:

call_me (str) – Optional. What to call the model when storing it in the object’s history dictionary. If not specified, the model’s nickname will be assigned the estimator value (‘mlp’ will be ‘mlp’, etc.). Duplicated names will be overwritten with the most recently called model.
dynamic_testing (bool or int) – Default True. Whether to dynamically/recursively test the forecast (meaning AR terms will be propagated with predicted values). If True, evaluates dynamically over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out. The model will skip testing if the test_length attribute is set to 0.
test_again (bool) – Default True. Whether to test the model before forecasting to a future horizon. If test_length is 0, this is ignored. Set this to False if you tested the model manually by calling f.test() and don’t want to waste resources testing it again.

>>> f.set_estimator('xgboost')
>>> f.tune()
>>> f.auto_forecast()

chop_from_front(n, fcst_length=None)

Cuts the amount of y observations in the object from the front counting backwards. The current length of the forecast horizon will be maintained and all future regressors will be rewritten to the appropriate attributes.

Parameters:

n (int) – The number of observations to cut from the front.
fcst_length (int) – Optional. The new length of the forecast length. By default, maintains the same forecast length currently in the object.

>>> mvf.chop_from_front(10) # keeps all observations before the last 10

copy(): Creates an object copy.

corr(train_only=False, disp='matrix', df=None, **kwargs)

Displays pearson correlation between all stored series in object.

Parameters:

train_only (bool) – Default False. Whether to only include the training set (to avoid leakage).
disp (str) – One of {‘matrix’,’heatmap’}. Default ‘matrix’. How to display results.
df (DataFrame) – Optional. A dataframe to display correlation for. If specified, a dataframe will be created using all series with no lags. This argument exists to make the corr_lags() method work and it is not recommended to use it manually.
**kwargs – Passed to seaborn.heatmap() function and are ignored if disp == ‘matrix’.

Returns:

The created dataframe if disp == ‘matrix’ else the heatmap fig.

Return type:

(DataFrame or Figure)

corr_lags(y=None, x=None, lags=1, **kwargs)

Displays pearson correlation between one series and another series’ lags.

Parameters:

y (str) – The series to display as the dependent series. Default will take the first loaded series in the object.
x (str) – The series to display as the independent series. Default will take the second loaded series in the object.
lags (int) – Default 1. The number of lags to display in the independent series.
**kwargs – Passed to the MVForecaster.corr() method. Will not pass the df arg.

Returns:

The created dataframe if disp == ‘matrix’ else the heatmap fig.

Return type:

(DataFrame or Figure)

cross_validate(k=5, test_length=None, train_length=None, space_between_sets=None, rolling=False, dynamic_tuning=False, set_aside_test_set=True, verbose=False)

Tunes a model’s hyperparameters using time-series cross validation. Monitors the metric specified in the valiation_metric attribute. Set an estimator before calling. Reads a grid for the estimator from a grids file unless a grid is ingested manually. The chosen parameters are stored in the best_params attribute. All metrics from each iteration are stored in grid_evaluated. The rows in this matrix correspond to the element index in f.grid (a hyperparameter combo) and the columns are the derived metrics across the k folds. Any hyperparameters that ever failed to evaluate will return N/A and are not considered. The best parameter combo is determined by the best average derived matrix across all folds. The temporal order of the series is always maintained in this process. If a test_length is specified in the object, it will be set aside by default. (Default) Normal cv diagram: https://scalecast-examples.readthedocs.io/en/latest/misc/validation/validation.html#5-Fold-Time-Series-Cross-Validation. (Default) Rolling cv diagram: https://scalecast-examples.readthedocs.io/en/latest/misc/validation/validation.html#5-Fold-Rolling-Time-Series-Cross-Validation.

Parameters:

k (int) – Default 5. The number of folds. If 1, behaves as if the model were being tuned on a single held out set.
test_length (int) – Optional. The size of each held-out sample. By default, determined such that the last test set and train set are the same size.
train_length (int) – Optional. The size of each training set. By default, all available observations before each test set are used.
space_between_sets (int) – Optional. The space between each training set. By default, uses the test_length.
rolling (bool) – Default False. Whether to use a rolling method, meaning every train and test size is the same. This is ignored when either of train_length or test_length is specified.
dynamic_tuning (bool or int) – Default False. Whether to dynamically/recursively test the forecast during the tuning process (meaning AR terms will be propagated with predicted values). If True, evaluates recursively over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step recurvie testing, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out.
set_aside_test_set (bool) – Default True. Whether to separate the test set specified in f.test_length during this process.
verbose (bool) – Default False. Whether to print out information about the test size, train size, and date ranges for each fold.

Returns:

None

>>> f.set_estimator('xgboost')
>>> f.cross_validate() # tunes hyperparam values
>>> f.auto_forecast() # forecasts with the best params

deepcopy(): Creates an object deepcopy.

drop_Xvars(*args, error='raise')

Drops regressors.

Parameters:

*args (str) – The names of regressors to drop.
error (str) – One of ‘ignore’,’raise’. Default ‘raise’. What to do with the error if the Xvar is not found in the object.

Returns:

None

>>> f.add_time_trend()
>>> f.add_exp_terms('t',pwr=.5)
>>> f.drop_Xvars('t','t^0.5')

drop_all_Xvars(): drops all regressors.

drop_regressors(*args, error='raise')

Drops regressors.

Parameters:

*args (str) – The names of regressors to drop.
error (str) – One of ‘ignore’,’raise’. Default ‘raise’. What to do with the error if the Xvar is not found in the object.

Returns:

None

>>> f.add_time_trend()
>>> f.add_exp_terms('t',pwr=.5)
>>> f.drop_regressors('t','t^0.5')

eval_cis(mode=True, cilevel=0.95)

Call this function to change whether or not the Forecaster sets confidence intervals on all evaluated models. Beginning 0.17.0, only conformal confidence intervals are supported. Conformal intervals need a test set to be configured soundly. Confidence intervals cannot be evaluated when there aren’t at least 1/(1-cilevel) observations in the test set.

Parameters:

mode (bool) – Default True. Whether to set confidence intervals on or off for models.
cilevel (float) – Default .95. Must be greater than 0, less than 1. The confidence level to use to set intervals.

export(dfs=['model_summaries', 'lvl_test_set_predictions', 'lvl_fcsts'], models='all', series='all', cis=False, to_excel=False, out_path='./', excel_name='results.xlsx')

Exports 1-all of 3 pandas dataframes. Can write to excel with each dataframe on a separate sheet. Will return either a dictionary with dataframes as values (df str arguments as keys) or a single dataframe if only one df is specified.

Parameters:

dfs (list-like or str) – Default [‘model_summaries’, ‘lvl_test_set_predictions’, ‘lvl_fcsts’]. A list or name of the specific dataframe(s) you want returned and/or written to excel. Must be one of or multiple of the elements in default. Exporting test set predictions only works if all exported models were tested using the same test length.
models (list-like or str) – Default ‘all’. The forecasted models to plot. Name of the model, ‘all’, or list-like of model names. ‘top_’ and None not supported.
series (list-like or str) – Default ‘all’. The series to plot. Name of the series (‘y1…’, ‘series1…’ or user-selected name), ‘all’, or list-like of series names.
to_excel (bool) – Default False. Whether to save to excel.
cis (bool) – Default False. Whether to export confidence intervals for models in “all_fcsts”, “test_set_predictions”, “lvl_test_set_predictions”, “lvl_fcsts” dataframes.
out_path (str) – Default ‘./’. The path to save the excel file to (ignored when to_excel=False).
excel_name (str) – Default ‘results.xlsx’. The name to call the excel file (ignored when to_excel=False).

Returns:

Either a single pandas dataframe if one element passed to dfs or a dictionary where the keys match what was passed to dfs and the values are dataframes.

Return type:

(DataFrame or Dict[str,DataFrame])

>>> results = mvf.export(dfs=['model_summaries','lvl_fcsts'],to_excel=True) # returns a dict
>>> model_summaries = results['model_summaries'] # returns a dataframe
>>> lvl_fcsts = results['lvl_fcsts'] # returns a dataframe
>>> ts_preds = mvf.export('test_set_predictions') # returns a dataframe

export_fitted_vals(series='all', models='all')

Exports a dataframe of fitted values and actuals.

Parameters:

models (list-like or str) – Default ‘all’. Name of the model, ‘all’, or list-like of model names.
series (list-like or str) – Default ‘all’. The series to plot. Name of the series (‘y1…’, ‘series1…’ or user-selected name), ‘all’, or list-like of series names.

Returns:

The fitted values for all selected series and models.

Return type:

(DataFrame)

export_validation_grid(model) → DataFrame

Exports the validation grid from a model, converted to a pandas dataframe. Raises an error if the model was not tuned.

Parameters:: model (str) – The name of them model to export for. Matches what was passed to call_me when evaluating the model.
Returns:: The resulting validation grid of the evaluated model passed to model arg.
Return type:: (DataFrame)

generate_future_dates(n)

Generates a certain amount of future dates in same frequency as current_dates.

Parameters:: n (int) – Greater than 0. Number of future dates to produce. This will also be the forecast length.
Returns:: None

>>> f.generate_future_dates(12) # 12 future dates to forecast out to

ingest_Xvars_df(df, date_col='Date', drop_first=False, use_future_dates=False, pad=False)

Ingests a dataframe of regressors and saves its Xvars to the object. The user must specify a date column name in the dataframe being ingested. All non-numeric values are dummied. The dataframe should cover the entire future horizon stored within the Forecaster object, but can be padded with 0s if testing only is desired. Any columns in the dataframe that begin with “AR” will be confused with autoregressive terms and could cause errors.

Parameters:

df (DataFrame) – The dataframe that is at least the length of the y array stored in the object plus the forecast horizon.
date_col (str) – Default ‘Date’. The name of the date column in the dataframe. This column must have the same frequency as the dates stored in the Forecaster object.
drop_first (bool) – Default False. Whether to drop the first observation of any dummied variables. Irrelevant if passing all numeric values.
use_future_dates (bool) – Default False. Whether to use the future dates in the dataframe as the resulting future_dates attribute in the Forecaster object.
pad (bool) – Default False. Whether to pad any missing values with 0s.

Returns:

None

ingest_grid(grid)

Ingests a grid to tune the estimator.

Parameters:: grid (dict or str) – If dict, must be a user-created grid. If str, must match the name of a dict grid stored in a grids file.
Returns:: None

>>> f.set_estimator('mlr')
>>> f.ingest_grid({'normalizer':['scale','minmax']})

keep_smaller_history(n)

Cuts y observations in the object by counting back from the beginning.

Parameters:: n (int, str, or datetime.datetime) – If int, the number of observations to keep. Otherwise, the last observation to keep. Must be parsable by pandas’ Timestamp function.
Returns:: None

>>> f.keep_smaller_history(500) # keeps last 500 observations
>>> f.keep_smaller_history('2020-01-01') # keeps only observations on or later than 1/1/2020

limit_grid_size(n, min_grid_size=1, random_seed=None)

Makes a grid smaller randomly.

Parameters:

n (int or float) – If int, randomly selects that many parameter combinations. If float, must be less than 1 and greater 0, randomly selects that percentage of parameter combinations.
min_grid_size (int) – Default 1. The min number of hyperparameters to keep from the original grid if a float is passed to n.
random_seed (int) – Optional. Set a seed to make results consistent.

Returns:

None

>>> from scalecast import GridGenerator
>>> GridGenerator.get_example_grids()
>>> f.set_estimator('mlp')
>>> f.ingest_grid('mlp')
>>> f.limit_grid_size(10,random_seed=20) # limits grid to 10 iterations
>>> f.limit_grid_size(.5,random_seed=20) # limits grid to half its original size

manual_forecast(call_me=None, dynamic_testing=True, test_again=True, bank_history=True, **kwargs)

Manually forecasts with the hyperparameters, Xvars, and normalizer selection passed as keywords. See https://scalecast.readthedocs.io/en/latest/Forecaster/_forecast.html.

Parameters:

call_me (str) – Optional. What to call the model when storing it in the object’s history. If not specified, the model’s nickname will be assigned the estimator value (‘mlp’ will be ‘mlp’, etc.). Duplicated names will be overwritten with the most recently called model.
dynamic_testing (bool or int) – Default True. Whether to dynamically/recursively test the forecast (meaning AR terms will be propagated with predicted values). If True, evaluates dynamically over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out. The model will skip testing if the test_length attribute is set to 0.
test_again (bool) – Default True. Whether to test the model before forecasting to a future horizon. If test_length is 0, this is ignored. Set this to False if you tested the model manually by calling f.test() and don’t want to waste resources testing it again.
**kwargs – passed to the _forecast_{estimator}() method and can include such parameters as Xvars, normalizer, cap, and floor, in addition to any given model’s specific hyperparameters. See https://scalecast.readthedocs.io/en/latest/Forecaster/_forecast.html.

>>> f.set_estimator('lasso')
>>> f.manual_forecast(alpha=.5)

plot(models='all', series='all', put_best_on_top=False, ci=False, ax=None, figsize=(12, 6))

Plots all forecasts with the actuals, or just actuals if no forecasts have been evaluated or are selected.

Parameters:

models (list-like or str) – Default ‘all’. The forecasted models to plot. Name of the model, ‘all’, or list-like of model names. ‘top_’ and None not supported.
series (list-like or str) – Default ‘all’. The series to plot. Name of the series, ‘all’, or list-like of series names.
put_best_on_top (bool) – Only set to True if you have previously called set_best_model(). If False, ignored.
ci (bool) – Default False. Whether to display the confidence intervals.
ax (Axis) – Optional. The existing axis to write the resulting figure to.
figsize (tuple) – Default (12,6). Size of the resulting figure. Ignored when ax is not None.

Returns:

The figure’s axis.

Return type:

(Axis)

>>> mvf.plot() # plots all forecasts and all series
>>> plt.show()

plot_fitted(models='all', series='all', ax=None, figsize=(12, 6))

Plots fitted values with the actuals.

Parameters:

models (list-like or str) – Default ‘all’. The forecasted models to plot. Name of the model, ‘all’, or list-like of model names. ‘top_’ and None not supported.
series (list-like or str) – Default ‘all’. The series to plot. Name of the series (‘y1…’, ‘series1…’ or user-selected name), ‘all’, or list-like of series names.
ax (Axis) – Optional. The existing axis to write the resulting figure to.
figsize (tuple) – Default (12,6). Size of the resulting figure. Ignored when ax is not None.

Returns:

The figure’s axis.

Return type:

(Axis)

>>> mvf.plot_fitted() # plots all fitted values on all series
>>> plt.show()

plot_test_set(models='all', series='all', put_best_on_top=False, include_train=True, ci=False, ax=None, figsize=(12, 6))

Plots all test-set predictions with the actuals.

Parameters:

models (list-like or str) – Default ‘all’. The forecasted models to plot. Name of the model, ‘all’, or list-like of model names. ‘top_’ and None not supported.
series (list-like or str) – Default ‘all’. The series to plot. Name of the series (‘y1…’, ‘series1…’ or user-selected name), ‘all’, or list-like of series names.
put_best_on_top (bool) – Only set to True if you have previously called set_best_model(). If False, ignored.
include_train (bool or int) – Default True. Use to zoom into training results. If True, plots the test results with the entire history in y. If False, matches y history to test results and only plots this. If int, plots that length of y to match to test results.
ci (bool) – Default False. Whether to display the confidence intervals.
ax (Axis) – Optional. The existing axis to write the resulting figure to.
figsize (tuple) – Default (12,6). Size of the resulting figure. Ignored when ax is not None.

Returns:

The figure’s axis.

Return type:

(Axis)

>>> mvf.plot_test_set() # plots all test set predictions on all series
>>> plt.show()

pop(*args)

Deletes evaluated forecasts from the object’s memory.

Parameters:: *args (str) – Names of models matching what was passed to call_me when model was evaluated.

>>> models = ('mlr','mlp','lightgbm')
>>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True)
>>> f.pop('mlr')

set_best_model(model=None, determine_best_by=None)

Sets the best model to be referenced as “best”. One of model or determine_best_by parameters must be specified.

Parameters:

model (str) – The model to set as the best. Must match the estimator name or call_me if that was used when evaluating the model.
determine_best_by (str) – One of MVForecaster.determine_best_by. If model is specified, this will be ignored.

Returns:

None

set_cilevel(n)

Sets the level for the resulting confidence intervals (95% default).

Parameters:: n (float) – Greater than 0 and less than 1.
Returns:: None

>>> f.set_cilevel(.80) # next forecast will get 80% confidence intervals

set_estimator(estimator)

Sets the estimator to forecast with.

Parameters:: estimator (str) – One of Forecaster.estimators.
Returns:: None

>>> f.set_estimator('lasso')
>>> f.manual_forecast(alpha = .5)

set_grids_file(name='Grids')

Sets the name of the file where the object will look automatically for grids when calling tune(), cross_validate(), tune_test_forecast(), or similar function. If the grids file does not exist in the working directory, the error will only be raised once tuning is called.

Parameters:: name (str) – Default ‘Grids’. The name of the file to look for. This file must exist in the working directory. The default will look for a file called “Grids.py”.

>>> f.set_grids_file('ModGrids') # expects to find a file called ModGrids.py in working directory.

set_last_future_date(date)

Generates future dates in the same frequency as current_dates that ends on a specified date.

Parameters:: date (datetime.datetime, pd.Timestamp, or str) – The date to end on. Must be parsable by pandas’ Timestamp() function.
Returns:: None

>>> f.set_last_future_date('2021-06-01') # creates future dates up to this one in the expected frequency

set_metrics(metrics)

Set or change the evaluated metrics for all model testing and validation.

Parameters:: metrics (list) – The metrics to evaluate when validating and testing models. Each element must exist in utils.metrics and take only two arguments: a and f. See https://scalecast.readthedocs.io/en/latest/Forecaster/Util.html#metrics. For each metric and model that is tested, the test-set and in-sample metrics will be evaluated and can be exported. Level test-set and in-sample metrics are also currently available, but will be removed in a future version.

set_optimize_on(how)

Choose how to determine best models by choosing which series should be optimized or the aggregate function to apply on the derived metrics across all series. This is the decision that will be used for optimizing model hyperparameters.

Parameters:: how (str) – One of MVForecaster.optimizer_funcs, a series name, or a function. Only one series name will be in mvf.optimizer_funcs at a given time. mvf.optimize_on is set to ‘mean’ when the object is initiated.

set_test_length(n=1)

Sets the length of the test set. As of version 0.16.0, 0-length test sets are supported.

Parameters:: n (int or float) – Default 1. The length of the resulting test set. Pass 0 to skip testing models. Fractional splits are supported by passing a float less than 1 and greater than 0.
Returns:: None

>>> f.set_test_length(12) # test set of 12
>>> f.set_test_length(.2) # 20% test split

set_validation_length(n=1)

Sets the length of the validation set. This will never matter for models that are not tuned.

Parameters:: n (int) – Default 1. The length of the resulting validation set.
Returns:: None

>>> f.set_validation_length(6) # validation length of 6

set_validation_metric(metric)

Sets the metric that will be used to tune all subsequent models.

Parameters:: metric – One of Forecaster.metrics. The metric to optimize the models with using the validation set. Although model testing will evaluate all metrics in Forecaster.metrics, model optimization with tuning and cross validation only uses one of these.
Returns:: None

>>> f.set_validation_metric('mae')

test(dynamic_testing=True, call_me=None, **kwargs)

Tests the forecast estimator out-of-sample. Uses the test_length attribute to determine on how-many observations. All test-set splits maintain temporal order.

Parameters:

dynamic_testing (bool or int) – Default True. Whether to dynamically/recursively test the forecast (meaning AR terms will be propagated with predicted values). If True, evaluates dynamically over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out. This will fail if the test_length attribute is 0.
call_me (str) – Optional. What to call the model when storing it in the object’s history. If not specified, the model’s nickname will be assigned the estimator value (‘mlp’ will be ‘mlp’, etc.). Duplicated names will be overwritten with the most recently called model.
**kwargs – passed to the _forecast_{estimator}() method and can include such parameters as Xvars, normalizer, cap, and floor, in addition to any given model’s specific hyperparameters. See https://scalecast.readthedocs.io/en/latest/Forecaster/_forecast.html.

>>> f.set_estimator('lasso')
>>> f.test(alpha=.5)

transfer_cis(transfer_from, model, transfer_to_model=None, transfer_test_set_cis='infer')

Transfers the confidence intervals from a model forecast in a passed Forecaster or MVForecaster object.

Parameters:

transfer_from (Forecaster or MVForecaster) – The object that contains the model from which intervals should be transferred.
model (str) – The model nickname of the already-evaluated model stored in transfer_from.
transfer_to_model (str) – Optional. The nickname of the model to which the intervals should be transferred. If not specified, inherits the name passed to model.
transfer_test_set_cis (bool or str) – Default ‘infer’. Whether to pass intervals for test-set predictions. If ‘infer’, the decision is made based on whether the inheriting MVForecaster object has test-set predictions evaluated.

Returns:

None.

>>> f.manual_forecast(call_me='mlr')
>>> f_new.transfer_predict(transfer_from=f,model='mlr')
>>> f_new.transfer_cis(transfer_from=f,model='mlr')

transfer_predict(transfer_from, model, model_type='sklearn', return_df=False, series=None, dates=[], save_to_history=True, call_me=None, regr=None)

Makes predictions using an already-trained model over any given forecast horizon. Will use the already-trained model from a passed MVForecaster object to create a new model in the MVForecaster object from which the method is called. Or the option is available to not save a new model but return the predictions in a pandas DataFrame object. Confidence intervals cannot be transferred from this method but can be from the transfer_cis() method.

Parameters:

transfer_from (MVForecaster) – The MVForecaster object that contains the already-fitted model.
model (str) – The model nickname of the already-evaluated model stored in the MVForecaster object passed to transfer_from.
model_type (str) – Default ‘sklearn’. The type of model that needs to be predicted. Right now, only ‘sklearn’ is supported. The scalecast VECM model is also supported but the model_type argument will still be ‘sklearn’ in those cases.
return_df (bool) – Default False. Whether to return a pandas DataFrame with the date as an index. If the dates argument is not specified, this will include all dates in the MVForecaster instance that the method is called from.
series (list) – Optional. The series in the MVForecaster object to return predictions for. By default, all series will have forecasts generated.
dates (collection) – Optional. The dates to limit the predictions for. Ignored if return_df is not specified. If the passed dates are not in the same frequency as the dates stored in the Forecaster object, an IndexError is raised.
save_to_history (bool) – Default True. Whether to save the transferred predictions as if they were a model being run using a _forecast() method.
call_me (str) – Optional. What to call the resulting model. If save_to_history is False, this is ignored. If not specified, inherits the name passed to model.
regr (dict or scalecast.auxmodels.vecm class) – Optional. The model(s) to make predictions with. If not supplied, the model will be searched for in the MVForecaster passed to transfer_from. The keys in the dictionary should match the names of the series in the MVForecaster object.

Returns:

The date-indexed DataFrame if return_series is True.

Return type:

(Pandas DataFrame or None)

>>> mvf.manual_forecast(call_me='mlr')
>>> mvf_new.transfer_predict(transfer_from=mvf,model='mlr')

tune(dynamic_tuning=False, set_aside_test_set=True)

Tunes the specified estimator using an ingested grid (ingests a grid from Grids.py with same name as the estimator by default). This is akin to cross-validation with one fold and a test_length equal to f.validation_length. Any parameters that can be passed as arguments to manual_forecast() can be tuned with this process. The chosen parameters are stored in the best_params attribute. The evaluated validation grid can be exported to a dataframe using f.export_validation_grid().

Parameters:

dynamic_tuning (bool or int) – Default False. Whether to dynamically/recursively test the forecast during the tuning process (meaning AR terms will be propagated with predicted values). If True, evaluates recursively over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step recurvie testing, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out.
set_aside_test_set (bool) – Default True. Whether to separate the test set specified in f.test_length during this process.

Returns:

None

>>> f.set_estimator('xgboost')
>>> f.tune()
>>> f.auto_forecast()

tune_test_forecast(models, cross_validate=False, dynamic_tuning=False, dynamic_testing=True, limit_grid_size=None, min_grid_size=1, suffix=None, error='raise', **cvkwargs)

Iterates through a list of models, tunes them using grids in a grids file, and forecasts them.

Parameters:

models (list-like) – The models to iterate through.
cross_validate (bool) – Default False. Whether to tune the model with cross validation. If False, uses the validation slice of data to tune.
dynamic_tuning (bool or int) – Default False. Whether to dynamically tune the model or, if int, how many forecast steps to dynamically tune it.
dynamic_testing (bool or int) – Default True. Whether to dynamically/recursively test the forecast (meaning AR terms will be propagated with predicted values). If True, evaluates dynamically over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out. The model will skip testing if the test_length attribute is set to 0.
limit_grid_size (int or float) – Optional. Pass an argument here to limit each of the grids being read. See https://scalecast.readthedocs.io/en/latest/Forecaster/MVForecaster.html#src.scalecast.MVForecaster.MVForecaster.limit_grid_size.
min_grid_size (int) – Default 1. The smallest grid size to keep. Ignored if limit_grid_size is None.
suffix (str) – Optional. A suffix to add to each model as it is evaluated to differentiate them when called later. If unspecified, each model can be called by its estimator name.
error (str) – One of ‘ignore’,’raise’,’warn’. Default ‘raise’. What to do with the error if a given model fails. ‘warn’ logs a warning that the model could not be evaluated.
**cvkwargs – Passed to the cross_validate() method.

Returns:

None

>>> models = ('mlr','mlp','lightgbm')
>>> mvf.tune_test_forecast(models,dynamic_testing=False)