Forecaster

This is the main object that is utilized for making predictions on the test set, making forecasts, evaluating models, data differencing, adding regressors, and saving, visualizing, and exporting results.

 from scalecast.Forecaster import Forecaster
 array_of_dates = ['2021-01-01','2021-01-02','2021-01-03']
 array_of_values = [1,2,3]
 f = Forecaster(
   y=array_of_values,
   current_dates=array_of_dates,
   # defaults below
   future_dates=None,
   test_length = 0,
   cis = False,
   metrics = ['rmse','mape','mae','r2'],
)

class scalecast.Forecaster.Forecaster(y: Sequence[float | int], current_dates: Sequence[date | datetime | Timestamp | datetime64 | str], future_dates: Annotated[int, 'must be >= 0'] | None = None, test_length: Annotated[int, 'must be >= 0'] = 0, validation_length: Annotated[int, 'must be >= 0'] = 1, metrics: list[str] | None = None, cis: bool = False, carry_fit_models: bool = True)

Forecaster object is the main univariate forecasting object in the scalecast library. It contains all the observed data, future dates, regressors, and methods to manipulate these and evaluate forecasts.

Parameters:

y (collection) – An array of all observed values. Must match the order of elements passed to current_dates.
current_dates (collection) – An array of all observed dates. Must be same length as y and in the same sequence. Can pass any numerical index if dates are unknown; in this case, It will act as if dates are in nanosecond frequency.
future_dates (int) – Optional. The future dates to add to the model upon initialization. If not added when object is initialized, can be added later.
test_length (int or float) – Default 0. The test length that all models will use to test all models out of sample. If float, must be between 0 and 1 and will be treated as a fractional split. By default, models will not be tested.
validation_length (int) – The size of the validation set. Default sets it at 1.
metrics (list[str]) – Optional. List of metrics to evaluate every model.
cis (bool) – Default False. Whether to evaluate naive conformal confidence intervals for every model evaluated. If setting to True, ensure you also set a test_length of at least 20 observations for 95% confidence intervals. See eval_cis() and set_cilevel() methods and docstrings for more information.
carry_fit_models (bool) – Default True. Whether to store the regression model for each fitted model in history. Setting this to False can save memory.

__init__(y: Sequence[float | int], current_dates: Sequence[date | datetime | Timestamp | datetime64 | str], future_dates: Annotated[int, 'must be >= 0'] | None = None, test_length: Annotated[int, 'must be >= 0'] = 0, validation_length: Annotated[int, 'must be >= 0'] = 1, metrics: list[str] | None = None, cis: bool = False, carry_fit_models: bool = True)

Methods:

`STL`([diffy, train_only])	Returns a Season-Trend decomposition using LOESS of the y values.
`add_AR_terms`(N)	Adds seasonal auto-regressive terms.
`add_ar_terms`(n)	Adds auto-regressive terms.
`add_combo_regressors`(*args[, sep])	Combines all passed variables by multiplying their values together.
`add_covid19_regressor`([called, start, end])	Adds a dummy variable that is 1 during the time period that COVID19 effects are present for the series, 0 otherwise.
`add_cycle`(cycle_length[, fourier_order, called])	Adds a regressor that acts as a seasonal cycle.
`add_exp_terms`(*args, pwr[, sep, cutoff, drop])	Raises all passed variables (no AR terms) to exponential powers (ints or floats).
`add_lagged_terms`(*args[, lags, upto, sep, drop])	Lags all passed variables (no AR terms) 1 or more times.
`add_logged_terms`(*args[, base, sep, drop])	Logs all passed variables (no AR terms).
`add_normalizer`(called, imported_normalizer)	Add a normalizer to be available for forecasting.
`add_other_regressor`(called, start, end)	Adds a dummy variable that is 1 during the specified time period, 0 otherwise.
`add_poly_terms`(*args[, pwr, sep])	raises all passed variables (no AR terms) to exponential powers (ints only).
`add_pt_terms`(*args[, method, sep, drop])	Applies a box-cox or yeo-johnson power transformation to all passed variables (no AR terms).
`add_seasonal_regressors`(*args[, raw, ...])	Adds seasonal regressors.
`add_series`(series, called[, first_date, pad])	Adds other series to the object as regressors.
`add_signals`(model_nicknames[, ...])	Adds the predictions from already-evaluated models as covariates that can be used for future evaluated models.
`add_sklearn_estimator`(imported_module, called)	Adds a new estimator from scikit-learn not built-in to the forecaster object that can be called using set_estimator().
`add_time_trend`([called])	Adds a time trend from 1 to length of the series + the forecast horizon as a current and future Xvar.
`adf_test`([critical_pval, full_res, ...])	Tests the stationarity of the y series using augmented dickey fuller.
`all_feature_info_to_excel`([out_path, excel_name])	Saves all feature importance to excel.
`all_validation_grids_to_excel`([out_path, ...])	Saves all validation grids to excel.
`auto_Xvar_select`([estimator, try_trend, ...])	Attempts to find the ideal trend, seasonality, and look-back representations for the stored series by systematically adding regressors to the object and monintoring a passed metric value.
`auto_forecast`([call_me, test_model, ...])	Auto forecasts with the best parameters indicated from the tuning process.
`chop_from_back`(n)	Cuts y observations in the object from the back by counting forward from the beginning.
`chop_from_front`(n[, fcst_length])	Cuts the amount of y observations in the object from the front counting backwards.
`copy`()	Creates an object copy.
`cross_validate`([k, test_length, ...])	Tunes a model's hyperparameters using time-series cross validation.
`determine_best_series_length`([estimator, ...])	Attempts to find the optimal length for the series to produce accurate forecasts by systematically shortening the series, running estimations, and monitoring a passed metric value.
`determine_if_MVForecaster`()	Determines if the object is a Forecater of MVForecaster type by checking if the y attribute is a dictionary (MVForecaster) or a Series (Forecaster).
`drop_Xvars`(*args[, raise_error])	Drops regressors.
`drop_all_Xvars`()	Drops all regressors.
`drop_regressors`(*args[, raise_error])	Drops regressors.
`eval_cis`([mode, cilevel])	Call this function to change whether or not the Forecaster sets confidence intervals on all evaluated models.
`export`([dfs, models, best_model, ...])	Exports 1-all of 3 pandas DataFrames.
`export_Xvars_df`([dropna])	Gets all utilized regressors and values.
`export_feature_importance`(model)	Exports the feature importance from a model.
`export_fitted_vals`(model)	Exports a single dataframe with dates, fitted values, actuals, and residuals for one model.
`export_validation_grid`(model)	Exports the validation grid from a model, converted to a pandas dataframe.
`fit`(**fit_params)	Fits the model assigned to self.call_estimator.
`generate_future_dates`(n)	Generates a certain amount of future dates in same frequency as current_dates.
`get_freq`()	Gets the pandas inferred date frequency.
`get_max_lag_order`()	Returns the highest lag order variable stored in the object.
`get_regressor_names`()	Gets the regressor names stored in the object.
`infer_freq`()	Uses the pandas library to infer the frequency of the loaded dates.
`ingest_Xvars_df`(df[, date_col, drop_first, ...])	Ingests a dataframe of regressors and saves its Xvars to the object.
`ingest_grid`(grid)	Ingests a grid to tune the estimator.
`init_estimator`([dynamic_testing])	Initiates the estimator to be used for forecasting by creating an instance of the model's interpreted_model class and assigning it to self.call_estimator.
`keep_smaller_history`(n)	Cuts y observations in the object by counting back from the beginning.
`limit_grid_size`(n[, min_grid_size, random_seed])	Makes a grid smaller randomly.
`list_stored_ar_terms`()	Returns a list of all stored autoregressive (AR) terms.
`lookup_normalizer`([normalizer])	Returns the normalizing object (i.e. StandardScaler) with fit/transform methods.
`manual_forecast`([call_me, test_model, ...])	Manually forecasts with the hyperparameters, Xvars, and normalizer selection passed as keywords.
`n_actuals`()	Returns the number of actual observations in the object.
`normality_test`([train_only])	Runs D'Agostino and Pearson's test for normality ported from scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.normaltest.html.
`order_fcsts`([models, determine_best_by])	Gets estimated forecasts ordered from best-to-worst.
`parse_determine_best_by`(determine_best_by)	Returns the metric to determine the best model by based on the DetermineBestBy object created in set_metrics().
`parse_labeled_metrics`(labeled_metrics)	Parsses a dictionary of EvaluatedMetric objects and returns a dictionary of model nicknames and their corresponding scores ordered from best to worst based on the store attribute of the EvaluatedMetric objects.
`plot`([models, exclude, order_by, ci, ax, ...])	Plots all forecasts with the actuals, or just actuals if no forecasts have been evaluated or are selected.
`plot_acf`([diffy, train_only])	Plots an autocorrelation function of the y values.
`plot_fitted`([models, exclude, order_by, ax, ...])	Plots all fitted values with the actuals.
`plot_pacf`([diffy, train_only])	Plots a partial autocorrelation function of the y values.
`plot_periodogram`([diffy, train_only])	Plots a periodogram of the y values (comes from scipy.signal).
`plot_test_set`([models, exclude, order_by, ...])	Plots all test-set predictions with the actuals.
`pop`(*args)	Deletes evaluated forecasts from the object's memory.
`predict`(**predict_params)	Predicts with the model assigned to self.call_estimator.
`predict_fitted_vals`(**predict_params)	Returns the fitted values for the training data with the model assigned to self.call_estimator.
`reduce_Xvars`([method, estimator, ...])	Requires the optional shap library.
`restore_series_length`()	Restores the original y values and current dates in the object from before keep_smaller_history() or determine_best_series_length() were called.
`round`([decimals])	Rounds the values saved to Forecaster.y.
`save_feature_importance`([method, on_error, ...])	Requires shap.
`seasonal_decompose`([diffy, train_only])	Returns a signal/seasonal decomposition of the y values.
`set_cilevel`(n)	Sets the level for the resulting confidence intervals (95% default).
`set_estimator`(estimator)	Sets the estimator to forecast with.
`set_grids_file`([name])	Sets the name of the file where the object will look automatically for grids when calling tune(), cross_validate(), tune_test_forecast(), or similar function.
`set_last_future_date`(date)	Generates future dates in the same frequency as current_dates that ends on a specified date.
`set_metrics`(metrics[, keep_existing])	Set or change the evaluated metrics for all model testing and validation.
`set_test_length`([n])	Sets the length of the test set.
`set_validation_length`([n])	Sets the length of the validation set.
`set_validation_metric`(metric)	Sets the metric that will be used to tune all subsequent models.
`synthesize_models`([models, ...])	Creates a model that is an average of other models with confidence intervals determined by forming normal distributions around each point prediction.
`test`([dynamic_testing, call_me])	Tests the forecast estimator out-of-sample.
`transfer_cis`(transfer_from, model[, ...])	Transfers the confidence intervals from a model forecast in a passed Forecaster or MVForecaster object.
`transfer_predict`(transfer_from, model[, ...])	Makes predictions using an already-trained model over any given forecast horizon.
`tune`([dynamic_tuning, set_aside_test_set])	Tunes the specified estimator using an ingested grid (ingests a grid from Grids.py with same name as the estimator by default).
`tune_test_forecast`(models[, cross_validate, ...])	Iterates through a list of models, tunes them using grids in a grids file, forecasts them, and can save feature information.
`validate_regressor_names`()	Validates that all regressor names exist in both current_xregs and future_xregs.

STL(diffy: bool = False, train_only: bool = False, **kwargs: Any) → DecomposeResult

Returns a Season-Trend decomposition using LOESS of the y values.

Parameters:

diffy (bool) – Default False. Whether to difference the data before passing the values to the function. If False or 0, does not difference. If True or 1, differences 1 time.
train_only (bool) – Default False. If True, will exclude the test set from the test (a measure added to avoid leakage).
**kwargs – Passed to STL() function from statsmodels. See https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.STL.html.

Returns:

An object with seasonal, trend, and resid attributes.

Return type:

(DecomposeResult)

>>> import matplotlib.pyplot as plt
>>> f.STL(train_only=True).plot()
>>> plt.show()

add_AR_terms(N: tuple[int, int]) → Self

Adds seasonal auto-regressive terms.

Parameters:: N (tuple) – First element is the number of lags to add and the second element is the space between lags.
Returns:: Self

>>> f.add_AR_terms((2,12)) # adds 12th and 24th lags called 'AR12', 'AR24'

add_ar_terms(n: Annotated[int, 'must be > 0'] | list[Annotated[int, 'must be > 0']]) → Self

Adds auto-regressive terms.

Parameters:: n (int or list[int]) – If int, the number of lags to add to the object (1 to this number will be added by default). If list, will add the lags specified in the collection ([2,4] will add lags 2 and 4). To add only lag 10, pass [10]. To add 10 lags, pass 10.
Returns:: Self

>>> f.add_ar_terms(4) # adds four lags of y called 'AR1' - 'AR4' to predict with
>>> f.add_ar_terms([4]) # adds the fourth lag called 'AR4' to predict with

add_combo_regressors(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], sep: str = '_') → Self

Combines all passed variables by multiplying their values together.

Parameters:

*args (str) – Names of Xvars that aleady exist in the object.
sep (str) – Default ‘_’. The separator between each term in arg to create the final variable name.

Returns:

Self

>>> f.add_combo_regressors('t','monthsin') # multiplies these two together (called 't_monthsin')
>>> f.add_combo_regressors('t','monthcos') # multiplies these two together (called 't_monthcos')

Adds a dummy variable that is 1 during the time period that COVID19 effects are present for the series, 0 otherwise. The default dates are selected to be optimized for the time-span where the economy was most impacted by COVID.

Parameters:

called (str) – Default ‘COVID19’. What to call the resulting variable.
start (str, datetime.datetime, or pd.Timestamp) – Default datetime.datetime(2020,3,15). The start date (default is day Walt Disney World closed in the U.S.). Must be parsable by pandas’ Timestamp function.
end – (str, datetime.datetime, or pd.Timestamp): Default datetime.datetime(2021,5,13). The end date (default is day the U.S. CDC first dropped the mask mandate/recommendation for vaccinated people). Must be parsable by pandas’ Timestamp function.

Returns:

None

add_cycle(cycle_length: Annotated[int, 'must be > 0'], fourier_order: float = 2.0, called: str | None = None) → Self

Adds a regressor that acts as a seasonal cycle. Use this function to capture non-normal seasonality.

Parameters:

cycle_length (int) – How many time steps make one complete cycle.
fourier_order (float) – Default 2.0. The fourier order to apply. This number is the number of complete cycles in that given seasonal period. 2 captures the fundamental frequency and its first harmonic. Higher orders will capture more complex seasonality, but may lead to overfitting.
called (str) – Optional. What to call the resulting variable. Two variables will be created–one for a sin transformation and the other for cos resulting variable names will have “sin” or “cos” at the end. Example, called = ‘cycle5’ will become ‘cycle5sin’, ‘cycle5cos’. If left unspecified, ‘cycle{cycle_length}’ will be used as the name.

Returns:

Self

>>> f.add_cycle(13) # adds a seasonal effect that cycles every 13 observations called 'cycle13'

add_exp_terms(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], pwr: float, sep: str = '^', cutoff: Annotated[int, 'must be >= 0'] = 2, drop: bool = False) → Self

Raises all passed variables (no AR terms) to exponential powers (ints or floats).

Parameters:

*args (str) – Names of Xvars that aleady exist in the object.
pwr (float) – The power to raise each term to in args. Can use values like 0.5 to perform square roots, etc.
sep (str) – default ‘^’. The separator between each term in arg to create the final variable name.
cutoff (int) – default 2. The resulting variable name will be rounded to this number based on the passed pwr. For instance, if pwr = 0.33333333333 and ‘t’ is passed as an arg to *args, the resulting name will be t^0.33 by default.
drop (bool) – Default False. Whether to drop the regressors passed to *args.

Returns:

Self

>>> f.add_exp_terms('t',pwr=.5) # adds square root t called 't^0.5'

add_lagged_terms(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], lags: Annotated[int, 'must be > 0'] = 1, upto: bool = True, sep: str = '_', drop: bool = False) → Self

Lags all passed variables (no AR terms) 1 or more times.

Parameters:

*args (str) – Names of Xvars that aleady exist in the object.
lags (int) – Greater than 0, default 1. The number of times to lag each passed variable.
upto (bool) – Default True. Whether to add all lags up to the number passed to lags. If you pass 6 to lags and upto is True, lags 1, 2, 3, 4, 5, 6 will all be added. If you pass 6 to lags and upto is False, lag 6 only will be added.
sep (str) – Default ‘_’. The separator between each term in arg to create the final variable name. Resulting variable names will be like “tlag_1” or “tlag_2” by default.
drop (bool) – Default False. Whether to drop the regressors passed to *args.

Returns:

Self

>>> add_lagged_terms('t',lags=3) # adds first, second, and third lag of t called 'tlag_1' - 'tlag_3'
>>> add_lagged_terms('t',lags=6,upto=False) # adds 6th lag of t only called 'tlag_6'

add_logged_terms(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], base: float = 2.718281828459045, sep: str = '', drop: bool = False) → Self

Logs all passed variables (no AR terms).

Parameters:

*args (str) – Names of Xvars that aleady exist in the object.
base (float) – Default math.e (natural log). The log base. Must be math.e or int greater than 1.
sep (str) – Default ‘’. The separator between each term in arg to create the final variable name. Resulting variable names will be like “log2t” or “lnt” by default.
drop (bool) – Default False. Whether to drop the regressors passed to *args.

Returns:

Self

>>> f.add_logged_terms('t') # adds natural log t callend 'lnt'

add_normalizer(called: str, imported_normalizer: NormalizerLike) → Self

Add a normalizer to be available for forecasting.

Parameters:

called (str) – The name of the normalizer that can be referenced when looking up normalizers.
imported_normalizer (NormalizerLike) – The object that can be used for normalizing/scaling.

Returns:

Self

Adds a dummy variable that is 1 during the specified time period, 0 otherwise.

Parameters:

called (str) – What to call the resulting variable.
start (str, datetime.datetime, or pd.Timestamp) – Start date. Must be parsable by pandas’ Timestamp function.
end (str, datetime.datetime, or pd.Timestamp) – End date. Must be parsable by pandas’ Timestamp function.

Returns:

Self

>>> f.add_other_regressor('january_2021','2021-01-01','2021-01-31')

add_poly_terms(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], pwr: Annotated[int, 'must be >= 0'] = 2, sep: str = '^') → Self

raises all passed variables (no AR terms) to exponential powers (ints only).

Parameters:

*args (str) – Names of Xvars that aleady exist in the object
pwr (int) – Default 2. The max power to add to each term in args (2 to this number will be added).
sep (str) – default ‘^’. The separator between each term in arg to create the final variable name.

Returns:

Self

>>> f.add_poly_terms('t','year',pwr=3) # raises t and year to 2nd and 3rd powers (called 't^2', 't^3', 'year^2', 'year^3')

add_pt_terms(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], method: Literal['box-cox', 'yeo-johnson'] = 'box-cox', sep: str = '_', drop: bool = False) → Self

Applies a box-cox or yeo-johnson power transformation to all passed variables (no AR terms).

Parameters:

*args (str) – Names of Xvars that aleady exist in the object.
method (str) – One of {‘box-cox’,’yeo-johnson’}, default ‘box-cox’. The type of transformation. box-cox works for positive values only. yeo-johnson is like a box-cox but can be used with 0s or negatives. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PowerTransformer.html.
sep (str) – Default ‘’. The separator between each term in arg to create the final variable name. Resulting variable names will be like “box-cox_t” or “yeo-johnson_t” by default.
drop (bool) – Default False. Whether to drop the regressors passed to *args.

Returns:

Self

>>> f.add_pt_terms('t') # adds box cox of t called 'box-cox_t'

add_seasonal_regressors(*args: str, raw: bool = True, sincos: bool = False, dummy: bool = False, drop_first: bool = False, cycle_lens: dict[str, int] = None, fourier_order: float = 2.0) → Self

Adds seasonal regressors. Can be in the form of Fourier transformed, dummy, or integer values.

Parameters:

*args (str) – Values that return a series of int type from pandas.dt or pandas.dt.isocalendar(). See https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.year.html.
raw (bool) – Default True. Whether to use the raw integer values.
sincos (bool) – Default False. Whether to use a Fourier transformation of the raw integer values. The length of the cycle is derived from the max observed value unless cycle_lens is specified.
dummy (bool) – Default False. Whether to use dummy variables from the raw int values.
drop_first (bool) – Default False. Whether to drop the first observed dummy level. Not relevant when dummy = False.
cycle_lens (dict) – Optional. A dictionary that specifies a cycle length for each selected seasonality. Each key should match a value passed to *args. If this is not specified or a selected seasonality is not added to the dictionary as a key, the cycle length will be selected automatically as the maximum value observed for the given seasonality. Not relevant when sincos = False.
fourier_order (float) – Default 2.0. The fourier order to apply to terms that are added using sincos = True. This number is the number of complete cycles in that given seasonal period. 2 captures the fundamental frequency and its first harmonic. Higher orders will capture more complex seasonality, but may lead to overfitting.

Returns:

Self

>>> f.add_seasonal_regressors('year')
>>> f.add_seasonal_regressors(
>>>     'dayofyear',
>>>     'month',
>>>     'week',
>>>     'quarter',
>>>     raw=False,
>>>     sincos=True,
>>>     cycle_lens={'dayofyear':365.25},
>>> )
>>> f.add_seasonal_regressors('dayofweek',raw=False,dummy=True,drop_first=True)

Adds other series to the object as regressors. If the added series is less than the length of Forecaster.y + len(Forecaster.future_dates), it will padded with 0s by default.

Parameters:

series (list-like) – The series to add as a regressor to the object.
called (str) – Required. What to call the resulting regressor in the Forecaster object.
first_date (Datetime) – Optional. The first date that corresponds with the added series. If left unspecified, will assume its first date is the same as the first date in the Forecaster object. Must be datetime or otherwise able to be parsed by the pandas.Timestamp() function.
pad (bool) – Default True. Whether to put 0s before and/or after the series if the series is too short.

>>> x = [1,2,3,4,5,6]
>>> f.add_series(series = x,called='x') # assumes first date is same as what is in f.current_dates

add_signals(model_nicknames: Collection[Annotated[str, "must exist as a key in object's history attribute"]], fill_strategy: Literal['actuals', 'bfill'] | None = 'actuals', train_only: bool = False)

Adds the predictions from already-evaluated models as covariates that can be used for future evaluated models. The names of the added variables will all begin with “signal_” and end with the given model nickname.

Parameters:

model_nicknames (list) – The names of already-evaluated models with information stored in the history attribute.
fill_strategy (str or None) – The strategy to fill NA values that are present at the beginning of a given model’s fitted values. Available options are: ‘actuals’ (default) which will replace nulls with actuals; ‘bfill’ which will backfill null values; or None which will leave null values alone, which can cause errors in future evaluated models.
train_only (bool) – Default False. Whether to add fitted values from the training set only. The test-set predictions will be out-of-sample if this is True. The future unknown values are always out-of-sample. Even when this is True, the future unknown values are taken from a model trained on the full set of known observations.

>>> f.set_estimator('lstm')
>>> f.manual_forecast(call_me='lstm')
>>> f.add_signals(model_nicknames = ['lstm']) # adds a regressor called 'signal_lstm'

add_sklearn_estimator(imported_module: ScikitLike, called: str) → Self

Adds a new estimator from scikit-learn not built-in to the forecaster object that can be called using set_estimator(). Only regression models are accepted.

Parameters:

imported_module (scikit-learn regression model) – The model from scikit-learn to add. Must have already been imported locally. Supports models from sklearn and sklearn APIs.
called (str) – The name of the estimator that can be called using set_estimator().
mv (bool) – Whether the add is for Multivariate forecasting.

Returns:

Self

>>> from sklearn.ensemble import StackingRegressor
>>> f.add_sklearn_estimator(StackingRegressor,called='stacking')
>>> f.set_estimator('stacking')
>>> f.manual_forecast(...)

add_time_trend(called: str = 't') → Self

Adds a time trend from 1 to length of the series + the forecast horizon as a current and future Xvar.

Parameters:: Called (str) – Default ‘t’. What to call the resulting variable.
Returns:: Self

>>> f.add_time_trend() # adds time trend called 't'

adf_test(critical_pval: Annotated[float, 'must be > 0 and < 1'] = 0.05, full_res: bool = True, train_only: bool = False, diffy: bool = False, **kwargs: Any) → bool | tuple[float, float, int, int, dict, float]

Tests the stationarity of the y series using augmented dickey fuller. Ports from statsmodels: https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.adfuller.html.

Parameters:

critical_pval (float) – Default 0.05. The p-value threshold in the statistical test to accept the alternative hypothesis.
full_res (bool) – Default True. If True, returns a dictionary with the pvalue, evaluated statistic, and other statistical information (returns what the adfuller() function from statsmodels does). If False, returns a bool that matches whether the test indicates stationarity.
train_only (bool) – Default False. If True, will exclude the test set from the test (to avoid leakage).
diffy (bool or int) – One of {True,False,0,1}. Default False. Whether to difference the data before passing the values to the function. If False or 0, does not difference. If True or 1, differences 1 time.
**kwargs – Passed to the adfuller() function from statsmodels. See https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.adfuller.html.

Returns:

If bool (full_res = False), returns whether the test suggests stationarity. Otherwise, returns the full results (stat, pval, etc.) of the test.

Return type:

(bool or tuple)

>>> stat, pval, _, _, _, _ = f.adf_test(full_res=True)

all_feature_info_to_excel(out_path: PathLike = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/scalecast/checkouts/latest/docs'), excel_name: str = 'feature_info.xlsx')

Saves all feature importance to excel. Each model where such info is available for gets its own tab. Be sure to have called save_feature_importance() before using this function.

Parameters:

out_path (PathLike) – Default ‘./’. The path to export to.
excel_name (str) – Default ‘feature_info.xlsx’. The name of the resulting excel file.

Returns:

None

all_validation_grids_to_excel(out_path: PathLike = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/scalecast/checkouts/latest/docs'), excel_name: str = 'validation_grids.xlsx')

Saves all validation grids to excel. Each model where such info is available for gets its own tab. Be sure to have tuned at least model before calling this.

Parameters:

out_path (PathLike) – Default uses current working directory. The path to export to.
excel_name (str) – Default ‘feature_info.xlsx’. The name of the resulting excel file.

Returns:

None

auto_Xvar_select(estimator: Annotated[str, 'must map to a base model that follows the scikit-learn API'] = 'mlr', try_trend: bool = True, trend_estimator: Annotated[str, 'must map to a base model that follows the scikit-learn API'] = 'mlr', trend_estimator_kwargs: dict[str, Any] = {}, decomp_trend: bool = True, decomp_method: Literal['additive', 'multiplicative'] = 'additive', try_ln_trend: bool = True, max_trend_poly_order: Annotated[int, 'must be > 0'] = 2, try_seasonalities: bool = True, seasonality_repr: list[str] | dict[list[str]] = ['sincos'], exclude_seasonalities: list[str] = [], irr_cycles: list[Annotated[int, 'must be > 0']] | None = None, max_ar: Literal['auto'] | Annotated[int, 'must be >= 0'] = 'auto', test_already_added: bool = True, must_keep: list[str] = [], monitor: Annotated[str, "must exist as a value in object's determine_best_by attribute"] = 'ValidationMetricValue', cross_validate: bool = False, dynamic_tuning: bool = False, cvkwargs: dict[str, Any] = {}, **kwargs: Any)

Attempts to find the ideal trend, seasonality, and look-back representations for the stored series by systematically adding regressors to the object and monintoring a passed metric value. Searches for trend first, then seasonalities, then optimal lag order, then the best combination of all of the above, along with irregular cycles (if specified) and any regressors already added to the object. The function offers flexibility around setting Xvars it must add to the object by letting the user add these regressors before calling the function, telling the function not to re-search for them, and telling the function not to drop them when considering the optimal combination of regressors. The final optimal combination of regressors is determined by grouping all extracted regressors into trends, seasonalities, irregular cycles, ar terms, and regressors already added, and tying all combinations of all these groups. See the example: https://scalecast-examples.readthedocs.io/en/latest/misc/auto_Xvar/auto_Xvar.html.

Parameters:

estimator (str) – Default ‘mlr’. The estimator to use to determine the best seasonal and lag regressors.
try_trend (bool) – Default True. Whether to search for trend representations of the series.
trend_estimator (str) – One of Forecaster.sklearn_estimators. Default ‘mlr’. Ignored if try_trend is False. The estimator to use to determine the best trend representation.
trend_estimator_kwargs (dict) – Default {}. The model parameters to pass to the trend_estimator model.
decomp_trend (bool) – Default True. Whether to decompose the series to estimate the trend. Ignored if try_trend is False. The idea is there can be many seasonalities represented by scalecast, but only one trend, so using a decomposition method for trend could lead to finding a better trend representation.
decomp_method (str) – One of ‘additive’,’multiplicative’. Default ‘additive’. The decomp method used to represent the trend. Ignored if try_trend is False. Ignored if decomp_trend is False.
try_ln_trend (bool) – Default True. Ignored if try_trend is False. Whether to search logged trend representations using a natural log.
max_trend_poly_order (int) – Default 2. The highest order trend representation that will be searched.
try_seasonalities (bool) – Default True. Whether to search for seasonal representations. This function uses a hierachical approach from secondly –> quarterly representations. Secondly will search all seasonal representations up to quarterly to find the best hierarchy of seasonalities. Anything lower than second and higher than quarter will not receive a seasonality with this method. Day seasonality and lower will try, ‘day’ (of month), ‘dayofweek’, and ‘dayofyear’ seasonalities. Everything else will try cycles that reset yearly, so to search for intermitent seasonal fluctuations, use the irr_cycles argument.
seasonality_repr (list or dict[str,list]) – Default [‘sincos’]. How to represent the extracted seasonalties. the default will use fourier representations only. Ignored if try_seasonalities is False. Other elements to add to the list: ‘dummy’,’raw’,’drop_first’. Can add multiple or one of these. If dict, the key needs to be the seasonal representation (‘quarter’ for quarterly, ‘month’ for monthly) and the value a list. If a seasonal representation is not found in this dictionary, it will default to [‘sincos’], i.e. a fourier representation. ‘drop_first’ ignored when ‘dummy’ is not present.
exclude_seasonalities (list) – Default []. Ignored if try_seasonalities is False. Add in this list any seasonal representations to skip searching. If you have day frequency and you only want to search dayofweek, you should specify this as: [‘dayofweek’,’week’,’month’,’quarter’].
irr_cycles (list[int]) – Optional. Add any irregular cycle lengths to a list as integers to search for using this method.
max_ar ('auto' or int) – The highest lag order to search for. If ‘auto’, will use the greater of the forecast length or the test-set length as the lag order. If a larger number than available observations is placed here, the AR search will stop early. Set to 0 to skip searching for lag terms.
test_already_added (bool) – Default True. If there are already regressors added to the series, you can either always keep them in the object by setting this to False, or by default, it is possible they will be dropped when looking for the optimal combination of regressors in the object.
must_keep (list-like) – Default []. The names of any regressors that must be kept in the object. All regressors here must already be added to the Forecaster object before calling the function. This is ignored if test_already_added is False since it becomes redundant.
monitor (str) – One of Forecaster.determine_best_by. Default ‘ValidationMetricValue’. The metric to be monitored when making reduction decisions.
cross_validate (bool) – Default False. Whether to tune the model with cross validation. If False, uses the validation slice of data to tune. If not monitoring ValidationMetricValue, you will want to leave this False.
dynamic_tuning (bool or int) – Default False. Whether to dynamically tune the model or, if int, how many forecast steps to dynamically tune it.
cvkwargs (dict) – Default {}. Passed to the cross_validate() method.
**kwargs – Passed to manual_forecast() method and can include arguments related to a given model’s hyperparameters or dynamic_testing. Do not pass Xvars.

Returns:

A dictionary where each key is a tuple of variable combinations and the value is the derived metric (based on value passed to monitor argument).

Return type:

(dict[tuple[float]])

>>> f.add_covid19_regressor()
>>> f.auto_Xvar_select(cross_validate=True)

auto_forecast(call_me: str | None = None, test_model: bool = True, predict_fitted: bool = True, dynamic_testing: bool | Annotated[int, 'must be > 0'] = True) → list[float]

Auto forecasts with the best parameters indicated from the tuning process.

Parameters:

call_me (str) – Optional. What to call the model when storing it in the object’s history dictionary. If not specified, the model’s nickname will be assigned the estimator value (‘mlp’ will be ‘mlp’, etc.). Duplicated names will be overwritten with the most recently called model.
test_model (bool) – Default True. Whether to test the model before forecasting to a future horizon. If test_length is 0, this is ignored. Set this to False if you tested the model manually by calling f.test() and don’t want to waste resources testing it again.
dynamic_testing (bool or int) – Default True. Whether to dynamically/recursively test the forecast (meaning AR terms will be propagated with predicted values). If True, evaluates dynamically over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out. The model will skip testing if the test_length attribute is set to 0.
predict_fitted (bool) – Whether to predict fitted values.

Returns:

The final point estimates.

Return type:

(list[float])

>>> f.set_estimator('xgboost')
>>> f.tune()
>>> f.auto_forecast()

chop_from_back(n: Annotated[int, 'must be > 0']) → Self

Cuts y observations in the object from the back by counting forward from the beginning.

Parameters:: n (int) – The number of observations to cut from the back.
Returns:: Self

>>> f.chop_from_back(10) # chops 10 observations off the back

chop_from_front(n: Annotated[int, 'must be > 0'], fcst_length: Annotated[int, 'must be > 0'] | None = None) → Self

Cuts the amount of y observations in the object from the front counting backwards. The current length of the forecast horizon will be maintained and all future regressors will be rewritten to the appropriate attributes.

Parameters:

n (int) – The number of observations to cut from the front.
fcst_length (int) – Optional. The new length of the forecast length. By default, maintains the same forecast length currently in the object.

Returns:

Self

>>> f.chop_from_front(10) # keeps all observations before the last 10

copy()

Creates an object copy.

Returns:: A copy of the object.
Return type:: Self

cross_validate(k: Annotated[int, 'must be > 0'] = 5, test_length: int | None = None, train_length: int | None = None, space_between_sets: int | None = None, rolling: bool = False, dynamic_tuning: bool | Annotated[int, 'must be > 0'] = False, set_aside_test_set: bool = True, verbose: bool = False) → Self

Tunes a model’s hyperparameters using time-series cross validation. Monitors the metric specified in the valiation_metric attribute. Set an estimator before calling. Reads a grid for the estimator from a grids file unless a grid is ingested manually. The chosen parameters are stored in the best_params attribute. All metrics from each iteration are stored in grid_evaluated. The rows in this matrix correspond to the element index in f.grid (a hyperparameter combo) and the columns are the derived metrics across the k folds. Any hyperparameters that ever failed to evaluate will return N/A and are not considered. The best parameter combo is determined by the best average derived matrix across all folds. The temporal order of the series is always maintained in this process. If a test_length is specified in the object, it will be set aside by default. (Default) Normal cv diagram: https://scalecast-examples.readthedocs.io/en/latest/misc/validation/validation.html#5-Fold-Time-Series-Cross-Validation. (Default) Rolling cv diagram: https://scalecast-examples.readthedocs.io/en/latest/misc/validation/validation.html#5-Fold-Rolling-Time-Series-Cross-Validation.

Parameters:

k (int) – Default 5. The number of folds. If 1, behaves as if the model were being tuned on a single held out set.
test_length (int) – Optional. The size of each held-out sample. By default, determined such that the last test set and train set are the same size.
train_length (int) – Optional. The size of each training set. By default, all available observations before each test set are used.
space_between_sets (int) – Optional. The space between each training set. By default, uses the test_length.
rolling (bool) – Default False. Whether to use a rolling method, meaning every train and test size is the same. This is ignored when either of train_length or test_length is specified.
dynamic_tuning (bool or int) – Default False. Whether to dynamically/recursively test the forecast during the tuning process (meaning AR terms will be propagated with predicted values). If True, evaluates recursively over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step recurvie testing, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out.
set_aside_test_set (bool) – Default True. Whether to separate the test set specified in f.test_length during this process.
verbose (bool) – Default False. Whether to print out information about the test size, train size, and date ranges for each fold.

Returns:

Self

>>> f.set_estimator('xgboost')
>>> f.cross_validate() # tunes hyperparam values
>>> f.auto_forecast() # forecasts with the best params

determine_best_series_length(estimator: Annotated[str, "must exist in the name attribute of the object's estimators attribute"] = 'mlr', min_obs: Annotated[int, 'must be > 0'] = 100, max_obs: Annotated[int, 'must be > 0'] | None = None, step: Annotated[int, 'must be > 0'] = 25, monitor: Annotated[str, "must exist as a value in object's determine_best_by attribute"] = 'ValidationMetricValue', cross_validate: bool = False, dynamic_tuning: bool | Annotated[int, 'must be > 0'] = False, cvkwargs: dict[str, Any] = {}, chop: bool = True, **kwargs: Any) → dict[int, float]

Attempts to find the optimal length for the series to produce accurate forecasts by systematically shortening the series, running estimations, and monitoring a passed metric value. This should be run after Xvars have already been added to the object and all Xvars will be used in the iterative estimations.

Parameters:

estimator (str) – One of Forecaster.estimators. Default ‘mlr’. The estimator to use to determine the best series length.
min_obs (int) – Default 100. The shortest representation of the series to search.
max_obs (int) – Optional. The longest representation of the series to search. By default, the last estimation will be run on all available observations.
step (int) – Default 25. How big a step to take between iterations.
monitor (str) – One of Forecaster.determine_best_by. Default ‘ValidationSetMetric’. The metric to be monitored when making reduction decisions.
cross_validate (bool) – Default False. Whether to tune the model with cross validation. If False, uses the validation slice of data to tune. If not monitoring ValidationMetricValue, you will want to leave this False.
dynamic_tuning (bool or int) – Default False. Whether to dynamically tune the model or, if int, how many forecast steps to dynamically tune it.
cvkwargs (dict) – Default {}. Passed to the cross_validate() method.
chop (bool) – Default True. Whether to shorten the series if a shorter length is found to be best.
**kwargs – Passed to manual_forecast() method and can include arguments related to a given model’s hyperparameters, dynamic_testing, or Xvars.

Returns:

A dictionary where each key is a series length and the value is the derived metric (based on what was passed to the monitor argument).

Return type:

(dict[int,float])

>>> f.auto_Xvar_select()
>>> f.determine_best_series_length()

determine_if_MVForecaster()

Determines if the object is a Forecater of MVForecaster type by checking if the y attribute is a dictionary (MVForecaster) or a Series (Forecaster).

Returns:: True if the object is an MVForecaster, False if it is a Forecaster.
Return type:: bool

drop_Xvars(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], raise_error: bool = True) → Self

Drops regressors.

Parameters:

*args (str) – The names of regressors to drop.
raise_error (bool) – Whether to raise an error if regressors not found. Default raises. False ignores.

Returns:

Self

>>> f.add_time_trend()
>>> f.add_exp_terms('t',pwr=.5)
>>> f.drop_Xvars('t','t^0.5')

drop_all_Xvars() → Self

Drops all regressors.

Returns:: Self

drop_regressors(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], raise_error: bool = True)

Drops regressors.

Parameters:

*args (str) – The names of regressors to drop.
raise_error (bool) – Whether to raise an error if regressors not found. Default raises. False ignores.

Returns:

Self

>>> f.add_time_trend()
>>> f.add_exp_terms('t',pwr=.5)
>>> f.drop_regressors('t','t^0.5')

eval_cis(mode: bool = True, cilevel: Annotated[float, 'must be > 0 and < 1'] = 0.95) → Self

Call this function to change whether or not the Forecaster sets confidence intervals on all evaluated models. Beginning 0.17.0, only conformal confidence intervals are supported. Conformal intervals need a test set to be configured soundly. Confidence intervals cannot be evaluated when there aren’t at least 1/(1-cilevel) observations in the test set.

Parameters:

mode (bool) – Default True. Whether to set confidence intervals on or off for models.
cilevel (float) – Default .95. Must be greater than 0, less than 1. The confidence level to use to set intervals.

export(dfs: Literal['model_summaries', 'lvl_test_set_predictions', 'lvl_fcsts'] | list[Literal['model_summaries', 'lvl_test_set_predictions', 'lvl_fcsts']] = ['model_summaries', 'lvl_test_set_predictions', 'lvl_fcsts'], models: Annotated[str, "must exist as a key in object's history attribute"] | list[Annotated[str, "must exist as a key in object's history attribute"]] | Annotated[str, 'must begin with top_ followed by a positive integer'] | Literal['all'] = 'all', best_model: Literal['auto'] | Annotated[str, "must exist as a key in object's history attribute"] = 'auto', determine_best_by: Annotated[str, "must exist as a value in object's determine_best_by attribute"] | None = None, cis: bool = False, to_excel: bool = False, out_path: PathLike = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/scalecast/checkouts/latest/docs'), excel_name: str = 'results.xlsx') → DataFrame | dict[str, DataFrame]

Exports 1-all of 3 pandas DataFrames. Can write to excel with each DataFrame on a separate sheet. Will return either a dictionary with dataframes as values (df str arguments as keys) or a single dataframe if only one df is specified.

Parameters:

dfs (list-like or str) – Default [‘model_summaries’, ‘lvl_test_set_predictions’, ‘lvl_fcsts’]. A list or name of the specific dataframe(s) you want returned and/or written to excel. Must be one of or multiple of the elements in default. Exporting test set predictions only works if all exported models were tested using the same test length.
models (list-like or str) – Default ‘all’. The models to write information for. Can start with “top_” and the metric specified in determine_best_by will be used to order the models appropriately.
best_model (str) – Default ‘auto’. The name of the best model, if “auto”, will determine this by the metric in determine_best_by. If not “auto”, must match a model nickname of an already-evaluated model.
determine_best_by (str) – One of Forecaster.determine_best_by or None. Default ‘TestSetRMSE’. If None and best_model is ‘auto’, the best model will be designated as the first-evaluated model.
to_excel (bool) – Default False. Whether to save to excel.
out_path (PathLike) – Default ‘./’. The path to save the excel file to (ignored when to_excel=False).
cis (bool) – Default False. Whether to export confidence intervals for models in “lvl_test_set_predictions”, “lvl_fcsts” dataframes.
excel_name (str) – Default ‘results.xlsx’. The name to call the excel file (ignored when to_excel=False).

Returns:

either a single pandas dataframe if one element passed to dfs or a dictionary where the keys match what was passed to dfs and the values are dataframes.

Return type:

(DataFrame or Dict[str,DataFrame])

>>> results = f.export(dfs=['model_summaries','lvl_fcsts'],to_excel=True) # returns a dict
>>> model_summaries = results['model_summaries'] # returns a dataframe
>>> lvl_fcsts = results['lvl_fcsts'] # returns a dataframe
>>> ts_preds = f.export('lvl_test_set_predictions') # returns a dataframe

export_Xvars_df(dropna=False)

Gets all utilized regressors and values.

Parameters:: dropna (bool) – Default False. Whether to drop null values from the resulting dataframe
Returns:: A dataframe of Xvars and names/values stored in the object.
Return type:: (DataFrame)

export_feature_importance(model: Annotated[str, "must exist as a key in object's history attribute"]) → DataFrame

Exports the feature importance from a model. Raises an error if you never saved the model’s feature importance.

Parameters:: model (str) – The name of them model to export for. Matches what was passed to call_me when evaluating the model.
Returns:: The resulting feature importances of the evaluated model passed to model.
Return type:: (DataFrame)

>>> fi = f.export_feature_importance('mlr')

export_fitted_vals(model: Annotated[str, "must exist as a key in object's history attribute"]) → DataFrame

Exports a single dataframe with dates, fitted values, actuals, and residuals for one model.

Parameters:: model (str) – The model nickname.
Returns:: A dataframe with dates, fitted values, actuals, and residuals.
Return type:: (DataFrame)

export_validation_grid(model: Annotated[str, "must exist as a key in object's history attribute"]) → DataFrame

Exports the validation grid from a model, converted to a pandas dataframe. Raises an error if the model was not tuned.

Parameters:: model (str) – The name of them model to export for. Matches what was passed to call_me when evaluating the model.
Returns:: The resulting validation grid of the evaluated model passed to model arg.
Return type:: (DataFrame)

fit(**fit_params: Any) → Self

Fits the model assigned to self.call_estimator. Called in auto_forecast()/manual_forecast() after init_estimator() creates the instance to fit.

Parameters:: **fit_params – Any parameters to pass to the fit method of the model instance assigned to self.call_estimator. This can include parameters such as sample_weight, eval_set, early_stopping_rounds, etc. depending on the model being fit.
Returns:: Self

generate_future_dates(n: Annotated[int, 'must be > 0']) → Self

Generates a certain amount of future dates in same frequency as current_dates.

Parameters:: n (int) – Greater than 0. Number of future dates to produce. This will also be the forecast length.
Returns:: None

>>> f.generate_future_dates(12) # 12 future dates to forecast out to

get_freq()

Gets the pandas inferred date frequency.

Returns:: The inferred frequency of the current_dates array.
Return type:: (str)

>>> f.get_freq()

get_max_lag_order()

Returns the highest lag order variable stored in the object. Returns 0 if none were found.

Returns:: The max order found.
Return type:: int

get_regressor_names()

Gets the regressor names stored in the object.

Returns:: Regressor names that have been added to the object.
Return type:: (list)

>>> f.add_time_trend()
>>> f.get_regressor_names()

infer_freq() → Self

Uses the pandas library to infer the frequency of the loaded dates.

Returns:: Self

ingest_Xvars_df(df: DataFrame, date_col: str = 'Date', drop_first: bool = False, use_future_dates: bool = False, pad: bool = False) → Self

Ingests a dataframe of regressors and saves its Xvars to the object. The user must specify a date column name in the dataframe being ingested. All non-numeric values are dummied. The dataframe should cover the entire future horizon stored within the Forecaster object, but can be padded with 0s if testing only is desired. Any columns in the dataframe that begin with “AR” will be confused with autoregressive terms and could cause errors.

Parameters:

df (DataFrame) – The dataframe that is at least the length of the y array stored in the object plus the forecast horizon.
date_col (str) – Default ‘Date’. The name of the date column in the dataframe. This column must have the same frequency as the dates stored in the Forecaster object.
drop_first (bool) – Default False. Whether to drop the first observation of any dummied variables. Irrelevant if passing all numeric values.
use_future_dates (bool) – Default False. Whether to use the future dates in the dataframe as the resulting future_dates attribute in the Forecaster object.
pad (bool) – Default False. Whether to pad any missing values with 0s.

Returns:

Self

ingest_grid(grid: str | dict[str, Any]) → Self

Ingests a grid to tune the estimator.

Parameters:: grid (dict or str) – If dict, must be a user-created grid. If str, must match the name of a dict grid stored in a grids file.
Returns:: Self

>>> f.set_estimator('mlr')
>>> f.ingest_grid({'normalizer':['scale','minmax']})

init_estimator(dynamic_testing: bool | Annotated[int, 'must be > 0'] | None = None, **kwargs: Any) → Self

Initiates the estimator to be used for forecasting by creating an instance of the model’s interpreted_model class and assigning it to self.call_estimator. This is called in auto_forecast()/manual_forecast() and can be called separately if you want to fit the model manually by calling f.fit() and f.predict().

Parameters:

dynamic_testing (bool or int) – Optional. Whether to dynamically/recursively test the forecast (meaning AR terms will be propagated with predicted values). If True, evaluates dynamically over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out. The model will skip testing if the test_length attribute is set to 0.
**kwargs – Passed to the relevant model class and can include such parameters as Xvars, normalizer, cap, and floor, in addition to any given model’s specific hyperparameters.

Returns:

Self

Cuts y observations in the object by counting back from the beginning.

Parameters:: n (int, str, or datetime.datetime) – If int, the number of observations to keep. Otherwise, the last observation to keep. Must be parsable by pandas’ Timestamp function.
Returns:: Self

>>> f.keep_smaller_history(500) # keeps last 500 observations
>>> f.keep_smaller_history('2020-01-01') # keeps only observations on or later than 1/1/2020

limit_grid_size(n: Annotated[int, 'must be > 0'] | Annotated[float, 'must be > 0 and < 1'], min_grid_size: Annotated[int, 'must be > 0'] = 1, random_seed: int | None = None) → Self

Makes a grid smaller randomly.

Parameters:

n (int or float) – If int, randomly selects that many parameter combinations. If float, must be less than 1 and greater 0, randomly selects that percentage of parameter combinations.
min_grid_size (int) – Default 1. The min number of hyperparameters to keep from the original grid if a float is passed to n.
random_seed (int) – Optional. Set a seed to make results consistent.

Returns:

Self

>>> from scalecast import GridGenerator
>>> GridGenerator.get_example_grids()
>>> f.set_estimator('mlp')
>>> f.ingest_grid('mlp')
>>> f.limit_grid_size(10,random_seed=20) # limits grid to 10 iterations
>>> f.limit_grid_size(.5,random_seed=20) # limits grid to half its original size

list_stored_ar_terms()

Returns a list of all stored autoregressive (AR) terms.

Returns:: All stored AR terms.
Return type:: list

lookup_normalizer(normalizer: Annotated[str, "must exist as a key in the object's normalizer attribute"] = None) → NormalizerLike

Returns the normalizing object (i.e. StandardScaler) with fit/transform methods.

Parameters:: normalizer (str) – Optional. The name of the normalizer in the object’s normalizer attribute. Default returns a function that does nothing.
Returns:: An object with the fit/transform methods.
Return type:: NormalizerLike

manual_forecast(call_me: str | None = None, test_model: bool = True, dynamic_testing: bool | Annotated[int, 'must be > 0'] = True, bank_history: bool = True, predict_fitted: bool = True, **kwargs: Any) → list[float]

Manually forecasts with the hyperparameters, Xvars, and normalizer selection passed as keywords. See https://scalecast.readthedocs.io/en/latest/Forecaster/_forecast.html.

Parameters:

call_me (str) – Optional. What to call the model when storing it in the object’s history. If not specified, the model’s nickname will be assigned the estimator value (‘mlp’ will be ‘mlp’, etc.). Duplicated names will be overwritten with the most recently called model.
test_model (bool) – Default True. Whether to test the model before forecasting to a future horizon. If test_length is 0, this is ignored. Set this to False if you tested the model manually by calling f.test() and don’t want to waste resources testing it again.
dynamic_testing (bool or int) – Default True. Whether to dynamically/recursively test the forecast (meaning AR terms will be propagated with predicted values). If True, evaluates dynamically over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out. The model will skip testing if the test_length attribute is set to 0.
predict_fitted (bool) – Whether to predict fitted values.
**kwargs – passed to the _forecast_{estimator}() method and can include such parameters as Xvars, normalizer, cap, and floor, in addition to any given model’s specific hyperparameters. See https://scalecast.readthedocs.io/en/latest/Forecaster/_forecast.html.

Returns:

The forecasted predictions.

Return type:

List[float]

>>> f.set_estimator('lasso')
>>> f.manual_forecast(alpha=.5)

n_actuals(): Returns the number of actual observations in the object.

normality_test(train_only=False)

Runs D’Agostino and Pearson’s test for normality ported from scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.normaltest.html. Holds the null hypothesis that the series is normally distributed.

Parameters:: train_only (bool) – Default False. If True, will exclude the test set from the test (to avoid leakage).
Returns:: The derived statistic and pvalue.
Return type:: (float, float)

order_fcsts(models: Annotated[str, "must exist as a key in object's history attribute"] | None = None, determine_best_by: DetermineBestBy = 'TestSetRMSE') → list[str]

Gets estimated forecasts ordered from best-to-worst.

Parameters:

models (list-like) – Optional. A list of models to consider in the order. Default considers all evaluated models. If not ‘all’, each element must match an evaluated model’s nickname. ‘all’ will only consider models that have a non-null determine_best_by value in history.
determine_best_by (str) – Default ‘TestSetRMSE’. One of Forecaster.determine_best_by.

Returns:

The ordered models.

Return type:

(list)

>>> models = ('mlr','mlp','lightgbm')
>>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True)
>>> ordered_models = f.order_fcsts(models,"TestSetRMSE")

parse_determine_best_by(determine_best_by: DetermineBestBy) → MetricStore

Returns the metric to determine the best model by based on the DetermineBestBy object created in set_metrics().

Parameters:: determine_best_by (DetermineBestBy) – The DetermineBestBy object created in set_metrics().
Returns:: The metric to determine the best model by based on the DetermineBestBy object created in set_metrics().
Return type:: MetricStore

parse_labeled_metrics(labeled_metrics: dict[str, EvaluatedMetric]) → dict[str, float]

Parsses a dictionary of EvaluatedMetric objects and returns a dictionary of model nicknames and their corresponding scores ordered from best to worst based on the store attribute of the EvaluatedMetric objects. If the metric is one where lower is better, the dictionary is ordered in ascending order. If the metric is one where higher is better, the dictionary is ordered in descending order.

Parameters:: labeled_metrics (dict) – A dictionary where the keys are model nicknames and the values are EvaluatedMetric objects.
Returns:: A dictionary where the keys are model nicknames and the values are the corresponding scores ordered from best to worst.
Return type:: dict

plot(models: Annotated[str, "must exist as a key in object's history attribute"] | list[Annotated[str, "must exist as a key in object's history attribute"]] | Annotated[str, 'must begin with top_ followed by a positive integer'] | Literal['all'] = 'all', exclude: list[Annotated[str, "must exist as a key in object's history attribute"]] = [], order_by: Annotated[str, "must exist as a value in object's determine_best_by attribute"] | None = None, ci: bool = False, ax: Axes | None = None, figsize: tuple[int, int] = (12, 6), colors: list[str] | None = ['#FFA500', '#DC143C', '#00FF7F', '#808000', '#BC8F8F', '#A9A9A9', '#8B008B', '#FF1493', '#FFDAB9', '#20B2AA', '#7FFFD4', '#A52A2A', '#BDB76B', '#DEB887']) → Axes

Plots all forecasts with the actuals, or just actuals if no forecasts have been evaluated or are selected.

Parameters:

models (list-like, str, or None) – Default ‘all’. The forecasted models to plot. Can start with “top_” and the metric specified in order_by will be used to order the models appropriately. If None or models/order_by combo invalid, will plot only actual values.
exclude (collection) – Default []. Pass any models here that you don’t want displayed. Good to use in conjunction with models = ‘top_{n}’.
order_by (str) – Optional. One of Forecaster.determine_best_by. How to order the display of forecasts on the plots (from best-to-worst according to the selected metric). Default doesn’t order.
ci (bool) – Default False. Whether to display the confidence intervals.
ax (Axis) – Optional. The existing axis to write the resulting figure to.
figsize (tuple) – Default (12,6). The size of the resulting figure. Ignored when ax is not None.
colors (list[str]) – Optional. The colors to use when making the plot.

Returns:

The figure’s axis.

Return type:

(Axis)

>>> models = ('mlr','mlp','lightgbm')
>>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True)
>>> f.plot(order_by='TestSetRMSE') # plots all forecasts
>>> plt.show()

plot_acf(diffy: bool = False, train_only: bool = False, **kwargs: Any) → Figure

Plots an autocorrelation function of the y values.

Parameters:

diffy (bool or int) – One of {True,False,0,1}. default False. Whether to difference the data before passing the values to the function. If False or 0, does not difference. If True or 1, differences 1 time.
train_only (bool) – Default False. If True, will exclude the test set from the test (a measure added to avoid leakage).
**kwargs – Passed to plot_acf() function from statsmodels.

Returns:

If ax is None, the created figure. Otherwise the figure to which ax is connected.

Return type:

(Figure)

>>> import matplotlib.pyplot as plt
>>> f.plot_acf(train_only=True)
>>> plt.plot()

plot_fitted(models: Annotated[str, "must exist as a key in object's history attribute"] | list[Annotated[str, "must exist as a key in object's history attribute"]] | Annotated[str, 'must begin with top_ followed by a positive integer'] | Literal['all'] = 'all', exclude: list[Annotated[str, "must exist as a key in object's history attribute"]] = [], order_by: Annotated[str, "must exist as a value in object's determine_best_by attribute"] | None = None, ax: Axes | None = None, figsize: tuple[int, int] = (12, 6), colors: list[str] | None = ['#FFA500', '#DC143C', '#00FF7F', '#808000', '#BC8F8F', '#A9A9A9', '#8B008B', '#FF1493', '#FFDAB9', '#20B2AA', '#7FFFD4', '#A52A2A', '#BDB76B', '#DEB887'])

Plots all fitted values with the actuals. Does not support level fitted values (for now).

Parameters:

models (list-like,str) – Default ‘all’. The forecated models to plot. Can start with “top_” and the metric specified in order_by will be used to order the models appropriately.
exclude (collection) – Default []. Pass any models here that you don’t want displayed. Good to use in conjunction with models = ‘top_{n}’.
order_by (str) – Optional. One of Forecaster.determine_best_by. How to order the display of forecasts on the plots (from best-to-worst according to the selected metric).
ax (Axis) – Optional. The existing axis to write the resulting figure to.
figsize (tuple) – Default (12,6). Size of the resulting figure. Ignored when ax is not None.
colors (list[str]) – Optional. The colors to use when making the plot.

Returns:

The figure’s axis.

Return type:

(Axis)

>>> models = ('mlr','mlp','lightgbm')
>>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True)
>>> f.plot_fitted(order_by='TestSetRMSE') # plots all fitted values
>>> plt.show()

plot_pacf(diffy: bool = False, train_only: bool = False, **kwargs: Any) → Figure

Plots a partial autocorrelation function of the y values.

Parameters:

diffy (bool or int) – One of {True,False,0,1}. Default False. Whether to difference the data before passing the values to the function. If False or 0, does not difference. If True or 1, differences 1 time.
train_only (bool) – Default False. If True, will exclude the test set from the test (a measure added to avoid leakage).
**kwargs – Passed to plot_pacf() function from statsmodels.

Returns:

If ax is None, the created figure. Otherwise the figure to which ax is connected.

Return type:

(Figure)

>>> import matplotlib.pyplot as plt
>>> f.plot_pacf(train_only=True)
>>> plt.plot()

plot_periodogram(diffy: bool = False, train_only: bool = False) → tuple[ndarray, ndarray]

Plots a periodogram of the y values (comes from scipy.signal).

Parameters:

diffy (bool or int) – One of {True,False,0,1}. Default False. Whether to difference the data before passing the values to the function. If False or 0, does not difference. If True or 1, differences 1 time.
train_only (bool) – Default False. If True, will exclude the test set from the test (a measure added to avoid leakage).

Returns:

Element 1: Array of sample frequencies. Element 2: Power spectral density or power spectrum of x.

Return type:

(ndarray,ndarray)

>>> import matplotlib.pyplot as plt
>>> a, b = f.plot_periodogram(diffy=True,train_only=True)
>>> plt.semilogy(a, b)
>>> plt.show()

plot_test_set(models: Annotated[str, "must exist as a key in object's history attribute"] | list[Annotated[str, "must exist as a key in object's history attribute"]] | Annotated[str, 'must begin with top_ followed by a positive integer'] | Literal['all'] = 'all', exclude: list[Annotated[str, "must exist as a key in object's history attribute"]] = [], order_by: Annotated[str, "must exist as a value in object's determine_best_by attribute"] | None = None, include_train: bool | Annotated[int, 'must be >= 0'] = True, ci: bool = False, ax: Axes | None = None, figsize: tuple[int, int] = (12, 6), colors: list[str] | None = ['#FFA500', '#DC143C', '#00FF7F', '#808000', '#BC8F8F', '#A9A9A9', '#8B008B', '#FF1493', '#FFDAB9', '#20B2AA', '#7FFFD4', '#A52A2A', '#BDB76B', '#DEB887'])

Plots all test-set predictions with the actuals.

Parameters:

models (list-like or str) – Default ‘all’. The forecated models to plot. Can start with “top_” and the metric specified in order_by will be used to order the models appropriately.
exclude (collection) – Default []. Pass any models here that you don’t want displayed. Good to use in conjunction with models = ‘top_{n}’.
order_by (str) – Optional. One of Forecaster.determine_best_by. How to order the display of forecasts on the plots (from best-to-worst according to the selected metric).
include_train (bool or int) – Default True. Use to zoom into testing results. If True, plots the test results with the entire history in y. If False, matches y history to test results and only plots this. If int, plots that length of y to match to test results.
ci (bool) – Default False. Whether to display the confidence intervals. Default is 100 boostrapped samples and a 95% confidence interval.
ax (Axis) – Optional. The existing axis to write the resulting figure to.
figsize (tuple) – Default (12,6). Size of the resulting figure. Ignored when ax is not None.
colors (list[str]) – Optional. The colors to use when making the plot.

Returns:

The figure’s axis.

Return type:

(Axis)

>>> models = ('mlr','mlp','lightgbm')
>>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True)
>>> f.plot(order_by='TestSetRMSE') # plots all test-set results
>>> plt.show()

pop(*args: Annotated[str, "must exist as a key in object's history attribute"]) → Self

Deletes evaluated forecasts from the object’s memory.

Parameters:: *args (str) – Names of models matching what was passed to call_me when model was evaluated.
Returns:: Self

>>> models = ('mlr','mlp','lightgbm')
>>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True)
>>> f.pop('mlr')

predict(**predict_params: Any) → list[float]

Predicts with the model assigned to self.call_estimator. Called in auto_forecast()/manual_forecast() after fit() fits the model.

Parameters:: **predict_params – Any parameters to pass to the predict method of the model instance assigned to self.call_estimator. This can include parameters such as num_iteration for xgboost, etc. depending on the model being fit.
Returns:: The forecasted values.
Return type:: list[float]

predict_fitted_vals(**predict_params: Any)

Returns the fitted values for the training data with the model assigned to self.call_estimator.

Parameters:: **predict_params – Any parameters to pass to the predict method of the model instance assigned to self.call_estimator. This can include parameters such as num_iteration for xgboost, etc. depending on the model being fit.
Returns:: The fitted values for the training data.
Return type:: list[float]

reduce_Xvars(method: FIMethod = 'PermutationExplainer', estimator: SKLearnModel = 'lasso', keep_at_least: PositiveInt = 1, keep_this_many: PositiveInt | Literal['auto', 'sqrt'] = 'auto', grid_search: bool = True, use_loaded_grid: bool = False, dynamic_tuning: DynamicTesting = False, monitor: DetermineBestBy = 'ValidationMetricValue', overwrite: bool = True, cross_validate: bool = False, masker: 'shap.maskers.Masker' | None = None, cvkwargs: dict[str, Any] = {}, **kwargs: Any) → Self

Requires the optional shap library. Reduces the regressor variables stored in the object. Any feature importance type available with f.save_feature_importance() can be used to rank features in this process. Features are reduced one-at-a-time, according to which one ranked the lowest. After each variable reduction, the model is re-run and feature importance re-evaluated. By default, the validation-set error is used to avoid leakage and the variable set that most reduced the error is selected. The pfi_error_values attr is one greater in length than pfi_dropped_vars attr because The first error is the initial error before any variables were dropped. The following attributes: pfi_dropped_vars and pfi_error_values, which are lists representing the error change with the corresponding dropped variable, are created and stored in the Forecaster object. See the example: https://scalecast-examples.readthedocs.io/en/latest/misc/feature-selection/feature_selection.html.

Parameters:

method (Literal) – See scalecast.types.FIMethod. The method for scoring features.
estimator (str) – One of Forecaster.sklearn_estimators. Default ‘lasso’. The estimator to use to determine the best set of variables.
keep_at_least (str or int) – Default 1. The fewest number of Xvars to keep.. ‘sqrt’ keeps at least the sqare root of the number of Xvars rounded down. This exists so that the keep_this_many keyword can use ‘auto’ as an argument.
keep_this_many (str or int) – Default ‘auto’. The number of Xvars to keep if method == ‘pfi’ or ‘shap’. “auto” keeps the number of xvars that returned the best error using the metric passed to monitor, but it is the most computationally expensive. “sqrt” keeps the square root of the total number of observations rounded down.
gird_search (bool) – Default True. Whether to run a grid search for optimal hyperparams on the validation set. If use_loaded_grid is False, uses a grids file currently available in the working directory or creates a new grids file called Grids.py with default values if none available to determine the grid to use. The grid search is only run once and then those hyperparameters are used for all subsequent pfi runs when method == ‘pfi’. In any utilized grid, do not include ‘Xvars’ as a key. If you want to access the chosen hyperparams after the fact, they are stored in the reduction_hyperparams attribute.
use_loaded_grid (bool) – Default False. Whether to use the currently loaded grid in the object instead of using a grid from a file. In any utilized grid, do not include ‘Xvars’ as a key.
dynamic_tuning (bool or int) – Default False. Whether to dynamically tune the model or, if int, how many forecast steps to dynamically tune it.
monitor (str) – One of Forecaster.determine_best_by. Default ‘ValidationSetMetric’. The metric to be monitored when making reduction decisions.
overwrite (bool) – Default True. If False, the list of selected Xvars are stored in an attribute called reduced_Xvars. If True, this list of regressors overwrites the current Xvars in the object.
cross_validate (bool) – Default False. Whether to tune the model with cross validation. If False, uses the validation slice of data to tune. If not monitoring ValidationMetricValue, you will want to leave this False.
masker (shap.maskers) – Optional. Pass your own masker to this function if desired. Default will use shap.maskers.Independent with default arguments.
cvkwargs (dict) – Default {}. Passed to the cross_validate() method.
**kwargs – Passed to the manual_forecast() method and can include arguments related to a given model’s hyperparameters or dynamic_testing. Do not pass hyperparameters if grid_search is True. Do not pass Xvars.

Returns:

Self

>>> f.add_ar_terms(24)
>>> f.add_seasonal_regressors('month',raw=False,sincos=True,dummy=True)
>>> f.add_seasonal_regressors('year')
>>> f.add_time_trend()
>>> f.set_validation_length(12)
>>> f.reduce_Xvars(overwrite=False) # reduce with lasso (but don't overwrite Xvars)
>>> print(f.reduced_Xvars) # view results
>>> f.reduce_Xvars(
>>>     method='TreeExplainer',
>>>     estimator='gbt',
>>>     keep_at_least=10,
>>>     keep_this_many='auto',
>>>     dynamic_testing=False,
>>>     dynamic_tuning=True,
>>>     cross_validate=True,
>>>     cvkwargs={'rolling':True},
>>> ) # reduce with gradient boosted tree estimator and overwrite with result
>>> print(f.reduced_Xvars) # view results

restore_series_length(): Restores the original y values and current dates in the object from before keep_smaller_history() or determine_best_series_length() were called. If those methods were never called, this function does nothing. Restoring a series’ length automatically drops all stored regressors in the object.

round(decimals: int = 0) → Self

Rounds the values saved to Forecaster.y.

Parameters:: decimals (int) – The number of digits to round off to. Passed to np.round(decimals).
Returns:: Self

save_feature_importance(method: Literal['shap'] = 'shap', on_error: Literal['warn', 'ignore', 'raise'] = 'warn', try_order: Sequence[FIMethod] = ['PermutationExplainer', 'TreeExplainer', 'LinearExplainer', 'KernelExplainer', 'SamplingExplainer'], masker: 'shap.maskers.Masker' | None = None, verbose: bool = False)

Requires shap. Saves feature info for models that offer it (sklearn models). Call after evaluating the model you want it for and before changing the estimator. This method saves a dataframe listing the feature as the index and its score. This dataframe can be recalled using the export_feature_importance() method. The importance scores are determined as the average shap score applied to each feature in each observation.

Parameters:

method (str) – Default ‘shap’. As of scalecast 0.19.4, shap is the only method available, as pfi is deprecated.
on_error (str) – One of {‘warn’,’raise’,’ignore’}. Default ‘warn’. If the last model called doesn’t support feature importance, ‘warn’ will log a warning. ‘raise’ will raise an error.
try_order (list) – The order of explainers to try. If one fails, will try setting with the next one. This should be able to set feature importance on any sklearn model. What each Explainer does can be found in the shap documentation: https://shap-lrjball.readthedocs.io/en/latest/index.html
masker (shap.maskers) – Optional. Pass your own masker if desired and you are using the PermutationExplainer or LinearExplainer. Default will use shap.maskers.Independent masker with default arguments.
verbose (bool) – Default True. Whether to print out information about which explainers were tried/chosen. The chosen explainer is saved in Forecaster.history[estimator][‘feature_importance_explainer’].

>>> f.set_estimator('xgboost')
>>> f.manual_forecast()
>>> f.save_feature_importance()
>>> fi = f.export_feature_importance('xgboost') # returns a dataframe

seasonal_decompose(diffy: bool = False, train_only: bool = False, **kwargs: Any) → DecomposeResult

Returns a signal/seasonal decomposition of the y values.

Parameters:

diffy (bool) – Default False. Whether to difference the data before passing the values to the function. If False or 0, does not difference. If True or 1, differences 1 time.
train_only (bool) – Default False. If True, will exclude the test set from the test (a measure added to avoid leakage).
**kwargs – Passed to seasonal_decompose() function from statsmodels. See https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.seasonal_decompose.html.

Returns:

An object with seasonal, trend, and resid attributes.

Return type:

(DecomposeResult)

>>> import matplotlib.pyplot as plt
>>> f.seasonal_decompose(train_only=True).plot()
>>> plt.show()

set_cilevel(n: Annotated[float, 'must be > 0 and < 1']) → Self

Sets the level for the resulting confidence intervals (95% default).

Parameters:: n (float) – Greater than 0 and less than 1.
Returns:: Self

>>> f.set_cilevel(.80) # next forecast will get 80% confidence intervals

set_estimator(estimator: Annotated[str, "must exist in the name attribute of the object's estimators attribute"]) → Self

Sets the estimator to forecast with.

Parameters:: estimator (str) – One of Forecaster.estimators.
Returns:: Self

>>> f.set_estimator('lasso')
>>> f.manual_forecast(alpha = .5)

set_grids_file(name: str = 'Grids') → Self

Sets the name of the file where the object will look automatically for grids when calling tune(), cross_validate(), tune_test_forecast(), or similar function. If the grids file does not exist in the working directory, the error will only be raised once tuning is called.

Parameters:: name (str) – Default ‘Grids’. The name of the file to look for. This file must exist in the working directory. The default will look for a file called “Grids.py”.

>>> f.set_grids_file('ModGrids') # expects to find a file called ModGrids.py in working directory.

set_last_future_date(date: date | datetime | Timestamp | datetime64 | str) → Self

Generates future dates in the same frequency as current_dates that ends on a specified date.

Parameters:: date (datetime-like) – The date to end on. Must be parsable by pandas’ Timestamp() function.
Returns:: Self

>>> f.set_last_future_date('2021-06-01') # creates future dates up to this one in the expected frequency

set_metrics(metrics: list[MetricStore | Annotated[str, 'must be the name of a static method in the scalecast.Metrics class that only accepts two arguments (a and f)']], keep_existing: bool = False) → Self

Set or change the evaluated metrics for all model testing and validation.

Parameters:

metrics (list[MetricStore|str]) – The metrics to evaluate when validating and testing models. If str, each element must exist as a name in scalecast.Metrics.Metrics and can only accept two arguments: a and f. Otherwise use the MetricStore class from scalecast.Classes to specify a custom metric. For each metric and model that is tested, the test-set and in-sample metrics will be evaluated and can be exported. Level test-set and in-sample metrics are also currently available, but will be removed in a future version.
keep_existing (bool) – Default False. Whether to keep evaluating all existing metrics already in the object.

Returns:

Self

set_test_length(n: Annotated[int, 'must be >= 0'] | Annotated[float, 'must be > 0 and < 1'] = 1) → Self

Sets the length of the test set. As of version 0.16.0, 0-length test sets are supported.

Parameters:: n (int or float) – Default 1. The length of the resulting test set. Pass 0 to skip testing models. Fractional splits are supported by passing a float less than 1 and greater than 0.
Returns:: Self

>>> f.set_test_length(12) # test set of 12
>>> f.set_test_length(.2) # 20% test split

set_validation_length(n: Annotated[int, 'must be > 0'] = 1) → Self

Sets the length of the validation set. This will never matter for models that are not tuned.

Parameters:: n (int) – Default 1. The length of the resulting validation set.
Returns:: Self

>>> f.set_validation_length(6) # validation length of 6

set_validation_metric(metric: str) → Self

Sets the metric that will be used to tune all subsequent models.

Parameters:: metric (str) – One of the names in Forecaster.metrics. The metric to optimize the models with using the validation set. Although model testing will evaluate all metrics in Forecaster.metrics, model optimization with tuning and cross validation only uses one of these.
Returns:: Self

>>> f.set_validation_metric('mae')

synthesize_models(models: Annotated[str, "must exist as a key in object's history attribute"] | list[Annotated[str, "must exist as a key in object's history attribute"]] | Annotated[str, 'must begin with top_ followed by a positive integer'] | Literal['all'] = 'all', determine_best_by: Annotated[str, "must exist as a value in object's determine_best_by attribute"] | None = None, call_me: str = 'synthesized', cilevel: Annotated[float, 'must be > 0 and < 1'] = 0.95, verbose: bool = False) → Self

Creates a model that is an average of other models with confidence intervals determined by forming normal distributions around each point prediction.

Parameters:

models (list-like or str) – Default ‘all’. Which models to consider. Can start with top (‘top_5’).
determine_best_by (str) – Optional. Combine with call_me = ‘top_{n}’. One of Forecaster.determine_best_by.
call_me (str) – The name of the resulting model. Default ‘synthesized’.
cilevel (float) – The confidence level for the resulting confidence interval. Default .95.
verbose (bool) – Whether to print successful completion of the function. Default False.

test(dynamic_testing: bool | Annotated[int, 'must be > 0'] = True, call_me: str | None = None, **kwargs: Any) → Self

Tests the forecast estimator out-of-sample. Uses the test_length attribute to determine on how-many observations. All test-set splits maintain temporal order.

Parameters:

dynamic_testing (bool or int) – Default True. Whether to dynamically/recursively test the forecast (meaning AR terms will be propagated with predicted values). If True, evaluates dynamically over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out. This will fail if the test_length attribute is 0.
call_me (str) – Optional. What to call the model when storing it in the object’s history. If not specified, the model’s nickname will be assigned the estimator value (‘mlp’ will be ‘mlp’, etc.). Duplicated names will be overwritten with the most recently called model.
**kwargs – passed to the _forecast_{estimator}() method and can include such parameters as Xvars, normalizer, cap, and floor, in addition to any given model’s specific hyperparameters. See https://scalecast.readthedocs.io/en/latest/Forecaster/_forecast.html.

>>> f.set_estimator('lasso')
>>> f.test(alpha=.5)

transfer_cis(transfer_from: _Forecaster_parent, model: str, transfer_to_model: str = None, transfer_test_set_cis: bool | None = None) → Self

Transfers the confidence intervals from a model forecast in a passed Forecaster or MVForecaster object.

Parameters:

transfer_from (Forecaster or MVForecaster) – The object that contains the model from which intervals should be transferred.
model (str) – The model nickname of the already-evaluated model stored in transfer_from.
transfer_to_model (str) – Optional. The nickname of the model to which the intervals should be transferred. If not specified, inherits the name passed to model.
transfer_test_set_cis (bool) – Optional. Whether to pass intervals for test-set predictions. If left unspecified, the decision is made based on whether the inheriting object has test-set predictions evaluated.

Returns:

Self

>>> f.manual_forecast(call_me='mlr')
>>> f_new.transfer_predict(transfer_from=f,model='mlr')
>>> f_new.transfer_cis(transfer_from=f,model='mlr')

transfer_predict(transfer_from: Forecaster_parent, model: str, return_series: bool = False, save_to_history: bool = True, call_me: str | None = None, regr=None) → Self | list[float]

Makes predictions using an already-trained model over any given forecast horizon. Will use the already-trained model from a passed Forecaster object to create a new model in the Forecaster or ‘MVForecaster` object from which the method is called. Or the option is available to not save a new model but return the predictions in a pandas Series object. Confidence intervals cannot be transferred from this method but can be from the transfer_cis() method.

Parameters:

transfer_from (Forecaster) – The Forecaster object that contains the already-fitted model.
model (str) – The model nickname of the already-evaluated model stored in the Forecaster object passed to transfer_from.
return_series (bool) – Default False. Whether to return a pandas Series with the date as an index of the values. If the dates argument is not specified, this will include all dates in the Forecaster instance that the method is called from.
save_to_history (bool) – Default True. Whether to save the transferred predictions as if they were a model being run using a _forecast() method.
call_me (str) – Optional. What to call the resulting model. If save_to_history is False, this is ignored. If not specified, inherits the name passed to model.
regr – Optional. The model to make predictions with. If not supplied, the model will be searched for in the Forecaster passed to transfer_from.

Returns:

The date-indexed series if return_series is True. Otherwise returns self.

Return type:

(Pandas Series or Self)

>>> f.manual_forecast(call_me='mlr')
>>> f_new.transfer_predict(transfer_from=f,model='mlr')

tune(dynamic_tuning: bool | Annotated[int, 'must be > 0'] = False, set_aside_test_set: bool = True) → Self

Tunes the specified estimator using an ingested grid (ingests a grid from Grids.py with same name as the estimator by default). This is akin to cross-validation with one fold and a test_length equal to f.validation_length. Any parameters that can be passed as arguments to manual_forecast() can be tuned with this process. The chosen parameters are stored in the best_params attribute. The evaluated validation grid can be exported to a dataframe using f.export_validation_grid().

Parameters:

dynamic_tuning (bool or int) – Default False. Whether to dynamically/recursively test the forecast during the tuning process (meaning AR terms will be propagated with predicted values). If True, evaluates recursively over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step recurvie testing, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out.
set_aside_test_set (bool) – Default True. Whether to separate the test set specified in f.test_length during this process.

Returns:

Self

>>> f.set_estimator('xgboost')
>>> f.tune()
>>> f.auto_forecast()

tune_test_forecast(models: Annotated[str, "must exist in the name attribute of the object's estimators attribute"], cross_validate: bool = False, dynamic_tuning: bool = False, dynamic_testing: bool = True, feature_importance: bool = False, fi_try_order: Sequence[Literal['PermutationExplainer', 'TreeExplainer', 'LinearExplainer', 'KernelExplainer', 'SamplingExplainer']] | None = None, limit_grid_size: Annotated[int, 'must be > 0'] | Annotated[float, 'must be > 0 and < 1'] | None = None, min_grid_size: Annotated[int, 'must be > 0'] = 1, suffix: str | None = None, error: Literal['ignore', 'raise', 'warn'] = 'raise', **cvkwargs: dict[str, Any]) → Self

Iterates through a list of models, tunes them using grids in a grids file, forecasts them, and can save feature information.

Parameters:

models (list-like) – Each element must be a name in Forecaster.estimators.
cross_validate (bool) – Default False. Whether to tune the model with cross validation. If False, uses the validation slice of data to tune.
dynamic_tuning (bool or int) – Default False. whether to dynamically tune the forecast (meaning AR terms will be propagated with predicted values). if True, evaluates dynamically over the entire out-of-sample slice of data. if int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform out x amount of periods.
dynamic_testing (bool or int) – Default True. whether to dynamically test the forecast (meaning AR terms will be propagated with predicted values). if True, evaluates dynamically over the entire out-of-sample slice of data. if int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform out x amount of periods.
feature_importance (bool) – Default False. Whether to save feature importance information for the models that offer it.
fi_try_order (list) – Optional. If the feature_importance argument is True, what feature importance methods to try? If using a combination of tree-based and linear models, for example, it might be good to pass [‘TreeExplainer’,’LinearExplainer’]. The default will use whatever is specifiec by default in Forecaster.save_feature_importance(), which usually ends up being the PermutationExplainer.
limit_grid_size (int or float) – Optional. Pass an argument here to limit each of the grids being read. See https://scalecast.readthedocs.io/en/latest/Forecaster/Forecaster.html#src.scalecast.Forecaster.Forecaster.limit_grid_size.
min_grid_size (int) – Default 1. The smallest grid size to keep. Ignored if limit_grid_size is None.
suffix (str) – Optional. A suffix to add to each model as it is evaluated to differentiate them when called later. If unspecified, each model can be called by its estimator name.
error (str) – One of ‘ignore’,’raise’,’warn’; default ‘raise’. What to do with the error if a given model fails. ‘warn’ prints a warning that the model could not be evaluated.
**cvkwargs – Passed to the cross_validate() method.

Returns:

Self

>>> models = ('mlr','mlp','lightgbm')
>>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True)

validate_regressor_names(): Validates that all regressor names exist in both current_xregs and future_xregs. Raises an error if this is not the case.