MVForecaster
This object can be used to extend the univariate/exogenous regressor approach from the Forecaster class to make forecasts with multiple series that are all predicted forward dynamically using each other’s lags, seasonality, and any other exogenous regressors. This object is initiated by combining several Forecaster objects together. This approach can utilize any sklearn regressor model to make forecasts. All models can be dynamically tuned and tested.
from scalecast.Forecaster import Forecaster
from scalecast.MVForecaster import MVForecaster
from scalecast.SeriesTransformer import SeriesTransformer
import pandas_datareader as pdr # pip install pandas-datareader
data = pd.read_csv('data.csv') # df with 3 cols - Date, Series1, Series2
f1 = Forecaster(
y = data['Series1'],
current_dates = data['Date'],
future_dates = 24,
)
f2 = Forecaster(
y = data['Series2'],
current_dates = data['Date'],
future_dates = 24,
)
# before feeding to the MVForecaster object, you may want to add seasonal and other regressors
# you can add to one Forecaster object and in the MVForecaster object, it will be added to forecast both series
# initiate the MVForecaster object
mvf = MVForecaster(
f1,
f2,
# add more Forecaster objects here
# defaults below
not_same_len_action='trim',
merge_Xvars='union',
merge_future_dates='longest',
test_length = 0,
cis = False,
metrics = ['rmse','mape','mae','r2'],
# specify names if you want them
names=['My First Series', 'My Second Series'],
)
- class scalecast.MVForecaster.MVForecaster(*fs: Forecaster, names: str | None = None, not_same_len_action: Literal['trim', 'fail'] = 'trim', merge_Xvars: Literal['u', 'union', 'intersection', 'i'] = 'union', merge_future_dates: Literal['longest', 'shortest'] = 'longest', test_length: NonNegativeInt = 0, validation_length: NonNegativeInt = 1, metrics: list[str] | None = None, optimize_on: Literal['mean', 'min', 'max'] | callable | SeriesName = 'mean', cis: bool = False, carry_fit_models: bool = False)
MVForecaster is a class for forecasting multiple series at once with the same models and hyperparameters.
- Parameters:
*fs (Forecaster) – Forecaster objects
names (list-like) – Optional. An array with the same number of elements as *fs that can be used to map to each series. Ex. if names == [‘UTUR’,’UNRATE’], the user must now refer to the series with the selected names. If specific names are not supplied, refer to the series with y1, y2, etc. The order the series are supplied will be maintained.
not_same_len_action (str) – One of ‘trim’, ‘fail’. default ‘trim’. What to do with series that are different lengths. ‘trim’ will trim each series so that all dates line up.
merge_Xvars (str) – One of ‘union’, ‘u’, ‘intersection’, ‘i’. default ‘union’. How to combine Xvars in each object. ‘union’ or ‘u’ combines all regressors from each object. ‘intersection’ or ‘i’ combines only regressors that all objects have in common.
merge_future_dates (str) – One of ‘longest’, ‘shortest’. Default ‘longest’. Which future dates to use in the various series. This can be changed later.
test_length (int or float) – Default 0. The test length that all models will use to test all models out of sample. If float, must be between 0 and 1 and will be treated as a fractional split. By default, models will not be tested.
validation_length (int) – The size of the validation set. Default sets to 1.
metrics (list[str]) – Optional. List of metrics to evaluate every model.
optimize_on (str) – The way to aggregate the derived metrics when optimizing models across all series. This can be a function: ‘mean’, ‘min’, ‘max’, a custom function that takes a list of objects and returns an aggregate function (such as a weighted average) or a series name. Custom functions and weighted averages can also be added later by calling mvf.set_optimize_on().
cis (bool) – Default False. Whether to evaluate probabilistic confidence intervals for every model evaluated. If setting to True, ensure you also set a test_length of at least 20 observations for 95% confidence intervals. See eval_cis() and set_cilevel() methods and docstrings for more information.
carry_fit_models (bool) – Default False. Whether to store the regression model for each fitted model in history. Setting this to False can save memory.
- __init__(*fs: Forecaster, names: str | None = None, not_same_len_action: Literal['trim', 'fail'] = 'trim', merge_Xvars: Literal['u', 'union', 'intersection', 'i'] = 'union', merge_future_dates: Literal['longest', 'shortest'] = 'longest', test_length: NonNegativeInt = 0, validation_length: NonNegativeInt = 1, metrics: list[str] | None = None, optimize_on: Literal['mean', 'min', 'max'] | callable | SeriesName = 'mean', cis: bool = False, carry_fit_models: bool = False)
Methods:
add_combo_regressors(*args[, sep])Combines all passed variables by multiplying their values together.
add_covid19_regressor([called, start, end])Adds a dummy variable that is 1 during the time period that COVID19 effects are present for the series, 0 otherwise.
add_cycle(cycle_length[, fourier_order, called])Adds a regressor that acts as a seasonal cycle.
add_exp_terms(*args, pwr[, sep, cutoff, drop])Raises all passed variables (no AR terms) to exponential powers (ints or floats).
add_lagged_terms(*args[, lags, upto, sep, drop])Lags all passed variables (no AR terms) 1 or more times.
add_logged_terms(*args[, base, sep, drop])Logs all passed variables (no AR terms).
add_normalizer(called, imported_normalizer)Add a normalizer to be available for forecasting.
add_optimizer_func(func[, called])Add an optimizer function that can be used to determine the best-performing model.
add_other_regressor(called, start, end)Adds a dummy variable that is 1 during the specified time period, 0 otherwise.
add_poly_terms(*args[, pwr, sep])raises all passed variables (no AR terms) to exponential powers (ints only).
add_pt_terms(*args[, method, sep, drop])Applies a box-cox or yeo-johnson power transformation to all passed variables (no AR terms).
add_seasonal_regressors(*args[, raw, ...])Adds seasonal regressors.
add_series(series, called[, first_date, pad])Adds other series to the object as regressors.
add_signals(model_nicknames[, series, ...])Adds the predictions from already-evaluated models as covariates that can be used for future evaluated models.
add_sklearn_estimator(imported_module, called)Adds a new estimator from scikit-learn not built-in to the forecaster object that can be called using set_estimator().
add_time_trend([called])Adds a time trend from 1 to length of the series + the forecast horizon as a current and future Xvar.
auto_forecast([call_me, test_model, ...])Auto forecasts with the best parameters indicated from the tuning process.
chop_from_front(n[, fcst_length])Cuts the amount of y observations in the object from the front counting backwards.
copy()Creates an object copy.
corr([train_only, disp, df])Displays pearson correlation between all stored series in object.
corr_lags([y, x, lags])Displays pearson correlation between one series and another series' lags.
cross_validate([k, test_length, ...])Tunes a model's hyperparameters using time-series cross validation.
Determines if the object is a Forecater of MVForecaster type by checking if the y attribute is a dictionary (MVForecaster) or a Series (Forecaster).
drop_Xvars(*args[, raise_error])Drops regressors.
Drops all regressors.
drop_regressors(*args[, raise_error])Drops regressors.
eval_cis([mode, cilevel])Call this function to change whether or not the Forecaster sets confidence intervals on all evaluated models.
export([dfs, models, series, cis, to_excel, ...])Exports 1-all of 3 pandas dataframes.
export_fitted_vals([series, models])Exports a dataframe of fitted values and actuals.
export_validation_grid(model)Exports the validation grid from a model, converted to a pandas dataframe.
fit(**fit_params)Fits the model assigned to self.call_estimator.
Generates a certain amount of future dates in same frequency as current_dates.
Returns the highest lag order variable stored in the object.
ingest_Xvars_df(df[, date_col, drop_first, ...])Ingests a dataframe of regressors and saves its Xvars to the object.
ingest_grid(grid)Ingests a grid to tune the estimator.
init_estimator([dynamic_testing])Initiates the estimator to be used for forecasting by creating an instance of the model's interpreted_model class and assigning it to self.call_estimator.
Cuts y observations in the object by counting back from the beginning.
limit_grid_size(n[, min_grid_size, random_seed])Makes a grid smaller randomly.
Returns a list of all stored autoregressive (AR) terms.
lookup_normalizer([normalizer])Returns the normalizing object (i.e. StandardScaler) with fit/transform methods.
manual_forecast([call_me, test_model, ...])Manually forecasts with the hyperparameters, Xvars, and normalizer selection passed as keywords.
Returns the number of actual observations in the object.
order_fcsts([models, determine_best_by])Gets estimated forecasts ordered from best-to-worst.
parse_determine_best_by(determine_best_by)Returns the metric to determine the best model by based on the DetermineBestBy object created in set_metrics().
parse_labeled_metrics(labeled_metrics)Parsses a dictionary of EvaluatedMetric objects and returns a dictionary of model nicknames and their corresponding scores ordered from best to worst based on the store attribute of the EvaluatedMetric objects.
plot([models, series, put_best_on_top, ci, ...])Plots all forecasts with the actuals, or just actuals if no forecasts have been evaluated or are selected.
plot_fitted([models, series, ax, figsize, ...])Plots fitted values with the actuals.
plot_test_set([models, series, ...])Plots all test-set predictions with the actuals.
pop(*args)Deletes evaluated forecasts from the object's memory.
predict(**predict_params)Predicts with the model assigned to self.call_estimator.
predict_fitted_vals(**predict_params)Returns the fitted values for the training data with the model assigned to self.call_estimator.
set_best_model([model, determine_best_by])Sets the best model to be referenced as "best".
set_cilevel(n)Sets the level for the resulting confidence intervals (95% default).
set_estimator(estimator)Sets the estimator to forecast with.
set_grids_file([name])Sets the name of the file where the object will look automatically for grids when calling tune(), cross_validate(), tune_test_forecast(), or similar function.
set_last_future_date(date)Generates future dates in the same frequency as current_dates that ends on a specified date.
set_metrics(metrics[, keep_existing])Set or change the evaluated metrics for all model testing and validation.
set_optimize_on(how)Choose how to determine best models by choosing which series should be optimized or the aggregate function to apply on the derived metrics across all series.
set_test_length([n])Sets the length of the test set.
Sets the length of the validation set.
set_validation_metric(metric)Sets the metric that will be used to tune all subsequent models.
test([dynamic_testing, call_me])Tests the forecast estimator out-of-sample.
transfer_cis(transfer_from, model[, ...])Transfers the confidence intervals from a model forecast in a passed Forecaster or MVForecaster object.
transfer_predict(transfer_from, model[, ...])Makes predictions using an already-trained model over any given forecast horizon.
tune([dynamic_tuning, set_aside_test_set])Tunes the specified estimator using an ingested grid (ingests a grid from Grids.py with same name as the estimator by default).
tune_test_forecast(models[, cross_validate, ...])Iterates through a list of models, tunes them using grids in a grids file, and forecasts them.
- add_combo_regressors(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], sep: str = '_') Self
Combines all passed variables by multiplying their values together.
- Parameters:
*args (str) – Names of Xvars that aleady exist in the object.
sep (str) – Default ‘_’. The separator between each term in arg to create the final variable name.
- Returns:
Self
>>> f.add_combo_regressors('t','monthsin') # multiplies these two together (called 't_monthsin') >>> f.add_combo_regressors('t','monthcos') # multiplies these two together (called 't_monthcos')
- add_covid19_regressor(called: str = 'COVID19', start: date | datetime | Timestamp | datetime64 | str = datetime.datetime(2020, 3, 15, 0, 0), end: date | datetime | Timestamp | datetime64 | str = datetime.datetime(2021, 5, 13, 0, 0)) Self
Adds a dummy variable that is 1 during the time period that COVID19 effects are present for the series, 0 otherwise. The default dates are selected to be optimized for the time-span where the economy was most impacted by COVID.
- Parameters:
called (str) – Default ‘COVID19’. What to call the resulting variable.
start (str, datetime.datetime, or pd.Timestamp) – Default datetime.datetime(2020,3,15). The start date (default is day Walt Disney World closed in the U.S.). Must be parsable by pandas’ Timestamp function.
end – (str, datetime.datetime, or pd.Timestamp): Default datetime.datetime(2021,5,13). The end date (default is day the U.S. CDC first dropped the mask mandate/recommendation for vaccinated people). Must be parsable by pandas’ Timestamp function.
- Returns:
None
- add_cycle(cycle_length: Annotated[int, 'must be > 0'], fourier_order: float = 2.0, called: str | None = None) Self
Adds a regressor that acts as a seasonal cycle. Use this function to capture non-normal seasonality.
- Parameters:
cycle_length (int) – How many time steps make one complete cycle.
fourier_order (float) – Default 2.0. The fourier order to apply. This number is the number of complete cycles in that given seasonal period. 2 captures the fundamental frequency and its first harmonic. Higher orders will capture more complex seasonality, but may lead to overfitting.
called (str) – Optional. What to call the resulting variable. Two variables will be created–one for a sin transformation and the other for cos resulting variable names will have “sin” or “cos” at the end. Example, called = ‘cycle5’ will become ‘cycle5sin’, ‘cycle5cos’. If left unspecified, ‘cycle{cycle_length}’ will be used as the name.
- Returns:
Self
>>> f.add_cycle(13) # adds a seasonal effect that cycles every 13 observations called 'cycle13'
- add_exp_terms(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], pwr: float, sep: str = '^', cutoff: Annotated[int, 'must be >= 0'] = 2, drop: bool = False) Self
Raises all passed variables (no AR terms) to exponential powers (ints or floats).
- Parameters:
*args (str) – Names of Xvars that aleady exist in the object.
pwr (float) – The power to raise each term to in args. Can use values like 0.5 to perform square roots, etc.
sep (str) – default ‘^’. The separator between each term in arg to create the final variable name.
cutoff (int) – default 2. The resulting variable name will be rounded to this number based on the passed pwr. For instance, if pwr = 0.33333333333 and ‘t’ is passed as an arg to *args, the resulting name will be t^0.33 by default.
drop (bool) – Default False. Whether to drop the regressors passed to *args.
- Returns:
Self
>>> f.add_exp_terms('t',pwr=.5) # adds square root t called 't^0.5'
- add_lagged_terms(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], lags: Annotated[int, 'must be > 0'] = 1, upto: bool = True, sep: str = '_', drop: bool = False) Self
Lags all passed variables (no AR terms) 1 or more times.
- Parameters:
*args (str) – Names of Xvars that aleady exist in the object.
lags (int) – Greater than 0, default 1. The number of times to lag each passed variable.
upto (bool) – Default True. Whether to add all lags up to the number passed to lags. If you pass 6 to lags and upto is True, lags 1, 2, 3, 4, 5, 6 will all be added. If you pass 6 to lags and upto is False, lag 6 only will be added.
sep (str) – Default ‘_’. The separator between each term in arg to create the final variable name. Resulting variable names will be like “tlag_1” or “tlag_2” by default.
drop (bool) – Default False. Whether to drop the regressors passed to *args.
- Returns:
Self
>>> add_lagged_terms('t',lags=3) # adds first, second, and third lag of t called 'tlag_1' - 'tlag_3' >>> add_lagged_terms('t',lags=6,upto=False) # adds 6th lag of t only called 'tlag_6'
- add_logged_terms(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], base: float = 2.718281828459045, sep: str = '', drop: bool = False) Self
Logs all passed variables (no AR terms).
- Parameters:
*args (str) – Names of Xvars that aleady exist in the object.
base (float) – Default math.e (natural log). The log base. Must be math.e or int greater than 1.
sep (str) – Default ‘’. The separator between each term in arg to create the final variable name. Resulting variable names will be like “log2t” or “lnt” by default.
drop (bool) – Default False. Whether to drop the regressors passed to *args.
- Returns:
Self
>>> f.add_logged_terms('t') # adds natural log t callend 'lnt'
- add_normalizer(called: str, imported_normalizer: NormalizerLike) Self
Add a normalizer to be available for forecasting.
- Parameters:
called (str) – The name of the normalizer that can be referenced when looking up normalizers.
imported_normalizer (NormalizerLike) – The object that can be used for normalizing/scaling.
- Returns:
Self
- add_optimizer_func(func: callable, called: str | None = None) Self
Add an optimizer function that can be used to determine the best-performing model. This is in addition to the ‘mean’, ‘min’, and ‘max’ functions that are available by default.
- Parameters:
func (Function) – The function to add.
called (str) – Optional. How to refer to the function when calling optimize_on(). If left unspecified, will use the name of the function.
- Returns:
Self
>>> def weighted(x): >>> # weighted average of first two series in the object >>> return x[0]*.25 + x[1]*.75 >>> mvf.add_optimizer_func(weighted) >>> mvf.set_optimize_on('weighted') # optimize on that function >>> mvf.set_estimator('mlr') >>> mvf.tune() # best model now chosen based on the weighted average function you added; series2 gets 3x the weight of series 1
- add_other_regressor(called: str, start: date | datetime | Timestamp | datetime64 | str, end: date | datetime | Timestamp | datetime64 | str) Self
Adds a dummy variable that is 1 during the specified time period, 0 otherwise.
- Parameters:
called (str) – What to call the resulting variable.
start (str, datetime.datetime, or pd.Timestamp) – Start date. Must be parsable by pandas’ Timestamp function.
end (str, datetime.datetime, or pd.Timestamp) – End date. Must be parsable by pandas’ Timestamp function.
- Returns:
Self
>>> f.add_other_regressor('january_2021','2021-01-01','2021-01-31')
- add_poly_terms(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], pwr: Annotated[int, 'must be >= 0'] = 2, sep: str = '^') Self
raises all passed variables (no AR terms) to exponential powers (ints only).
- Parameters:
*args (str) – Names of Xvars that aleady exist in the object
pwr (int) – Default 2. The max power to add to each term in args (2 to this number will be added).
sep (str) – default ‘^’. The separator between each term in arg to create the final variable name.
- Returns:
Self
>>> f.add_poly_terms('t','year',pwr=3) # raises t and year to 2nd and 3rd powers (called 't^2', 't^3', 'year^2', 'year^3')
- add_pt_terms(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], method: Literal['box-cox', 'yeo-johnson'] = 'box-cox', sep: str = '_', drop: bool = False) Self
Applies a box-cox or yeo-johnson power transformation to all passed variables (no AR terms).
- Parameters:
*args (str) – Names of Xvars that aleady exist in the object.
method (str) – One of {‘box-cox’,’yeo-johnson’}, default ‘box-cox’. The type of transformation. box-cox works for positive values only. yeo-johnson is like a box-cox but can be used with 0s or negatives. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PowerTransformer.html.
sep (str) – Default ‘’. The separator between each term in arg to create the final variable name. Resulting variable names will be like “box-cox_t” or “yeo-johnson_t” by default.
drop (bool) – Default False. Whether to drop the regressors passed to *args.
- Returns:
Self
>>> f.add_pt_terms('t') # adds box cox of t called 'box-cox_t'
- add_seasonal_regressors(*args: str, raw: bool = True, sincos: bool = False, dummy: bool = False, drop_first: bool = False, cycle_lens: dict[str, int] = None, fourier_order: float = 2.0) Self
Adds seasonal regressors. Can be in the form of Fourier transformed, dummy, or integer values.
- Parameters:
*args (str) – Values that return a series of int type from pandas.dt or pandas.dt.isocalendar(). See https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.year.html.
raw (bool) – Default True. Whether to use the raw integer values.
sincos (bool) – Default False. Whether to use a Fourier transformation of the raw integer values. The length of the cycle is derived from the max observed value unless cycle_lens is specified.
dummy (bool) – Default False. Whether to use dummy variables from the raw int values.
drop_first (bool) – Default False. Whether to drop the first observed dummy level. Not relevant when dummy = False.
cycle_lens (dict) – Optional. A dictionary that specifies a cycle length for each selected seasonality. Each key should match a value passed to *args. If this is not specified or a selected seasonality is not added to the dictionary as a key, the cycle length will be selected automatically as the maximum value observed for the given seasonality. Not relevant when sincos = False.
fourier_order (float) – Default 2.0. The fourier order to apply to terms that are added using sincos = True. This number is the number of complete cycles in that given seasonal period. 2 captures the fundamental frequency and its first harmonic. Higher orders will capture more complex seasonality, but may lead to overfitting.
- Returns:
Self
>>> f.add_seasonal_regressors('year') >>> f.add_seasonal_regressors( >>> 'dayofyear', >>> 'month', >>> 'week', >>> 'quarter', >>> raw=False, >>> sincos=True, >>> cycle_lens={'dayofyear':365.25}, >>> ) >>> f.add_seasonal_regressors('dayofweek',raw=False,dummy=True,drop_first=True)
- add_series(series: Sequence[float | int], called: str, first_date: date | datetime | Timestamp | datetime64 | str | None = None, pad: bool = True) Self
Adds other series to the object as regressors. If the added series is less than the length of Forecaster.y + len(Forecaster.future_dates), it will padded with 0s by default.
- Parameters:
series (list-like) – The series to add as a regressor to the object.
called (str) – Required. What to call the resulting regressor in the Forecaster object.
first_date (Datetime) – Optional. The first date that corresponds with the added series. If left unspecified, will assume its first date is the same as the first date in the Forecaster object. Must be datetime or otherwise able to be parsed by the pandas.Timestamp() function.
pad (bool) – Default True. Whether to put 0s before and/or after the series if the series is too short.
>>> x = [1,2,3,4,5,6] >>> f.add_series(series = x,called='x') # assumes first date is same as what is in f.current_dates
- add_signals(model_nicknames: list[Annotated[str, "must exist as a key in object's history attribute"]], series: Annotated[str, "must exist in object's names attribute"] | Literal['all'] = 'all', fill_strategy: Literal['actuals', 'bfill'] | None = 'actuals', train_only: bool = False) Self
Adds the predictions from already-evaluated models as covariates that can be used for future evaluated models. The names of the added variables will all begin with “signal_” and end with the given model nickname folowed by the series name.
- Parameters:
model_nicknames (list) – The names of already-evaluated models with information stored in the history attribute.
fill_strategy (str or None) – The strategy to fill NA values that are present at the beginning of a given model’s fitted values. Available options are: ‘actuals’ (default) which will replace nulls with actuals; ‘bfill’ which will backfill null values; or None which will leave null values alone, which can cause errors in future evaluated models.
train_only (bool) – Default False. Whether to add fitted values from the training set only. The test-set predictions will be out-of-sample if this is True. The future unknown values are always out-of-sample. Even when this is True, the future unknown values are taken from a model trained on the full set of known observations.
- Returns:
Self
>>> mvf.set_estimator('xgboost') >>> mvf.manual_forecast() >>> mvf.add_signals(model_nicknames = ['xgboost']) # adds regressors called 'signal_xgboost_{series1name}', ..., 'signal_xgboost_{seriesNname}'
- add_sklearn_estimator(imported_module: ScikitLike, called: str) Self
Adds a new estimator from scikit-learn not built-in to the forecaster object that can be called using set_estimator(). Only regression models are accepted.
- Parameters:
imported_module (scikit-learn regression model) – The model from scikit-learn to add. Must have already been imported locally. Supports models from sklearn and sklearn APIs.
called (str) – The name of the estimator that can be called using set_estimator().
mv (bool) – Whether the add is for Multivariate forecasting.
- Returns:
Self
>>> from sklearn.ensemble import StackingRegressor >>> f.add_sklearn_estimator(StackingRegressor,called='stacking') >>> f.set_estimator('stacking') >>> f.manual_forecast(...)
- add_time_trend(called: str = 't') Self
Adds a time trend from 1 to length of the series + the forecast horizon as a current and future Xvar.
- Parameters:
Called (str) – Default ‘t’. What to call the resulting variable.
- Returns:
Self
>>> f.add_time_trend() # adds time trend called 't'
- auto_forecast(call_me: str | None = None, test_model: bool = True, predict_fitted: bool = True, dynamic_testing: bool | Annotated[int, 'must be > 0'] = True) list[float]
Auto forecasts with the best parameters indicated from the tuning process.
- Parameters:
call_me (str) – Optional. What to call the model when storing it in the object’s history dictionary. If not specified, the model’s nickname will be assigned the estimator value (‘mlp’ will be ‘mlp’, etc.). Duplicated names will be overwritten with the most recently called model.
test_model (bool) – Default True. Whether to test the model before forecasting to a future horizon. If test_length is 0, this is ignored. Set this to False if you tested the model manually by calling f.test() and don’t want to waste resources testing it again.
dynamic_testing (bool or int) – Default True. Whether to dynamically/recursively test the forecast (meaning AR terms will be propagated with predicted values). If True, evaluates dynamically over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out. The model will skip testing if the test_length attribute is set to 0.
predict_fitted (bool) – Whether to predict fitted values.
- Returns:
The final point estimates.
- Return type:
(list[float])
>>> f.set_estimator('xgboost') >>> f.tune() >>> f.auto_forecast()
- chop_from_front(n: Annotated[int, 'must be >= 0'], fcst_length: Annotated[int, 'must be >= 0'] | None = None) Self
Cuts the amount of y observations in the object from the front counting backwards. The current length of the forecast horizon will be maintained and all future regressors will be rewritten to the appropriate attributes.
- Parameters:
n (int) – The number of observations to cut from the front.
fcst_length (int) – Optional. The new length of the forecast length. By default, maintains the same forecast length currently in the object.
- Returns:
Self
>>> mvf.chop_from_front(10) # keeps all observations before the last 10
- copy()
Creates an object copy.
- Returns:
A copy of the object.
- Return type:
Self
- corr(train_only: bool = False, disp: Literal['matrix', 'heatmap'] = 'matrix', df: DataFrame | None = None, **kwargs: Any) DataFrame | Figure
Displays pearson correlation between all stored series in object.
- Parameters:
train_only (bool) – Default False. Whether to only include the training set (to avoid leakage).
disp (str) – One of {‘matrix’,’heatmap’}. Default ‘matrix’. How to display results.
df (DataFrame) – Optional. A dataframe to display correlation for. If specified, a dataframe will be created using all series with no lags. This argument exists to make the corr_lags() method work and it is not recommended to use it manually.
**kwargs – Passed to seaborn.heatmap() function and are ignored if disp == ‘matrix’.
- Returns:
The created dataframe if disp == ‘matrix’ else the heatmap fig.
- Return type:
(DataFrame or Figure)
- corr_lags(y: Annotated[str, "must exist in object's names attribute"] | None = None, x: Annotated[str, "must exist in object's names attribute"] | None = None, lags: Annotated[int, 'must be > 0'] = 1, **kwargs: Any) DataFrame
Displays pearson correlation between one series and another series’ lags.
- Parameters:
y (str) – The series to display as the dependent series. Default will take the first loaded series in the object.
x (str) – The series to display as the independent series. Default will take the second loaded series in the object.
lags (int) – Default 1. The number of lags to display in the independent series.
**kwargs – Passed to the MVForecaster.corr() method. Will not pass the df arg.
- Returns:
The created dataframe if disp == ‘matrix’ else the heatmap fig.
- Return type:
(DataFrame or Figure)
- cross_validate(k: Annotated[int, 'must be > 0'] = 5, test_length: int | None = None, train_length: int | None = None, space_between_sets: int | None = None, rolling: bool = False, dynamic_tuning: bool | Annotated[int, 'must be > 0'] = False, set_aside_test_set: bool = True, verbose: bool = False) Self
Tunes a model’s hyperparameters using time-series cross validation. Monitors the metric specified in the valiation_metric attribute. Set an estimator before calling. Reads a grid for the estimator from a grids file unless a grid is ingested manually. The chosen parameters are stored in the best_params attribute. All metrics from each iteration are stored in grid_evaluated. The rows in this matrix correspond to the element index in f.grid (a hyperparameter combo) and the columns are the derived metrics across the k folds. Any hyperparameters that ever failed to evaluate will return N/A and are not considered. The best parameter combo is determined by the best average derived matrix across all folds. The temporal order of the series is always maintained in this process. If a test_length is specified in the object, it will be set aside by default. (Default) Normal cv diagram: https://scalecast-examples.readthedocs.io/en/latest/misc/validation/validation.html#5-Fold-Time-Series-Cross-Validation. (Default) Rolling cv diagram: https://scalecast-examples.readthedocs.io/en/latest/misc/validation/validation.html#5-Fold-Rolling-Time-Series-Cross-Validation.
- Parameters:
k (int) – Default 5. The number of folds. If 1, behaves as if the model were being tuned on a single held out set.
test_length (int) – Optional. The size of each held-out sample. By default, determined such that the last test set and train set are the same size.
train_length (int) – Optional. The size of each training set. By default, all available observations before each test set are used.
space_between_sets (int) – Optional. The space between each training set. By default, uses the test_length.
rolling (bool) – Default False. Whether to use a rolling method, meaning every train and test size is the same. This is ignored when either of train_length or test_length is specified.
dynamic_tuning (bool or int) – Default False. Whether to dynamically/recursively test the forecast during the tuning process (meaning AR terms will be propagated with predicted values). If True, evaluates recursively over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step recurvie testing, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out.
set_aside_test_set (bool) – Default True. Whether to separate the test set specified in f.test_length during this process.
verbose (bool) – Default False. Whether to print out information about the test size, train size, and date ranges for each fold.
- Returns:
Self
>>> f.set_estimator('xgboost') >>> f.cross_validate() # tunes hyperparam values >>> f.auto_forecast() # forecasts with the best params
- determine_if_MVForecaster()
Determines if the object is a Forecater of MVForecaster type by checking if the y attribute is a dictionary (MVForecaster) or a Series (Forecaster).
- Returns:
True if the object is an MVForecaster, False if it is a Forecaster.
- Return type:
bool
- drop_Xvars(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], raise_error: bool = True) Self
Drops regressors.
- Parameters:
*args (str) – The names of regressors to drop.
raise_error (bool) – Whether to raise an error if regressors not found. Default raises. False ignores.
- Returns:
Self
>>> f.add_time_trend() >>> f.add_exp_terms('t',pwr=.5) >>> f.drop_Xvars('t','t^0.5')
- drop_all_Xvars() Self
Drops all regressors.
- Returns:
Self
- drop_regressors(*args: Annotated[str, "must exist as a key in the object's current_xreg attribute"], raise_error: bool = True)
Drops regressors.
- Parameters:
*args (str) – The names of regressors to drop.
raise_error (bool) – Whether to raise an error if regressors not found. Default raises. False ignores.
- Returns:
Self
>>> f.add_time_trend() >>> f.add_exp_terms('t',pwr=.5) >>> f.drop_regressors('t','t^0.5')
- eval_cis(mode: bool = True, cilevel: Annotated[float, 'must be > 0 and < 1'] = 0.95) Self
Call this function to change whether or not the Forecaster sets confidence intervals on all evaluated models. Beginning 0.17.0, only conformal confidence intervals are supported. Conformal intervals need a test set to be configured soundly. Confidence intervals cannot be evaluated when there aren’t at least 1/(1-cilevel) observations in the test set.
- Parameters:
mode (bool) – Default True. Whether to set confidence intervals on or off for models.
cilevel (float) – Default .95. Must be greater than 0, less than 1. The confidence level to use to set intervals.
- export(dfs: Literal['model_summaries', 'lvl_test_set_predictions', 'lvl_fcsts'] | list[Literal['model_summaries', 'lvl_test_set_predictions', 'lvl_fcsts']] = ['model_summaries', 'lvl_test_set_predictions', 'lvl_fcsts'], models: Annotated[str, "must exist as a key in object's history attribute"] | list[Annotated[str, "must exist as a key in object's history attribute"]] | Annotated[str, 'must begin with top_ followed by a positive integer'] | Literal['all'] = 'all', series: list[Annotated[str, "must exist in object's names attribute"]] | Annotated[str, "must exist in object's names attribute"] | Literal['all'] = 'all', cis: bool = False, to_excel: bool = False, out_path: PathLike = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/scalecast/checkouts/latest/docs'), excel_name: str = 'results.xlsx') DataFrame | dict[str, DataFrame]
Exports 1-all of 3 pandas dataframes. Can write to excel with each dataframe on a separate sheet. Will return either a dictionary with dataframes as values (df str arguments as keys) or a single dataframe if only one df is specified.
- Parameters:
dfs (list-like or str) – Default [‘model_summaries’, ‘lvl_test_set_predictions’, ‘lvl_fcsts’]. A list or name of the specific dataframe(s) you want returned and/or written to excel. Must be one of or multiple of the elements in default. Exporting test set predictions only works if all exported models were tested using the same test length.
models (list-like or str) – Default ‘all’. The forecasted models to plot. Name of the model, ‘all’, or list-like of model names. ‘top_’ and None not supported.
series (list-like or str) – Default ‘all’. The series to plot. Name of the series (‘y1…’, ‘series1…’ or user-selected name), ‘all’, or list-like of series names.
to_excel (bool) – Default False. Whether to save to excel.
cis (bool) – Default False. Whether to export confidence intervals for models in “all_fcsts”, “test_set_predictions”, “lvl_test_set_predictions”, “lvl_fcsts” dataframes.
out_path (str) – Default ‘./’. The path to save the excel file to (ignored when to_excel=False).
excel_name (str) – Default ‘results.xlsx’. The name to call the excel file (ignored when to_excel=False).
- Returns:
Either a single pandas dataframe if one element passed to dfs or a dictionary where the keys match what was passed to dfs and the values are dataframes.
- Return type:
(DataFrame or Dict[str,DataFrame])
>>> results = mvf.export(dfs=['model_summaries','lvl_fcsts'],to_excel=True) # returns a dict >>> model_summaries = results['model_summaries'] # returns a dataframe >>> lvl_fcsts = results['lvl_fcsts'] # returns a dataframe >>> ts_preds = mvf.export('test_set_predictions') # returns a dataframe
- export_fitted_vals(series: list[Annotated[str, "must exist in object's names attribute"]] | Annotated[str, "must exist in object's names attribute"] | Literal['all'] = 'all', models: Annotated[str, "must exist as a key in object's history attribute"] | list[Annotated[str, "must exist as a key in object's history attribute"]] | Annotated[str, 'must begin with top_ followed by a positive integer'] | Literal['all'] = 'all') DataFrame
Exports a dataframe of fitted values and actuals.
- Parameters:
models (list-like or str) – Default ‘all’. Name of the model, ‘all’, or list-like of model names.
series (list-like or str) – Default ‘all’. The series to plot. Name of the series (‘y1…’, ‘series1…’ or user-selected name), ‘all’, or list-like of series names.
- Returns:
The fitted values for all selected series and models.
- Return type:
(DataFrame)
- export_validation_grid(model: Annotated[str, "must exist as a key in object's history attribute"]) DataFrame
Exports the validation grid from a model, converted to a pandas dataframe. Raises an error if the model was not tuned.
- Parameters:
model (str) – The name of them model to export for. Matches what was passed to call_me when evaluating the model.
- Returns:
The resulting validation grid of the evaluated model passed to model arg.
- Return type:
(DataFrame)
- fit(**fit_params: Any) Self
Fits the model assigned to self.call_estimator. Called in auto_forecast()/manual_forecast() after init_estimator() creates the instance to fit.
- Parameters:
**fit_params – Any parameters to pass to the fit method of the model instance assigned to self.call_estimator. This can include parameters such as sample_weight, eval_set, early_stopping_rounds, etc. depending on the model being fit.
- Returns:
Self
- generate_future_dates(n: Annotated[int, 'must be > 0']) Self
Generates a certain amount of future dates in same frequency as current_dates.
- Parameters:
n (int) – Greater than 0. Number of future dates to produce. This will also be the forecast length.
- Returns:
None
>>> f.generate_future_dates(12) # 12 future dates to forecast out to
- get_max_lag_order()
Returns the highest lag order variable stored in the object. Returns 0 if none were found.
- Returns:
The max order found.
- Return type:
int
- ingest_Xvars_df(df: DataFrame, date_col: str = 'Date', drop_first: bool = False, use_future_dates: bool = False, pad: bool = False) Self
Ingests a dataframe of regressors and saves its Xvars to the object. The user must specify a date column name in the dataframe being ingested. All non-numeric values are dummied. The dataframe should cover the entire future horizon stored within the Forecaster object, but can be padded with 0s if testing only is desired. Any columns in the dataframe that begin with “AR” will be confused with autoregressive terms and could cause errors.
- Parameters:
df (DataFrame) – The dataframe that is at least the length of the y array stored in the object plus the forecast horizon.
date_col (str) – Default ‘Date’. The name of the date column in the dataframe. This column must have the same frequency as the dates stored in the Forecaster object.
drop_first (bool) – Default False. Whether to drop the first observation of any dummied variables. Irrelevant if passing all numeric values.
use_future_dates (bool) – Default False. Whether to use the future dates in the dataframe as the resulting future_dates attribute in the Forecaster object.
pad (bool) – Default False. Whether to pad any missing values with 0s.
- Returns:
Self
- ingest_grid(grid: str | dict[str, Any]) Self
Ingests a grid to tune the estimator.
- Parameters:
grid (dict or str) – If dict, must be a user-created grid. If str, must match the name of a dict grid stored in a grids file.
- Returns:
Self
>>> f.set_estimator('mlr') >>> f.ingest_grid({'normalizer':['scale','minmax']})
- init_estimator(dynamic_testing: bool | Annotated[int, 'must be > 0'] | None = None, **kwargs: Any) Self
Initiates the estimator to be used for forecasting by creating an instance of the model’s interpreted_model class and assigning it to self.call_estimator. This is called in auto_forecast()/manual_forecast() and can be called separately if you want to fit the model manually by calling f.fit() and f.predict().
- Parameters:
dynamic_testing (bool or int) – Optional. Whether to dynamically/recursively test the forecast (meaning AR terms will be propagated with predicted values). If True, evaluates dynamically over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out. The model will skip testing if the test_length attribute is set to 0.
**kwargs – Passed to the relevant model class and can include such parameters as Xvars, normalizer, cap, and floor, in addition to any given model’s specific hyperparameters.
- Returns:
Self
- keep_smaller_history(n: Annotated[int, 'must be >= 0']) Self
Cuts y observations in the object by counting back from the beginning.
- Parameters:
n (int, str, or datetime.datetime) – If int, the number of observations to keep. Otherwise, the last observation to keep. Must be parsable by pandas’ Timestamp function.
- Returns:
Self
>>> f.keep_smaller_history(500) # keeps last 500 observations >>> f.keep_smaller_history('2020-01-01') # keeps only observations on or later than 1/1/2020
- limit_grid_size(n: Annotated[int, 'must be > 0'] | Annotated[float, 'must be > 0 and < 1'], min_grid_size: Annotated[int, 'must be > 0'] = 1, random_seed: int | None = None) Self
Makes a grid smaller randomly.
- Parameters:
n (int or float) – If int, randomly selects that many parameter combinations. If float, must be less than 1 and greater 0, randomly selects that percentage of parameter combinations.
min_grid_size (int) – Default 1. The min number of hyperparameters to keep from the original grid if a float is passed to n.
random_seed (int) – Optional. Set a seed to make results consistent.
- Returns:
Self
>>> from scalecast import GridGenerator >>> GridGenerator.get_example_grids() >>> f.set_estimator('mlp') >>> f.ingest_grid('mlp') >>> f.limit_grid_size(10,random_seed=20) # limits grid to 10 iterations >>> f.limit_grid_size(.5,random_seed=20) # limits grid to half its original size
- list_stored_ar_terms()
Returns a list of all stored autoregressive (AR) terms.
- Returns:
All stored AR terms.
- Return type:
list
- lookup_normalizer(normalizer: Annotated[str, "must exist as a key in the object's normalizer attribute"] = None) NormalizerLike
Returns the normalizing object (i.e. StandardScaler) with fit/transform methods.
- Parameters:
normalizer (str) – Optional. The name of the normalizer in the object’s normalizer attribute. Default returns a function that does nothing.
- Returns:
An object with the fit/transform methods.
- Return type:
NormalizerLike
- manual_forecast(call_me: str | None = None, test_model: bool = True, dynamic_testing: bool | Annotated[int, 'must be > 0'] = True, bank_history: bool = True, predict_fitted: bool = True, **kwargs: Any) list[float]
Manually forecasts with the hyperparameters, Xvars, and normalizer selection passed as keywords. See https://scalecast.readthedocs.io/en/latest/Forecaster/_forecast.html.
- Parameters:
call_me (str) – Optional. What to call the model when storing it in the object’s history. If not specified, the model’s nickname will be assigned the estimator value (‘mlp’ will be ‘mlp’, etc.). Duplicated names will be overwritten with the most recently called model.
test_model (bool) – Default True. Whether to test the model before forecasting to a future horizon. If test_length is 0, this is ignored. Set this to False if you tested the model manually by calling f.test() and don’t want to waste resources testing it again.
dynamic_testing (bool or int) – Default True. Whether to dynamically/recursively test the forecast (meaning AR terms will be propagated with predicted values). If True, evaluates dynamically over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out. The model will skip testing if the test_length attribute is set to 0.
predict_fitted (bool) – Whether to predict fitted values.
**kwargs – passed to the _forecast_{estimator}() method and can include such parameters as Xvars, normalizer, cap, and floor, in addition to any given model’s specific hyperparameters. See https://scalecast.readthedocs.io/en/latest/Forecaster/_forecast.html.
- Returns:
The forecasted predictions.
- Return type:
List[float]
>>> f.set_estimator('lasso') >>> f.manual_forecast(alpha=.5)
- n_actuals()
Returns the number of actual observations in the object.
- order_fcsts(models: Annotated[str, "must exist as a key in object's history attribute"] | None = None, determine_best_by: DetermineBestBy = 'TestSetRMSE') list[str]
Gets estimated forecasts ordered from best-to-worst.
- Parameters:
models (list-like) – Optional. A list of models to consider in the order. Default considers all evaluated models. If not ‘all’, each element must match an evaluated model’s nickname. ‘all’ will only consider models that have a non-null determine_best_by value in history.
determine_best_by (str) – Default ‘TestSetRMSE’. One of Forecaster.determine_best_by.
- Returns:
The ordered models.
- Return type:
(list)
>>> models = ('mlr','mlp','lightgbm') >>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True) >>> ordered_models = f.order_fcsts(models,"TestSetRMSE")
- parse_determine_best_by(determine_best_by: DetermineBestBy) MetricStore
Returns the metric to determine the best model by based on the DetermineBestBy object created in set_metrics().
- Parameters:
determine_best_by (DetermineBestBy) – The DetermineBestBy object created in set_metrics().
- Returns:
The metric to determine the best model by based on the DetermineBestBy object created in set_metrics().
- Return type:
MetricStore
- parse_labeled_metrics(labeled_metrics: dict[str, EvaluatedMetric]) dict[str, float]
Parsses a dictionary of EvaluatedMetric objects and returns a dictionary of model nicknames and their corresponding scores ordered from best to worst based on the store attribute of the EvaluatedMetric objects. If the metric is one where lower is better, the dictionary is ordered in ascending order. If the metric is one where higher is better, the dictionary is ordered in descending order.
- Parameters:
labeled_metrics (dict) – A dictionary where the keys are model nicknames and the values are EvaluatedMetric objects.
- Returns:
A dictionary where the keys are model nicknames and the values are the corresponding scores ordered from best to worst.
- Return type:
dict
- plot(models: Annotated[str, "must exist as a key in object's history attribute"] | list[Annotated[str, "must exist as a key in object's history attribute"]] | Annotated[str, 'must begin with top_ followed by a positive integer'] | Literal['all'] = 'all', series: list[Annotated[str, "must exist in object's names attribute"]] | Annotated[str, "must exist in object's names attribute"] | Literal['all'] = 'all', put_best_on_top: bool = False, ci: bool = False, ax: Axes = None, figsize: tuple[int, int] = (12, 6), colors: list[str] | None = ['#FFA500', '#DC143C', '#00FF7F', '#808000', '#BC8F8F', '#A9A9A9', '#8B008B', '#FF1493', '#FFDAB9', '#20B2AA', '#7FFFD4', '#A52A2A', '#BDB76B', '#DEB887'], series_colors: list[str] | None = ['#0000FF', '#00FFFF', '#7393B3', '#088F8F', '#0096FF', '#F0FFFF', '#00FFFF', '#5D3FD3', '#191970', '#9FE2BF']) Axes
Plots all forecasts with the actuals, or just actuals if no forecasts have been evaluated or are selected.
- Parameters:
models (list-like or str) – Default ‘all’. The forecasted models to plot. Name of the model, ‘all’, or list-like of model names. ‘top_’ and None not supported.
series (list-like or str) – Default ‘all’. The series to plot. Name of the series, ‘all’, or list-like of series names.
put_best_on_top (bool) – Only set to True if you have previously called set_best_model(). If False, ignored.
ci (bool) – Default False. Whether to display the confidence intervals.
ax (Axis) – Optional. The existing axis to write the resulting figure to.
figsize (tuple) – Default (12,6). Size of the resulting figure. Ignored when ax is not None.
colors (list[str]) – Optional. The colors to use when drawing the forecasts.
series_colors (list[str]) – Optional. The colors to use when drawing the actual series.
- Returns:
The figure’s axis.
- Return type:
(Axis)
>>> mvf.plot() # plots all forecasts and all series >>> plt.show()
- plot_fitted(models: Annotated[str, "must exist as a key in object's history attribute"] | list[Annotated[str, "must exist as a key in object's history attribute"]] | Annotated[str, 'must begin with top_ followed by a positive integer'] | Literal['all'] = 'all', series: list[Annotated[str, "must exist in object's names attribute"]] | Annotated[str, "must exist in object's names attribute"] | Literal['all'] = 'all', ax: Axes = None, figsize: tuple[int, int] = (12, 6), colors: list[str] | None = ['#FFA500', '#DC143C', '#00FF7F', '#808000', '#BC8F8F', '#A9A9A9', '#8B008B', '#FF1493', '#FFDAB9', '#20B2AA', '#7FFFD4', '#A52A2A', '#BDB76B', '#DEB887'], series_colors: list[str] | None = ['#0000FF', '#00FFFF', '#7393B3', '#088F8F', '#0096FF', '#F0FFFF', '#00FFFF', '#5D3FD3', '#191970', '#9FE2BF']) Axes
Plots fitted values with the actuals.
- Parameters:
models (list-like or str) – Default ‘all’. The forecasted models to plot. Name of the model, ‘all’, or list-like of model names. ‘top_’ and None not supported.
series (list-like or str) – Default ‘all’. The series to plot. Name of the series (‘y1…’, ‘series1…’ or user-selected name), ‘all’, or list-like of series names.
ax (Axis) – Optional. The existing axis to write the resulting figure to.
figsize (tuple) – Default (12,6). Size of the resulting figure. Ignored when ax is not None.
colors (list[str]) – Optional. The colors to use when drawing the forecasts.
series_colors (list[str]) – Optional. The colors to use when drawing the actual series.
- Returns:
The figure’s axis.
- Return type:
(Axis)
>>> mvf.plot_fitted() # plots all fitted values on all series >>> plt.show()
- plot_test_set(models: Annotated[str, "must exist as a key in object's history attribute"] | list[Annotated[str, "must exist as a key in object's history attribute"]] | Annotated[str, 'must begin with top_ followed by a positive integer'] | Literal['all'] = 'all', series: list[Annotated[str, "must exist in object's names attribute"]] | Annotated[str, "must exist in object's names attribute"] | Literal['all'] = 'all', put_best_on_top: bool = False, include_train: bool | Annotated[int, 'must be > 0'] = True, ci: bool = False, ax: Axes = None, figsize: tuple[int, int] = (12, 6), colors: list[str] | None = ['#FFA500', '#DC143C', '#00FF7F', '#808000', '#BC8F8F', '#A9A9A9', '#8B008B', '#FF1493', '#FFDAB9', '#20B2AA', '#7FFFD4', '#A52A2A', '#BDB76B', '#DEB887'], series_colors: list[str] | None = ['#0000FF', '#00FFFF', '#7393B3', '#088F8F', '#0096FF', '#F0FFFF', '#00FFFF', '#5D3FD3', '#191970', '#9FE2BF']) Axes
Plots all test-set predictions with the actuals.
- Parameters:
models (list-like or str) – Default ‘all’. The forecasted models to plot. Name of the model, ‘all’, or list-like of model names. ‘top_’ and None not supported.
series (list-like or str) – Default ‘all’. The series to plot. Name of the series (‘y1…’, ‘series1…’ or user-selected name), ‘all’, or list-like of series names.
put_best_on_top (bool) – Only set to True if you have previously called set_best_model(). If False, ignored.
include_train (bool or int) – Default True. Use to zoom into training results. If True, plots the test results with the entire history in y. If False, matches y history to test results and only plots this. If int, plots that length of y to match to test results.
ci (bool) – Default False. Whether to display the confidence intervals.
ax (Axis) – Optional. The existing axis to write the resulting figure to.
figsize (tuple) – Default (12,6). Size of the resulting figure. Ignored when ax is not None.
colors (list[str]) – Optional. The colors to use when drawing the forecasts.
series_colors (list[str]) – Optional. The colors to use when drawing the actual series.
- Returns:
The figure’s axis.
- Return type:
(Axis)
>>> mvf.plot_test_set() # plots all test set predictions on all series >>> plt.show()
- pop(*args: Annotated[str, "must exist as a key in object's history attribute"]) Self
Deletes evaluated forecasts from the object’s memory.
- Parameters:
*args (str) – Names of models matching what was passed to call_me when model was evaluated.
- Returns:
Self
>>> models = ('mlr','mlp','lightgbm') >>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True) >>> f.pop('mlr')
- predict(**predict_params: Any) list[float]
Predicts with the model assigned to self.call_estimator. Called in auto_forecast()/manual_forecast() after fit() fits the model.
- Parameters:
**predict_params – Any parameters to pass to the predict method of the model instance assigned to self.call_estimator. This can include parameters such as num_iteration for xgboost, etc. depending on the model being fit.
- Returns:
The forecasted values.
- Return type:
list[float]
- predict_fitted_vals(**predict_params: Any)
Returns the fitted values for the training data with the model assigned to self.call_estimator.
- Parameters:
**predict_params – Any parameters to pass to the predict method of the model instance assigned to self.call_estimator. This can include parameters such as num_iteration for xgboost, etc. depending on the model being fit.
- Returns:
The fitted values for the training data.
- Return type:
list[float]
- set_best_model(model: Annotated[str, "must exist as a key in object's history attribute"] | None = None, determine_best_by: Annotated[str, "must exist as a value in object's determine_best_by attribute"] | None = None) Self
Sets the best model to be referenced as “best”. One of model or determine_best_by parameters must be specified.
- Parameters:
model (str) – The model to set as the best. Must match the estimator name or call_me if that was used when evaluating the model.
determine_best_by (str) – One of MVForecaster.determine_best_by. If model is specified, this will be ignored.
- Returns:
Self
- set_cilevel(n: Annotated[float, 'must be > 0 and < 1']) Self
Sets the level for the resulting confidence intervals (95% default).
- Parameters:
n (float) – Greater than 0 and less than 1.
- Returns:
Self
>>> f.set_cilevel(.80) # next forecast will get 80% confidence intervals
- set_estimator(estimator: Annotated[str, "must exist in the name attribute of the object's estimators attribute"]) Self
Sets the estimator to forecast with.
- Parameters:
estimator (str) – One of Forecaster.estimators.
- Returns:
Self
>>> f.set_estimator('lasso') >>> f.manual_forecast(alpha = .5)
- set_grids_file(name: str = 'Grids') Self
Sets the name of the file where the object will look automatically for grids when calling tune(), cross_validate(), tune_test_forecast(), or similar function. If the grids file does not exist in the working directory, the error will only be raised once tuning is called.
- Parameters:
name (str) – Default ‘Grids’. The name of the file to look for. This file must exist in the working directory. The default will look for a file called “Grids.py”.
>>> f.set_grids_file('ModGrids') # expects to find a file called ModGrids.py in working directory.
- set_last_future_date(date: date | datetime | Timestamp | datetime64 | str) Self
Generates future dates in the same frequency as current_dates that ends on a specified date.
- Parameters:
date (datetime-like) – The date to end on. Must be parsable by pandas’ Timestamp() function.
- Returns:
Self
>>> f.set_last_future_date('2021-06-01') # creates future dates up to this one in the expected frequency
- set_metrics(metrics: list[MetricStore | Annotated[str, 'must be the name of a static method in the scalecast.Metrics class that only accepts two arguments (a and f)']], keep_existing: bool = False) Self
Set or change the evaluated metrics for all model testing and validation.
- Parameters:
metrics (list[MetricStore|str]) – The metrics to evaluate when validating and testing models. If str, each element must exist as a name in scalecast.Metrics.Metrics and can only accept two arguments: a and f. Otherwise use the MetricStore class from scalecast.Classes to specify a custom metric. For each metric and model that is tested, the test-set and in-sample metrics will be evaluated and can be exported. Level test-set and in-sample metrics are also currently available, but will be removed in a future version.
keep_existing (bool) – Default False. Whether to keep evaluating all existing metrics already in the object.
- Returns:
Self
- set_optimize_on(how: str) Self
Choose how to determine best models by choosing which series should be optimized or the aggregate function to apply on the derived metrics across all series. This is the decision that will be used for optimizing model hyperparameters.
- Parameters:
how (str) – One of MVForecaster.optimizer_funcs, a series name, or a function. Only one series name will be in mvf.optimizer_funcs at a given time. mvf.optimize_on is set to ‘mean’ when the object is initiated.
- Returns:
Self
- set_test_length(n: Annotated[int, 'must be >= 0'] | Annotated[float, 'must be > 0 and < 1'] = 1) Self
Sets the length of the test set. As of version 0.16.0, 0-length test sets are supported.
- Parameters:
n (int or float) – Default 1. The length of the resulting test set. Pass 0 to skip testing models. Fractional splits are supported by passing a float less than 1 and greater than 0.
- Returns:
Self
>>> f.set_test_length(12) # test set of 12 >>> f.set_test_length(.2) # 20% test split
- set_validation_length(n: Annotated[int, 'must be > 0'] = 1) Self
Sets the length of the validation set. This will never matter for models that are not tuned.
- Parameters:
n (int) – Default 1. The length of the resulting validation set.
- Returns:
Self
>>> f.set_validation_length(6) # validation length of 6
- set_validation_metric(metric: str) Self
Sets the metric that will be used to tune all subsequent models.
- Parameters:
metric (str) – One of the names in Forecaster.metrics. The metric to optimize the models with using the validation set. Although model testing will evaluate all metrics in Forecaster.metrics, model optimization with tuning and cross validation only uses one of these.
- Returns:
Self
>>> f.set_validation_metric('mae')
- test(dynamic_testing: bool | Annotated[int, 'must be > 0'] = True, call_me: str | None = None, **kwargs: Any) Self
Tests the forecast estimator out-of-sample. Uses the test_length attribute to determine on how-many observations. All test-set splits maintain temporal order.
- Parameters:
dynamic_testing (bool or int) – Default True. Whether to dynamically/recursively test the forecast (meaning AR terms will be propagated with predicted values). If True, evaluates dynamically over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out. This will fail if the test_length attribute is 0.
call_me (str) – Optional. What to call the model when storing it in the object’s history. If not specified, the model’s nickname will be assigned the estimator value (‘mlp’ will be ‘mlp’, etc.). Duplicated names will be overwritten with the most recently called model.
**kwargs – passed to the _forecast_{estimator}() method and can include such parameters as Xvars, normalizer, cap, and floor, in addition to any given model’s specific hyperparameters. See https://scalecast.readthedocs.io/en/latest/Forecaster/_forecast.html.
>>> f.set_estimator('lasso') >>> f.test(alpha=.5)
- transfer_cis(transfer_from: _Forecaster_parent, model: str, transfer_to_model: str = None, transfer_test_set_cis: bool | None = None) Self
Transfers the confidence intervals from a model forecast in a passed Forecaster or MVForecaster object.
- Parameters:
transfer_from (Forecaster or MVForecaster) – The object that contains the model from which intervals should be transferred.
model (str) – The model nickname of the already-evaluated model stored in transfer_from.
transfer_to_model (str) – Optional. The nickname of the model to which the intervals should be transferred. If not specified, inherits the name passed to model.
transfer_test_set_cis (bool) – Optional. Whether to pass intervals for test-set predictions. If left unspecified, the decision is made based on whether the inheriting object has test-set predictions evaluated.
- Returns:
Self
>>> f.manual_forecast(call_me='mlr') >>> f_new.transfer_predict(transfer_from=f,model='mlr') >>> f_new.transfer_cis(transfer_from=f,model='mlr')
- transfer_predict(transfer_from: Forecaster_parent, model: str, return_series: bool = False, save_to_history: bool = True, call_me: str | None = None, regr=None) Self | list[float]
Makes predictions using an already-trained model over any given forecast horizon. Will use the already-trained model from a passed Forecaster object to create a new model in the Forecaster or ‘MVForecaster` object from which the method is called. Or the option is available to not save a new model but return the predictions in a pandas Series object. Confidence intervals cannot be transferred from this method but can be from the transfer_cis() method.
- Parameters:
transfer_from (Forecaster) – The Forecaster object that contains the already-fitted model.
model (str) – The model nickname of the already-evaluated model stored in the Forecaster object passed to transfer_from.
return_series (bool) – Default False. Whether to return a pandas Series with the date as an index of the values. If the dates argument is not specified, this will include all dates in the Forecaster instance that the method is called from.
save_to_history (bool) – Default True. Whether to save the transferred predictions as if they were a model being run using a _forecast() method.
call_me (str) – Optional. What to call the resulting model. If save_to_history is False, this is ignored. If not specified, inherits the name passed to model.
regr – Optional. The model to make predictions with. If not supplied, the model will be searched for in the Forecaster passed to transfer_from.
- Returns:
The date-indexed series if return_series is True. Otherwise returns self.
- Return type:
(Pandas Series or Self)
>>> f.manual_forecast(call_me='mlr') >>> f_new.transfer_predict(transfer_from=f,model='mlr')
- tune(dynamic_tuning: bool | Annotated[int, 'must be > 0'] = False, set_aside_test_set: bool = True) Self
Tunes the specified estimator using an ingested grid (ingests a grid from Grids.py with same name as the estimator by default). This is akin to cross-validation with one fold and a test_length equal to f.validation_length. Any parameters that can be passed as arguments to manual_forecast() can be tuned with this process. The chosen parameters are stored in the best_params attribute. The evaluated validation grid can be exported to a dataframe using f.export_validation_grid().
- Parameters:
dynamic_tuning (bool or int) – Default False. Whether to dynamically/recursively test the forecast during the tuning process (meaning AR terms will be propagated with predicted values). If True, evaluates recursively over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step recurvie testing, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out.
set_aside_test_set (bool) – Default True. Whether to separate the test set specified in f.test_length during this process.
- Returns:
Self
>>> f.set_estimator('xgboost') >>> f.tune() >>> f.auto_forecast()
- tune_test_forecast(models: list[Annotated[str, "must exist in the name attribute of the object's estimators attribute"]], cross_validate: bool = False, dynamic_tuning: bool = False, dynamic_testing: bool = True, limit_grid_size: Annotated[int, 'must be >= 0'] | Annotated[float, 'must be > 0 and < 1'] | None = None, min_grid_size: Annotated[int, 'must be >= 0'] = 1, suffix: str | None = None, error: Literal['ignore', 'raise', 'warn'] = 'raise', **cvkwargs: Any) Self
Iterates through a list of models, tunes them using grids in a grids file, and forecasts them.
- Parameters:
models (list-like) – The models to iterate through.
cross_validate (bool) – Default False. Whether to tune the model with cross validation. If False, uses the validation slice of data to tune.
dynamic_tuning (bool or int) – Default False. Whether to dynamically tune the model or, if int, how many forecast steps to dynamically tune it.
dynamic_testing (bool or int) – Default True. Whether to dynamically/recursively test the forecast (meaning AR terms will be propagated with predicted values). If True, evaluates dynamically over the entire out-of-sample slice of data. If int, window evaluates over that many steps (2 for 2-step dynamic forecasting, 12 for 12-step, etc.). Setting this to False or 1 means faster performance, but gives a less-good indication of how well the forecast will perform more than one period out. The model will skip testing if the test_length attribute is set to 0.
limit_grid_size (int or float) – Optional. Pass an argument here to limit each of the grids being read. See https://scalecast.readthedocs.io/en/latest/Forecaster/MVForecaster.html#src.scalecast.MVForecaster.MVForecaster.limit_grid_size.
min_grid_size (int) – Default 1. The smallest grid size to keep. Ignored if limit_grid_size is None.
suffix (str) – Optional. A suffix to add to each model as it is evaluated to differentiate them when called later. If unspecified, each model can be called by its estimator name.
error (str) – One of ‘ignore’,’raise’,’warn’. Default ‘raise’. What to do with the error if a given model fails. ‘warn’ logs a warning that the model could not be evaluated.
**cvkwargs – Passed to the cross_validate() method.
- Returns:
Self
>>> models = ('mlr','mlp','lightgbm') >>> mvf.tune_test_forecast(models,dynamic_testing=False)