Introductory Example

This demonstrates an example using scalecast.

pip install --upgrade scalecast

If you spot a typo or you have suggestions to improve functionality or documentation, open an issue.

Link to dataset used in this example: https://www.kaggle.com/datasets/neuromusic/avocado-prices.

Link to other examples: https://github.com/mikekeith52/scalecast-examples

[1]:

import pandas as pd
import matplotlib.pyplot as plt

[2]:

# read in data
data = pd.read_csv('avocado.csv',parse_dates = ['Date'])
data = data.sort_values(['region','type','Date'])

Univariate Forecasting
Multivariate Forecasting
Transformations
Pipelines
Fully Automated Pipelines
Scaled Automated Forecasting
Exporting Results

Univariate Forecasting

Load the `Forecaster` Object

This is an object that can store data, run forecasts, store results, and plot. It’s a UI, procedure, and set of models all-in-one.
Forecasts in scalecast are run with a dynamic recursive approach by default, as opposed to a direct or other approach.
Forecaster Object Documentation.

[3]:

from scalecast.Forecaster import Forecaster

[4]:

volume = data.groupby('Date')['Total Volume'].sum()

[5]:

f = Forecaster(
    y = volume,
    current_dates = volume.index,
    future_dates = 13,
)

f

[5]:

Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=0
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=None
    CurrentEstimator=mlr
    GridsFile=Grids
)

Exploratory Data Analysis

[6]:

f.plot()
plt.show()

../_images/examples_Introduction2_9_0.png

[7]:

plt.rc("figure",figsize=(10,6))
f.seasonal_decompose().plot()
plt.show()

../_images/examples_Introduction2_10_0.png

[8]:

figs, axs = plt.subplots(2, 1,figsize=(9,9))
f.plot_acf(ax=axs[0],title='ACF',lags=26,color='black')
f.plot_pacf(ax=axs[1],title='PACF',lags=26,color='#B2C248',method='ywm')
plt.show()

../_images/examples_Introduction2_11_0.png

Parameterize the `Forecaster` Object

Set Test Length

You can skip model testing by setting a test length of 0 (which is the default any time you initiate the object).
In this example, all models will be tested on the last 15% of the observed values in the dataset.

[9]:

f.set_test_length(.15)

[9]:

Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=None
    CurrentEstimator=mlr
    GridsFile=Grids
)

Tell the Object to Evaluate Confidence Intervals

This only works if there is a test set specified and it is of a sufficient size.
See the documentation.
See the example.

[10]:

# default args below
f.eval_cis(
    mode = True, # tell the object to evaluate intervals
    cilevel = .95, # 95% confidence level
)

[10]:

Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=0.95
    CurrentEstimator=mlr
    GridsFile=Grids
)

Specify Model Inputs

Trend

[11]:

f.add_time_trend()

[11]:

Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=['t']
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=0.95
    CurrentEstimator=mlr
    GridsFile=Grids
)

Seasonality

[12]:

f.add_seasonal_regressors('week',raw=False,sincos=True)

Autoregressive Terms / Series Lags

[13]:

f.add_ar_terms(13)

[13]:

Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos', AR(lag_order=1), AR(lag_order=2), AR(lag_order=3), AR(lag_order=4), AR(lag_order=5), AR(lag_order=6), AR(lag_order=7), AR(lag_order=8), AR(lag_order=9), AR(lag_order=10), AR(lag_order=11), AR(lag_order=12), AR(lag_order=13)]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=0.95
    CurrentEstimator=mlr
    GridsFile=Grids
)

Run Models

See the available models.
See the blog post.
The dynamic_testing argument for all of these will be 13 – test-set results will then be in terms of rolling averages of 13-step forecasts, which is also our forecast length.
The resulting forecasts from this process are not well-fit. Better forecasts are obtained once more optimization is performed in later sections.

Linear Scikit-Learn Models

[14]:

f.set_estimator('mlr')
f.manual_forecast(dynamic_testing=13)

[14]:

[np.float64(111925329.05919904),
 np.float64(121868567.62128673),
 np.float64(99242683.29670584),
 np.float64(105678887.13113497),
 np.float64(112308203.89056407),
 np.float64(120678154.74020167),
 np.float64(116651150.57302916),
 np.float64(109280118.5937806),
 np.float64(112349081.0792562),
 np.float64(109439809.17344561),
 np.float64(110071156.63943005),
 np.float64(101942075.08718517),
 np.float64(105855231.06833495)]

[15]:

f.set_estimator('lasso')
f.manual_forecast(alpha=0.2,dynamic_testing=13)

[15]:

[np.float64(111925345.03515087),
 np.float64(121868543.75811642),
 np.float64(99242690.97795567),
 np.float64(105678895.70507629),
 np.float64(112308214.10968),
 np.float64(120678152.0012235),
 np.float64(116651130.64591947),
 np.float64(109280120.05952656),
 np.float64(112349068.95905635),
 np.float64(109439806.5289072),
 np.float64(110071134.33767673),
 np.float64(101942075.85864875),
 np.float64(105855227.63866332)]

[16]:

f.set_estimator('ridge')
f.manual_forecast(alpha=0.2,dynamic_testing=13)

[16]:

[np.float64(112274660.61929211),
 np.float64(119418913.83741271),
 np.float64(101042044.25944781),
 np.float64(105727082.80765453),
 np.float64(112062324.78367476),
 np.float64(119544283.79214948),
 np.float64(114434064.88122702),
 np.float64(109083035.43914787),
 np.float64(111016311.82109724),
 np.float64(108835849.31780478),
 np.float64(108575648.39289793),
 np.float64(102768196.78693624),
 np.float64(105049882.09154822)]

[17]:

f.set_estimator('elasticnet')
f.manual_forecast(alpha=0.2,l1_ratio=0.5,dynamic_testing=13)

[17]:

[np.float64(105685629.29640055),
 np.float64(105281381.8744145),
 np.float64(103215061.54035863),
 np.float64(102649894.2349436),
 np.float64(102879280.27984047),
 np.float64(102902501.47034767),
 np.float64(101898337.98197824),
 np.float64(101102145.44408616),
 np.float64(100388037.21838164),
 np.float64(99432772.90679798),
 np.float64(98765481.90172084),
 np.float64(98088863.2782056),
 np.float64(97329749.89139286)]

[18]:

f.set_estimator('sgd')
f.manual_forecast(alpha=0.2,l1_ratio=0.5,dynamic_testing=13)

[18]:

[np.float64(103005110.44890657),
 np.float64(102528203.37330228),
 np.float64(101188761.77107164),
 np.float64(100611465.43448582),
 np.float64(100455483.80855274),
 np.float64(100195411.17846547),
 np.float64(99413182.09937711),
 np.float64(98773882.05096462),
 np.float64(98164749.1028991),
 np.float64(97420240.520294),
 np.float64(96873112.59986287),
 np.float64(96330077.83240241),
 np.float64(95718401.74348791)]

[19]:

f.plot_test_set(ci=True,models=['mlr','lasso','ridge','elasticnet','sgd'],order_by='TestSetRMSE')
plt.show()

../_images/examples_Introduction2_31_0.png

[20]:

f.plot(ci=True,models=['mlr','lasso','ridge','elasticnet','sgd'],order_by='TestSetRMSE')
plt.show()

../_images/examples_Introduction2_32_0.png

Non-linear Scikit-Learn Models

[21]:

f.set_estimator('rf')
f.manual_forecast(max_depth=2,dynamic_testing=13)

[21]:

[np.float64(105519042.51181273),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(106747493.59269297),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506)]

[22]:

f.set_estimator('gbt')
f.manual_forecast(max_depth=2,dynamic_testing=13)

[22]:

[np.float64(115938308.61096564),
 np.float64(114842333.95051594),
 np.float64(116570947.86813788),
 np.float64(118027393.92171198),
 np.float64(121927583.9638932),
 np.float64(122059827.9244863),
 np.float64(110544046.04649158),
 np.float64(106702436.29151869),
 np.float64(112453200.90496303),
 np.float64(111024626.7337773),
 np.float64(111690629.80145997),
 np.float64(112840537.34805126),
 np.float64(117367062.54025795)]

[23]:

f.set_estimator('xgboost')
f.manual_forecast(gamma=1,dynamic_testing=13)

[23]:

[np.float32(1.1698018e+08),
 np.float32(1.1564128e+08),
 np.float32(1.1667483e+08),
 np.float32(1.1958582e+08),
 np.float32(1.170815e+08),
 np.float32(1.2778861e+08),
 np.float32(1.173033e+08),
 np.float32(1.0321523e+08),
 np.float32(1.0819822e+08),
 np.float32(1.0809726e+08),
 np.float32(1.06283944e+08),
 np.float32(1.0922606e+08),
 np.float32(1.16995064e+08)]

[24]:

f.set_estimator('catboost')
f.manual_forecast(depth=4,verbose=False,dynamic_testing=13)

[24]:

[np.float64(116846314.39195865),
 np.float64(114782872.13859823),
 np.float64(115526880.62077451),
 np.float64(112943011.8195494),
 np.float64(118833386.8812325),
 np.float64(118462895.75596091),
 np.float64(110523264.47436357),
 np.float64(111988737.59071368),
 np.float64(112226803.30083351),
 np.float64(111621609.22791208),
 np.float64(111429705.42153662),
 np.float64(110663502.60068737),
 np.float64(109363815.35791725)]

[25]:

f.set_estimator('knn')
f.manual_forecast(n_neighbors=5,dynamic_testing=13)

[25]:

[np.float64(104376279.50799999),
 np.float64(106889207.4),
 np.float64(100185057.636),
 np.float64(98687859.784),
 np.float64(100107069.238),
 np.float64(109220441.306),
 np.float64(102384289.098),
 np.float64(102155557.60800001),
 np.float64(100432864.052),
 np.float64(97987550.39),
 np.float64(94980742.072),
 np.float64(95231419.28400001),
 np.float64(93919469.74599999)]

[26]:

f.set_estimator('mlp')
f.manual_forecast(hidden_layer_sizes=(50,50),solver='lbfgs',dynamic_testing=13)

[26]:

[np.float64(109164589.4082709),
 np.float64(114384836.39800653),
 np.float64(111344510.3697601),
 np.float64(113340626.03508061),
 np.float64(114950093.75819898),
 np.float64(129457409.91330838),
 np.float64(122420201.4697397),
 np.float64(116571080.21675634),
 np.float64(117800884.2601351),
 np.float64(115517193.02928248),
 np.float64(115238510.36223798),
 np.float64(114534741.32529166),
 np.float64(113846676.41878459)]

[27]:

f.plot_test_set(
    ci=True,
    models=['rf','gbt','xgboost','lightgbm','catboost','knn','mlp'],
    order_by='TestSetRMSE'
)
plt.show()

../_images/examples_Introduction2_40_0.png

[28]:

f.plot(ci=True,models=['rf','gbt','xgboost','knn','mlp'],order_by='TestSetRMSE')
plt.show()

../_images/examples_Introduction2_41_0.png

Stacking Models

Sklearn Stacking Model

[29]:

from sklearn.ensemble import StackingRegressor

[30]:

f.add_sklearn_estimator(StackingRegressor,'stacking')

[30]:

Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos', AR(lag_order=1), AR(lag_order=2), AR(lag_order=3), AR(lag_order=4), AR(lag_order=5), AR(lag_order=6), AR(lag_order=7), AR(lag_order=8), AR(lag_order=9), AR(lag_order=10), AR(lag_order=11), AR(lag_order=12), AR(lag_order=13)]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=['mlr', 'lasso', 'ridge', 'elasticnet', 'sgd', 'rf', 'gbt', 'xgboost', 'catboost', 'knn', 'mlp']
    CILevel=0.95
    CurrentEstimator=mlp
    GridsFile=Grids
)

[31]:

estimators = [
    ('elasticnet',f.estimators.lookup_item('elasticnet').imported_model(alpha=0.2)),
    ('xgboost',f.estimators.lookup_item('xgboost').imported_model(gamma=1)),
    ('gbt',f.estimators.lookup_item('gbt').imported_model(max_depth=2)),
]

final_estimator = f.estimators.lookup_item('catboost').imported_model(verbose=False)

f.add_sklearn_estimator(StackingRegressor,'stacking')
f.set_estimator('stacking')
f.manual_forecast(
    estimators=estimators,
    final_estimator=final_estimator,
    dynamic_testing=13,
    call_me='sklearn_stack',
)

[31]:

[np.float64(107599182.74294516),
 np.float64(109954074.74807498),
 np.float64(109253667.79037124),
 np.float64(107324377.66522634),
 np.float64(109667251.60962425),
 np.float64(140510135.26105216),
 np.float64(107599182.74294516),
 np.float64(114776432.91538765),
 np.float64(110559300.11845502),
 np.float64(111993840.23790823),
 np.float64(113790448.97065803),
 np.float64(113528926.8568792),
 np.float64(129154865.04544067)]

Scalecast Stacking Model

[32]:

f.add_signals(['elasticnet','xgboost','knn'],train_only=True)
f.set_estimator('catboost')
f.manual_forecast(call_me = 'scalecast_stack',verbose=False)

[32]:

[np.float64(114370918.11640924),
 np.float64(114016247.62424554),
 np.float64(115033489.37223953),
 np.float64(113970030.66329396),
 np.float64(114042622.2036964),
 np.float64(115123567.68239625),
 np.float64(112929094.00877601),
 np.float64(112220435.15175214),
 np.float64(113633748.55193451),
 np.float64(113245862.48019421),
 np.float64(111596901.5275508),
 np.float64(112700546.59594658),
 np.float64(110843933.54867446)]

[33]:

f.plot_test_set(models=['sklearn_stack','scalecast_stack'],ci=True,order_by='TestSetRMSE')
plt.show()

../_images/examples_Introduction2_49_0.png

[34]:

f.plot(models=['sklearn_stack','scalecast_stack'],ci=True,order_by='TestSetRMSE')
plt.show()

../_images/examples_Introduction2_50_0.png

ARIMA

auxmodels.auto_arima

[35]:

from scalecast.auxmodels import auto_arima

[36]:

auto_arima(f,m=52)

[36]:

Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos', AR(lag_order=1), AR(lag_order=2), AR(lag_order=3), AR(lag_order=4), AR(lag_order=5), AR(lag_order=6), AR(lag_order=7), AR(lag_order=8), AR(lag_order=9), AR(lag_order=10), AR(lag_order=11), AR(lag_order=12), AR(lag_order=13), 'signal_elasticnet', 'signal_xgboost', 'signal_knn']
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=['mlr', 'lasso', 'ridge', 'elasticnet', 'sgd', 'rf', 'gbt', 'xgboost', 'catboost', 'knn', 'mlp', 'sklearn_stack', 'scalecast_stack', 'auto_arima']
    CILevel=0.95
    CurrentEstimator=arima
    GridsFile=Grids
)

[37]:

f.plot_test_set(models='auto_arima',ci=True)
plt.show()

../_images/examples_Introduction2_54_0.png

[38]:

f.plot(models='auto_arima',ci=True)
plt.show()

../_images/examples_Introduction2_55_0.png

Multivariate Forecasting

Load the `MVForecaster` Object

This object extends the univariate approach to several series, with many of the same plotting and reporting features available.
Documentation

[39]:

from scalecast.MVForecaster import MVForecaster

[40]:

price = data.groupby('Date')['AveragePrice'].mean()

fvol = Forecaster(y=volume,current_dates=volume.index,future_dates=13)
fprice = Forecaster(y=price,current_dates=price.index,future_dates=13)

fvol.add_time_trend()
fvol.add_seasonal_regressors('week',raw=False,sincos=True)

mvf = MVForecaster(
    fvol,
    fprice,
    merge_Xvars='union',
    names=['volume','price'],
)

mvf

[40]:

MVForecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    N_series=2
    SeriesNames=['volume', 'price']
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos']
    TestLength=0
    ValidationLength=1
    ValidationMetric=MetricStore(name='rmse', eval_func=<function Metrics.rmse at 0x121d5dbc0>, lower_is_better=True, min_obs_required=1)
    ForecastsEvaluated=[]
    CILevel=None
    CurrentEstimator=mlr
    OptimizeOn=mean
    GridsFile=MVGrids
)

Exploratory Data Analysis

[41]:

mvf.plot(series='volume')
plt.show()

../_images/examples_Introduction2_60_0.png

[42]:

mvf.plot(series='price')
plt.show()

../_images/examples_Introduction2_61_0.png

[43]:

mvf.corr(disp='heatmap',cmap='Spectral',annot=True,vmin=-1,vmax=1)
plt.show()

../_images/examples_Introduction2_62_0.png

[44]:

mvf.corr_lags(y='price',x='volume',disp='heatmap',cmap='Spectral',annot=True,vmin=-1,vmax=1,lags=13)
plt.show()

../_images/examples_Introduction2_63_0.png

[45]:

mvf.corr_lags(y='volume',x='price',disp='heatmap',cmap='Spectral',annot=True,vmin=-1,vmax=1,lags=13)
plt.show()

../_images/examples_Introduction2_64_0.png

Parameterize the `MVForecaster` Object

Starting in scalecast version 0.16.0, you can skip model testing by setting a test length of 0.
In this example, all models will be tested on the last 15% of the observed values in the dataset.
We will also have model optimization select hyperparemeters based on what predicts the volume series, rather than the price series, or an average of the two (which is the default), best.
Custom optimization functions are available.

[46]:

mvf.set_test_length(.15)
mvf.set_optimize_on('volume') # we care more about predicting volume and price is just used to make those predictions more accurate
# by default, the optimizer uses an average scoring of all series in the MVForecaster object
mvf.eval_cis() # tell object to evaluate cis

mvf

[46]:

MVForecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    N_series=2
    SeriesNames=['volume', 'price']
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos']
    TestLength=25
    ValidationLength=1
    ValidationMetric=MetricStore(name='rmse', eval_func=<function Metrics.rmse at 0x121d5dbc0>, lower_is_better=True, min_obs_required=1)
    ForecastsEvaluated=[]
    CILevel=0.95
    CurrentEstimator=mlr
    OptimizeOn=volume
    GridsFile=MVGrids
)

Run Models

Uses scikit-learn models and APIs only.
See the adapted VECM model for this object.

ElasticNet

[47]:

mvf.set_estimator('elasticnet')
mvf.manual_forecast(alpha=0.2,dynamic_testing=13,lags=13)

[47]:

{'volume': [np.float64(106335852.49281684),
  np.float64(105996760.49615355),
  np.float64(103764831.76312366),
  np.float64(103219085.1545862),
  np.float64(103556174.50935248),
  np.float64(103631712.71106488),
  np.float64(102440651.65475723),
  np.float64(101711752.26293223),
  np.float64(101107339.59279619),
  np.float64(100363808.85579455),
  np.float64(100029026.11427599),
  np.float64(99652566.48028274),
  np.float64(99185655.81929667)],
 'price': [np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793)]}

XGBoost

[48]:

mvf.set_estimator('xgboost')
mvf.manual_forecast(gamma=1,dynamic_testing=13,lags=13)

[48]:

{'volume': [np.float32(1.1481013e+08),
  np.float32(1.1214653e+08),
  np.float32(1.1213585e+08),
  np.float32(1.2023188e+08),
  np.float32(1.1611754e+08),
  np.float32(1.2740009e+08),
  np.float32(1.1520513e+08),
  np.float32(1.0009431e+08),
  np.float32(1.0179305e+08),
  np.float32(1.06290056e+08),
  np.float32(1.0324345e+08),
  np.float32(1.0475008e+08),
  np.float32(1.1199661e+08)],
 'price': [np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366)]}

MLP Stack

auxmodels.mlp_stack

[49]:

from scalecast.auxmodels import mlp_stack

[50]:

mvf.export('model_summaries')

[50]:

Series	ModelNickname	Estimator	Xvars	HyperParams	Lags	Observations	DynamicallyTested	TestSetLength	ValidationMetric	...	MetricOptimized	best_model	TestSetRMSE	TestSetR2	TestSetMAE	TestSetMAPE	InSampleRMSE	InSampleR2	InSampleMAE	InSampleMAPE
volume	elasticnet	elasticnet	[t, weeksin, weekcos]	{'alpha': 0.2}	13	169	13	25	<NA>	...	<NA>	<NA>	1.869490e+07	0.248765	1.282436e+07	0.118172	1.197603e+07	0.486109	8.206334e+06	8.891664e-02
volume	xgboost	xgboost	[t, weeksin, weekcos]	{'gamma': 1}	13	169	13	25	<NA>	...	<NA>	<NA>	1.992583e+07	0.146580	1.381269e+07	0.135479	4.290375e+01	1.000000	3.229199e+01	3.583936e-07
price	elasticnet	elasticnet	[t, weeksin, weekcos]	{'alpha': 0.2}	13	169	13	25	<NA>	...	<NA>	<NA>	1.522410e-01	-0.048569	1.088633e-01	0.071282	1.560728e-01	0.000000	1.199107e-01	8.408015e-02
price	xgboost	xgboost	[t, weeksin, weekcos]	{'gamma': 1}	13	169	13	25	<NA>	...	<NA>	<NA>	1.288486e-01	0.248908	8.764766e-02	0.057013	1.128880e-01	0.476832	8.104546e-02	5.703486e-02

4 rows × 22 columns

[51]:

mlp_stack(mvf,model_nicknames=['elasticnet','xgboost'],lags=13)

[51]:

MVForecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    N_series=2
    SeriesNames=['volume', 'price']
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos']
    TestLength=25
    ValidationLength=1
    ValidationMetric=MetricStore(name='rmse', eval_func=<function Metrics.rmse at 0x121d5dbc0>, lower_is_better=True, min_obs_required=1)
    ForecastsEvaluated=['elasticnet', 'xgboost', 'mlp_stack']
    CILevel=0.95
    CurrentEstimator=stacking
    OptimizeOn=volume
    GridsFile=MVGrids
)

[52]:

mvf.set_best_model(determine_best_by='TestSetRMSE')

[52]:

MVForecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    N_series=2
    SeriesNames=['volume', 'price']
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos']
    TestLength=25
    ValidationLength=1
    ValidationMetric=MetricStore(name='rmse', eval_func=<function Metrics.rmse at 0x121d5dbc0>, lower_is_better=True, min_obs_required=1)
    ForecastsEvaluated=['elasticnet', 'xgboost', 'mlp_stack']
    CILevel=0.95
    CurrentEstimator=stacking
    OptimizeOn=volume
    GridsFile=MVGrids
)

[53]:

mvf.plot_test_set(ci=True,series='volume',put_best_on_top=True)
plt.show()

../_images/examples_Introduction2_77_0.png

[54]:

mvf.plot(ci=True,series='volume',put_best_on_top=True)
plt.show()

../_images/examples_Introduction2_78_0.png

Probabalistic forecasting for creating confidence intervals is currently being worked on in the MVForecaster object, but until that is done, the backtested interval also works well:

[55]:

mvf.plot(ci=True,series='volume',put_best_on_top=True)
plt.show()

../_images/examples_Introduction2_80_0.png

Break Back into Forecaster Objects

You can then add univariate models to these objects to compare with the models run multivariate.

[56]:

from scalecast.util import break_mv_forecaster

[57]:

fvol, fprice = break_mv_forecaster(mvf)

[58]:

fvol

[58]:

Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=['elasticnet', 'xgboost', 'mlp_stack']
    CILevel=0.95
    CurrentEstimator=mlr
    GridsFile=Grids
)

[59]:

fprice

[59]:

Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=['elasticnet', 'xgboost', 'mlp_stack']
    CILevel=0.95
    CurrentEstimator=mlr
    GridsFile=Grids
)

Transformations

One of the most effective way to boost forecasting power is with transformations.
Transformations include:
- Log
- Scaling
  - Standard
  - MinMax
- Differencing
- Detrending
- Custom Functions
All transformations have a corresponding revert function.
See the blog post.

[60]:

from scalecast.SeriesTransformer import SeriesTransformer

[61]:

f_trans = Forecaster(y=volume,current_dates=volume.index,future_dates=13)

[62]:

f_trans.set_test_length(.15)
f_trans.set_validation_length(13)

[62]:

Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=None
    CurrentEstimator=mlr
    GridsFile=Grids
)

[63]:

transformer = SeriesTransformer(f_trans)

[64]:

# these will all be reverted later after forecasts have been called
f_trans = transformer.DiffTransform(1)
f_trans = transformer.DiffTransform(52)
f_trans = transformer.DetrendTransform()

[65]:

f_trans.plot()
plt.show()

../_images/examples_Introduction2_92_0.png

[66]:

f_trans.add_time_trend()
f_trans.add_seasonal_regressors('week',sincos=True,raw=False)
f_trans.add_ar_terms(13)

[66]:

Forecaster(
    DateStartActuals=2016-01-10T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=116
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos', AR(lag_order=1), AR(lag_order=2), AR(lag_order=3), AR(lag_order=4), AR(lag_order=5), AR(lag_order=6), AR(lag_order=7), AR(lag_order=8), AR(lag_order=9), AR(lag_order=10), AR(lag_order=11), AR(lag_order=12), AR(lag_order=13)]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=None
    CurrentEstimator=mlr
    GridsFile=Grids
)

[67]:

f_trans.set_estimator('xgboost')
f_trans.manual_forecast(gamma=1,dynamic_testing=13)

[67]:

[np.float32(1.00715175e+06),
 np.float32(-2.8431745e+06),
 np.float32(2.4069338e+06),
 np.float32(2.0318998e+06),
 np.float32(-7.8323395e+06),
 np.float32(1.5664614e+06),
 np.float32(-2.0622385e+06),
 np.float32(-5.4871235e+06),
 np.float32(-5.6764815e+06),
 np.float32(2.5870602e+06),
 np.float32(1.9597401e+06),
 np.float32(1.4060644e+06),
 np.float32(-249488.03)]

[68]:

f_trans.plot_test_set()
plt.show()

../_images/examples_Introduction2_95_0.png

[69]:

f_trans.plot()
plt.show()

../_images/examples_Introduction2_96_0.png

[70]:

# call revert functions in the opposite order as how they were called when transforming
f_trans = transformer.DetrendRevert()
f_trans = transformer.DiffRevert(52)
f_trans = transformer.DiffRevert(1)

[71]:

f_trans.plot_test_set()
plt.show()

../_images/examples_Introduction2_98_0.png

[72]:

f_trans.plot()
plt.show()

../_images/examples_Introduction2_99_0.png

Pipelines

These are objects similar to scikit-learn pipelines that offer readable and streamlined code for transforming, forecasting, and reverting.
See the Pipeline object documentation.

[73]:

from scalecast.Pipeline import Transformer, Reverter, Pipeline, MVPipeline

[74]:

f_pipe = Forecaster(y=volume,current_dates=volume.index,future_dates=13)
f_pipe.set_test_length(.15)

[74]:

Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=None
    CurrentEstimator=mlr
    GridsFile=Grids
)

[75]:

def forecaster(f):
    f.add_time_trend()
    f.add_seasonal_regressors('week',raw=False,sincos=True)
    f.add_ar_terms(13)
    f.set_estimator('catboost')
    f.manual_forecast(max_depth=2, verbose=False, dynamic_testing=13)

[76]:

transformer = Transformer(
    transformers = [
        ('DiffTransform',1),
        ('DiffTransform',52),
        ('DetrendTransform',)
    ]
)

reverter = Reverter(
    reverters = [
        ('DetrendRevert',),
        ('DiffRevert',52),
        ('DiffRevert',1)
    ],
    base_transformer = transformer,
)

[77]:

reverter

[77]:

Reverter(
  reverters = [
    ('DetrendRevert',),
    ('DiffRevert', 52),
    ('DiffRevert', 1)
  ],
  base_transformer = Transformer(
  transformers = [
    ('DiffTransform', 1),
    ('DiffTransform', 52),
    ('DetrendTransform',)
  ]
)
)

[78]:

pipeline = Pipeline(
    steps = [
        ('Transform',transformer),
        ('Forecast',forecaster),
        ('Revert',reverter),
    ]
)

f_pipe = pipeline.fit_predict(f_pipe)

[79]:

f_pipe.plot_test_set()
plt.show()

../_images/examples_Introduction2_107_0.png

[80]:

f_pipe.plot()
plt.show()

../_images/examples_Introduction2_108_0.png

Fully Automated Pipelines

We can automate the construction of pipelines, the selection of input variables, and tuning of models with cross validation on a grid search for each model using files in the working directory called Grids.py for univariate forecasting and MVGrids.py for multivariate. Default grids can be downloaded from scalecast.

Automated Univariate Pipelines

Documentation

[81]:

from scalecast import GridGenerator
from scalecast.util import find_optimal_transformation

[82]:

GridGenerator.get_example_grids(overwrite=True)

[83]:

f_pipe_aut = Forecaster(y=volume,current_dates=volume.index,future_dates=13)
f_pipe_aut.set_test_length(.15)

[83]:

Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=None
    CurrentEstimator=mlr
    GridsFile=Grids
)

[84]:

def forecaster_aut(f,models):
    f.auto_Xvar_select(
        estimator='elasticnet',
        monitor='TestSetMAE',
        alpha=0.2,
        irr_cycles = [26],
    )
    f.tune_test_forecast(
        models,
        cross_validate=True,
        k=3,
        # dynamic tuning = 13 means we will hopefully find a model that is optimized to predict 13 steps
        dynamic_tuning=13,
        dynamic_testing=13,
    )
    f.set_estimator('combo')
    f.manual_forecast()

util.find_optimal_transformation

In this function, the following transformations are searched for:

Detrending
Box-Cox
First Differencing
Seasonal Differencing
Scaling

The optimal set of transformations are returned based on best estimated out-of-sample performance on the test set. Therefore, running this function introduces leakage into the test set, but it can still be a good addition to an automated pipeline, depending on the application. Which and the order of transfomations to search through are configurable. How performance is measured, the parameters specific to a given transformation, and several other paramters are also configurable. See the documentation.

[85]:

transformer_aut, reverter_aut = find_optimal_transformation(
    f_pipe_aut,
    lags = 13,
    m = 52,
    monitor = 'mae',
    estimator = 'elasticnet',
    alpha = 0.2,
    test_length = 13,
    num_test_sets = 3,
    space_between_sets = 4,
    verbose = True,
) # returns a Transformer and Reverter object that can be plugged into a larger pipeline

Using elasticnet model to find the best transformation set on 3 test sets, each 13 in length.
All transformation tries will be evaluated with 13 lags.
Last transformer tried:
[]
Score (mae): 17933061.20374991
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'loess': True})]
Score (mae): 23964157.165726677
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1})]
Score (mae): 17174376.36667073
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2})]
Score (mae): 24467364.037870836
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'})]
Score (mae): 11573053.425807392
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]
Score (mae): 9478522.651025778
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('DiffTransform', 52)]
Score (mae): 11116081.856823217
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('ScaleTransform',)]
Score (mae): 9583504.942193016
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('MinMaxTransform',)]
Score (mae): 9583504.942193046
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('RobustScaleTransform',)]
Score (mae): 9583504.942193015
--------------------------------------------------
Final Selection:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]

[86]:

pipeline_aut = Pipeline(
    steps = [
        ('Transform',transformer_aut),
        ('Forecast',forecaster_aut),
        ('Revert',reverter_aut),
    ]
)

f_pipe_aut = pipeline_aut.fit_predict(
    f_pipe_aut,
    models=[
        'mlr',
        'elasticnet',
        'xgboost',
        'knn',
    ],
)

[87]:

f_pipe_aut

[87]:

Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[AR(lag_order=1), AR(lag_order=2), AR(lag_order=3), AR(lag_order=4), AR(lag_order=5)]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=['mlr', 'elasticnet', 'xgboost', 'knn', 'combo']
    CILevel=None
    CurrentEstimator=combo
    GridsFile=Grids
)

[88]:

f_pipe_aut.plot_test_set(order_by='TestSetRMSE')
plt.show()

../_images/examples_Introduction2_119_0.png

[89]:

f_pipe_aut.plot(order_by='TestSetRMSE')
plt.show()

../_images/examples_Introduction2_120_0.png

Backtest Univariate Pipeline

You may be interested to know beyond a single test-set metric, how well your pipeline performs out-of-sample. Backtesting can help answer that by iterating through the entire pipeline several times and testing the procedure each time. It can also help make expanding confidence intervals. See the documentation.

[90]:

from scalecast.util import backtest_metrics

[91]:

uv_backtest_results = pipeline_aut.backtest(
    f_pipe_aut,
    n_iter = 3,
    jump_back = 13,
    cis = False,
    models=[
        'mlr',
        'elasticnet',
        'xgboost',
        'knn',
    ],
)

After obtaining the results from the backtest, we can see the average performance over each iteration using the util.backtest_metrics function. See the documentation.

[92]:

pd.options.display.float_format = '{:,.4f}'.format
backtest_metrics(
    uv_backtest_results,
    mets=['smape','rmse','bias'],
)

[92]:

		Iter0	Iter1	Iter2	Average
Model	Metric
mlr	smape	0.1130	0.2317	0.1142	0.1529
	rmse	15,488,744.6687	19,538,227.8480	11,910,709.6284	15,645,894.0484
	bias	-165,115,026.5519	-207,268,257.9295	103,757,419.3759	-89,541,955.0352
elasticnet	smape	0.1139	0.1962	0.1428	0.1510
	rmse	15,637,578.9398	16,625,313.2954	14,338,412.2989	15,533,768.1780
	bias	-166,295,878.6389	-169,974,287.0231	145,808,728.0805	-63,487,145.8605
xgboost	smape	0.1559	0.1151	0.1475	0.1395
	rmse	19,082,290.8290	11,550,757.4375	15,191,141.2472	15,274,729.8379
	bias	-219,935,082.4832	-103,119,653.1940	147,885,363.8979	-58,389,790.5931
knn	smape	0.1568	0.1896	0.1439	0.1635
	rmse	19,855,895.0761	16,441,684.0561	14,418,665.8740	16,905,415.0021
	bias	-220,650,230.0243	-174,089,027.5994	145,204,593.3703	-83,178,221.4178
combo	smape	0.1344	0.1805	0.1366	0.1505
	rmse	17,371,835.8070	15,768,729.5255	13,916,077.0706	15,685,547.4677
	bias	-192,999,054.0779	-163,612,807.9017	135,664,027.3303	-73,649,278.2164

Automated Multivariate Pipelines

See the MVPipeline object documentation.

[93]:

GridGenerator.get_mv_grids(overwrite=True)

[94]:

fvol_aut = Forecaster(
    y=volume,
    current_dates=volume.index,
    future_dates=13,
    test_length = .15,
)
fprice_aut = Forecaster(
    y=price,
    current_dates=price.index,
    future_dates=13,
    test_length = .15,
)

[95]:

def add_vars(f,**kwargs):
    f.add_seasonal_regressors(
        'month',
        'quarter',
        'week',
        raw=False,
        sincos=True
    )

def mvforecaster(mvf,models):
    mvf.set_optimize_on('volume')
    mvf.tune_test_forecast(
        models,
        cross_validate=True,
        k=2,
        rolling=True,
        dynamic_tuning=13,
        dynamic_testing=13,
        limit_grid_size = .2,
        error = 'warn',
    )

[96]:

transformer_vol, reverter_vol = find_optimal_transformation(
    fvol_aut,
    lags = 13,
    m = 52,
    monitor = 'mae',
    estimator = 'elasticnet',
    alpha = 0.2,
    test_length = 13,
    num_test_sets = 3,
    space_between_sets = 4,
    verbose = True,
)

Using elasticnet model to find the best transformation set on 3 test sets, each 13 in length.
All transformation tries will be evaluated with 13 lags.
Last transformer tried:
[]
Score (mae): 17933061.20374991
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'loess': True})]
Score (mae): 23964157.165726677
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1})]
Score (mae): 17174376.36667073
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2})]
Score (mae): 24467364.037870836
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'})]
Score (mae): 11573053.425807392
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]
Score (mae): 9478522.651025778
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('DiffTransform', 52)]
Score (mae): 11116081.856823217
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('ScaleTransform',)]
Score (mae): 9583504.942193016
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('MinMaxTransform',)]
Score (mae): 9583504.942193046
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('RobustScaleTransform',)]
Score (mae): 9583504.942193015
--------------------------------------------------
Final Selection:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]

[97]:

transformer_price, reverter_price = find_optimal_transformation(
    fprice_aut,
    lags = 13,
    m = 52,
    monitor = 'mae',
    estimator = 'elasticnet',
    alpha = 0.2,
    test_length = 13,
    num_test_sets = 3,
    space_between_sets = 4,
    verbose = True,
)

Using elasticnet model to find the best transformation set on 3 test sets, each 13 in length.
All transformation tries will be evaluated with 13 lags.
Last transformer tried:
[]
Score (mae): 0.06804292152560591
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'loess': True})]
Score (mae): 0.3531129223321485
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1})]
Score (mae): 0.18654572266988675
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2})]
Score (mae): 0.4079071208348029
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'})]
Score (mae): 0.04226554848615104
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('Transform', BoxcoxTransform, {'lmbda': -0.5})]
Score (mae): 0.04754421078794407
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('Transform', BoxcoxTransform, {'lmbda': 0})]
Score (mae): 0.04573807786929985
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('Transform', BoxcoxTransform, {'lmbda': 0.5})]
Score (mae): 0.04392809054109528
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]
Score (mae): 0.08609820028223668
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 52)]
Score (mae): 0.05863628346364504
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('ScaleTransform',)]
Score (mae): 0.03860371560228571
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('MinMaxTransform',)]
Score (mae): 0.04226554848615097
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('RobustScaleTransform',)]
Score (mae): 0.039814357454047544
--------------------------------------------------
Final Selection:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('ScaleTransform',)]

[98]:

mvpipeline = MVPipeline(
    steps = [
        ('Transform',[transformer_vol,transformer_price]),
        ('Add Xvars',[add_vars]*2),
        ('Forecast',mvforecaster),
        ('Revert',[reverter_vol,reverter_price]),
    ],
    test_length = 20,
    cis = True,
    names = ['volume','price'],
)

fvol_aut, fprice_aut = mvpipeline.fit_predict(
    fvol_aut,
    fprice_aut,
    models=[
        'mlr',
        'elasticnet',
        'xgboost',
    ],
) # returns a tuple of Forecaster objects

[99]:

fvol_aut.plot_test_set(order_by='TestSetRMSE',ci=True)
plt.show()

../_images/examples_Introduction2_133_0.png

[100]:

fvol_aut.plot(order_by='TestSetRMSE',ci=True)
plt.show()

../_images/examples_Introduction2_134_0.png

Backtest Multivariate Pipeline

Like univariate pipelines, multivariate pipelines can also be backtested. Info about each model and series becomes possible to compare. See the documentation.

[101]:

# recreate Forecaster objects to bring dates back lost from taking seasonal differences
fvol_aut = Forecaster(
    y=volume,
    current_dates=volume.index,
    future_dates=13,
)
fprice_aut = Forecaster(
    y=price,
    current_dates=price.index,
    future_dates=13,
)

[102]:

mv_backtest_results = mvpipeline.backtest(
    fvol_aut,
    fprice_aut,
    n_iter = 3,
    jump_back = 13,
    test_length = 0,
    cis = False,
    models=[
        'mlr',
        'elasticnet',
        'xgboost',
    ],
)

[103]:

backtest_metrics(
    mv_backtest_results,
    mets=['smape','rmse','bias'],
    names = ['Volume','Price'],
)

[103]:

			Iter0	Iter1	Iter2	Average
Series	Model	Metric
Volume	mlr	smape	0.0633	0.1365	0.1656	0.1218
		rmse	10,189,713.7944	12,579,221.2052	16,886,578.8084	13,218,504.6026
		bias	-81,340,397.6578	-118,858,462.0219	167,738,817.2818	-10,820,014.1326
	elasticnet	smape	0.1156	0.1814	0.1616	0.1529
		rmse	15,693,483.0781	15,391,716.1042	16,157,240.6763	15,747,479.9529
		bias	-164,494,984.5647	-151,411,465.5474	171,433,487.9525	-48,157,654.0532
	xgboost	smape	0.1038	0.1632	0.1181	0.1283
		rmse	14,651,691.0077	14,850,108.8682	12,279,905.4688	13,927,235.1149
		bias	-138,886,144.2332	-132,099,657.9440	114,021,698.2729	-52,321,367.9681
Price	mlr	smape	0.0566	0.0849	0.0728	0.0714
		rmse	0.0867	0.1452	0.1614	0.1311
		bias	1.0108	1.2946	-1.3093	0.3320
	elasticnet	smape	0.0275	0.0632	0.1485	0.0798
		rmse	0.0444	0.1364	0.2719	0.1509
		bias	-0.2007	-1.3118	-3.0910	-1.5345
	xgboost	smape	0.0466	0.0446	0.0663	0.0525
		rmse	0.0751	0.0984	0.1485	0.1073
		bias	0.7649	-0.5261	-1.2205	-0.3272

Through backtesting, we can see that the multivariate approach out-performed the univariate approach. Very cool!

Scaled Automated Forecasting

We can scale the fully automated approach to many series where we can then access all results through plotting with Jupyter widgets and export functions.
We produce a separate forecast for avocado sales in each region in our dataset.
This is done with a univariate approach, but cleverly using the code in this notebook, it could be transformed into a multivariate process where volume and price are forecasted together.

[104]:

from scalecast.notebook import results_vis
from tqdm.notebook import tqdm

[105]:

def forecaster_scaled(f,models):
    f.auto_Xvar_select(
        estimator='elasticnet',
        monitor='TestSetMAE',
        alpha=0.2,
        irr_cycles = [26],
    )
    f.tune_test_forecast(
        models,
        dynamic_testing=13,
    )
    f.set_estimator('combo')
    f.manual_forecast()

[106]:

results_dict = {}
for region in tqdm(data.region.unique()):
    series = data.loc[data['region'] == region].groupby('Date')['Total Volume'].sum()
    f_i = Forecaster(
        y = series,
        current_dates = series.index,
        future_dates = 13,
        test_length = .15,
        validation_length = 13,
        cis = True,
    )
    transformer_i, reverter_i = find_optimal_transformation(
        f_i,
        lags = 13,
        m = 52,
        monitor = 'mae',
        estimator = 'elasticnet',
        alpha = 0.2,
        test_length = 13,
    )
    pipeline_i = Pipeline(
        steps = [
            ('Transform',transformer_i),
            ('Forecast',forecaster_scaled),
            ('Revert',reverter_i),
        ]
    )
    f_i = pipeline_i.fit_predict(
        f_i,
        models=[
            'mlr',
            'elasticnet',
            'xgboost',
            'knn',
        ],
    )
    results_dict[region] = f_i

Run the next two functions locally to see the full functionality of these widgets.

[107]:

results_vis(results_dict,'test')

[108]:

results_vis(results_dict)

Exporting Results

[109]:

from scalecast.multiseries import export_model_summaries

Exporting Results from a Single `Forecaster` Object

[110]:

results = f.export(cis=True,models=['mlr','lasso','ridge'])
results.keys()

[110]:

dict_keys(['model_summaries', 'lvl_fcsts', 'lvl_test_set_predictions'])

[111]:

for k, df in results.items():
    print(f'{k} has these columns:',*df.columns,'-'*25,sep='\n')

model_summaries has these columns:
ModelNickname
Estimator
Xvars
HyperParams
Observations
DynamicallyTested
TestSetLength
CILevel
ValidationMetric
models
weights
best_model
ValidationMetricValue
TestSetRMSE
TestSetR2
TestSetMAE
TestSetMAPE
InSampleRMSE
InSampleR2
InSampleMAE
InSampleMAPE
-------------------------
lvl_fcsts has these columns:
DATE
mlr
mlr_upperci
mlr_lowerci
lasso
lasso_upperci
lasso_lowerci
ridge
ridge_upperci
ridge_lowerci
-------------------------
lvl_test_set_predictions has these columns:
DATE
actual
mlr
mlr_upperci
mlr_lowerci
lasso
lasso_upperci
lasso_lowerci
ridge
ridge_upperci
ridge_lowerci
-------------------------

[112]:

results['model_summaries'][['ModelNickname','HyperParams','TestSetRMSE','InSampleRMSE']]

[112]:

ModelNickname	HyperParams	TestSetRMSE	InSampleRMSE
mlr	{}	16,818,489.5277	10,231,495.9303
lasso	{'alpha': 0.2}	16,818,489.9828	10,231,495.9303
ridge	{'alpha': 0.2}	16,827,165.4875	10,265,352.0145

Other export functions:
Forecaster.export_Xvars_df
Forecaster.export_feature_importance
Forecaster.export_fitted_vals
Forecaster.export_summary_stats
Forecaster.export_validation_grid

Other plotting functions:
Forecaster.plot_fitted
Forecaster.plot_periodogram

Exporting Results from a Single `MVForecaster` Object

[113]:

mvresults = mvf.export(cis=True,models=['elasticnet','xgboost'])
mvresults.keys()

[113]:

dict_keys(['model_summaries', 'lvl_fcsts', 'lvl_test_set_predictions'])

[114]:

for k, df in mvresults.items():
    print(f'{k} has these columns:',*df.columns,'-'*25,sep='\n')

model_summaries has these columns:
Series
ModelNickname
Estimator
Xvars
HyperParams
Lags
Observations
DynamicallyTested
TestSetLength
ValidationMetric
ValidationMetricValue
OptimizedOn
best_model
MetricOptimized
TestSetRMSE
TestSetR2
TestSetMAE
TestSetMAPE
InSampleRMSE
InSampleR2
InSampleMAE
InSampleMAPE
-------------------------
lvl_fcsts has these columns:
DATE
volume_elasticnet_lvl_fcst
volume_elasticnet_lvl_fcst_upper
volume_elasticnet_lvl_fcst_lower
volume_xgboost_lvl_fcst
volume_xgboost_lvl_fcst_upper
volume_xgboost_lvl_fcst_lower
price_elasticnet_lvl_fcst
price_elasticnet_lvl_fcst_upper
price_elasticnet_lvl_fcst_lower
price_xgboost_lvl_fcst
price_xgboost_lvl_fcst_upper
price_xgboost_lvl_fcst_lower
-------------------------
lvl_test_set_predictions has these columns:
DATE
volume_actuals
volume_elasticnet_lvl_ts
volume_elasticnet_lvl_ts_upper
volume_elasticnet_lvl_ts_lower
volume_xgboost_lvl_ts
volume_xgboost_lvl_ts_upper
volume_xgboost_lvl_ts_lower
price_actuals
price_elasticnet_lvl_ts
price_elasticnet_lvl_ts_upper
price_elasticnet_lvl_ts_lower
price_xgboost_lvl_ts
price_xgboost_lvl_ts_upper
price_xgboost_lvl_ts_lower
-------------------------

[115]:

mvresults['model_summaries'][['Series','ModelNickname','HyperParams','Lags','TestSetRMSE','InSampleRMSE']]

[115]:

Series	ModelNickname	HyperParams	Lags	TestSetRMSE	InSampleRMSE
volume	elasticnet	{'alpha': 0.2}	13	18,694,896.5159	11,976,031.0554
volume	xgboost	{'gamma': 1}	13	19,925,832.7824	42.9037
price	elasticnet	{'alpha': 0.2}	13	0.1522	0.1561
price	xgboost	{'gamma': 1}	13	0.1288	0.1129

Other export functions:
Forecaster.export_fitted_vals
Forecaster.export_validation_grid

Other plotting functions:

MVForecaster.plot_fitted

Exporting Results from a Dictionary of `Forecaster` Objects

[116]:

all_results = export_model_summaries(results_dict)
all_results[['ModelNickname','Series','Xvars','HyperParams','TestSetRMSE','InSampleRMSE']].sample(10)

[116]:

	ModelNickname	Series	Xvars	HyperParams	TestSetRMSE	InSampleRMSE
123	knn	MiamiFtLauderdale	[AR(lag_order=1)]	{'n_neighbors': 6}	210,226.7387	212,046.9685
94	combo	Houston	None	{}	278,085.7441	202,264.4191
37	xgboost	Charlotte	[weeksin, weekcos, monthsin, monthcos, quarter...	{'n_estimators': 150, 'scale_pos_weight': 5, '...	69,206.1080	24,644.7896
152	xgboost	NorthernNewEngland	[weeksin, weekcos, monthsin, monthcos, AR(lag_...	{'n_estimators': 250, 'scale_pos_weight': 5, '...	113,026.2050	33,877.4027
97	xgboost	Indianapolis	[t, AR(lag_order=1), AR(lag_order=2), AR(lag_o...	{'n_estimators': 200, 'scale_pos_weight': 5, '...	49,505.8778	19,566.0374
194	combo	RichmondNorfolk	None	{}	75,493.7775	55,380.6707
209	combo	SanDiego	None	{}	118,202.8681	125,687.3310
44	combo	Chicago	None	{}	130,219.1839	230,083.5853
144	combo	NewYork	None	{}	391,217.1665	1,362,145.0224
215	mlr	Seattle	[AR(lag_order=1), AR(lag_order=2), AR(lag_orde...	{'normalizer': 'scale'}	117,065.8371	112,476.3328

[ ]:

Introductory Example

Univariate Forecasting

Load the Forecaster Object

Exploratory Data Analysis

Parameterize the Forecaster Object

Set Test Length

Tell the Object to Evaluate Confidence Intervals

Specify Model Inputs

Trend

Seasonality

Autoregressive Terms / Series Lags

Run Models

Linear Scikit-Learn Models

Non-linear Scikit-Learn Models

Stacking Models

Sklearn Stacking Model

Scalecast Stacking Model

ARIMA

Multivariate Forecasting

Load the MVForecaster Object

Exploratory Data Analysis

Parameterize the MVForecaster Object

Run Models

ElasticNet

XGBoost

MLP Stack

Break Back into Forecaster Objects

Transformations

Pipelines

Fully Automated Pipelines

Automated Univariate Pipelines

util.find_optimal_transformation

Backtest Univariate Pipeline

Automated Multivariate Pipelines

Backtest Multivariate Pipeline

Scaled Automated Forecasting

Exporting Results

Exporting Results from a Single Forecaster Object

Exporting Results from a Single MVForecaster Object

Exporting Results from a Dictionary of Forecaster Objects

Load the `Forecaster` Object

Parameterize the `Forecaster` Object

Load the `MVForecaster` Object

Parameterize the `MVForecaster` Object

Exporting Results from a Single `Forecaster` Object

Exporting Results from a Single `MVForecaster` Object

Exporting Results from a Dictionary of `Forecaster` Objects