Introductory Example

This demonstrates an example using scalecast.

pip install --upgrade scalecast

If you spot a typo or you have suggestions to improve functionality or documentation, open an issue.

Link to dataset used in this example: https://www.kaggle.com/datasets/neuromusic/avocado-prices.

Link to other examples: https://github.com/mikekeith52/scalecast-examples

[1]:
import pandas as pd
import matplotlib.pyplot as plt
[2]:
# read in data
data = pd.read_csv('avocado.csv',parse_dates = ['Date'])
data = data.sort_values(['region','type','Date'])

Univariate Forecasting

Load the Forecaster Object

  • This is an object that can store data, run forecasts, store results, and plot. It’s a UI, procedure, and set of models all-in-one.

  • Forecasts in scalecast are run with a dynamic recursive approach by default, as opposed to a direct or other approach.

  • Forecaster Object Documentation.

[3]:
from scalecast.Forecaster import Forecaster
[4]:
volume = data.groupby('Date')['Total Volume'].sum()
[5]:
f = Forecaster(
    y = volume,
    current_dates = volume.index,
    future_dates = 13,
)

f
[5]:
Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=0
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=None
    CurrentEstimator=mlr
    GridsFile=Grids
)

Exploratory Data Analysis

[6]:
f.plot()
plt.show()
../_images/examples_Introduction2_9_0.png
[7]:
plt.rc("figure",figsize=(10,6))
f.seasonal_decompose().plot()
plt.show()
../_images/examples_Introduction2_10_0.png
[8]:
figs, axs = plt.subplots(2, 1,figsize=(9,9))
f.plot_acf(ax=axs[0],title='ACF',lags=26,color='black')
f.plot_pacf(ax=axs[1],title='PACF',lags=26,color='#B2C248',method='ywm')
plt.show()
../_images/examples_Introduction2_11_0.png

Parameterize the Forecaster Object

Set Test Length

  • You can skip model testing by setting a test length of 0 (which is the default any time you initiate the object).

  • In this example, all models will be tested on the last 15% of the observed values in the dataset.

[9]:
f.set_test_length(.15)
[9]:
Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=None
    CurrentEstimator=mlr
    GridsFile=Grids
)

Tell the Object to Evaluate Confidence Intervals

  • This only works if there is a test set specified and it is of a sufficient size.

  • See the documentation.

  • See the example.

[10]:
# default args below
f.eval_cis(
    mode = True, # tell the object to evaluate intervals
    cilevel = .95, # 95% confidence level
)
[10]:
Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=0.95
    CurrentEstimator=mlr
    GridsFile=Grids
)

Specify Model Inputs

Trend
[11]:
f.add_time_trend()
[11]:
Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=['t']
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=0.95
    CurrentEstimator=mlr
    GridsFile=Grids
)
Seasonality
[12]:
f.add_seasonal_regressors('week',raw=False,sincos=True)
Autoregressive Terms / Series Lags
[13]:
f.add_ar_terms(13)
[13]:
Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos', AR(lag_order=1), AR(lag_order=2), AR(lag_order=3), AR(lag_order=4), AR(lag_order=5), AR(lag_order=6), AR(lag_order=7), AR(lag_order=8), AR(lag_order=9), AR(lag_order=10), AR(lag_order=11), AR(lag_order=12), AR(lag_order=13)]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=0.95
    CurrentEstimator=mlr
    GridsFile=Grids
)

Run Models

  • See the available models.

  • See the blog post.

  • The dynamic_testing argument for all of these will be 13 – test-set results will then be in terms of rolling averages of 13-step forecasts, which is also our forecast length.

  • The resulting forecasts from this process are not well-fit. Better forecasts are obtained once more optimization is performed in later sections.

Linear Scikit-Learn Models

[14]:
f.set_estimator('mlr')
f.manual_forecast(dynamic_testing=13)
[14]:
[np.float64(111925329.05919904),
 np.float64(121868567.62128673),
 np.float64(99242683.29670584),
 np.float64(105678887.13113497),
 np.float64(112308203.89056407),
 np.float64(120678154.74020167),
 np.float64(116651150.57302916),
 np.float64(109280118.5937806),
 np.float64(112349081.0792562),
 np.float64(109439809.17344561),
 np.float64(110071156.63943005),
 np.float64(101942075.08718517),
 np.float64(105855231.06833495)]
[15]:
f.set_estimator('lasso')
f.manual_forecast(alpha=0.2,dynamic_testing=13)
[15]:
[np.float64(111925345.03515087),
 np.float64(121868543.75811642),
 np.float64(99242690.97795567),
 np.float64(105678895.70507629),
 np.float64(112308214.10968),
 np.float64(120678152.0012235),
 np.float64(116651130.64591947),
 np.float64(109280120.05952656),
 np.float64(112349068.95905635),
 np.float64(109439806.5289072),
 np.float64(110071134.33767673),
 np.float64(101942075.85864875),
 np.float64(105855227.63866332)]
[16]:
f.set_estimator('ridge')
f.manual_forecast(alpha=0.2,dynamic_testing=13)
[16]:
[np.float64(112274660.61929211),
 np.float64(119418913.83741271),
 np.float64(101042044.25944781),
 np.float64(105727082.80765453),
 np.float64(112062324.78367476),
 np.float64(119544283.79214948),
 np.float64(114434064.88122702),
 np.float64(109083035.43914787),
 np.float64(111016311.82109724),
 np.float64(108835849.31780478),
 np.float64(108575648.39289793),
 np.float64(102768196.78693624),
 np.float64(105049882.09154822)]
[17]:
f.set_estimator('elasticnet')
f.manual_forecast(alpha=0.2,l1_ratio=0.5,dynamic_testing=13)
[17]:
[np.float64(105685629.29640055),
 np.float64(105281381.8744145),
 np.float64(103215061.54035863),
 np.float64(102649894.2349436),
 np.float64(102879280.27984047),
 np.float64(102902501.47034767),
 np.float64(101898337.98197824),
 np.float64(101102145.44408616),
 np.float64(100388037.21838164),
 np.float64(99432772.90679798),
 np.float64(98765481.90172084),
 np.float64(98088863.2782056),
 np.float64(97329749.89139286)]
[18]:
f.set_estimator('sgd')
f.manual_forecast(alpha=0.2,l1_ratio=0.5,dynamic_testing=13)
[18]:
[np.float64(103005110.44890657),
 np.float64(102528203.37330228),
 np.float64(101188761.77107164),
 np.float64(100611465.43448582),
 np.float64(100455483.80855274),
 np.float64(100195411.17846547),
 np.float64(99413182.09937711),
 np.float64(98773882.05096462),
 np.float64(98164749.1028991),
 np.float64(97420240.520294),
 np.float64(96873112.59986287),
 np.float64(96330077.83240241),
 np.float64(95718401.74348791)]
[19]:
f.plot_test_set(ci=True,models=['mlr','lasso','ridge','elasticnet','sgd'],order_by='TestSetRMSE')
plt.show()
../_images/examples_Introduction2_31_0.png
[20]:
f.plot(ci=True,models=['mlr','lasso','ridge','elasticnet','sgd'],order_by='TestSetRMSE')
plt.show()
../_images/examples_Introduction2_32_0.png

Non-linear Scikit-Learn Models

[21]:
f.set_estimator('rf')
f.manual_forecast(max_depth=2,dynamic_testing=13)
[21]:
[np.float64(105519042.51181273),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(106747493.59269297),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506),
 np.float64(105368518.4357506)]
[22]:
f.set_estimator('gbt')
f.manual_forecast(max_depth=2,dynamic_testing=13)
[22]:
[np.float64(115938308.61096564),
 np.float64(114842333.95051594),
 np.float64(116570947.86813788),
 np.float64(118027393.92171198),
 np.float64(121927583.9638932),
 np.float64(122059827.9244863),
 np.float64(110544046.04649158),
 np.float64(106702436.29151869),
 np.float64(112453200.90496303),
 np.float64(111024626.7337773),
 np.float64(111690629.80145997),
 np.float64(112840537.34805126),
 np.float64(117367062.54025795)]
[23]:
f.set_estimator('xgboost')
f.manual_forecast(gamma=1,dynamic_testing=13)
[23]:
[np.float32(1.1698018e+08),
 np.float32(1.1564128e+08),
 np.float32(1.1667483e+08),
 np.float32(1.1958582e+08),
 np.float32(1.170815e+08),
 np.float32(1.2778861e+08),
 np.float32(1.173033e+08),
 np.float32(1.0321523e+08),
 np.float32(1.0819822e+08),
 np.float32(1.0809726e+08),
 np.float32(1.06283944e+08),
 np.float32(1.0922606e+08),
 np.float32(1.16995064e+08)]
[24]:
f.set_estimator('catboost')
f.manual_forecast(depth=4,verbose=False,dynamic_testing=13)
[24]:
[np.float64(116846314.39195865),
 np.float64(114782872.13859823),
 np.float64(115526880.62077451),
 np.float64(112943011.8195494),
 np.float64(118833386.8812325),
 np.float64(118462895.75596091),
 np.float64(110523264.47436357),
 np.float64(111988737.59071368),
 np.float64(112226803.30083351),
 np.float64(111621609.22791208),
 np.float64(111429705.42153662),
 np.float64(110663502.60068737),
 np.float64(109363815.35791725)]
[25]:
f.set_estimator('knn')
f.manual_forecast(n_neighbors=5,dynamic_testing=13)
[25]:
[np.float64(104376279.50799999),
 np.float64(106889207.4),
 np.float64(100185057.636),
 np.float64(98687859.784),
 np.float64(100107069.238),
 np.float64(109220441.306),
 np.float64(102384289.098),
 np.float64(102155557.60800001),
 np.float64(100432864.052),
 np.float64(97987550.39),
 np.float64(94980742.072),
 np.float64(95231419.28400001),
 np.float64(93919469.74599999)]
[26]:
f.set_estimator('mlp')
f.manual_forecast(hidden_layer_sizes=(50,50),solver='lbfgs',dynamic_testing=13)
[26]:
[np.float64(109164589.4082709),
 np.float64(114384836.39800653),
 np.float64(111344510.3697601),
 np.float64(113340626.03508061),
 np.float64(114950093.75819898),
 np.float64(129457409.91330838),
 np.float64(122420201.4697397),
 np.float64(116571080.21675634),
 np.float64(117800884.2601351),
 np.float64(115517193.02928248),
 np.float64(115238510.36223798),
 np.float64(114534741.32529166),
 np.float64(113846676.41878459)]
[27]:
f.plot_test_set(
    ci=True,
    models=['rf','gbt','xgboost','lightgbm','catboost','knn','mlp'],
    order_by='TestSetRMSE'
)
plt.show()
../_images/examples_Introduction2_40_0.png
[28]:
f.plot(ci=True,models=['rf','gbt','xgboost','knn','mlp'],order_by='TestSetRMSE')
plt.show()
../_images/examples_Introduction2_41_0.png

Stacking Models

Sklearn Stacking Model
[29]:
from sklearn.ensemble import StackingRegressor
[30]:
f.add_sklearn_estimator(StackingRegressor,'stacking')
[30]:
Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos', AR(lag_order=1), AR(lag_order=2), AR(lag_order=3), AR(lag_order=4), AR(lag_order=5), AR(lag_order=6), AR(lag_order=7), AR(lag_order=8), AR(lag_order=9), AR(lag_order=10), AR(lag_order=11), AR(lag_order=12), AR(lag_order=13)]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=['mlr', 'lasso', 'ridge', 'elasticnet', 'sgd', 'rf', 'gbt', 'xgboost', 'catboost', 'knn', 'mlp']
    CILevel=0.95
    CurrentEstimator=mlp
    GridsFile=Grids
)
[31]:
estimators = [
    ('elasticnet',f.estimators.lookup_item('elasticnet').imported_model(alpha=0.2)),
    ('xgboost',f.estimators.lookup_item('xgboost').imported_model(gamma=1)),
    ('gbt',f.estimators.lookup_item('gbt').imported_model(max_depth=2)),
]

final_estimator = f.estimators.lookup_item('catboost').imported_model(verbose=False)

f.add_sklearn_estimator(StackingRegressor,'stacking')
f.set_estimator('stacking')
f.manual_forecast(
    estimators=estimators,
    final_estimator=final_estimator,
    dynamic_testing=13,
    call_me='sklearn_stack',
)
[31]:
[np.float64(107599182.74294516),
 np.float64(109954074.74807498),
 np.float64(109253667.79037124),
 np.float64(107324377.66522634),
 np.float64(109667251.60962425),
 np.float64(140510135.26105216),
 np.float64(107599182.74294516),
 np.float64(114776432.91538765),
 np.float64(110559300.11845502),
 np.float64(111993840.23790823),
 np.float64(113790448.97065803),
 np.float64(113528926.8568792),
 np.float64(129154865.04544067)]
Scalecast Stacking Model
[32]:
f.add_signals(['elasticnet','xgboost','knn'],train_only=True)
f.set_estimator('catboost')
f.manual_forecast(call_me = 'scalecast_stack',verbose=False)
[32]:
[np.float64(114370918.11640924),
 np.float64(114016247.62424554),
 np.float64(115033489.37223953),
 np.float64(113970030.66329396),
 np.float64(114042622.2036964),
 np.float64(115123567.68239625),
 np.float64(112929094.00877601),
 np.float64(112220435.15175214),
 np.float64(113633748.55193451),
 np.float64(113245862.48019421),
 np.float64(111596901.5275508),
 np.float64(112700546.59594658),
 np.float64(110843933.54867446)]
[33]:
f.plot_test_set(models=['sklearn_stack','scalecast_stack'],ci=True,order_by='TestSetRMSE')
plt.show()
../_images/examples_Introduction2_49_0.png
[34]:
f.plot(models=['sklearn_stack','scalecast_stack'],ci=True,order_by='TestSetRMSE')
plt.show()
../_images/examples_Introduction2_50_0.png

ARIMA

[35]:
from scalecast.auxmodels import auto_arima
[36]:
auto_arima(f,m=52)
[36]:
Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos', AR(lag_order=1), AR(lag_order=2), AR(lag_order=3), AR(lag_order=4), AR(lag_order=5), AR(lag_order=6), AR(lag_order=7), AR(lag_order=8), AR(lag_order=9), AR(lag_order=10), AR(lag_order=11), AR(lag_order=12), AR(lag_order=13), 'signal_elasticnet', 'signal_xgboost', 'signal_knn']
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=['mlr', 'lasso', 'ridge', 'elasticnet', 'sgd', 'rf', 'gbt', 'xgboost', 'catboost', 'knn', 'mlp', 'sklearn_stack', 'scalecast_stack', 'auto_arima']
    CILevel=0.95
    CurrentEstimator=arima
    GridsFile=Grids
)
[37]:
f.plot_test_set(models='auto_arima',ci=True)
plt.show()
../_images/examples_Introduction2_54_0.png
[38]:
f.plot(models='auto_arima',ci=True)
plt.show()
../_images/examples_Introduction2_55_0.png

Multivariate Forecasting

Load the MVForecaster Object

  • This object extends the univariate approach to several series, with many of the same plotting and reporting features available.

  • Documentation

[39]:
from scalecast.MVForecaster import MVForecaster
[40]:
price = data.groupby('Date')['AveragePrice'].mean()

fvol = Forecaster(y=volume,current_dates=volume.index,future_dates=13)
fprice = Forecaster(y=price,current_dates=price.index,future_dates=13)

fvol.add_time_trend()
fvol.add_seasonal_regressors('week',raw=False,sincos=True)

mvf = MVForecaster(
    fvol,
    fprice,
    merge_Xvars='union',
    names=['volume','price'],
)

mvf
[40]:
MVForecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    N_series=2
    SeriesNames=['volume', 'price']
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos']
    TestLength=0
    ValidationLength=1
    ValidationMetric=MetricStore(name='rmse', eval_func=<function Metrics.rmse at 0x121d5dbc0>, lower_is_better=True, min_obs_required=1)
    ForecastsEvaluated=[]
    CILevel=None
    CurrentEstimator=mlr
    OptimizeOn=mean
    GridsFile=MVGrids
)

Exploratory Data Analysis

[41]:
mvf.plot(series='volume')
plt.show()
../_images/examples_Introduction2_60_0.png
[42]:
mvf.plot(series='price')
plt.show()
../_images/examples_Introduction2_61_0.png
[43]:
mvf.corr(disp='heatmap',cmap='Spectral',annot=True,vmin=-1,vmax=1)
plt.show()
../_images/examples_Introduction2_62_0.png
[44]:
mvf.corr_lags(y='price',x='volume',disp='heatmap',cmap='Spectral',annot=True,vmin=-1,vmax=1,lags=13)
plt.show()
../_images/examples_Introduction2_63_0.png
[45]:
mvf.corr_lags(y='volume',x='price',disp='heatmap',cmap='Spectral',annot=True,vmin=-1,vmax=1,lags=13)
plt.show()
../_images/examples_Introduction2_64_0.png

Parameterize the MVForecaster Object

  • Starting in scalecast version 0.16.0, you can skip model testing by setting a test length of 0.

  • In this example, all models will be tested on the last 15% of the observed values in the dataset.

  • We will also have model optimization select hyperparemeters based on what predicts the volume series, rather than the price series, or an average of the two (which is the default), best.

  • Custom optimization functions are available.

[46]:
mvf.set_test_length(.15)
mvf.set_optimize_on('volume') # we care more about predicting volume and price is just used to make those predictions more accurate
# by default, the optimizer uses an average scoring of all series in the MVForecaster object
mvf.eval_cis() # tell object to evaluate cis

mvf
[46]:
MVForecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    N_series=2
    SeriesNames=['volume', 'price']
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos']
    TestLength=25
    ValidationLength=1
    ValidationMetric=MetricStore(name='rmse', eval_func=<function Metrics.rmse at 0x121d5dbc0>, lower_is_better=True, min_obs_required=1)
    ForecastsEvaluated=[]
    CILevel=0.95
    CurrentEstimator=mlr
    OptimizeOn=volume
    GridsFile=MVGrids
)

Run Models

  • Uses scikit-learn models and APIs only.

  • See the adapted VECM model for this object.

ElasticNet

[47]:
mvf.set_estimator('elasticnet')
mvf.manual_forecast(alpha=0.2,dynamic_testing=13,lags=13)
[47]:
{'volume': [np.float64(106335852.49281684),
  np.float64(105996760.49615355),
  np.float64(103764831.76312366),
  np.float64(103219085.1545862),
  np.float64(103556174.50935248),
  np.float64(103631712.71106488),
  np.float64(102440651.65475723),
  np.float64(101711752.26293223),
  np.float64(101107339.59279619),
  np.float64(100363808.85579455),
  np.float64(100029026.11427599),
  np.float64(99652566.48028274),
  np.float64(99185655.81929667)],
 'price': [np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793),
  np.float64(1.4104745120749793)]}

XGBoost

[48]:
mvf.set_estimator('xgboost')
mvf.manual_forecast(gamma=1,dynamic_testing=13,lags=13)
[48]:
{'volume': [np.float32(1.1481013e+08),
  np.float32(1.1214653e+08),
  np.float32(1.1213585e+08),
  np.float32(1.2023188e+08),
  np.float32(1.1611754e+08),
  np.float32(1.2740009e+08),
  np.float32(1.1520513e+08),
  np.float32(1.0009431e+08),
  np.float32(1.0179305e+08),
  np.float32(1.06290056e+08),
  np.float32(1.0324345e+08),
  np.float32(1.0475008e+08),
  np.float32(1.1199661e+08)],
 'price': [np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366),
  np.float32(1.3645366)]}

MLP Stack

[49]:
from scalecast.auxmodels import mlp_stack
[50]:
mvf.export('model_summaries')
[50]:
Series ModelNickname Estimator Xvars HyperParams Lags Observations DynamicallyTested TestSetLength ValidationMetric ... MetricOptimized best_model TestSetRMSE TestSetR2 TestSetMAE TestSetMAPE InSampleRMSE InSampleR2 InSampleMAE InSampleMAPE
0 volume elasticnet elasticnet [t, weeksin, weekcos] {'alpha': 0.2} 13 169 13 25 <NA> ... <NA> <NA> 1.869490e+07 0.248765 1.282436e+07 0.118172 1.197603e+07 0.486109 8.206334e+06 8.891664e-02
0 volume xgboost xgboost [t, weeksin, weekcos] {'gamma': 1} 13 169 13 25 <NA> ... <NA> <NA> 1.992583e+07 0.146580 1.381269e+07 0.135479 4.290375e+01 1.000000 3.229199e+01 3.583936e-07
0 price elasticnet elasticnet [t, weeksin, weekcos] {'alpha': 0.2} 13 169 13 25 <NA> ... <NA> <NA> 1.522410e-01 -0.048569 1.088633e-01 0.071282 1.560728e-01 0.000000 1.199107e-01 8.408015e-02
0 price xgboost xgboost [t, weeksin, weekcos] {'gamma': 1} 13 169 13 25 <NA> ... <NA> <NA> 1.288486e-01 0.248908 8.764766e-02 0.057013 1.128880e-01 0.476832 8.104546e-02 5.703486e-02

4 rows × 22 columns

[51]:
mlp_stack(mvf,model_nicknames=['elasticnet','xgboost'],lags=13)
[51]:
MVForecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    N_series=2
    SeriesNames=['volume', 'price']
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos']
    TestLength=25
    ValidationLength=1
    ValidationMetric=MetricStore(name='rmse', eval_func=<function Metrics.rmse at 0x121d5dbc0>, lower_is_better=True, min_obs_required=1)
    ForecastsEvaluated=['elasticnet', 'xgboost', 'mlp_stack']
    CILevel=0.95
    CurrentEstimator=stacking
    OptimizeOn=volume
    GridsFile=MVGrids
)
[52]:
mvf.set_best_model(determine_best_by='TestSetRMSE')
[52]:
MVForecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    N_series=2
    SeriesNames=['volume', 'price']
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos']
    TestLength=25
    ValidationLength=1
    ValidationMetric=MetricStore(name='rmse', eval_func=<function Metrics.rmse at 0x121d5dbc0>, lower_is_better=True, min_obs_required=1)
    ForecastsEvaluated=['elasticnet', 'xgboost', 'mlp_stack']
    CILevel=0.95
    CurrentEstimator=stacking
    OptimizeOn=volume
    GridsFile=MVGrids
)
[53]:
mvf.plot_test_set(ci=True,series='volume',put_best_on_top=True)
plt.show()
../_images/examples_Introduction2_77_0.png
[54]:
mvf.plot(ci=True,series='volume',put_best_on_top=True)
plt.show()
../_images/examples_Introduction2_78_0.png

Probabalistic forecasting for creating confidence intervals is currently being worked on in the MVForecaster object, but until that is done, the backtested interval also works well:

[55]:
mvf.plot(ci=True,series='volume',put_best_on_top=True)
plt.show()
../_images/examples_Introduction2_80_0.png

Break Back into Forecaster Objects

  • You can then add univariate models to these objects to compare with the models run multivariate.

[56]:
from scalecast.util import break_mv_forecaster
[57]:
fvol, fprice = break_mv_forecaster(mvf)
[58]:
fvol
[58]:
Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=['elasticnet', 'xgboost', 'mlp_stack']
    CILevel=0.95
    CurrentEstimator=mlr
    GridsFile=Grids
)
[59]:
fprice
[59]:
Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=['elasticnet', 'xgboost', 'mlp_stack']
    CILevel=0.95
    CurrentEstimator=mlr
    GridsFile=Grids
)

Transformations

[60]:
from scalecast.SeriesTransformer import SeriesTransformer
[61]:
f_trans = Forecaster(y=volume,current_dates=volume.index,future_dates=13)
[62]:
f_trans.set_test_length(.15)
f_trans.set_validation_length(13)
[62]:
Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=None
    CurrentEstimator=mlr
    GridsFile=Grids
)
[63]:
transformer = SeriesTransformer(f_trans)
[64]:
# these will all be reverted later after forecasts have been called
f_trans = transformer.DiffTransform(1)
f_trans = transformer.DiffTransform(52)
f_trans = transformer.DetrendTransform()
[65]:
f_trans.plot()
plt.show()
../_images/examples_Introduction2_92_0.png
[66]:
f_trans.add_time_trend()
f_trans.add_seasonal_regressors('week',sincos=True,raw=False)
f_trans.add_ar_terms(13)
[66]:
Forecaster(
    DateStartActuals=2016-01-10T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=116
    ForecastLength=13
    Xvars=['t', 'weeksin', 'weekcos', AR(lag_order=1), AR(lag_order=2), AR(lag_order=3), AR(lag_order=4), AR(lag_order=5), AR(lag_order=6), AR(lag_order=7), AR(lag_order=8), AR(lag_order=9), AR(lag_order=10), AR(lag_order=11), AR(lag_order=12), AR(lag_order=13)]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=None
    CurrentEstimator=mlr
    GridsFile=Grids
)
[67]:
f_trans.set_estimator('xgboost')
f_trans.manual_forecast(gamma=1,dynamic_testing=13)
[67]:
[np.float32(1.00715175e+06),
 np.float32(-2.8431745e+06),
 np.float32(2.4069338e+06),
 np.float32(2.0318998e+06),
 np.float32(-7.8323395e+06),
 np.float32(1.5664614e+06),
 np.float32(-2.0622385e+06),
 np.float32(-5.4871235e+06),
 np.float32(-5.6764815e+06),
 np.float32(2.5870602e+06),
 np.float32(1.9597401e+06),
 np.float32(1.4060644e+06),
 np.float32(-249488.03)]
[68]:
f_trans.plot_test_set()
plt.show()
../_images/examples_Introduction2_95_0.png
[69]:
f_trans.plot()
plt.show()
../_images/examples_Introduction2_96_0.png
[70]:
# call revert functions in the opposite order as how they were called when transforming
f_trans = transformer.DetrendRevert()
f_trans = transformer.DiffRevert(52)
f_trans = transformer.DiffRevert(1)
[71]:
f_trans.plot_test_set()
plt.show()
../_images/examples_Introduction2_98_0.png
[72]:
f_trans.plot()
plt.show()
../_images/examples_Introduction2_99_0.png

Pipelines

  • These are objects similar to scikit-learn pipelines that offer readable and streamlined code for transforming, forecasting, and reverting.

  • See the Pipeline object documentation.

[73]:
from scalecast.Pipeline import Transformer, Reverter, Pipeline, MVPipeline
[74]:
f_pipe = Forecaster(y=volume,current_dates=volume.index,future_dates=13)
f_pipe.set_test_length(.15)
[74]:
Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=None
    CurrentEstimator=mlr
    GridsFile=Grids
)
[75]:
def forecaster(f):
    f.add_time_trend()
    f.add_seasonal_regressors('week',raw=False,sincos=True)
    f.add_ar_terms(13)
    f.set_estimator('catboost')
    f.manual_forecast(max_depth=2, verbose=False, dynamic_testing=13)
[76]:
transformer = Transformer(
    transformers = [
        ('DiffTransform',1),
        ('DiffTransform',52),
        ('DetrendTransform',)
    ]
)

reverter = Reverter(
    reverters = [
        ('DetrendRevert',),
        ('DiffRevert',52),
        ('DiffRevert',1)
    ],
    base_transformer = transformer,
)
[77]:
reverter
[77]:
Reverter(
  reverters = [
    ('DetrendRevert',),
    ('DiffRevert', 52),
    ('DiffRevert', 1)
  ],
  base_transformer = Transformer(
  transformers = [
    ('DiffTransform', 1),
    ('DiffTransform', 52),
    ('DetrendTransform',)
  ]
)
)
[78]:
pipeline = Pipeline(
    steps = [
        ('Transform',transformer),
        ('Forecast',forecaster),
        ('Revert',reverter),
    ]
)

f_pipe = pipeline.fit_predict(f_pipe)
[79]:
f_pipe.plot_test_set()
plt.show()
../_images/examples_Introduction2_107_0.png
[80]:
f_pipe.plot()
plt.show()
../_images/examples_Introduction2_108_0.png

Fully Automated Pipelines

  • We can automate the construction of pipelines, the selection of input variables, and tuning of models with cross validation on a grid search for each model using files in the working directory called Grids.py for univariate forecasting and MVGrids.py for multivariate. Default grids can be downloaded from scalecast.

Automated Univariate Pipelines

[81]:
from scalecast import GridGenerator
from scalecast.util import find_optimal_transformation
[82]:
GridGenerator.get_example_grids(overwrite=True)
[83]:
f_pipe_aut = Forecaster(y=volume,current_dates=volume.index,future_dates=13)
f_pipe_aut.set_test_length(.15)
[83]:
Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=None
    CurrentEstimator=mlr
    GridsFile=Grids
)
[84]:
def forecaster_aut(f,models):
    f.auto_Xvar_select(
        estimator='elasticnet',
        monitor='TestSetMAE',
        alpha=0.2,
        irr_cycles = [26],
    )
    f.tune_test_forecast(
        models,
        cross_validate=True,
        k=3,
        # dynamic tuning = 13 means we will hopefully find a model that is optimized to predict 13 steps
        dynamic_tuning=13,
        dynamic_testing=13,
    )
    f.set_estimator('combo')
    f.manual_forecast()

util.find_optimal_transformation

In this function, the following transformations are searched for:

  • Detrending

  • Box-Cox

  • First Differencing

  • Seasonal Differencing

  • Scaling

The optimal set of transformations are returned based on best estimated out-of-sample performance on the test set. Therefore, running this function introduces leakage into the test set, but it can still be a good addition to an automated pipeline, depending on the application. Which and the order of transfomations to search through are configurable. How performance is measured, the parameters specific to a given transformation, and several other paramters are also configurable. See the documentation.

[85]:
transformer_aut, reverter_aut = find_optimal_transformation(
    f_pipe_aut,
    lags = 13,
    m = 52,
    monitor = 'mae',
    estimator = 'elasticnet',
    alpha = 0.2,
    test_length = 13,
    num_test_sets = 3,
    space_between_sets = 4,
    verbose = True,
) # returns a Transformer and Reverter object that can be plugged into a larger pipeline
Using elasticnet model to find the best transformation set on 3 test sets, each 13 in length.
All transformation tries will be evaluated with 13 lags.
Last transformer tried:
[]
Score (mae): 17933061.20374991
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'loess': True})]
Score (mae): 23964157.165726677
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1})]
Score (mae): 17174376.36667073
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2})]
Score (mae): 24467364.037870836
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'})]
Score (mae): 11573053.425807392
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]
Score (mae): 9478522.651025778
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('DiffTransform', 52)]
Score (mae): 11116081.856823217
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('ScaleTransform',)]
Score (mae): 9583504.942193016
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('MinMaxTransform',)]
Score (mae): 9583504.942193046
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('RobustScaleTransform',)]
Score (mae): 9583504.942193015
--------------------------------------------------
Final Selection:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]
[86]:
pipeline_aut = Pipeline(
    steps = [
        ('Transform',transformer_aut),
        ('Forecast',forecaster_aut),
        ('Revert',reverter_aut),
    ]
)

f_pipe_aut = pipeline_aut.fit_predict(
    f_pipe_aut,
    models=[
        'mlr',
        'elasticnet',
        'xgboost',
        'knn',
    ],
)
[87]:
f_pipe_aut
[87]:
Forecaster(
    DateStartActuals=2015-01-04T00:00:00.000000
    DateEndActuals=2018-03-25T00:00:00.000000
    Freq=W-SUN
    N_actuals=169
    ForecastLength=13
    Xvars=[AR(lag_order=1), AR(lag_order=2), AR(lag_order=3), AR(lag_order=4), AR(lag_order=5)]
    TestLength=25
    ValidationMetric=rmse
    ForecastsEvaluated=['mlr', 'elasticnet', 'xgboost', 'knn', 'combo']
    CILevel=None
    CurrentEstimator=combo
    GridsFile=Grids
)
[88]:
f_pipe_aut.plot_test_set(order_by='TestSetRMSE')
plt.show()
../_images/examples_Introduction2_119_0.png
[89]:
f_pipe_aut.plot(order_by='TestSetRMSE')
plt.show()
../_images/examples_Introduction2_120_0.png

Backtest Univariate Pipeline

You may be interested to know beyond a single test-set metric, how well your pipeline performs out-of-sample. Backtesting can help answer that by iterating through the entire pipeline several times and testing the procedure each time. It can also help make expanding confidence intervals. See the documentation.

[90]:
from scalecast.util import backtest_metrics
[91]:
uv_backtest_results = pipeline_aut.backtest(
    f_pipe_aut,
    n_iter = 3,
    jump_back = 13,
    cis = False,
    models=[
        'mlr',
        'elasticnet',
        'xgboost',
        'knn',
    ],
)

After obtaining the results from the backtest, we can see the average performance over each iteration using the util.backtest_metrics function. See the documentation.

[92]:
pd.options.display.float_format = '{:,.4f}'.format
backtest_metrics(
    uv_backtest_results,
    mets=['smape','rmse','bias'],
)
[92]:
Iter0 Iter1 Iter2 Average
Model Metric
mlr smape 0.1130 0.2317 0.1142 0.1529
rmse 15,488,744.6687 19,538,227.8480 11,910,709.6284 15,645,894.0484
bias -165,115,026.5519 -207,268,257.9295 103,757,419.3759 -89,541,955.0352
elasticnet smape 0.1139 0.1962 0.1428 0.1510
rmse 15,637,578.9398 16,625,313.2954 14,338,412.2989 15,533,768.1780
bias -166,295,878.6389 -169,974,287.0231 145,808,728.0805 -63,487,145.8605
xgboost smape 0.1559 0.1151 0.1475 0.1395
rmse 19,082,290.8290 11,550,757.4375 15,191,141.2472 15,274,729.8379
bias -219,935,082.4832 -103,119,653.1940 147,885,363.8979 -58,389,790.5931
knn smape 0.1568 0.1896 0.1439 0.1635
rmse 19,855,895.0761 16,441,684.0561 14,418,665.8740 16,905,415.0021
bias -220,650,230.0243 -174,089,027.5994 145,204,593.3703 -83,178,221.4178
combo smape 0.1344 0.1805 0.1366 0.1505
rmse 17,371,835.8070 15,768,729.5255 13,916,077.0706 15,685,547.4677
bias -192,999,054.0779 -163,612,807.9017 135,664,027.3303 -73,649,278.2164

Automated Multivariate Pipelines

[93]:
GridGenerator.get_mv_grids(overwrite=True)
[94]:
fvol_aut = Forecaster(
    y=volume,
    current_dates=volume.index,
    future_dates=13,
    test_length = .15,
)
fprice_aut = Forecaster(
    y=price,
    current_dates=price.index,
    future_dates=13,
    test_length = .15,
)
[95]:
def add_vars(f,**kwargs):
    f.add_seasonal_regressors(
        'month',
        'quarter',
        'week',
        raw=False,
        sincos=True
    )

def mvforecaster(mvf,models):
    mvf.set_optimize_on('volume')
    mvf.tune_test_forecast(
        models,
        cross_validate=True,
        k=2,
        rolling=True,
        dynamic_tuning=13,
        dynamic_testing=13,
        limit_grid_size = .2,
        error = 'warn',
    )
[96]:
transformer_vol, reverter_vol = find_optimal_transformation(
    fvol_aut,
    lags = 13,
    m = 52,
    monitor = 'mae',
    estimator = 'elasticnet',
    alpha = 0.2,
    test_length = 13,
    num_test_sets = 3,
    space_between_sets = 4,
    verbose = True,
)
Using elasticnet model to find the best transformation set on 3 test sets, each 13 in length.
All transformation tries will be evaluated with 13 lags.
Last transformer tried:
[]
Score (mae): 17933061.20374991
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'loess': True})]
Score (mae): 23964157.165726677
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1})]
Score (mae): 17174376.36667073
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2})]
Score (mae): 24467364.037870836
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'})]
Score (mae): 11573053.425807392
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]
Score (mae): 9478522.651025778
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('DiffTransform', 52)]
Score (mae): 11116081.856823217
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('ScaleTransform',)]
Score (mae): 9583504.942193016
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('MinMaxTransform',)]
Score (mae): 9583504.942193046
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('RobustScaleTransform',)]
Score (mae): 9583504.942193015
--------------------------------------------------
Final Selection:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]
[97]:
transformer_price, reverter_price = find_optimal_transformation(
    fprice_aut,
    lags = 13,
    m = 52,
    monitor = 'mae',
    estimator = 'elasticnet',
    alpha = 0.2,
    test_length = 13,
    num_test_sets = 3,
    space_between_sets = 4,
    verbose = True,
)
Using elasticnet model to find the best transformation set on 3 test sets, each 13 in length.
All transformation tries will be evaluated with 13 lags.
Last transformer tried:
[]
Score (mae): 0.06804292152560591
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'loess': True})]
Score (mae): 0.3531129223321485
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1})]
Score (mae): 0.18654572266988675
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2})]
Score (mae): 0.4079071208348029
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'})]
Score (mae): 0.04226554848615104
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('Transform', BoxcoxTransform, {'lmbda': -0.5})]
Score (mae): 0.04754421078794407
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('Transform', BoxcoxTransform, {'lmbda': 0})]
Score (mae): 0.04573807786929985
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('Transform', BoxcoxTransform, {'lmbda': 0.5})]
Score (mae): 0.04392809054109528
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]
Score (mae): 0.08609820028223668
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 52)]
Score (mae): 0.05863628346364504
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('ScaleTransform',)]
Score (mae): 0.03860371560228571
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('MinMaxTransform',)]
Score (mae): 0.04226554848615097
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('RobustScaleTransform',)]
Score (mae): 0.039814357454047544
--------------------------------------------------
Final Selection:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('ScaleTransform',)]
[98]:
mvpipeline = MVPipeline(
    steps = [
        ('Transform',[transformer_vol,transformer_price]),
        ('Add Xvars',[add_vars]*2),
        ('Forecast',mvforecaster),
        ('Revert',[reverter_vol,reverter_price]),
    ],
    test_length = 20,
    cis = True,
    names = ['volume','price'],
)

fvol_aut, fprice_aut = mvpipeline.fit_predict(
    fvol_aut,
    fprice_aut,
    models=[
        'mlr',
        'elasticnet',
        'xgboost',
    ],
) # returns a tuple of Forecaster objects
[99]:
fvol_aut.plot_test_set(order_by='TestSetRMSE',ci=True)
plt.show()
../_images/examples_Introduction2_133_0.png
[100]:
fvol_aut.plot(order_by='TestSetRMSE',ci=True)
plt.show()
../_images/examples_Introduction2_134_0.png

Backtest Multivariate Pipeline

Like univariate pipelines, multivariate pipelines can also be backtested. Info about each model and series becomes possible to compare. See the documentation.

[101]:
# recreate Forecaster objects to bring dates back lost from taking seasonal differences
fvol_aut = Forecaster(
    y=volume,
    current_dates=volume.index,
    future_dates=13,
)
fprice_aut = Forecaster(
    y=price,
    current_dates=price.index,
    future_dates=13,
)
[102]:
mv_backtest_results = mvpipeline.backtest(
    fvol_aut,
    fprice_aut,
    n_iter = 3,
    jump_back = 13,
    test_length = 0,
    cis = False,
    models=[
        'mlr',
        'elasticnet',
        'xgboost',
    ],
)
[103]:
backtest_metrics(
    mv_backtest_results,
    mets=['smape','rmse','bias'],
    names = ['Volume','Price'],
)
[103]:
Iter0 Iter1 Iter2 Average
Series Model Metric
Volume mlr smape 0.0633 0.1365 0.1656 0.1218
rmse 10,189,713.7944 12,579,221.2052 16,886,578.8084 13,218,504.6026
bias -81,340,397.6578 -118,858,462.0219 167,738,817.2818 -10,820,014.1326
elasticnet smape 0.1156 0.1814 0.1616 0.1529
rmse 15,693,483.0781 15,391,716.1042 16,157,240.6763 15,747,479.9529
bias -164,494,984.5647 -151,411,465.5474 171,433,487.9525 -48,157,654.0532
xgboost smape 0.1038 0.1632 0.1181 0.1283
rmse 14,651,691.0077 14,850,108.8682 12,279,905.4688 13,927,235.1149
bias -138,886,144.2332 -132,099,657.9440 114,021,698.2729 -52,321,367.9681
Price mlr smape 0.0566 0.0849 0.0728 0.0714
rmse 0.0867 0.1452 0.1614 0.1311
bias 1.0108 1.2946 -1.3093 0.3320
elasticnet smape 0.0275 0.0632 0.1485 0.0798
rmse 0.0444 0.1364 0.2719 0.1509
bias -0.2007 -1.3118 -3.0910 -1.5345
xgboost smape 0.0466 0.0446 0.0663 0.0525
rmse 0.0751 0.0984 0.1485 0.1073
bias 0.7649 -0.5261 -1.2205 -0.3272

Through backtesting, we can see that the multivariate approach out-performed the univariate approach. Very cool!

Scaled Automated Forecasting

  • We can scale the fully automated approach to many series where we can then access all results through plotting with Jupyter widgets and export functions.

  • We produce a separate forecast for avocado sales in each region in our dataset.

  • This is done with a univariate approach, but cleverly using the code in this notebook, it could be transformed into a multivariate process where volume and price are forecasted together.

[104]:
from scalecast.notebook import results_vis
from tqdm.notebook import tqdm
[105]:
def forecaster_scaled(f,models):
    f.auto_Xvar_select(
        estimator='elasticnet',
        monitor='TestSetMAE',
        alpha=0.2,
        irr_cycles = [26],
    )
    f.tune_test_forecast(
        models,
        dynamic_testing=13,
    )
    f.set_estimator('combo')
    f.manual_forecast()
[106]:
results_dict = {}
for region in tqdm(data.region.unique()):
    series = data.loc[data['region'] == region].groupby('Date')['Total Volume'].sum()
    f_i = Forecaster(
        y = series,
        current_dates = series.index,
        future_dates = 13,
        test_length = .15,
        validation_length = 13,
        cis = True,
    )
    transformer_i, reverter_i = find_optimal_transformation(
        f_i,
        lags = 13,
        m = 52,
        monitor = 'mae',
        estimator = 'elasticnet',
        alpha = 0.2,
        test_length = 13,
    )
    pipeline_i = Pipeline(
        steps = [
            ('Transform',transformer_i),
            ('Forecast',forecaster_scaled),
            ('Revert',reverter_i),
        ]
    )
    f_i = pipeline_i.fit_predict(
        f_i,
        models=[
            'mlr',
            'elasticnet',
            'xgboost',
            'knn',
        ],
    )
    results_dict[region] = f_i

Run the next two functions locally to see the full functionality of these widgets.

[107]:
results_vis(results_dict,'test')
[108]:
results_vis(results_dict)

Exporting Results

[109]:
from scalecast.multiseries import export_model_summaries

Exporting Results from a Single Forecaster Object

[110]:
results = f.export(cis=True,models=['mlr','lasso','ridge'])
results.keys()
[110]:
dict_keys(['model_summaries', 'lvl_fcsts', 'lvl_test_set_predictions'])
[111]:
for k, df in results.items():
    print(f'{k} has these columns:',*df.columns,'-'*25,sep='\n')
model_summaries has these columns:
ModelNickname
Estimator
Xvars
HyperParams
Observations
DynamicallyTested
TestSetLength
CILevel
ValidationMetric
models
weights
best_model
ValidationMetricValue
TestSetRMSE
TestSetR2
TestSetMAE
TestSetMAPE
InSampleRMSE
InSampleR2
InSampleMAE
InSampleMAPE
-------------------------
lvl_fcsts has these columns:
DATE
mlr
mlr_upperci
mlr_lowerci
lasso
lasso_upperci
lasso_lowerci
ridge
ridge_upperci
ridge_lowerci
-------------------------
lvl_test_set_predictions has these columns:
DATE
actual
mlr
mlr_upperci
mlr_lowerci
lasso
lasso_upperci
lasso_lowerci
ridge
ridge_upperci
ridge_lowerci
-------------------------
[112]:
results['model_summaries'][['ModelNickname','HyperParams','TestSetRMSE','InSampleRMSE']]
[112]:
ModelNickname HyperParams TestSetRMSE InSampleRMSE
0 mlr {} 16,818,489.5277 10,231,495.9303
0 lasso {'alpha': 0.2} 16,818,489.9828 10,231,495.9303
0 ridge {'alpha': 0.2} 16,827,165.4875 10,265,352.0145

Exporting Results from a Single MVForecaster Object

[113]:
mvresults = mvf.export(cis=True,models=['elasticnet','xgboost'])
mvresults.keys()
[113]:
dict_keys(['model_summaries', 'lvl_fcsts', 'lvl_test_set_predictions'])
[114]:
for k, df in mvresults.items():
    print(f'{k} has these columns:',*df.columns,'-'*25,sep='\n')
model_summaries has these columns:
Series
ModelNickname
Estimator
Xvars
HyperParams
Lags
Observations
DynamicallyTested
TestSetLength
ValidationMetric
ValidationMetricValue
OptimizedOn
best_model
MetricOptimized
TestSetRMSE
TestSetR2
TestSetMAE
TestSetMAPE
InSampleRMSE
InSampleR2
InSampleMAE
InSampleMAPE
-------------------------
lvl_fcsts has these columns:
DATE
volume_elasticnet_lvl_fcst
volume_elasticnet_lvl_fcst_upper
volume_elasticnet_lvl_fcst_lower
volume_xgboost_lvl_fcst
volume_xgboost_lvl_fcst_upper
volume_xgboost_lvl_fcst_lower
price_elasticnet_lvl_fcst
price_elasticnet_lvl_fcst_upper
price_elasticnet_lvl_fcst_lower
price_xgboost_lvl_fcst
price_xgboost_lvl_fcst_upper
price_xgboost_lvl_fcst_lower
-------------------------
lvl_test_set_predictions has these columns:
DATE
volume_actuals
volume_elasticnet_lvl_ts
volume_elasticnet_lvl_ts_upper
volume_elasticnet_lvl_ts_lower
volume_xgboost_lvl_ts
volume_xgboost_lvl_ts_upper
volume_xgboost_lvl_ts_lower
price_actuals
price_elasticnet_lvl_ts
price_elasticnet_lvl_ts_upper
price_elasticnet_lvl_ts_lower
price_xgboost_lvl_ts
price_xgboost_lvl_ts_upper
price_xgboost_lvl_ts_lower
-------------------------
[115]:
mvresults['model_summaries'][['Series','ModelNickname','HyperParams','Lags','TestSetRMSE','InSampleRMSE']]
[115]:
Series ModelNickname HyperParams Lags TestSetRMSE InSampleRMSE
0 volume elasticnet {'alpha': 0.2} 13 18,694,896.5159 11,976,031.0554
0 volume xgboost {'gamma': 1} 13 19,925,832.7824 42.9037
0 price elasticnet {'alpha': 0.2} 13 0.1522 0.1561
0 price xgboost {'gamma': 1} 13 0.1288 0.1129
Other plotting functions:

Exporting Results from a Dictionary of Forecaster Objects

[116]:
all_results = export_model_summaries(results_dict)
all_results[['ModelNickname','Series','Xvars','HyperParams','TestSetRMSE','InSampleRMSE']].sample(10)
[116]:
ModelNickname Series Xvars HyperParams TestSetRMSE InSampleRMSE
123 knn MiamiFtLauderdale [AR(lag_order=1)] {'n_neighbors': 6} 210,226.7387 212,046.9685
94 combo Houston None {} 278,085.7441 202,264.4191
37 xgboost Charlotte [weeksin, weekcos, monthsin, monthcos, quarter... {'n_estimators': 150, 'scale_pos_weight': 5, '... 69,206.1080 24,644.7896
152 xgboost NorthernNewEngland [weeksin, weekcos, monthsin, monthcos, AR(lag_... {'n_estimators': 250, 'scale_pos_weight': 5, '... 113,026.2050 33,877.4027
97 xgboost Indianapolis [t, AR(lag_order=1), AR(lag_order=2), AR(lag_o... {'n_estimators': 200, 'scale_pos_weight': 5, '... 49,505.8778 19,566.0374
194 combo RichmondNorfolk None {} 75,493.7775 55,380.6707
209 combo SanDiego None {} 118,202.8681 125,687.3310
44 combo Chicago None {} 130,219.1839 230,083.5853
144 combo NewYork None {} 391,217.1665 1,362,145.0224
215 mlr Seattle [AR(lag_order=1), AR(lag_order=2), AR(lag_orde... {'normalizer': 'scale'} 117,065.8371 112,476.3328
[ ]: