Introductory Example
This demonstrates an example using scalecast.
pip install --upgrade scalecast
If you spot a typo or you have suggestions to improve functionality or documentation, open an issue.
Link to dataset used in this example: https://www.kaggle.com/datasets/neuromusic/avocado-prices.
Link to other examples: https://github.com/mikekeith52/scalecast-examples
[1]:
import pandas as pd
import matplotlib.pyplot as plt
[2]:
# read in data
data = pd.read_csv('avocado.csv',parse_dates = ['Date'])
data = data.sort_values(['region','type','Date'])
Univariate Forecasting
Load the Forecaster Object
This is an object that can store data, run forecasts, store results, and plot. It’s a UI, procedure, and set of models all-in-one.
Forecasts in scalecast are run with a dynamic recursive approach by default, as opposed to a direct or other approach.
[3]:
from scalecast.Forecaster import Forecaster
[4]:
volume = data.groupby('Date')['Total Volume'].sum()
[5]:
f = Forecaster(
y = volume,
current_dates = volume.index,
future_dates = 13,
)
f
[5]:
Forecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
ForecastLength=13
Xvars=[]
TestLength=0
ValidationMetric=rmse
ForecastsEvaluated=[]
CILevel=None
CurrentEstimator=mlr
GridsFile=Grids
)
Exploratory Data Analysis
[6]:
f.plot()
plt.show()
[7]:
plt.rc("figure",figsize=(10,6))
f.seasonal_decompose().plot()
plt.show()
[8]:
figs, axs = plt.subplots(2, 1,figsize=(9,9))
f.plot_acf(ax=axs[0],title='ACF',lags=26,color='black')
f.plot_pacf(ax=axs[1],title='PACF',lags=26,color='#B2C248',method='ywm')
plt.show()
Parameterize the Forecaster Object
Set Test Length
You can skip model testing by setting a test length of 0 (which is the default any time you initiate the object).
In this example, all models will be tested on the last 15% of the observed values in the dataset.
[9]:
f.set_test_length(.15)
[9]:
Forecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
ForecastLength=13
Xvars=[]
TestLength=25
ValidationMetric=rmse
ForecastsEvaluated=[]
CILevel=None
CurrentEstimator=mlr
GridsFile=Grids
)
Tell the Object to Evaluate Confidence Intervals
This only works if there is a test set specified and it is of a sufficient size.
See the documentation.
See the example.
[10]:
# default args below
f.eval_cis(
mode = True, # tell the object to evaluate intervals
cilevel = .95, # 95% confidence level
)
[10]:
Forecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
ForecastLength=13
Xvars=[]
TestLength=25
ValidationMetric=rmse
ForecastsEvaluated=[]
CILevel=0.95
CurrentEstimator=mlr
GridsFile=Grids
)
Specify Model Inputs
Trend
[11]:
f.add_time_trend()
[11]:
Forecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
ForecastLength=13
Xvars=['t']
TestLength=25
ValidationMetric=rmse
ForecastsEvaluated=[]
CILevel=0.95
CurrentEstimator=mlr
GridsFile=Grids
)
Seasonality
[12]:
f.add_seasonal_regressors('week',raw=False,sincos=True)
Autoregressive Terms / Series Lags
[13]:
f.add_ar_terms(13)
[13]:
Forecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
ForecastLength=13
Xvars=['t', 'weeksin', 'weekcos', AR(lag_order=1), AR(lag_order=2), AR(lag_order=3), AR(lag_order=4), AR(lag_order=5), AR(lag_order=6), AR(lag_order=7), AR(lag_order=8), AR(lag_order=9), AR(lag_order=10), AR(lag_order=11), AR(lag_order=12), AR(lag_order=13)]
TestLength=25
ValidationMetric=rmse
ForecastsEvaluated=[]
CILevel=0.95
CurrentEstimator=mlr
GridsFile=Grids
)
Run Models
See the available models.
See the blog post.
The
dynamic_testingargument for all of these will be 13 – test-set results will then be in terms of rolling averages of 13-step forecasts, which is also our forecast length.The resulting forecasts from this process are not well-fit. Better forecasts are obtained once more optimization is performed in later sections.
Linear Scikit-Learn Models
[14]:
f.set_estimator('mlr')
f.manual_forecast(dynamic_testing=13)
[14]:
[np.float64(111925329.05919904),
np.float64(121868567.62128673),
np.float64(99242683.29670584),
np.float64(105678887.13113497),
np.float64(112308203.89056407),
np.float64(120678154.74020167),
np.float64(116651150.57302916),
np.float64(109280118.5937806),
np.float64(112349081.0792562),
np.float64(109439809.17344561),
np.float64(110071156.63943005),
np.float64(101942075.08718517),
np.float64(105855231.06833495)]
[15]:
f.set_estimator('lasso')
f.manual_forecast(alpha=0.2,dynamic_testing=13)
[15]:
[np.float64(111925345.03515087),
np.float64(121868543.75811642),
np.float64(99242690.97795567),
np.float64(105678895.70507629),
np.float64(112308214.10968),
np.float64(120678152.0012235),
np.float64(116651130.64591947),
np.float64(109280120.05952656),
np.float64(112349068.95905635),
np.float64(109439806.5289072),
np.float64(110071134.33767673),
np.float64(101942075.85864875),
np.float64(105855227.63866332)]
[16]:
f.set_estimator('ridge')
f.manual_forecast(alpha=0.2,dynamic_testing=13)
[16]:
[np.float64(112274660.61929211),
np.float64(119418913.83741271),
np.float64(101042044.25944781),
np.float64(105727082.80765453),
np.float64(112062324.78367476),
np.float64(119544283.79214948),
np.float64(114434064.88122702),
np.float64(109083035.43914787),
np.float64(111016311.82109724),
np.float64(108835849.31780478),
np.float64(108575648.39289793),
np.float64(102768196.78693624),
np.float64(105049882.09154822)]
[17]:
f.set_estimator('elasticnet')
f.manual_forecast(alpha=0.2,l1_ratio=0.5,dynamic_testing=13)
[17]:
[np.float64(105685629.29640055),
np.float64(105281381.8744145),
np.float64(103215061.54035863),
np.float64(102649894.2349436),
np.float64(102879280.27984047),
np.float64(102902501.47034767),
np.float64(101898337.98197824),
np.float64(101102145.44408616),
np.float64(100388037.21838164),
np.float64(99432772.90679798),
np.float64(98765481.90172084),
np.float64(98088863.2782056),
np.float64(97329749.89139286)]
[18]:
f.set_estimator('sgd')
f.manual_forecast(alpha=0.2,l1_ratio=0.5,dynamic_testing=13)
[18]:
[np.float64(103005110.44890657),
np.float64(102528203.37330228),
np.float64(101188761.77107164),
np.float64(100611465.43448582),
np.float64(100455483.80855274),
np.float64(100195411.17846547),
np.float64(99413182.09937711),
np.float64(98773882.05096462),
np.float64(98164749.1028991),
np.float64(97420240.520294),
np.float64(96873112.59986287),
np.float64(96330077.83240241),
np.float64(95718401.74348791)]
[19]:
f.plot_test_set(ci=True,models=['mlr','lasso','ridge','elasticnet','sgd'],order_by='TestSetRMSE')
plt.show()
[20]:
f.plot(ci=True,models=['mlr','lasso','ridge','elasticnet','sgd'],order_by='TestSetRMSE')
plt.show()
Non-linear Scikit-Learn Models
[21]:
f.set_estimator('rf')
f.manual_forecast(max_depth=2,dynamic_testing=13)
[21]:
[np.float64(105519042.51181273),
np.float64(105368518.4357506),
np.float64(105368518.4357506),
np.float64(105368518.4357506),
np.float64(105368518.4357506),
np.float64(106747493.59269297),
np.float64(105368518.4357506),
np.float64(105368518.4357506),
np.float64(105368518.4357506),
np.float64(105368518.4357506),
np.float64(105368518.4357506),
np.float64(105368518.4357506),
np.float64(105368518.4357506)]
[22]:
f.set_estimator('gbt')
f.manual_forecast(max_depth=2,dynamic_testing=13)
[22]:
[np.float64(115938308.61096564),
np.float64(114842333.95051594),
np.float64(116570947.86813788),
np.float64(118027393.92171198),
np.float64(121927583.9638932),
np.float64(122059827.9244863),
np.float64(110544046.04649158),
np.float64(106702436.29151869),
np.float64(112453200.90496303),
np.float64(111024626.7337773),
np.float64(111690629.80145997),
np.float64(112840537.34805126),
np.float64(117367062.54025795)]
[23]:
f.set_estimator('xgboost')
f.manual_forecast(gamma=1,dynamic_testing=13)
[23]:
[np.float32(1.1698018e+08),
np.float32(1.1564128e+08),
np.float32(1.1667483e+08),
np.float32(1.1958582e+08),
np.float32(1.170815e+08),
np.float32(1.2778861e+08),
np.float32(1.173033e+08),
np.float32(1.0321523e+08),
np.float32(1.0819822e+08),
np.float32(1.0809726e+08),
np.float32(1.06283944e+08),
np.float32(1.0922606e+08),
np.float32(1.16995064e+08)]
[24]:
f.set_estimator('catboost')
f.manual_forecast(depth=4,verbose=False,dynamic_testing=13)
[24]:
[np.float64(116846314.39195865),
np.float64(114782872.13859823),
np.float64(115526880.62077451),
np.float64(112943011.8195494),
np.float64(118833386.8812325),
np.float64(118462895.75596091),
np.float64(110523264.47436357),
np.float64(111988737.59071368),
np.float64(112226803.30083351),
np.float64(111621609.22791208),
np.float64(111429705.42153662),
np.float64(110663502.60068737),
np.float64(109363815.35791725)]
[25]:
f.set_estimator('knn')
f.manual_forecast(n_neighbors=5,dynamic_testing=13)
[25]:
[np.float64(104376279.50799999),
np.float64(106889207.4),
np.float64(100185057.636),
np.float64(98687859.784),
np.float64(100107069.238),
np.float64(109220441.306),
np.float64(102384289.098),
np.float64(102155557.60800001),
np.float64(100432864.052),
np.float64(97987550.39),
np.float64(94980742.072),
np.float64(95231419.28400001),
np.float64(93919469.74599999)]
[26]:
f.set_estimator('mlp')
f.manual_forecast(hidden_layer_sizes=(50,50),solver='lbfgs',dynamic_testing=13)
[26]:
[np.float64(109164589.4082709),
np.float64(114384836.39800653),
np.float64(111344510.3697601),
np.float64(113340626.03508061),
np.float64(114950093.75819898),
np.float64(129457409.91330838),
np.float64(122420201.4697397),
np.float64(116571080.21675634),
np.float64(117800884.2601351),
np.float64(115517193.02928248),
np.float64(115238510.36223798),
np.float64(114534741.32529166),
np.float64(113846676.41878459)]
[27]:
f.plot_test_set(
ci=True,
models=['rf','gbt','xgboost','lightgbm','catboost','knn','mlp'],
order_by='TestSetRMSE'
)
plt.show()
[28]:
f.plot(ci=True,models=['rf','gbt','xgboost','knn','mlp'],order_by='TestSetRMSE')
plt.show()
Stacking Models
Sklearn Stacking Model
[29]:
from sklearn.ensemble import StackingRegressor
[30]:
f.add_sklearn_estimator(StackingRegressor,'stacking')
[30]:
Forecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
ForecastLength=13
Xvars=['t', 'weeksin', 'weekcos', AR(lag_order=1), AR(lag_order=2), AR(lag_order=3), AR(lag_order=4), AR(lag_order=5), AR(lag_order=6), AR(lag_order=7), AR(lag_order=8), AR(lag_order=9), AR(lag_order=10), AR(lag_order=11), AR(lag_order=12), AR(lag_order=13)]
TestLength=25
ValidationMetric=rmse
ForecastsEvaluated=['mlr', 'lasso', 'ridge', 'elasticnet', 'sgd', 'rf', 'gbt', 'xgboost', 'catboost', 'knn', 'mlp']
CILevel=0.95
CurrentEstimator=mlp
GridsFile=Grids
)
[31]:
estimators = [
('elasticnet',f.estimators.lookup_item('elasticnet').imported_model(alpha=0.2)),
('xgboost',f.estimators.lookup_item('xgboost').imported_model(gamma=1)),
('gbt',f.estimators.lookup_item('gbt').imported_model(max_depth=2)),
]
final_estimator = f.estimators.lookup_item('catboost').imported_model(verbose=False)
f.add_sklearn_estimator(StackingRegressor,'stacking')
f.set_estimator('stacking')
f.manual_forecast(
estimators=estimators,
final_estimator=final_estimator,
dynamic_testing=13,
call_me='sklearn_stack',
)
[31]:
[np.float64(107599182.74294516),
np.float64(109954074.74807498),
np.float64(109253667.79037124),
np.float64(107324377.66522634),
np.float64(109667251.60962425),
np.float64(140510135.26105216),
np.float64(107599182.74294516),
np.float64(114776432.91538765),
np.float64(110559300.11845502),
np.float64(111993840.23790823),
np.float64(113790448.97065803),
np.float64(113528926.8568792),
np.float64(129154865.04544067)]
Scalecast Stacking Model
[32]:
f.add_signals(['elasticnet','xgboost','knn'],train_only=True)
f.set_estimator('catboost')
f.manual_forecast(call_me = 'scalecast_stack',verbose=False)
[32]:
[np.float64(114370918.11640924),
np.float64(114016247.62424554),
np.float64(115033489.37223953),
np.float64(113970030.66329396),
np.float64(114042622.2036964),
np.float64(115123567.68239625),
np.float64(112929094.00877601),
np.float64(112220435.15175214),
np.float64(113633748.55193451),
np.float64(113245862.48019421),
np.float64(111596901.5275508),
np.float64(112700546.59594658),
np.float64(110843933.54867446)]
[33]:
f.plot_test_set(models=['sklearn_stack','scalecast_stack'],ci=True,order_by='TestSetRMSE')
plt.show()
[34]:
f.plot(models=['sklearn_stack','scalecast_stack'],ci=True,order_by='TestSetRMSE')
plt.show()
ARIMA
[35]:
from scalecast.auxmodels import auto_arima
[36]:
auto_arima(f,m=52)
[36]:
Forecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
ForecastLength=13
Xvars=['t', 'weeksin', 'weekcos', AR(lag_order=1), AR(lag_order=2), AR(lag_order=3), AR(lag_order=4), AR(lag_order=5), AR(lag_order=6), AR(lag_order=7), AR(lag_order=8), AR(lag_order=9), AR(lag_order=10), AR(lag_order=11), AR(lag_order=12), AR(lag_order=13), 'signal_elasticnet', 'signal_xgboost', 'signal_knn']
TestLength=25
ValidationMetric=rmse
ForecastsEvaluated=['mlr', 'lasso', 'ridge', 'elasticnet', 'sgd', 'rf', 'gbt', 'xgboost', 'catboost', 'knn', 'mlp', 'sklearn_stack', 'scalecast_stack', 'auto_arima']
CILevel=0.95
CurrentEstimator=arima
GridsFile=Grids
)
[37]:
f.plot_test_set(models='auto_arima',ci=True)
plt.show()
[38]:
f.plot(models='auto_arima',ci=True)
plt.show()
Multivariate Forecasting
Load the MVForecaster Object
This object extends the univariate approach to several series, with many of the same plotting and reporting features available.
[39]:
from scalecast.MVForecaster import MVForecaster
[40]:
price = data.groupby('Date')['AveragePrice'].mean()
fvol = Forecaster(y=volume,current_dates=volume.index,future_dates=13)
fprice = Forecaster(y=price,current_dates=price.index,future_dates=13)
fvol.add_time_trend()
fvol.add_seasonal_regressors('week',raw=False,sincos=True)
mvf = MVForecaster(
fvol,
fprice,
merge_Xvars='union',
names=['volume','price'],
)
mvf
[40]:
MVForecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
N_series=2
SeriesNames=['volume', 'price']
ForecastLength=13
Xvars=['t', 'weeksin', 'weekcos']
TestLength=0
ValidationLength=1
ValidationMetric=MetricStore(name='rmse', eval_func=<function Metrics.rmse at 0x121d5dbc0>, lower_is_better=True, min_obs_required=1)
ForecastsEvaluated=[]
CILevel=None
CurrentEstimator=mlr
OptimizeOn=mean
GridsFile=MVGrids
)
Exploratory Data Analysis
[41]:
mvf.plot(series='volume')
plt.show()
[42]:
mvf.plot(series='price')
plt.show()
[43]:
mvf.corr(disp='heatmap',cmap='Spectral',annot=True,vmin=-1,vmax=1)
plt.show()
[44]:
mvf.corr_lags(y='price',x='volume',disp='heatmap',cmap='Spectral',annot=True,vmin=-1,vmax=1,lags=13)
plt.show()
[45]:
mvf.corr_lags(y='volume',x='price',disp='heatmap',cmap='Spectral',annot=True,vmin=-1,vmax=1,lags=13)
plt.show()
Parameterize the MVForecaster Object
Starting in scalecast version 0.16.0, you can skip model testing by setting a test length of 0.
In this example, all models will be tested on the last 15% of the observed values in the dataset.
We will also have model optimization select hyperparemeters based on what predicts the volume series, rather than the price series, or an average of the two (which is the default), best.
Custom optimization functions are available.
[46]:
mvf.set_test_length(.15)
mvf.set_optimize_on('volume') # we care more about predicting volume and price is just used to make those predictions more accurate
# by default, the optimizer uses an average scoring of all series in the MVForecaster object
mvf.eval_cis() # tell object to evaluate cis
mvf
[46]:
MVForecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
N_series=2
SeriesNames=['volume', 'price']
ForecastLength=13
Xvars=['t', 'weeksin', 'weekcos']
TestLength=25
ValidationLength=1
ValidationMetric=MetricStore(name='rmse', eval_func=<function Metrics.rmse at 0x121d5dbc0>, lower_is_better=True, min_obs_required=1)
ForecastsEvaluated=[]
CILevel=0.95
CurrentEstimator=mlr
OptimizeOn=volume
GridsFile=MVGrids
)
Run Models
Uses scikit-learn models and APIs only.
See the adapted VECM model for this object.
ElasticNet
[47]:
mvf.set_estimator('elasticnet')
mvf.manual_forecast(alpha=0.2,dynamic_testing=13,lags=13)
[47]:
{'volume': [np.float64(106335852.49281684),
np.float64(105996760.49615355),
np.float64(103764831.76312366),
np.float64(103219085.1545862),
np.float64(103556174.50935248),
np.float64(103631712.71106488),
np.float64(102440651.65475723),
np.float64(101711752.26293223),
np.float64(101107339.59279619),
np.float64(100363808.85579455),
np.float64(100029026.11427599),
np.float64(99652566.48028274),
np.float64(99185655.81929667)],
'price': [np.float64(1.4104745120749793),
np.float64(1.4104745120749793),
np.float64(1.4104745120749793),
np.float64(1.4104745120749793),
np.float64(1.4104745120749793),
np.float64(1.4104745120749793),
np.float64(1.4104745120749793),
np.float64(1.4104745120749793),
np.float64(1.4104745120749793),
np.float64(1.4104745120749793),
np.float64(1.4104745120749793),
np.float64(1.4104745120749793),
np.float64(1.4104745120749793)]}
XGBoost
[48]:
mvf.set_estimator('xgboost')
mvf.manual_forecast(gamma=1,dynamic_testing=13,lags=13)
[48]:
{'volume': [np.float32(1.1481013e+08),
np.float32(1.1214653e+08),
np.float32(1.1213585e+08),
np.float32(1.2023188e+08),
np.float32(1.1611754e+08),
np.float32(1.2740009e+08),
np.float32(1.1520513e+08),
np.float32(1.0009431e+08),
np.float32(1.0179305e+08),
np.float32(1.06290056e+08),
np.float32(1.0324345e+08),
np.float32(1.0475008e+08),
np.float32(1.1199661e+08)],
'price': [np.float32(1.3645366),
np.float32(1.3645366),
np.float32(1.3645366),
np.float32(1.3645366),
np.float32(1.3645366),
np.float32(1.3645366),
np.float32(1.3645366),
np.float32(1.3645366),
np.float32(1.3645366),
np.float32(1.3645366),
np.float32(1.3645366),
np.float32(1.3645366),
np.float32(1.3645366)]}
MLP Stack
[49]:
from scalecast.auxmodels import mlp_stack
[50]:
mvf.export('model_summaries')
[50]:
| Series | ModelNickname | Estimator | Xvars | HyperParams | Lags | Observations | DynamicallyTested | TestSetLength | ValidationMetric | ... | MetricOptimized | best_model | TestSetRMSE | TestSetR2 | TestSetMAE | TestSetMAPE | InSampleRMSE | InSampleR2 | InSampleMAE | InSampleMAPE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | volume | elasticnet | elasticnet | [t, weeksin, weekcos] | {'alpha': 0.2} | 13 | 169 | 13 | 25 | <NA> | ... | <NA> | <NA> | 1.869490e+07 | 0.248765 | 1.282436e+07 | 0.118172 | 1.197603e+07 | 0.486109 | 8.206334e+06 | 8.891664e-02 |
| 0 | volume | xgboost | xgboost | [t, weeksin, weekcos] | {'gamma': 1} | 13 | 169 | 13 | 25 | <NA> | ... | <NA> | <NA> | 1.992583e+07 | 0.146580 | 1.381269e+07 | 0.135479 | 4.290375e+01 | 1.000000 | 3.229199e+01 | 3.583936e-07 |
| 0 | price | elasticnet | elasticnet | [t, weeksin, weekcos] | {'alpha': 0.2} | 13 | 169 | 13 | 25 | <NA> | ... | <NA> | <NA> | 1.522410e-01 | -0.048569 | 1.088633e-01 | 0.071282 | 1.560728e-01 | 0.000000 | 1.199107e-01 | 8.408015e-02 |
| 0 | price | xgboost | xgboost | [t, weeksin, weekcos] | {'gamma': 1} | 13 | 169 | 13 | 25 | <NA> | ... | <NA> | <NA> | 1.288486e-01 | 0.248908 | 8.764766e-02 | 0.057013 | 1.128880e-01 | 0.476832 | 8.104546e-02 | 5.703486e-02 |
4 rows × 22 columns
[51]:
mlp_stack(mvf,model_nicknames=['elasticnet','xgboost'],lags=13)
[51]:
MVForecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
N_series=2
SeriesNames=['volume', 'price']
ForecastLength=13
Xvars=['t', 'weeksin', 'weekcos']
TestLength=25
ValidationLength=1
ValidationMetric=MetricStore(name='rmse', eval_func=<function Metrics.rmse at 0x121d5dbc0>, lower_is_better=True, min_obs_required=1)
ForecastsEvaluated=['elasticnet', 'xgboost', 'mlp_stack']
CILevel=0.95
CurrentEstimator=stacking
OptimizeOn=volume
GridsFile=MVGrids
)
[52]:
mvf.set_best_model(determine_best_by='TestSetRMSE')
[52]:
MVForecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
N_series=2
SeriesNames=['volume', 'price']
ForecastLength=13
Xvars=['t', 'weeksin', 'weekcos']
TestLength=25
ValidationLength=1
ValidationMetric=MetricStore(name='rmse', eval_func=<function Metrics.rmse at 0x121d5dbc0>, lower_is_better=True, min_obs_required=1)
ForecastsEvaluated=['elasticnet', 'xgboost', 'mlp_stack']
CILevel=0.95
CurrentEstimator=stacking
OptimizeOn=volume
GridsFile=MVGrids
)
[53]:
mvf.plot_test_set(ci=True,series='volume',put_best_on_top=True)
plt.show()
[54]:
mvf.plot(ci=True,series='volume',put_best_on_top=True)
plt.show()
Probabalistic forecasting for creating confidence intervals is currently being worked on in the MVForecaster object, but until that is done, the backtested interval also works well:
[55]:
mvf.plot(ci=True,series='volume',put_best_on_top=True)
plt.show()
Break Back into Forecaster Objects
You can then add univariate models to these objects to compare with the models run multivariate.
[56]:
from scalecast.util import break_mv_forecaster
[57]:
fvol, fprice = break_mv_forecaster(mvf)
[58]:
fvol
[58]:
Forecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
ForecastLength=13
Xvars=[]
TestLength=25
ValidationMetric=rmse
ForecastsEvaluated=['elasticnet', 'xgboost', 'mlp_stack']
CILevel=0.95
CurrentEstimator=mlr
GridsFile=Grids
)
[59]:
fprice
[59]:
Forecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
ForecastLength=13
Xvars=[]
TestLength=25
ValidationMetric=rmse
ForecastsEvaluated=['elasticnet', 'xgboost', 'mlp_stack']
CILevel=0.95
CurrentEstimator=mlr
GridsFile=Grids
)
Transformations
One of the most effective way to boost forecasting power is with transformations.
Transformations include:
All transformations have a corresponding revert function.
See the blog post.
[60]:
from scalecast.SeriesTransformer import SeriesTransformer
[61]:
f_trans = Forecaster(y=volume,current_dates=volume.index,future_dates=13)
[62]:
f_trans.set_test_length(.15)
f_trans.set_validation_length(13)
[62]:
Forecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
ForecastLength=13
Xvars=[]
TestLength=25
ValidationMetric=rmse
ForecastsEvaluated=[]
CILevel=None
CurrentEstimator=mlr
GridsFile=Grids
)
[63]:
transformer = SeriesTransformer(f_trans)
[64]:
# these will all be reverted later after forecasts have been called
f_trans = transformer.DiffTransform(1)
f_trans = transformer.DiffTransform(52)
f_trans = transformer.DetrendTransform()
[65]:
f_trans.plot()
plt.show()
[66]:
f_trans.add_time_trend()
f_trans.add_seasonal_regressors('week',sincos=True,raw=False)
f_trans.add_ar_terms(13)
[66]:
Forecaster(
DateStartActuals=2016-01-10T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=116
ForecastLength=13
Xvars=['t', 'weeksin', 'weekcos', AR(lag_order=1), AR(lag_order=2), AR(lag_order=3), AR(lag_order=4), AR(lag_order=5), AR(lag_order=6), AR(lag_order=7), AR(lag_order=8), AR(lag_order=9), AR(lag_order=10), AR(lag_order=11), AR(lag_order=12), AR(lag_order=13)]
TestLength=25
ValidationMetric=rmse
ForecastsEvaluated=[]
CILevel=None
CurrentEstimator=mlr
GridsFile=Grids
)
[67]:
f_trans.set_estimator('xgboost')
f_trans.manual_forecast(gamma=1,dynamic_testing=13)
[67]:
[np.float32(1.00715175e+06),
np.float32(-2.8431745e+06),
np.float32(2.4069338e+06),
np.float32(2.0318998e+06),
np.float32(-7.8323395e+06),
np.float32(1.5664614e+06),
np.float32(-2.0622385e+06),
np.float32(-5.4871235e+06),
np.float32(-5.6764815e+06),
np.float32(2.5870602e+06),
np.float32(1.9597401e+06),
np.float32(1.4060644e+06),
np.float32(-249488.03)]
[68]:
f_trans.plot_test_set()
plt.show()
[69]:
f_trans.plot()
plt.show()
[70]:
# call revert functions in the opposite order as how they were called when transforming
f_trans = transformer.DetrendRevert()
f_trans = transformer.DiffRevert(52)
f_trans = transformer.DiffRevert(1)
[71]:
f_trans.plot_test_set()
plt.show()
[72]:
f_trans.plot()
plt.show()
Pipelines
These are objects similar to scikit-learn pipelines that offer readable and streamlined code for transforming, forecasting, and reverting.
See the Pipeline object documentation.
[73]:
from scalecast.Pipeline import Transformer, Reverter, Pipeline, MVPipeline
[74]:
f_pipe = Forecaster(y=volume,current_dates=volume.index,future_dates=13)
f_pipe.set_test_length(.15)
[74]:
Forecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
ForecastLength=13
Xvars=[]
TestLength=25
ValidationMetric=rmse
ForecastsEvaluated=[]
CILevel=None
CurrentEstimator=mlr
GridsFile=Grids
)
[75]:
def forecaster(f):
f.add_time_trend()
f.add_seasonal_regressors('week',raw=False,sincos=True)
f.add_ar_terms(13)
f.set_estimator('catboost')
f.manual_forecast(max_depth=2, verbose=False, dynamic_testing=13)
[76]:
transformer = Transformer(
transformers = [
('DiffTransform',1),
('DiffTransform',52),
('DetrendTransform',)
]
)
reverter = Reverter(
reverters = [
('DetrendRevert',),
('DiffRevert',52),
('DiffRevert',1)
],
base_transformer = transformer,
)
[77]:
reverter
[77]:
Reverter(
reverters = [
('DetrendRevert',),
('DiffRevert', 52),
('DiffRevert', 1)
],
base_transformer = Transformer(
transformers = [
('DiffTransform', 1),
('DiffTransform', 52),
('DetrendTransform',)
]
)
)
[78]:
pipeline = Pipeline(
steps = [
('Transform',transformer),
('Forecast',forecaster),
('Revert',reverter),
]
)
f_pipe = pipeline.fit_predict(f_pipe)
[79]:
f_pipe.plot_test_set()
plt.show()
[80]:
f_pipe.plot()
plt.show()
Fully Automated Pipelines
We can automate the construction of pipelines, the selection of input variables, and tuning of models with cross validation on a grid search for each model using files in the working directory called
Grids.pyfor univariate forecasting andMVGrids.pyfor multivariate. Default grids can be downloaded from scalecast.
Automated Univariate Pipelines
[81]:
from scalecast import GridGenerator
from scalecast.util import find_optimal_transformation
[82]:
GridGenerator.get_example_grids(overwrite=True)
[83]:
f_pipe_aut = Forecaster(y=volume,current_dates=volume.index,future_dates=13)
f_pipe_aut.set_test_length(.15)
[83]:
Forecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
ForecastLength=13
Xvars=[]
TestLength=25
ValidationMetric=rmse
ForecastsEvaluated=[]
CILevel=None
CurrentEstimator=mlr
GridsFile=Grids
)
[84]:
def forecaster_aut(f,models):
f.auto_Xvar_select(
estimator='elasticnet',
monitor='TestSetMAE',
alpha=0.2,
irr_cycles = [26],
)
f.tune_test_forecast(
models,
cross_validate=True,
k=3,
# dynamic tuning = 13 means we will hopefully find a model that is optimized to predict 13 steps
dynamic_tuning=13,
dynamic_testing=13,
)
f.set_estimator('combo')
f.manual_forecast()
util.find_optimal_transformation
In this function, the following transformations are searched for:
Detrending
Box-Cox
First Differencing
Seasonal Differencing
Scaling
The optimal set of transformations are returned based on best estimated out-of-sample performance on the test set. Therefore, running this function introduces leakage into the test set, but it can still be a good addition to an automated pipeline, depending on the application. Which and the order of transfomations to search through are configurable. How performance is measured, the parameters specific to a given transformation, and several other paramters are also configurable. See the documentation.
[85]:
transformer_aut, reverter_aut = find_optimal_transformation(
f_pipe_aut,
lags = 13,
m = 52,
monitor = 'mae',
estimator = 'elasticnet',
alpha = 0.2,
test_length = 13,
num_test_sets = 3,
space_between_sets = 4,
verbose = True,
) # returns a Transformer and Reverter object that can be plugged into a larger pipeline
Using elasticnet model to find the best transformation set on 3 test sets, each 13 in length.
All transformation tries will be evaluated with 13 lags.
Last transformer tried:
[]
Score (mae): 17933061.20374991
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'loess': True})]
Score (mae): 23964157.165726677
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1})]
Score (mae): 17174376.36667073
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2})]
Score (mae): 24467364.037870836
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'})]
Score (mae): 11573053.425807392
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]
Score (mae): 9478522.651025778
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('DiffTransform', 52)]
Score (mae): 11116081.856823217
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('ScaleTransform',)]
Score (mae): 9583504.942193016
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('MinMaxTransform',)]
Score (mae): 9583504.942193046
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('RobustScaleTransform',)]
Score (mae): 9583504.942193015
--------------------------------------------------
Final Selection:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]
[86]:
pipeline_aut = Pipeline(
steps = [
('Transform',transformer_aut),
('Forecast',forecaster_aut),
('Revert',reverter_aut),
]
)
f_pipe_aut = pipeline_aut.fit_predict(
f_pipe_aut,
models=[
'mlr',
'elasticnet',
'xgboost',
'knn',
],
)
[87]:
f_pipe_aut
[87]:
Forecaster(
DateStartActuals=2015-01-04T00:00:00.000000
DateEndActuals=2018-03-25T00:00:00.000000
Freq=W-SUN
N_actuals=169
ForecastLength=13
Xvars=[AR(lag_order=1), AR(lag_order=2), AR(lag_order=3), AR(lag_order=4), AR(lag_order=5)]
TestLength=25
ValidationMetric=rmse
ForecastsEvaluated=['mlr', 'elasticnet', 'xgboost', 'knn', 'combo']
CILevel=None
CurrentEstimator=combo
GridsFile=Grids
)
[88]:
f_pipe_aut.plot_test_set(order_by='TestSetRMSE')
plt.show()
[89]:
f_pipe_aut.plot(order_by='TestSetRMSE')
plt.show()
Backtest Univariate Pipeline
You may be interested to know beyond a single test-set metric, how well your pipeline performs out-of-sample. Backtesting can help answer that by iterating through the entire pipeline several times and testing the procedure each time. It can also help make expanding confidence intervals. See the documentation.
[90]:
from scalecast.util import backtest_metrics
[91]:
uv_backtest_results = pipeline_aut.backtest(
f_pipe_aut,
n_iter = 3,
jump_back = 13,
cis = False,
models=[
'mlr',
'elasticnet',
'xgboost',
'knn',
],
)
After obtaining the results from the backtest, we can see the average performance over each iteration using the util.backtest_metrics function. See the documentation.
[92]:
pd.options.display.float_format = '{:,.4f}'.format
backtest_metrics(
uv_backtest_results,
mets=['smape','rmse','bias'],
)
[92]:
| Iter0 | Iter1 | Iter2 | Average | ||
|---|---|---|---|---|---|
| Model | Metric | ||||
| mlr | smape | 0.1130 | 0.2317 | 0.1142 | 0.1529 |
| rmse | 15,488,744.6687 | 19,538,227.8480 | 11,910,709.6284 | 15,645,894.0484 | |
| bias | -165,115,026.5519 | -207,268,257.9295 | 103,757,419.3759 | -89,541,955.0352 | |
| elasticnet | smape | 0.1139 | 0.1962 | 0.1428 | 0.1510 |
| rmse | 15,637,578.9398 | 16,625,313.2954 | 14,338,412.2989 | 15,533,768.1780 | |
| bias | -166,295,878.6389 | -169,974,287.0231 | 145,808,728.0805 | -63,487,145.8605 | |
| xgboost | smape | 0.1559 | 0.1151 | 0.1475 | 0.1395 |
| rmse | 19,082,290.8290 | 11,550,757.4375 | 15,191,141.2472 | 15,274,729.8379 | |
| bias | -219,935,082.4832 | -103,119,653.1940 | 147,885,363.8979 | -58,389,790.5931 | |
| knn | smape | 0.1568 | 0.1896 | 0.1439 | 0.1635 |
| rmse | 19,855,895.0761 | 16,441,684.0561 | 14,418,665.8740 | 16,905,415.0021 | |
| bias | -220,650,230.0243 | -174,089,027.5994 | 145,204,593.3703 | -83,178,221.4178 | |
| combo | smape | 0.1344 | 0.1805 | 0.1366 | 0.1505 |
| rmse | 17,371,835.8070 | 15,768,729.5255 | 13,916,077.0706 | 15,685,547.4677 | |
| bias | -192,999,054.0779 | -163,612,807.9017 | 135,664,027.3303 | -73,649,278.2164 |
Automated Multivariate Pipelines
See the MVPipeline object documentation.
[93]:
GridGenerator.get_mv_grids(overwrite=True)
[94]:
fvol_aut = Forecaster(
y=volume,
current_dates=volume.index,
future_dates=13,
test_length = .15,
)
fprice_aut = Forecaster(
y=price,
current_dates=price.index,
future_dates=13,
test_length = .15,
)
[95]:
def add_vars(f,**kwargs):
f.add_seasonal_regressors(
'month',
'quarter',
'week',
raw=False,
sincos=True
)
def mvforecaster(mvf,models):
mvf.set_optimize_on('volume')
mvf.tune_test_forecast(
models,
cross_validate=True,
k=2,
rolling=True,
dynamic_tuning=13,
dynamic_testing=13,
limit_grid_size = .2,
error = 'warn',
)
[96]:
transformer_vol, reverter_vol = find_optimal_transformation(
fvol_aut,
lags = 13,
m = 52,
monitor = 'mae',
estimator = 'elasticnet',
alpha = 0.2,
test_length = 13,
num_test_sets = 3,
space_between_sets = 4,
verbose = True,
)
Using elasticnet model to find the best transformation set on 3 test sets, each 13 in length.
All transformation tries will be evaluated with 13 lags.
Last transformer tried:
[]
Score (mae): 17933061.20374991
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'loess': True})]
Score (mae): 23964157.165726677
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1})]
Score (mae): 17174376.36667073
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2})]
Score (mae): 24467364.037870836
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'})]
Score (mae): 11573053.425807392
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]
Score (mae): 9478522.651025778
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('DiffTransform', 52)]
Score (mae): 11116081.856823217
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('ScaleTransform',)]
Score (mae): 9583504.942193016
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('MinMaxTransform',)]
Score (mae): 9583504.942193046
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1), ('RobustScaleTransform',)]
Score (mae): 9583504.942193015
--------------------------------------------------
Final Selection:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]
[97]:
transformer_price, reverter_price = find_optimal_transformation(
fprice_aut,
lags = 13,
m = 52,
monitor = 'mae',
estimator = 'elasticnet',
alpha = 0.2,
test_length = 13,
num_test_sets = 3,
space_between_sets = 4,
verbose = True,
)
Using elasticnet model to find the best transformation set on 3 test sets, each 13 in length.
All transformation tries will be evaluated with 13 lags.
Last transformer tried:
[]
Score (mae): 0.06804292152560591
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'loess': True})]
Score (mae): 0.3531129223321485
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1})]
Score (mae): 0.18654572266988675
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2})]
Score (mae): 0.4079071208348029
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'})]
Score (mae): 0.04226554848615104
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('Transform', BoxcoxTransform, {'lmbda': -0.5})]
Score (mae): 0.04754421078794407
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('Transform', BoxcoxTransform, {'lmbda': 0})]
Score (mae): 0.04573807786929985
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('Transform', BoxcoxTransform, {'lmbda': 0.5})]
Score (mae): 0.04392809054109528
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]
Score (mae): 0.08609820028223668
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 52)]
Score (mae): 0.05863628346364504
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('ScaleTransform',)]
Score (mae): 0.03860371560228571
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('MinMaxTransform',)]
Score (mae): 0.04226554848615097
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('RobustScaleTransform',)]
Score (mae): 0.039814357454047544
--------------------------------------------------
Final Selection:
[('DeseasonTransform', {'m': 52, 'model': 'add'}), ('ScaleTransform',)]
[98]:
mvpipeline = MVPipeline(
steps = [
('Transform',[transformer_vol,transformer_price]),
('Add Xvars',[add_vars]*2),
('Forecast',mvforecaster),
('Revert',[reverter_vol,reverter_price]),
],
test_length = 20,
cis = True,
names = ['volume','price'],
)
fvol_aut, fprice_aut = mvpipeline.fit_predict(
fvol_aut,
fprice_aut,
models=[
'mlr',
'elasticnet',
'xgboost',
],
) # returns a tuple of Forecaster objects
[99]:
fvol_aut.plot_test_set(order_by='TestSetRMSE',ci=True)
plt.show()
[100]:
fvol_aut.plot(order_by='TestSetRMSE',ci=True)
plt.show()
Backtest Multivariate Pipeline
Like univariate pipelines, multivariate pipelines can also be backtested. Info about each model and series becomes possible to compare. See the documentation.
[101]:
# recreate Forecaster objects to bring dates back lost from taking seasonal differences
fvol_aut = Forecaster(
y=volume,
current_dates=volume.index,
future_dates=13,
)
fprice_aut = Forecaster(
y=price,
current_dates=price.index,
future_dates=13,
)
[102]:
mv_backtest_results = mvpipeline.backtest(
fvol_aut,
fprice_aut,
n_iter = 3,
jump_back = 13,
test_length = 0,
cis = False,
models=[
'mlr',
'elasticnet',
'xgboost',
],
)
[103]:
backtest_metrics(
mv_backtest_results,
mets=['smape','rmse','bias'],
names = ['Volume','Price'],
)
[103]:
| Iter0 | Iter1 | Iter2 | Average | |||
|---|---|---|---|---|---|---|
| Series | Model | Metric | ||||
| Volume | mlr | smape | 0.0633 | 0.1365 | 0.1656 | 0.1218 |
| rmse | 10,189,713.7944 | 12,579,221.2052 | 16,886,578.8084 | 13,218,504.6026 | ||
| bias | -81,340,397.6578 | -118,858,462.0219 | 167,738,817.2818 | -10,820,014.1326 | ||
| elasticnet | smape | 0.1156 | 0.1814 | 0.1616 | 0.1529 | |
| rmse | 15,693,483.0781 | 15,391,716.1042 | 16,157,240.6763 | 15,747,479.9529 | ||
| bias | -164,494,984.5647 | -151,411,465.5474 | 171,433,487.9525 | -48,157,654.0532 | ||
| xgboost | smape | 0.1038 | 0.1632 | 0.1181 | 0.1283 | |
| rmse | 14,651,691.0077 | 14,850,108.8682 | 12,279,905.4688 | 13,927,235.1149 | ||
| bias | -138,886,144.2332 | -132,099,657.9440 | 114,021,698.2729 | -52,321,367.9681 | ||
| Price | mlr | smape | 0.0566 | 0.0849 | 0.0728 | 0.0714 |
| rmse | 0.0867 | 0.1452 | 0.1614 | 0.1311 | ||
| bias | 1.0108 | 1.2946 | -1.3093 | 0.3320 | ||
| elasticnet | smape | 0.0275 | 0.0632 | 0.1485 | 0.0798 | |
| rmse | 0.0444 | 0.1364 | 0.2719 | 0.1509 | ||
| bias | -0.2007 | -1.3118 | -3.0910 | -1.5345 | ||
| xgboost | smape | 0.0466 | 0.0446 | 0.0663 | 0.0525 | |
| rmse | 0.0751 | 0.0984 | 0.1485 | 0.1073 | ||
| bias | 0.7649 | -0.5261 | -1.2205 | -0.3272 |
Through backtesting, we can see that the multivariate approach out-performed the univariate approach. Very cool!
Scaled Automated Forecasting
We can scale the fully automated approach to many series where we can then access all results through plotting with Jupyter widgets and export functions.
We produce a separate forecast for avocado sales in each region in our dataset.
This is done with a univariate approach, but cleverly using the code in this notebook, it could be transformed into a multivariate process where volume and price are forecasted together.
[104]:
from scalecast.notebook import results_vis
from tqdm.notebook import tqdm
[105]:
def forecaster_scaled(f,models):
f.auto_Xvar_select(
estimator='elasticnet',
monitor='TestSetMAE',
alpha=0.2,
irr_cycles = [26],
)
f.tune_test_forecast(
models,
dynamic_testing=13,
)
f.set_estimator('combo')
f.manual_forecast()
[106]:
results_dict = {}
for region in tqdm(data.region.unique()):
series = data.loc[data['region'] == region].groupby('Date')['Total Volume'].sum()
f_i = Forecaster(
y = series,
current_dates = series.index,
future_dates = 13,
test_length = .15,
validation_length = 13,
cis = True,
)
transformer_i, reverter_i = find_optimal_transformation(
f_i,
lags = 13,
m = 52,
monitor = 'mae',
estimator = 'elasticnet',
alpha = 0.2,
test_length = 13,
)
pipeline_i = Pipeline(
steps = [
('Transform',transformer_i),
('Forecast',forecaster_scaled),
('Revert',reverter_i),
]
)
f_i = pipeline_i.fit_predict(
f_i,
models=[
'mlr',
'elasticnet',
'xgboost',
'knn',
],
)
results_dict[region] = f_i
Run the next two functions locally to see the full functionality of these widgets.
[107]:
results_vis(results_dict,'test')
[108]:
results_vis(results_dict)
Exporting Results
[109]:
from scalecast.multiseries import export_model_summaries
Exporting Results from a Single Forecaster Object
[110]:
results = f.export(cis=True,models=['mlr','lasso','ridge'])
results.keys()
[110]:
dict_keys(['model_summaries', 'lvl_fcsts', 'lvl_test_set_predictions'])
[111]:
for k, df in results.items():
print(f'{k} has these columns:',*df.columns,'-'*25,sep='\n')
model_summaries has these columns:
ModelNickname
Estimator
Xvars
HyperParams
Observations
DynamicallyTested
TestSetLength
CILevel
ValidationMetric
models
weights
best_model
ValidationMetricValue
TestSetRMSE
TestSetR2
TestSetMAE
TestSetMAPE
InSampleRMSE
InSampleR2
InSampleMAE
InSampleMAPE
-------------------------
lvl_fcsts has these columns:
DATE
mlr
mlr_upperci
mlr_lowerci
lasso
lasso_upperci
lasso_lowerci
ridge
ridge_upperci
ridge_lowerci
-------------------------
lvl_test_set_predictions has these columns:
DATE
actual
mlr
mlr_upperci
mlr_lowerci
lasso
lasso_upperci
lasso_lowerci
ridge
ridge_upperci
ridge_lowerci
-------------------------
[112]:
results['model_summaries'][['ModelNickname','HyperParams','TestSetRMSE','InSampleRMSE']]
[112]:
| ModelNickname | HyperParams | TestSetRMSE | InSampleRMSE | |
|---|---|---|---|---|
| 0 | mlr | {} | 16,818,489.5277 | 10,231,495.9303 |
| 0 | lasso | {'alpha': 0.2} | 16,818,489.9828 | 10,231,495.9303 |
| 0 | ridge | {'alpha': 0.2} | 16,827,165.4875 | 10,265,352.0145 |
Exporting Results from a Single MVForecaster Object
[113]:
mvresults = mvf.export(cis=True,models=['elasticnet','xgboost'])
mvresults.keys()
[113]:
dict_keys(['model_summaries', 'lvl_fcsts', 'lvl_test_set_predictions'])
[114]:
for k, df in mvresults.items():
print(f'{k} has these columns:',*df.columns,'-'*25,sep='\n')
model_summaries has these columns:
Series
ModelNickname
Estimator
Xvars
HyperParams
Lags
Observations
DynamicallyTested
TestSetLength
ValidationMetric
ValidationMetricValue
OptimizedOn
best_model
MetricOptimized
TestSetRMSE
TestSetR2
TestSetMAE
TestSetMAPE
InSampleRMSE
InSampleR2
InSampleMAE
InSampleMAPE
-------------------------
lvl_fcsts has these columns:
DATE
volume_elasticnet_lvl_fcst
volume_elasticnet_lvl_fcst_upper
volume_elasticnet_lvl_fcst_lower
volume_xgboost_lvl_fcst
volume_xgboost_lvl_fcst_upper
volume_xgboost_lvl_fcst_lower
price_elasticnet_lvl_fcst
price_elasticnet_lvl_fcst_upper
price_elasticnet_lvl_fcst_lower
price_xgboost_lvl_fcst
price_xgboost_lvl_fcst_upper
price_xgboost_lvl_fcst_lower
-------------------------
lvl_test_set_predictions has these columns:
DATE
volume_actuals
volume_elasticnet_lvl_ts
volume_elasticnet_lvl_ts_upper
volume_elasticnet_lvl_ts_lower
volume_xgboost_lvl_ts
volume_xgboost_lvl_ts_upper
volume_xgboost_lvl_ts_lower
price_actuals
price_elasticnet_lvl_ts
price_elasticnet_lvl_ts_upper
price_elasticnet_lvl_ts_lower
price_xgboost_lvl_ts
price_xgboost_lvl_ts_upper
price_xgboost_lvl_ts_lower
-------------------------
[115]:
mvresults['model_summaries'][['Series','ModelNickname','HyperParams','Lags','TestSetRMSE','InSampleRMSE']]
[115]:
| Series | ModelNickname | HyperParams | Lags | TestSetRMSE | InSampleRMSE | |
|---|---|---|---|---|---|---|
| 0 | volume | elasticnet | {'alpha': 0.2} | 13 | 18,694,896.5159 | 11,976,031.0554 |
| 0 | volume | xgboost | {'gamma': 1} | 13 | 19,925,832.7824 | 42.9037 |
| 0 | price | elasticnet | {'alpha': 0.2} | 13 | 0.1522 | 0.1561 |
| 0 | price | xgboost | {'gamma': 1} | 13 | 0.1288 | 0.1129 |
Exporting Results from a Dictionary of Forecaster Objects
[116]:
all_results = export_model_summaries(results_dict)
all_results[['ModelNickname','Series','Xvars','HyperParams','TestSetRMSE','InSampleRMSE']].sample(10)
[116]:
| ModelNickname | Series | Xvars | HyperParams | TestSetRMSE | InSampleRMSE | |
|---|---|---|---|---|---|---|
| 123 | knn | MiamiFtLauderdale | [AR(lag_order=1)] | {'n_neighbors': 6} | 210,226.7387 | 212,046.9685 |
| 94 | combo | Houston | None | {} | 278,085.7441 | 202,264.4191 |
| 37 | xgboost | Charlotte | [weeksin, weekcos, monthsin, monthcos, quarter... | {'n_estimators': 150, 'scale_pos_weight': 5, '... | 69,206.1080 | 24,644.7896 |
| 152 | xgboost | NorthernNewEngland | [weeksin, weekcos, monthsin, monthcos, AR(lag_... | {'n_estimators': 250, 'scale_pos_weight': 5, '... | 113,026.2050 | 33,877.4027 |
| 97 | xgboost | Indianapolis | [t, AR(lag_order=1), AR(lag_order=2), AR(lag_o... | {'n_estimators': 200, 'scale_pos_weight': 5, '... | 49,505.8778 | 19,566.0374 |
| 194 | combo | RichmondNorfolk | None | {} | 75,493.7775 | 55,380.6707 |
| 209 | combo | SanDiego | None | {} | 118,202.8681 | 125,687.3310 |
| 44 | combo | Chicago | None | {} | 130,219.1839 | 230,083.5853 |
| 144 | combo | NewYork | None | {} | 391,217.1665 | 1,362,145.0224 |
| 215 | mlr | Seattle | [AR(lag_order=1), AR(lag_order=2), AR(lag_orde... | {'normalizer': 'scale'} | 117,065.8371 | 112,476.3328 |
[ ]: