Pipeline

Here are some objects that can be placed in a list and executed sequentially, similar to a pipeline from sklearn. But because it’s scalecast, fitting, testing, and producing forecasts are all done in one step. So in stead of separate fit(), transform(), and predict() methods, we only have fit_transform() and fit_predict(). The end result are some streamlined, low-code applications with optimal readability.

from scalecast.Forecaster import Forecaster
from scalecast.Pipeline import Pipeline, Transformer, Reverter
import pandas_datareader as pdr
import matplotlib.pyplot as plt

# get and load data into a Forecaster object
df = pdr.get_data_fred(
    'HOUSTNSA',
    start='1959-01-01',
    end='2022-08-01'
)
f = Forecaster(
    y=df['HOUSTNSA'],
    current_dates=df.index,
    future_dates=24,
)
# pipeline applications for forecasting should be written into a function(s)
def forecaster(f):
    f.set_test_length(0.2)
    f.set_validation_length(24)
    f.add_covid19_regressor()
    f.auto_Xvar_select(cross_validate=True)
    f.set_estimator('mlr')
    f.manual_forecast()
# transformer piece to get stationary and boost results
transformer = Transformer(
    transformers = [
        ('LogTransform',),
        ('DiffTransform',1),
        ('DiffTransform',12),
    ],
)
# reverter piece for interpretation
reverter = Reverter(
    reverters = [
        ('DiffRevert',12),
        ('DiffRevert',1),
        ('LogRevert',),
    ],
    base_transformer = transformer,
)
# full pipeline
pipeline = Pipeline(
    steps = [
        ('Transform',transformer),
        ('Forecast',forecaster),
        ('Revert',reverter),
    ],
)
f = pipeline.fit_predict(f)

# plot results
f.plot(ci=True,order_by='LevelTestSetMAPE')
plt.show()

# extract results
results_dfs = f.export(
  ['model_summaries','lvl_fcsts']
)

class src.scalecast.Pipeline.Transformer(transformers: List[Tuple])

__init__(transformers: List[Tuple])

Initiates the transformer pipeline.

Parameters:: transformers (list[tuple]) – A list of transformations to apply to the time series stored in a Forecaster object. The tuple’s first element should match the name of a transform function from the SeriesTransformer object: https://scalecast.readthedocs.io/en/latest/Forecaster/SeriesTransformer.html. Positional and keyword arguments can be passed to these functions. If a given tuple is more than 1 in length, the fit_transform() method will parse Elements after index 0 as positional arguments. Keywords are passed as a dictionary in the last position of tuples greater than 1 in length. Therefore, if the last argument in the tuple is a dict type, This is assumed to contain the keyword arguments. If the last positional argument you wish to pass happens to be dict type, you can eaither pass it as a keyword argument or place an additional (empty) dictionary at the end of the tuple.

>>> from scalecast.Pipeline import Transformer
>>> transformer = Transformer(
>>>     transformers = [
>>>         ('LogTransform',),
>>>         ('DiffTransform',1),
>>>         ('DiffTransform',12),
>>>     ],
>>> )

Methods:

fit_transform(f)

Applies the transformation to the series stored in the Forecaster object.

fit_transform(f: Forecaster) → Forecaster

Applies the transformation to the series stored in the Forecaster object.

Parameters:: f (Forecaster) – The Forecaster object that stores the series that will be transformed.
Returns:: A Forecaster object with the transformed series.
Return type:: (Forecaster)

>>> from scalecast.Pipeline import Transformer
>>> transformer = Transformer(
>>>     transformers = [
>>>         ('LogTransform',),
>>>         ('DiffTransform',1),
>>>         ('DiffTransform',12),
>>>     ],
>>> )
>>> f = transformer.fit_transform(f)

class src.scalecast.Pipeline.Reverter(reverters: List[Tuple], base_transformer: Transformer | SeriesTransformer)

__init__(reverters: List[Tuple], base_transformer: Transformer | SeriesTransformer)

Initiates the reverter pipeline.

Parameters:

reverters (list[tuple]) – A list of revert funcs to apply to the time series stored in a Forecaster object. The tuple’s first element should match the name of a revert function from the SeriesTransformer object: https://scalecast.readthedocs.io/en/latest/Forecaster/SeriesTransformer.html. Positional and keyword arguments can be passed to these functions. If a given tuple is more than 1 in length, the fit_transform() method will parse elements after index 0 as positional arguments. Keywords are passed as a dictionary in the last position of tuples greater than 1 in length. Therefore, if the last argument in the tuple is a dict type, this is assumed to contain the keyword arguments. If the last positional argument you wish to pass happens to be dict type, You can eaither pass it as a keyword argument or place an additional (empty) dictionary at the end of the tuple.
base_transformer (Transformer|SeriesTransformer) – The object that was used to make the original transformations. These objects contain the key information to undifference and unscale the stored data and therefore this argument is required.

>>> from scalecast.Pipeline import Reverter
>>> reverter = Reverter(
>>>     reverters = [
>>>         ('DiffRevert',12),
>>>         ('DiffRevert',1),
>>>         ('LogRevert',),
>>>     ],
>>>     base_transformer = transformer,
>>> )

Methods:

fit_transform(f[, exclude_models])

Applies the revert function to the series stored in the Forecaster object.

fit_transform(f: Forecaster, exclude_models=[]) → Forecaster

Applies the revert function to the series stored in the Forecaster object.

Parameters:

f (Forecaster) – The Forecaster object that stores the series that will be reverted.
exclude_models (list-like) – Optional. Models to not revert.

Returns:

A Forecaster object with the reverted series.

Return type:

(Forecaster)

>>> from scalecast.Pipeline import Reverter
>>> reverter = Reverter(
>>>     reverters = [
>>>         ('DiffRevert',12),
>>>         ('DiffRevert',1),
>>>         ('LogRevert',),
>>>     ],
>>>     base_transformer = transformer,
>>> )
>>> f = reverter.fit_transform(f)

class src.scalecast.Pipeline.Pipeline(steps: List[Tuple[str, Transformer | Reverter | function]])

__init__(steps: List[Tuple[str, Transformer | Reverter | function]])

Initiates the full pipeline.

Parameters:: steps (list[tuple]) – A list of transform, forecast, and revert funcs to apply to a Forecaster object. The first element of each tuple names the step. The second element should either be a Transformer or Reverter type or a function. If it is a function, the first argument in the function should require a Forecaster object. Functions are checked for as objects that do not have the fit_transform() method, so adding more elements to the Pipeline may be possible if they have a fit_transform() method.

>>> from scalecast.Forecaster import Forecaster
>>> from scalecast.Pipeline import Transformer, Reverter, Pipeline
>>> import pandas_datareader as pdr
>>>
>>> models = ['mlr','elasticnet']
>>> def forecaster(f,models):
>>>     f.add_covid19_regressor()
>>>     f.auto_Xvar_select(cross_validate=True)
>>>     f.tune_test_forecast(models)
>>>
>>> df = pdr.get_data_fred(
>>>     'HOUSTNSA',
>>>     start='1959-01-01',
>>>     end='2022-08-01'
>>> )
>>> f = Forecaster(
>>>     y=df['HOUSTNSA'],
>>>     current_dates=df.index,
>>>     future_dates=24,
>>> )
>>> f.set_test_length(0.2)
>>> f.set_validation_length(24)
>>> transformer = Transformer(
>>>     transformers = [
>>>         ('LogTransform',),
>>>         ('DiffTransform',1),
>>>         ('DiffTransform',12),
>>>     ],
>>> )
>>> reverter = Reverter(
>>>     reverters = [
>>>         ('DiffRevert',12),
>>>         ('DiffRevert',1),
>>>         ('LogRevert',),
>>>     ],
>>>     base_transformer = transformer,
>>> )
>>> pipeline = Pipeline(
>>>     steps = [
>>>         ('Transform',transformer),
>>>         ('Forecast',forecaster),
>>>         ('Revert',reverter),
>>>     ],
>>> )

Methods:

`backtest`(*fs[, n_iter, jump_back, ...])	Runs an out-of-sample backtest of the pipeline over a certain amount of iterations.
`fit_predict`(f, **kwargs)	Applies the transform, forecast, and revert functions to the series stored in the Forecaster object.

backtest(*fs, n_iter=5, jump_back=1, series_length=None, fcst_length=None, test_length=None, cis=None, cilevel=None, verbose=False, **kwargs) → List[Dict[str, DataFrame]]

Runs an out-of-sample backtest of the pipeline over a certain amount of iterations.

Parameters:

*fs (Forecaster) – Send one if univariate forecasting with the Pipeline class, more than one if multivariate forecasting with the MVPipeline class.
n_iter (int) – Default 5. How many backtest iterations to perform.
jump_back (int) – Default 1. The space between consecutive training sets.
series_length (int) – Optional. The total length of each traning set. Leave unspecified if you want to use every available training observation for each iteration.
fcst_length (int) – Optional. The forecast horizon length to forecast over for each iteration. Leave unspecified if you want to use the forecast horizon already programmed into the Forecaster object.
test_length (int) – Optional. The test set to hold out for each model evaluation. Leave unspecified if you want to use the test length already programmed into the Forecaster object.
cis (bool) – Optional. Whether to backtest confidence intervals. Leave unspecified if you want to use whatever is already programmed into the Forecaster object.
cilevel (float) – Optional. What level to evaluate confidence intervals at. Leave unspecified if you want to use whatever is already programmed into the Forecaster object.
**kwargs – Passed to the fit_predict() method from Pipeline or MVPipeline.

Returns:

The results from each model and backtest iteration. Each dict element of the resulting list corresponds to the Forecaster objects in the order they were passed (will be length 1 if univariate forecasting). Each key of each dict is either ‘Actuals’, ‘Obs’, or the name of a model that got backtested. Each value is a DataFrame with the iteration values. The ‘Actuals’ frame has the date information and are the actuals over each forecast horizon. The ‘Obs’ frame has the actual historical observations to make each forecast, back padded with NA values to make each array the same length.

Return type:

(List[Dict[str,pd.DataFrame]])

>>> # univariate forecasting
>>> pipeline = Pipeline(
>>>     steps = [
>>>         ('Transform',transformer),
>>>         ('Forecast',forecaster),
>>>         ('Revert',reverter),
>>>     ],
>>> )
>>> backtest_results = pipeline.backtest(f,models=models)
>>>
>>> # multivariate forecasting
>>> pipeline = MVPipeline(
>>>    steps = [
>>>        ('Transform',[transformer1,transformer2,transformer3]),
>>>        ('Select Xvars',[auto_Xvar_select]*3),
>>>        ('Forecast',forecaster,),
>>>        ('Revert',[reverter1,reverter2,reverter3]),
>>>    ],
>>>    names = ['UTUR','UTPHCI','UNRATE'], # used to combine to the mvf object
>>>    merge_Xvars = 'i', # used to combine to the mvf object
>>> )
>>> backtest_results = pipeline.backtest(f1,f2,f3)

fit_predict(f: Forecaster, **kwargs) → Forecaster

Applies the transform, forecast, and revert functions to the series stored in the Forecaster object.

Parameters:

f (Forecaster) – The Forecaster object that stores the series that will be sent through the pipeline.
**kwargs – Passed to any ‘function’ types passed in the pipeline.

Returns:

A Forecaster object with the stored results from the pipeline run.

Return type:

(Forecaster)

>>> pipeline = Pipeline(
>>>     steps = [
>>>         ('Transform',transformer),
>>>         ('Forecast',forecaster),
>>>         ('Revert',reverter),
>>>     ],
>>> )
>>> f = pipeline.fit_predict(f,models=models)

class src.scalecast.Pipeline.MVPipeline(steps: List[Tuple[str, List[Transformer] | List[Reverter] | function]], **kwargs)

__init__(steps: List[Tuple[str, List[Transformer] | List[Reverter] | function]], **kwargs)

Initiates the full pipeline for multivariate forecasting applications.

Parameters:

steps – (list[tuple]): A list of transform, forecast, and revert funcs to apply to multiple Forecaster objects. The first element of each tuple names the step. The second element should be a list of Transformer objects, a list of Reverter objects, a list of functions, or a single function. If it is a function or list of functions, the first argument in the should require a Forecaster or MVForecaster object. If it is a list of functions, Transformer, or Revereter objects, each one of these will be called on the Forecaster objects in the order they are passed to the fit_predict() method. Functions are checked for as objects that do not have the fit_transform() method, so adding more elements to the Pipeline may be possible if they have a fit_transform() method.
**kwargs – Passed to MVForecaster(). See https://scalecast.readthedocs.io/en/latest/Forecaster/MVForecaster.html#src.scalecast.MVForecaster.MVForecaster.__init__.

>>> from scalecast.Forecaster import Forecaster
>>> from scalecast.Pipeline import MVPipeline
>>> from scalecast.util import pdr_load, find_optimal_transformation
>>>
>>> def auto_Xvar_select(f):
>>>    f.auto_Xvar_select(max_ar=0)
>>> def forecaster(mvf):
>>>     mvf.set_test_length(24)
>>>     mvf.set_estimator('elasticnet')
>>>     mvf.manual_forecast(alpha=.2,lags=12)
>>>
>>> f1 = pdr_load('UTUR',future_dates=24,start='1970-01-01',end='2022-07-01')
>>> f2 = pdr_load('UTPHCI',future_dates=24,start='1970-01-01',end='2022-07-01')
>>> f3 = pdr_load('UNRATE',future_dates=24,start='1970-01-01',end='2022-07-01')
>>> # doing this helps the `DetrendTransform()` function
>>> fs = [f1,f2,f3]
>>> for f in fs:
>>>     f.set_test_length(24)
>>>
>>> transformer1, reverter1 = find_optimal_transformation(f1)
>>> transformer2, reverter2 = find_optimal_transformation(f2)
>>> transformer3, reverter3 = find_optimal_transformation(f3)
>>>
>>> pipeline = MVPipeline(
>>>     steps = [
>>>         ('Transform',[transformer1,transformer2,transformer3]),
>>>         ('Select Xvars',[auto_Xvar_select]*3), # finds xvars for each object
>>>         ('Forecast',forecaster,), # combines to an mvf object
>>>         ('Revert',[reverter1,reverter2,reverter3]), # breaks back to f objects
>>>     ],
>>>     names = ['UTUR','UTPHCI','UNRATE'],
>>>     merge_Xvars = 'i',
>>> )

Methods:

`backtest`(*fs[, n_iter, jump_back, ...])	Runs an out-of-sample backtest of the pipeline over a certain amount of iterations.
`fit_predict`(fs, *kwargs)	Applies the transform, forecast, and revert functions to the series stored in the Forecaster object. The order of Forecaster passed to *fs is the order all functions in lists will be applied.

backtest(*fs, n_iter=5, jump_back=1, series_length=None, fcst_length=None, test_length=None, cis=None, cilevel=None, verbose=False, **kwargs) → List[Dict[str, DataFrame]]

Runs an out-of-sample backtest of the pipeline over a certain amount of iterations.

Parameters:

*fs (Forecaster) – Send one if univariate forecasting with the Pipeline class, more than one if multivariate forecasting with the MVPipeline class.
n_iter (int) – Default 5. How many backtest iterations to perform.
jump_back (int) – Default 1. The space between consecutive training sets.
series_length (int) – Optional. The total length of each traning set. Leave unspecified if you want to use every available training observation for each iteration.
fcst_length (int) – Optional. The forecast horizon length to forecast over for each iteration. Leave unspecified if you want to use the forecast horizon already programmed into the Forecaster object.
test_length (int) – Optional. The test set to hold out for each model evaluation. Leave unspecified if you want to use the test length already programmed into the Forecaster object.
cis (bool) – Optional. Whether to backtest confidence intervals. Leave unspecified if you want to use whatever is already programmed into the Forecaster object.
cilevel (float) – Optional. What level to evaluate confidence intervals at. Leave unspecified if you want to use whatever is already programmed into the Forecaster object.
**kwargs – Passed to the fit_predict() method from Pipeline or MVPipeline.

Returns:

The results from each model and backtest iteration. Each dict element of the resulting list corresponds to the Forecaster objects in the order they were passed (will be length 1 if univariate forecasting). Each key of each dict is either ‘Actuals’, ‘Obs’, or the name of a model that got backtested. Each value is a DataFrame with the iteration values. The ‘Actuals’ frame has the date information and are the actuals over each forecast horizon. The ‘Obs’ frame has the actual historical observations to make each forecast, back padded with NA values to make each array the same length.

Return type:

(List[Dict[str,pd.DataFrame]])

>>> # univariate forecasting
>>> pipeline = Pipeline(
>>>     steps = [
>>>         ('Transform',transformer),
>>>         ('Forecast',forecaster),
>>>         ('Revert',reverter),
>>>     ],
>>> )
>>> backtest_results = pipeline.backtest(f,models=models)
>>>
>>> # multivariate forecasting
>>> pipeline = MVPipeline(
>>>    steps = [
>>>        ('Transform',[transformer1,transformer2,transformer3]),
>>>        ('Select Xvars',[auto_Xvar_select]*3),
>>>        ('Forecast',forecaster,),
>>>        ('Revert',[reverter1,reverter2,reverter3]),
>>>    ],
>>>    names = ['UTUR','UTPHCI','UNRATE'], # used to combine to the mvf object
>>>    merge_Xvars = 'i', # used to combine to the mvf object
>>> )
>>> backtest_results = pipeline.backtest(f1,f2,f3)

fit_predict(*fs: Forecaster, **kwargs)

Applies the transform, forecast, and revert functions to the series stored in the Forecaster object. The order of Forecaster passed to *fs is the order all functions in lists will be applied.

Parameters:

*fs (Forecaster) – The Forecaster objects that stores the series that will be sent through the pipeline.
**kwargs – Passed to any ‘function’ types passed in the pipeline.

Returns:

If the last element in the pipeline is a list of reverter functions this function returns the individual Forecaster objects. If not, an MVForecaster object is returned.

Return type:

(Tuple[Forecaster] | MVForecaster)

>>> pipeline = MVPipeline(
>>>    steps = [
>>>        ('Transform',[transformer1,transformer2,transformer3]),
>>>        ('Select Xvars',[auto_Xvar_select]*3), # applied to Forecaster objects
>>>        ('Forecast',forecaster,), # combines to an mvf object and calls the function
>>>        ('Revert',[reverter1,reverter2,reverter3]), # breaks back to f objects
>>>    ],
>>>    names = ['UTUR','UTPHCI','UNRATE'], # used to combine to the mvf object
>>>    merge_Xvars = 'i', # used to combine to the mvf object
>>> )
>>> f1, f2, f3 = pipeline.fit_predict(f1,f2,f3)