AnomalyDetector

This object can be used to detect anomalies in a time series using any of three methods. See the example notebook: https://scalecast-examples.readthedocs.io/en/latest/misc/anomalies/anomalies.html

import pandas as pd
import pandas_datareader as pdr
import matplotlib.pyplot as plt
from scalecast.Forecaster import Forecaster
from scalecast.AnomalyDetector import AnomalyDetector

df = pdr.get_data_fred('HOUSTNSA',start='1900-01-01',end='2021-06-01')
f = Forecaster(y=df['HOUSTNSA'],current_dates=df.index)
f.set_test_length(12)

detector = AnomalyDetector(f)

detector.NaiveDetect(extrapolate_trend='freq',cilevel=.99,train_only=True)

detector.MonteCarloDetect('2010-01-01','2020-12-01',cilevel=.99)

detector.EstimatorDetect(
    estimator='lstm',
    cilevel=.99,
    test_only=False,
    lags=24,
    epochs=25,
    validation_split=.2,
    shuffle=True,
    lstm_layer_sizes=(16,16,16),
    dropout=(0,0,0),
)

class src.scalecast.AnomalyDetector.AnomalyDetector(f)

Methods:

`EstimatorDetect`(estimator[, future_dates, ...])	Detects anomalies with one of a Forecaster object's estimators.
`MonteCarloDetect`(start_at, stop_at[, sims, ...])	Detects anomalies by running a series of monte carlo simulations over a span of the series, using the observations before the span start to determine the initial assumed distribution.
`MonteCarloDetect_sliding`(historical_window, ...)	Detects anomalies by running a series of monte carlo simulations rolling over a span of the series.
`NaiveDetect`([cilevel])	Detects anomalies by breaking a series into its fundamental components: trend, seasonality, and residual.
`WriteAnomtoXvars`([f, future_dates])	Writes the Xvars from the previously called anomaly detector to Xvars in a Forecaster object.
`adjust_anom`([f, method, q])	Changes the values of identified anomalies and returns a Forecaster object.
`plot_anom`([label, strftime_fmt])	Plots the series used to detect anomalies and red dashes around points that were identified as anomalies from the last algorithm run.
`plot_mc_results`([ax, figsize])	Plots the results from a monte-carlo detector: the series' original values and the simulated lines.

EstimatorDetect(estimator, future_dates=None, cilevel=0.99, samples=100, return_fitted_vals=False, random_seed=None, **kwargs)

Detects anomalies with one of a Forecaster object’s estimators. An anomaly in this instance is defined as any value that falls out of the fitted values’ bootstrapped confidence intervals determined by the value passed to cilevel. This can be a good method to detect anomalies if you want to attempt to break down a series’ into trends, seasonalities, and autoregressive parts in a more complex manner than NaiveDetect would let you. It also gives access to RNN estimators, which are shown to be effective anomaly detectors for time series. Results are saved to the labeled_anom attribute.

Parameters:

estimator (str) – One of Forecaster.estimators. The estimator to track anomalies with.
future_dates (int) – Optional. If this is specified with an integer, the estimator will use that number of forecast steps. If you want to span an entire series for anomalies, not just the training set, future dates should be created either before initiating the AnomalyDetector object or by passing an int to this arg. Future dates are what signal to the object that we want to train the entire dataset.
cilevel (float) – Default 0.99. The confidence interval to use when bootstrapping confidence intervals.
samples (int) – Default 100. How many samples in the bootstrap to find confidence intervals.
return_fitted_vals (bool) – Default False. Whether to return a DataFrame of the fitted values and confidence intervals from the fitting process.
random_seed (bool) – Optional. Set a seed for consistent results.
**kwargs – Passed to the Forecaster.manual_forecast() method.

Returns:

A DataFrame of fitted values if return_fitted_vals is True.

Return type:

(DataFrame or None)

>>> from scalecast.AnomalyDetector import AnomalyDetector
>>> from scalecast.SeriesTransformer import SeriesTransformer
>>> from scalecast.Forecaster import Forecaster
>>> import pandas_datareader as pdr
>>> df = pdr.get_data_fred('HOUSTNSA',start='1900-01-01',end='2021-06-01')
>>> f = Forecaster(y=df['HOUSTNSA'],current_dates=df.index)
>>> transformer = SeriesTransformer(f)
>>> f = transformer.LogTransform()
>>> f = transformer.DiffTransform(1)
>>> f = transformer.DiffTransform(12)
>>> detector = AnomalyDetector(f)
>>> detector.EstimatorDetect(
>>>    estimator='lstm',
>>>    cilevel=.99,
>>>    lags=24,
>>>    epochs=25,
>>>    validation_split=.2,
>>>    shuffle=True,
>>>    lstm_layer_sizes=(16,16,16),
>>>    dropout=(0,0,0),
>>> )

MonteCarloDetect(start_at, stop_at, sims=100, cilevel=0.99)

Detects anomalies by running a series of monte carlo simulations over a span of the series, using the observations before the span start to determine the initial assumed distribution. Results are saved to the raw_anom, labeled_anom, and mc_results attributes. It is a good idea to transform the series before running so that it is stationary and not seasonal. In other words, the series distribution should be as close to normal as possible.

Parameters:

start_at (int, str, Datetime.Datetime, or pandas.Timestamp) – If int, will start at that number obs in the series. Anything else should be a date-like object that can be parsed by the pandas.Timestamp() function, representing the starting point of the simulation. All observations before this point will be used to determine the mean/std of the intial distribution.
stop_at (int, str, Datetime.Datetime, or pandas.Timestamp) – If int, will stop at that number obs in the series. Anything else should be a date-like object that can be parsed by the pandas.Timestamp() function, representing the stopping point of the simulation.
sims (int) – The number of simulations.
cilevel (float) – Default .99. The percentage of points in the simulation that a given actual observation needs to be outside of the simulated series to be considered an anomaly.

Returns:

None

>>> from scalecast.AnomalyDetector import AnomalyDetector
>>> from scalecast.SeriesTransformer import SeriesTransformer
>>> from scalecast.Forecaster import Forecaster
>>> import pandas_datareader as pdr
>>> df = pdr.get_data_fred('HOUSTNSA',start='1900-01-01',end='2021-06-01')
>>> f = Forecaster(y=df['HOUSTNSA'],current_dates=df.index)
>>> transformer = SeriesTransformer(f)
>>> f = transformer.LogTransform()
>>> f = transformer.DiffTransform(1)
>>> f = transformer.DiffTransform(12)
>>> detector = AnomalyDetector(f)
>>> detector.MonteCarloDetect('2010-01-01','2020-12-01',cilevel=.99)

MonteCarloDetect_sliding(historical_window, step, **kwargs)

Detects anomalies by running a series of monte carlo simulations rolling over a span of the series. It is a good idea to transform the series before running so that it is stationary and not seasonal. In other words, the series distribution should be as close to normal as possible.

Parameters:

historical_window (int) – The number of periods to begin the initial search.
step (int) – How far to step forward after a scan.
**kwargs – Passed to the MonteCarloDetect() method. start_at and stop_at passed automatically based on the values passed to the other arguments in this function.

Returns:

None

>>> from scalecast.AnomalyDetector import AnomalyDetector
>>> from scalecast.SeriesTransformer import SeriesTransformer
>>> from scalecast.Forecaster import Forecaster
>>> import pandas_datareader as pdr
>>> df = pdr.get_data_fred('HOUSTNSA',start='1900-01-01',end='2021-06-01')
>>> f = Forecaster(y=df['HOUSTNSA'],current_dates=df.index)
>>> transformer = SeriesTransformer(f)
>>> f = transformer.LogTransform()
>>> f = transformer.DiffTransform(1)
>>> f = transformer.DiffTransform(12)
>>> detector = AnomalyDetector(f)
>>> detector.MonteCarloDetect_sliding(60,30)

NaiveDetect(cilevel=0.99, **kwargs)

Detects anomalies by breaking a series into its fundamental components: trend, seasonality, and residual. anomalies are defined as standard normal residuals further than a number of standard deviations away from the mean, determined by the value passed to cilevel. This is a simple, computationally cheap anomaly detector. Results are saved to the raw_anom and labeled_anom attributes.

Parameters:

cilevel (float) – Default 0.99. The confidence interval used to determine how far away a given residual must be from the mean to be considered an anomaly. In a normal series that is decomposed effectively in this process, a cilevel of 0.95 would still expect to label 5% of its points as anomalies.
**kwargs – Passed to the Forecaster.seasonal_decompose() method. If extrapolate_trend is left unspecified, this will fail to produce results. See https://scalecast.readthedocs.io/en/latest/Forecaster/Forecaster.html#src.scalecast.Forecaster.Forecaster.seasonal_decompose.

Returns:

None

>>> from scalecast.AnomalyDetector import AnomalyDetector
>>> from scalecast.Forecaster import Forecaster
>>> import pandas_datareader as pdr
>>> df = pdr.get_data_fred('HOUSTNSA',start='1900-01-01',end='2021-06-01')
>>> f = Forecaster(y=df['HOUSTNSA'],current_dates=df.index)
>>> detector = AnomalyDetector(f)
>>> detector.NaiveDetect(extrapolate_trend='freq',train_only=True)

WriteAnomtoXvars(f=None, future_dates=None, **kwargs)

Writes the Xvars from the previously called anomaly detector to Xvars in a Forecaster object. Each anomaly is its own dummy variable on the date it is found. A future distriution could detect level shifts.

Parameters:

f (Forecaster) – optional. if you pass an object here, that object will receive the Xvars. otherwise, it will apply to the copy of the object stored in the the AnomalyDetector object when it was initialized. this Forecaster object is stored in the f attribute.
future_dates (int) – optional. if you pass a future dates length here, it will write that many dates to the Forecaster object and future anomaly variables will be passed as arrays of 0s so that any algorithm you train will be able to use them into future horizon.
**kwargs – passed to the Forecaster.ingest_Xvars_df() function. see https://scalecast.readthedocs.io/en/latest/Forecaster/Forecaster.html#src.scalecast.Forecaster.Forecaster.ingest_Xvars_df

Returns:

(Forecaster) an object with the Xvars written.

>>> from scalecast.AnomalyDetector import AnomalyDetector
>>> from scalecast.SeriesTransformer import SeriesTransformer
>>> from scalecast.Forecaster import Forecaster
>>> import pandas_datareader as pdr
>>> df = pdr.get_data_fred('HOUSTNSA',start='1900-01-01',end='2021-06-01')
>>> f = Forecaster(y=df['HOUSTNSA'],current_dates=df.index)
>>> transformer = SeriesTransformer(f)
>>> f = transformer.LogTransform()
>>> f = transformer.DiffTransform(1)
>>> f = transformer.DiffTransform(12)
>>> detector = AnomalyDetector(f)
>>> detector.MonteCarloDetect('2010-01-01','2020-12-01',cilevel=.99)
>>> f = detector.WriteAnomtoXvars(drop_first=True)

adjust_anom(f=None, method='q', q=10)

Changes the values of identified anomalies and returns a Forecaster object.

Parameters:

f (Forecaster) – Optional. If you pass an object here, that object will have its y values altered. Otherwise, it will apply to the copy of the object stored in the the AnomalyDetector object when it was initialized. this Forecaster object is stored in the f attribute.
method (str) – The following methods are supported: “q” and “interpolate”. “q” uses q-cutting from pandas and fills values with second-to-last q value in either direction. For example, if q == 10, then high anomaly values will be replaced with the 90th percentile of the rest of the series data. Low anomaly values will be replaced with the 10th percentile of the rest of the series. This is a good method for when your data is stationary. For data with a trend, ‘interpolate’ is better as it fills in values linearly based on the values before and after consecutive anomaly values. Be careful when using “q” with differenced data. when undifferencing, original values will be reverted back to.
q (int) – Default 10. The q-value to use when method == ‘q’. Ignored when method != ‘q’.

>>> from scalecast.AnomalyDetector import AnomalyDetector
>>> from scalecast.Forecaster import Forecaster
>>> import pandas_datareader as pdr
>>> df = pdr.get_data_fred('HOUSTNSA',start='1900-01-01',end='2021-06-01')
>>> f = Forecaster(y=df['HOUSTNSA'],current_dates=df.index)
>>> detector = AnomalyDetector(f)
>>> detector.EstimatorDetect(
>>>    estimator='arima',
>>>    order=(1,1,1),
>>>    seasonal_order=(1,1,1),
>>> )
>>> f = detector.adjust_anom(method='interpolate')

plot_anom(label=True, strftime_fmt='%Y-%m-%d')

Plots the series used to detect anomalies and red dashes around points that were identified as anomalies from the last algorithm run.

Parameters:

label (bool) – Default True. Whether to add the date label to each plotted point.
strftime_fmt (str) – Default ‘%Y-%m-%d’. The string format to convert dates to when label is True. When label is False, this is ignored.

Returns:

The figure’s axis.

Return type:

(Axis)

>>> from scalecast.AnomalyDetector import AnomalyDetector
>>> from scalecast.SeriesTransformer import SeriesTransformer
>>> from scalecast.Forecaster import Forecaster
>>> import pandas_datareader as pdr
>>> import matplotlib.pyplot as plt
>>> df = pdr.get_data_fred('HOUSTNSA',start='1900-01-01',end='2021-06-01')
>>> f = Forecaster(y=df['HOUSTNSA'],current_dates=df.index)
>>> transformer = SeriesTransformer(f)
>>> f = transformer.LogTransform()
>>> f = transformer.DiffTransform(1)
>>> f = transformer.DiffTransform(12)
>>> detector = AnomalyDetector(f)
>>> detector.MonteCarloDetect('2010-01-01','2020-12-01',cilevel=.99)
>>> ax = f.plot_anom()
>>> plt.show()

plot_mc_results(ax=None, figsize=(12, 6))

Plots the results from a monte-carlo detector: the series’ original values and the simulated lines.

Parameters:

ax (Axis) – Optional. An existing axis to display the figure on.
figsize (tuple) – Default (12,6). The size of the resulting figure. Ignored if axis is not None.

Returns:

The figure’s axis.

Return type:

(Axis)

>>> from scalecast.AnomalyDetector import AnomalyDetector
>>> from scalecast.SeriesTransformer import SeriesTransformer
>>> from scalecast.Forecaster import Forecaster
>>> import pandas_datareader as pdr
>>> import matplotlib.pyplot as plt
>>> df = pdr.get_data_fred('HOUSTNSA',start='1900-01-01',end='2021-06-01')
>>> f = Forecaster(y=df['HOUSTNSA'],current_dates=df.index)
>>> transformer = SeriesTransformer(f)
>>> f = transformer.LogTransform()
>>> f = transformer.DiffTransform(1)
>>> f = transformer.DiffTransform(12)
>>> detector = AnomalyDetector(f)
>>> detector.MonteCarloDetect('2010-01-01','2020-12-01',cilevel=.99)
>>> ax = f.plot_mc_results()
>>> plt.show()