How fast is fitting long AR models using Neural prophet? In this quick test we will fit AR based models with different lags and see how long they take to fit.
To start, import some relevant libraries:
from operator import le
from time import process_time, time
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
plt.style.use("seaborn-whitegrid")
import logging
logging.basicConfig(level=logging.WARNING)
import sys
from pathlib import Path
import statsmodels.tsa.arima.model
from fbprophet import Prophet
from neuralprophet import NeuralProphet
sys.path.insert(0, str(Path(os.getcwd()).parent / "utilities"))
import data
import utilities
ERROR:fbprophet.plot:Importing plotly failed. Interactive plots will not work.
The data used for this experiment is weather data - daily temperatures over a few years with no other covariates.
We format the data appropriately for the Facebook Prophet API,
with a datetime column ds
and an output column y
.
We split the data into the usual train and test sets.
df = data.get_weather_data()
df_train, df_test = utilities.split_ts(df)
ax = df_train["y"].plot(figsize=(10, 6), label="train")
df_test["y"].plot(ax=ax, label="test")
ax.legend()
ax.set_xlabel("Date")
ax.set_ylabel("Temp (°C)")
df_train = df_train.reset_index()
df_test = df_test.reset_index()
df_train.head(10)
ds | y | |
---|---|---|
0 | 2009-01-02 | -4.54 |
1 | 2009-01-03 | -4.71 |
2 | 2009-01-04 | -1.90 |
3 | 2009-01-05 | -1.47 |
4 | 2009-01-06 | -12.63 |
5 | 2009-01-07 | -21.09 |
6 | 2009-01-08 | -10.78 |
7 | 2009-01-09 | -13.91 |
8 | 2009-01-10 | -13.24 |
9 | 2009-01-11 | -10.75 |
Now let’s fit two sets of models. The first is an ARIMA model from statsmodels
the second is an approximate AR model using NeuralProphet
.
Each iteration we increase the number of lags used in the model, and we time how long each takes to fit.
The neural prophet model has the various seasonalities and change point parts disabled so it closer to a raw AR model.
We also need to change the loss function to standard MSE.
fit_time_ar = []
fit_time_np = []
mae_np = []
lag_range = range(1,25)
logging.getLogger("nprophet").setLevel(logging.WARNING)
for lag in lag_range:
# fit statsmodels
t1 = process_time()
model_arima = statsmodels.tsa.arima.model.ARIMA(endog=df_train.set_index('ds'), order=(lag,0,0), freq='1D').fit()
fit_time_ar.append(process_time() - t1)
# fit neuralprophet
t1 = process_time()
model_nprophet_ar = NeuralProphet(
growth="off",
n_changepoints=0,
n_forecasts=1,
n_lags=lag,
daily_seasonality=False,
weekly_seasonality=False,
yearly_seasonality=False,
loss_func='MSE',
normalize='off'
)
mae_np.append(model_nprophet_ar.fit(df_train, freq="D"))
fit_time_np.append(process_time() - t1)
88%|████████▊ | 88/100 [00:00<00:00, 587.23it/s]
...
86%|████████▌ | 86/100 [00:00<00:00, 729.68it/s]
Plotting these results we can see that the fitting time for neuralprophet is fairly flat. As expected with fitting ARIMA models, the time of fitting increases rapidly with the number of lags. This means neuralprophet allows us to fit longer AR based models which would not be otherwise possible.
fig, ax = plt.subplots(figsize=(10,6))
ax.plot(lag_range[:-1], fit_time_ar[:-1], '-x',label='ARIMA')
ax.plot(lag_range[:-1], fit_time_np[:-1], '-x',label='NeuralProphet')
fig.legend()
ax.set_xlabel('AR lag')
ax.set_ylabel('Fitting time (s)')
plt.show()
Prediction results
To ensure these results are meaningful we need to check the models they are fitting are comparible. Let’s train a smaller neural prophet so we can check for similarity with an actual AR model.
lag = 5
model_arima = statsmodels.tsa.arima.model.ARIMA(endog=df_train.set_index('ds'), order=(lag,0,0), freq='1D').fit()
# fit neuralprophet
model_nprophet_ar = NeuralProphet(
growth="off",
n_changepoints=0,
n_forecasts=1,
n_lags=lag,
daily_seasonality=False,
weekly_seasonality=False,
yearly_seasonality=False,
epochs = 100,
loss_func='MSE',
normalize='off'
)
model_nprophet_ar.fit(df_train, freq="D")
86%|████████▌ | 86/100 [00:00<00:00, 630.39it/s]
MSELoss | MAE | RegLoss | |
---|---|---|---|
0 | 19.424664 | 3.443962 | 0.0 |
1 | 11.272208 | 2.658695 | 0.0 |
2 | 9.073363 | 2.393834 | 0.0 |
3 | 8.814939 | 2.356173 | 0.0 |
4 | 9.030052 | 2.372665 | 0.0 |
... | ... | ... | ... |
95 | 8.739055 | 2.340827 | 0.0 |
96 | 8.709106 | 2.333304 | 0.0 |
97 | 8.679108 | 2.335190 | 0.0 |
98 | 8.694123 | 2.333388 | 0.0 |
99 | 8.666990 | 2.332289 | 0.0 |
100 rows × 3 columns
fig, ax = plt.subplots(figsize=(10,6), ncols=2)
ax[0].plot(model_arima.params.iloc[1:-1].to_numpy())
ax[0].set_title('ARIMA')
ax[0].set_xlabel('AR Lag')
ax[0].set_ylabel('Coefficient')
ax[1].plot(np.flip(model_nprophet_ar.model.ar_weights.detach().numpy()).flatten())
ax[1].set_title('np')
ax[1].set_xlabel('AR Lag')
plt.show()
Their coefficients are nearly identical. As such predictions from each model are nearly the same:
pred_arima = model_arima.predict(start=df_train['ds'].iloc[-1], end=df_train['ds'].iloc[-1] + pd.Timedelta('100D'))
pred_nprophet = df_train.copy()
for idx in range(100):
future_nprophet = model_nprophet_ar.make_future_dataframe(
df=pred_nprophet,
)
temp = model_nprophet_ar.predict(future_nprophet)
temp['y'] = temp[['y','yhat1']].fillna(0).sum(axis=1)
temp = temp[['ds','y']]
pred_nprophet = pred_nprophet.append(temp.iloc[-1])
pred_nprophet = pred_nprophet.iloc[-101:].reset_index(drop=True)
fig, ax = plt.subplots(figsize=(10, 6))
pred_arima.plot(ax=ax, label='ARIMA')
pred_nprophet.set_index('ds')['y'].plot(ax=ax, label='np')
df_train.set_index('ds')['y'].iloc[-200:].plot(ax=ax, label='actual')
ax.set_ylabel("Temp (°C)")
fig.legend()
plt.show()
Long lags
Due to the significantly faster fitting time we can train models with significantly longer lags. This may not be necessary most of the time, but if we wanted to, we could do it! For example in the following we train an AR model with 500 lags.
lag = 500
t1 = process_time()
model_nprophet_ar = NeuralProphet(
growth="off",
n_changepoints=0,
n_forecasts=1,
n_lags=lag,
daily_seasonality=False,
weekly_seasonality=False,
yearly_seasonality=False,
epochs = 100,
loss_func='MSE',
normalize='off'
)
loss_epoch = model_nprophet_ar.fit(df_train, freq="D")
print("\n")
print("\n")
print(f"Time taken: {process_time() - t1} s")
69%|██████▉ | 69/100 [00:00<00:00, 779.16it/s]
Time taken: 7.613330999999988 s
Neuralprophet is based on an underlying pytorch model and trains using gradient descent. The fitting time is related to the number of epochs. This is chosen automatically in the package. The learning rate is by default chosen with pytorch lightning’s learning rate finder. The automatic learning rate can be a bit noisy and stop training early. As such it may be necessary to plot the loss against epoch. As a single example, the auto epoch number would be 27 in the following graph, where we train for 100.
fig, ax = plt.subplots(figsize=(10,6))
ax.plot(loss_epoch['MSELoss'])
ax.set_xlabel('Epoch')
ax.set_ylabel('Loss')
plt.show()
Even though, we can train for much more epochs and still successfully train a very long AR model.
Seasonality via AR models?
With a very long AR model we could potentially capture seasonalities which would be impossible before. On this daily weather data we have a yearly seasonality, we can’t capture that with a normal AR model. However by using neuralprophet we actually can…
pred_arima = model_arima.predict(start=df_train['ds'].iloc[-1], end=df_train['ds'].iloc[-1] + pd.Timedelta('200D'))
pred_nprophet = df_train.copy()
for idx in range(200):
future_nprophet = model_nprophet_ar.make_future_dataframe(
df=pred_nprophet,
)
temp = model_nprophet_ar.predict(future_nprophet)
temp['y'] = temp[['y','yhat1']].fillna(0).sum(axis=1)
temp = temp[['ds','y']]
pred_nprophet = pred_nprophet.append(temp.iloc[-1])
pred_nprophet = pred_nprophet.iloc[-201:].reset_index(drop=True)
fig, ax = plt.subplots(figsize=(10, 6))
pred_arima.plot(ax=ax, label='ARIMA')
pred_nprophet.set_index('ds')['y'].plot(ax=ax, label='np')
df_train.set_index('ds')['y'].iloc[-1500:].plot(ax=ax, label='actual')
ax.set_ylabel("Temp (°C)")
fig.legend()
plt.show()
The prediction shows we are able to capture some of the yearly seasonality!
The prediction is quite noisy. The pytorch based approach also allows us to build in regularisation into the AR coefficients.
This should reduce the models ability to model the noise and as such hopefully produce more smooth predictions.
This in turn means we don’t have to worry as much about getting the correct AR lag count when specifying the model
Here we set the ar_sparsity
to a very low value, which pushes most of the AR coefficients close to 0.
model_nprophet_ar_sparse = NeuralProphet(
growth="off",
n_changepoints=0,
n_forecasts=1,
n_lags=lag,
daily_seasonality=False,
weekly_seasonality=False,
yearly_seasonality=False,
epochs = 100,
loss_func='MSE',
normalize='off',
ar_sparsity=0.0001 # *** #
)
model_nprophet_ar_sparse.fit(df_train, freq="D")
65%|██████▌ | 65/100 [00:00<00:00, 798.22it/s]
MSELoss | MAE | RegLoss | |
---|---|---|---|
0 | 48.809326 | 5.547075 | 0.000000 |
1 | 47.614197 | 5.474120 | 1.891101 |
2 | 46.352211 | 5.397998 | 3.777936 |
3 | 44.840831 | 5.304262 | 5.659053 |
4 | 43.259937 | 5.211620 | 7.524754 |
... | ... | ... | ... |
95 | 8.483594 | 2.326365 | 27.624358 |
96 | 8.484436 | 2.326009 | 26.968793 |
97 | 8.483948 | 2.325840 | 26.380927 |
98 | 8.484425 | 2.325925 | 25.855129 |
99 | 8.484677 | 2.325938 | 25.359007 |
100 rows × 3 columns
Comparing the coefficients shows the sparse model is more conservative.
fig, ax = plt.subplots(figsize=(10,6), ncols=2)
ax[0].plot(np.flip(model_nprophet_ar.model.ar_weights.detach().numpy()).flatten(), label='np')
ax[1].plot(np.flip(model_nprophet_ar_sparse.model.ar_weights.detach().numpy()).flatten(), label='np')
plt.show()
pred_nprophet_sparse = df_train.copy()
for idx in range(200):
future_nprophet = model_nprophet_ar_sparse.make_future_dataframe(
df=pred_nprophet_sparse,
)
temp = model_nprophet_ar_sparse.predict(future_nprophet)
temp['y'] = temp[['y','yhat1']].fillna(0).sum(axis=1)
temp = temp[['ds','y']]
pred_nprophet_sparse = pred_nprophet_sparse.append(temp.iloc[-1])
pred_nprophet_sparse = pred_nprophet_sparse.iloc[-201:].reset_index(drop=True)
fig, ax = plt.subplots(figsize=(10, 6))
pred_nprophet.set_index('ds')['y'].plot(ax=ax, label='np')
pred_nprophet_sparse.set_index('ds')['y'].plot(ax=ax, label='np_sparse')
df_train.set_index('ds')['y'].iloc[-1500:].plot(ax=ax, label='actual')
ax.set_ylabel("Temp (°C)")
fig.legend()
plt.show()