In mathematics, a time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Wikipedia
Time series analysis is a specific way of analyzing a sequence of data points collected over an interval of time. In time series analysis, analysts record data points at consistent intervals over a set period of time rather than just recording the data points intermittently or randomly. Tableau
Time-Series is a sequence of data points collected at different timestamps. These are essentially successive measurements collected from the same data source at the same time interval. Further, we can use these chronologically gathered readings to monitor trends and changes over time. The time-series models can be univariate or multivariate. The univariate time series models are implemented when the dependent variable is a single time series, like room temperature measurement from a single sensor. On the other hand, a multivariate time series model can be used when there are multiple dependent variables, i.e., the output depends on more than one series. An example for the multivariate time-series model could be modelling the GDP, inflation, and unemployment together as these variables are linked to each other.
- A Time-Series represents a series of time-based orders. It would be Years, Months, Weeks, Days, Horus, Minutes, and Seconds
- A time series is an observation from the sequence of discrete-time of successive intervals.
- A time series is a running chart.
- The time variable/feature is the independent variable and supports the target variable to predict the results.
- Time Series Analysis (TSA) is used in different fields for time-based predictions — like Weather Forecasting, Financial, Signal processing, Engineering domain — Control Systems, Communications Systems.
- Since TSA involves producing the set of information in a particular sequence, it makes a distinct from spatial and other analyses.
- Using AR, MA, ARMA, and ARIMA models, we could predict the future.
Long short-term memory (LSTM) is an artificial neural network used in the fields of artificial intelligence and deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. Such a recurrent neural network (RNN) can process not only single data points (such as images), but also entire sequences of data (such as speech or video). For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition, machine translation, robot control, video games, and healthcare. LSTM has become the most cited neural network of the 20th century.
Humans don’t start their thinking from scratch every second. As you read this essay, you understand each word based on your understanding of previous words. You don’t throw everything away and start thinking from scratch again. Your thoughts have persistence.
Traditional neural networks can’t do this, and it seems like a major shortcoming. For example, imagine you want to classify what kind of event is happening at every point in a movie. It’s unclear how a traditional neural network could use its reasoning about previous events in the film to inform later ones.
Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist.

The name of LSTM refers to the analogy that a standard RNN has both “long-term memory” and “short-term memory”. The connection weights and biases in the network change once per episode of training, analogous to how physiological changes in synaptic strengths store long-term memories; the activation patterns in the network change once per time-step, analogous to how the moment-to-moment change in electric firing patterns in the brain store short-term memories. The LSTM architecture aims to provide a short-term memory for RNN that can last thousands of timesteps, thus “long short-term memory”.
A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell.
LSTM networks are well-suited to classifying, processing and making predictions based on time series data, since there can be lags of unknown duration between important events in a time series. LSTMs were developed to deal with the vanishing gradient problem that can be encountered when training traditional RNNs. Relative insensitivity to gap length is an advantage of LSTM over RNNs, hidden Markov models and other sequence learning methods in numerous applications

Neural Network
The picture above depicts four neural network layers in yellow boxes, point wise operators in green circles, input in yellow circles and cell state in blue circles. An LSTM module has a cell state and three gates which provides them with the power to selectively learn, unlearn or retain information from each of the units. The cell state in LSTM helps the information to flow through the units without being altered by allowing only a few linear interactions. Each unit has an input, output and a forget gate which can add or remove the information to the cell state. The forget gate decides which information from the previous cell state should be forgotten for which it uses a sigmoid function. The input gate controls the information flow to the current cell state using a point-wise multiplication operation of ‘sigmoid’ and ‘tanh’ respectively. Finally, the output gate decides which information should be passed on to the next hidden state.
The key to LSTMs is the cell state, the horizontal line running through the top of the diagram.
The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged.

The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates.
Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and a pointwise multiplication operation.

The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let through. A value of zero means “let nothing through,” while a value of one means “let everything through!”
An LSTM has three of these gates, to protect and control the cell state.
Consider, you have the task of modifying certain information in a calendar. To do this, an RNN completely changes the existing data by applying a function. Whereas, LSTM makes small modifications on the data by simple addition or multiplication that flow through cell states. This is how LSTM forgets and remembers things selectively, which makes it an improvement over RNNs.
Now consider, you want to process data with periodic patterns in it, such as predicting the sales of colored powder that peaks at the time of Holi in India. A good strategy is to look back at the sales records of the previous year. So, you need to know what data needs to be forgotten and what needs to be stored for later reference. Else, you need to have a really good memory. Recurrent neural networks seem to be doing a good job at this, theoretically. However, they have two downsides, exploding gradient and vanishing gradient, that make them redundant.
Here, LSTM introduces memory units, called cell states, to solve this problem. The designed cells may be seen as differentiable memory.
Importing Modules
import pickle
from scalecast.Forecaster import Forecaster
sns.set(rc={'figure.figsize':(25,8)})df = train[(train['country']=='Belgium')&(train['product']=='Kaggle Advanced Techniques')&(train['store']=='KaggleMart')]
Forecaster Class
f = Forecaster(y=df.num_sold,current_dates=df.date)
Partial Auto-corelation
f.plot_pacf(lags=7)
plt.show()

Seasonal Decompose
f.seasonal_decompose (model='additive', extrapolate_trend='freq', period=1).plot()
plt.show()

Stationarity test
stat, p, _, _, _, _ = f.adf_test(full_res=True)
print( stat,p)
Output:
-2.717785335903165 0.0710108697786503
To model anything in scalecast, we need to complete the following three basic steps:
1. Specify a test length — all models are tested in scalecast with the same slice of data and at least one data point must be set aside to do so. There is no getting around this. The test length is a discrete number of the last observations in the full time series. You can pass a percentage of a discrete number to the set_test_length function.
2. Generate future dates — all models in scalecast produce a forecast in the same scale as the observed data. There is no getting around this. The number of dates you generate in this step will determine how long all models will be forecast out.
3. Choose an estimator — we will be using the “lstm” estimator, but there are a handful of others available.
f.set_test_length(30)
f.generate_future_dates(90)
f.set_estimator('lstm')
Default Forecasting
f.manual_forecast(call_me='lstm_default')
f.plot_test_set(ci=True)
Output:
44/44 [==============================] - 4s 5ms/step - loss: 0.2229
43/43 [==============================] - 1s 3ms/step - loss: 0.2223

General Forecasting with 30 Lags
f.manual_forecast(call_me='lstm_30lags',lags=30)
f.plot_test_set(ci=True)
Out:
43/43 [==============================] - 1s 4ms/step - loss: 0.2194
42/42 [==============================] - 1s 3ms/step - loss: 0.2051

General Forecasting with 7 Lags, 5 Epochs
f.manual_forecast(call_me='lstm_7lags_5epochs',
lags=24,
epochs=5,
validation_split=.2,
shuffle=True)
f.plot_test_set(ci=True)
Train:
Epoch 1/5
35/35 [==============================] - 2s 15ms/step - loss: 0.2260 - val_loss: 0.1772
Epoch 2/5
35/35 [==============================] - 0s 4ms/step - loss: 0.1580 - val_loss: 0.1148
Epoch 3/5
35/35 [==============================] - 0s 4ms/step - loss: 0.1008 - val_loss: 0.0753
Epoch 4/5
35/35 [==============================] - 0s 4ms/step - loss: 0.0761 - val_loss: 0.0630
Epoch 5/5
35/35 [==============================] - 0s 4ms/step - loss: 0.0712 - val_loss: 0.0615
Epoch 1/5
34/34 [==============================] - 2s 14ms/step - loss: 0.2268 - val_loss: 0.1681
Epoch 2/5
34/34 [==============================] - 0s 4ms/step - loss: 0.1491 - val_loss: 0.1072
Epoch 3/5
34/34 [==============================] - 0s 4ms/step - loss: 0.1037 - val_loss: 0.0890
Epoch 4/5
34/34 [==============================] - 0s 4ms/step - loss: 0.0938 - val_loss: 0.0824
Epoch 5/5
34/34 [==============================] - 0s 4ms/step - loss: 0.0909 - val_loss: 0.0800

General Forecasting with Early Stopping
from tensorflow.keras.callbacks import EarlyStopping
f.manual_forecast(call_me='lstm_30lags_earlystop_8layers',
lags=30,
epochs=50,
validation_split=.2,
shuffle=True,
callbacks=EarlyStopping(monitor='val_loss',
patience=8),
lstm_layer_sizes=(16,16,16),
dropout=(0,0,0))
f.plot_test_set(ci=True)
Train:
Epoch 1/50
35/35 [==============================] - 4s 38ms/step - loss: 0.1754 - val_loss: 0.1035
Epoch 2/50
35/35 [==============================] - 0s 8ms/step - loss: 0.0893 - val_loss: 0.0659
------
------
Epoch 28/50
34/34 [==============================] - 0s 8ms/step - loss: 0.0805 - val_loss: 0.0719
Epoch 29/50
34/34 [==============================] - 0s 8ms/step - loss: 0.0802 - val_loss: 0.0707

Manual Forecasting
f.manual_forecast(call_me='lstm_best',
lags=36,
batch_size=32,
epochs=15,
validation_split=.2,
shuffle=True,
activation='tanh',
optimizer='Adam',
learning_rate=0.001,
lstm_layer_sizes=(72,)*4,
dropout=(0,)*4,
plot_loss=True)
f.plot_test_set(order_by='LevelTestSetMAPE',models='top_2',ci=True)
Train:
Epoch 1/15
35/35 [==============================] - 6s 42ms/step - loss: 0.1246 - val_loss: 0.0759
Epoch 2/15
35/35 [==============================] - 0s 11ms/step - loss: 0.0747 - val_loss: 0.0643
-----------
-----------Epoch 14/15
34/34 [==============================] - 0s 11ms/step - loss: 0.0818 - val_loss: 0.0685
Epoch 15/15
34/34 [==============================] - 0s 11ms/step - loss: 0.0813 - val_loss: 0.0693


f.set_estimator('mlr') # 1. choose the mlr estimator
f.add_ar_terms(7) # 2. add regressors (24 lagged terms)
f.add_seasonal_regressors('month','quarter',dummy=True) # 2.
f.add_seasonal_regressors('year') # 2.
f.add_time_trend() # 2.
f.diff() # 3. difference non-stationary dataf.manual_forecast()
f.plot_test_set(order_by='LevelTestSetMAPE',models='top_2')

f.plot(models=['mlr','lstm_best'],
order_by='LevelTestSetMAPE',
level=True)
