Stock price data is by its very nature sequential by time. As you would logically expect, each day follows the one before.
To predict a future stock price a special type of neural network is used called a recurrent neural network (RNN). This type of network structures the nodes into a temporal sequence.
To start applying this, we first need access to time series stock data. In this example the Ameritrade APIs are used to generate a data file, but other sources could equally be used.
The above code queries the Ameritrade APIs and returns 10 years of stock price data, in this example for American Airlines. Once plotted, the data should look something like this:
The candle data is then converted to a Pandas dataframe and a new column is added for the previous days closing price. This is then saved out to a CSV file.
Seperately the file can then be loaded to begin processing a prediction.
With the data loaded back into a dataframe, the timestamp needs to be set as the index and then it must be scaled into a 0 to 1 range to make it easier to process.
For this, we just scale the closing price, volume and previous days closing price columns.
Now we will split the data into training and testing datasets.
Importantly, the data is split at a predetermined point along the time series. In this case, we split it at the 80% mark meaning that the last 20% of the time series will be used for testing the predictions.
Next we must now window the data. Doing this creates snippets of data along the time series, each of which will be trained independently.
In the above example the data is windowed into groups of 5 samples. The number of samples in each group can be varied and the results will differ greatly depending on what value is used.
The data is now ready, so we will move on to building the RNN model itself. This begins with creating a callback function to stop the training early if performance is not impriving.
With the model built, we can now fit it to the data. Initially we will use a learning rate of 0.0001 over 50 epochs and 20 batches.
As the epochs run, the model loss function should decrease over time. In this case an optimal result was achieved after around 10 epochs. The model is now trained and ready for predictions.
This prediction is the actual stock price during the training period against the predicted price. As we are predicting against the same data we trained on, generally a very good result can be achieved as shown below:
If we now generate a prediction against the reserved test data (the 20% we seperated at the start) a more realistic prediction can be generated.
In this case we see some separation between the values but the predicted values do generally follow the actual ones.
The real test of the model is how well it generalises against another stock price entirely. To do this, another stock price was captured to a csv file and then loaded into the model to generate a prediction. The below example shows a prediction of the Microsoft stock price:
Overall, the model performs well. There is some degree of noise in the predictions but it follows the general trend well.
The last thing to do is to set the solution up to be able to predict tomorrows stock price. To do this we add another window to the data sequence with zero values.
We will then run the prediction as before but in this case, the last window will contain a future prediction. To extract the actual prediction, the scaling applied at the start must be reversed.
That completes the solution!