Predicting Stock Prices using ARIMA Model in R
With the rise of so many investors in the stock and cryptocurrencies market space. It would be great to create a program that can predict the market prices to help investors. <!--more--> Help them make the best decision on whether or not it's the right time to invest, so that they can make more profits or money.
Introduction
ARIMA model is one of the most useful and accurate time series models in making predictions about future trends. In our case we will predict stock market prices using R programming language.
Table of contents
- Prerequisites
- Importing Yahoo Finance data in R
- Stock charting
- Analyze the correlation of data
- Differencing data to be stationary
- Do stationary testing using unit root testing
- Building ARIMA model
- Fitting the ARIMA model and forecasting
Prerequisites
To follow along with this tutorial, the reader will need the following:
- A basic knowledge of time series and how various time series models work.
- Have R studio installed on your PC.
- A basic knowledge on how to analyze and interpret charts.
Importing Yahoo Finance data in R
In this tutorial, we are going to demonstrate stock price forecasting using the Amazon stock price, we will be using the NASDAQ
symbol: AMZN which will be imported into Yahoo Finance using the quantmod package in R.
The data will consist of OHLC (Open, High, Low, and Closed) type, but for simplicity sake we will use the close price to make our model a univariate time series.
We need to install the quantmond packages using;
install.packages("quantmod")
After installing the quantmod package, open the library to activate it using this code library(quantmod)
.
Then to import Amazon data, use getSymbols()
function.
Stock charting
To start analyzing the stock, we need to add technical indicators such as moving average, Bollinger bands (20, sd=1), relative strength index of 14 days, and Moving Average Convergence Divergence (12, 25) as the technical tools of analysis before forecasting.
To make our analysis easier, we will do a log transformation of the data to depict the growth rate of the stock and scale the unit value as shown below.
Plotting the log transformation of the data:
The stock shows an upward trend, but there is also a slight downward trend showing that there is a high volatility which means that the data is non-stationary like most financial data.
Therefore, we can assume that it's a random walk; meaning that the current price is equal to the price at the time (t-1) plus white noise therefore in order to fit the data in ARIMA model, we should differentiate the data in a particular lag.
Analyze the correlation of data
We analyze the AutoCorrelation function (ACF) and partial AutoCorrelation perform analysis to see if there is any correlation between today's and yesterday's price.
Differencing data to be stationary
Now that we know that our data is not stationary, we need to make it stationary by differentiating it at a certain lag for it to fit our ARIMA model.
Making the data stationary is important in that it helps us predict that the past statistical properties of our data will remain the same in the future. Here, the log-transformed data will be differenced by 1 lag to make it stationary.
We'll have to make sure that we fill in missing values with values from the observations after the missing value.
Do stationary testing using unit root testing
After differentiating the data at lag 1 and making our data stationary, we'll test if the data is stationary using Unit Root Testing.
We'll test this using the Augmented Dickey Fuller test. This tests the hypothesis of the stationary data.
If the resulting p-value is below 0.05, we will reject the null hypothesis and conclude that the data is stationary. We are going to test our differenced data.
To do this, activate the tseries
package using library (tseries) then we perform adf
test using:
adf<-adf.test(AMAZON_diff, alternative=c(“stationary”,”explosive”), k=0)
adf
The p-value is 0.01 meaning that our data is now stationary with no unit root making it appropriate for our ARIMA model.
To check if our data can fit into the AutoRegressive model and MA process, we will generate ACF and PACF correlogram using acf
and pacf
functions as follows.
We need to split our training data into sub sets starting from the first period to the 3355th period, which is from 3rd January 2013 to 12th March 2022 to train our model.
Next, install the caTools library using;
install.packages(“caTools”)
Activate the library caTools:
library(caTools)
To select our train data, we will use;
train_data<-AMAZON_diff[1:3355]
Building ARIMA model
The ARIMA model in R is found in the package ‘forecast’ which we will first install and then activate as follows:
install.packages(“forecast”)
library(forecast)
Auto.arima
is used to generate the ARIMA model.
To check the summary of our best fit ARIMA model, we use;
summary(arima_mode)
We then check for any residual in our ARIMA model, and judging by the Ljung-Box test, we conclude that the p-value > 0.05 (insignificant). This means that the model’s residuals are independent and not auto correlated. Which means we don't have to do volatility modeling using models like Garch, commonly used on financial data with heteroscedasticity problem.
Assuming that ARIMA (0,0,2) is our model, days ahead shows a straight line indicating that our ARIMA model fits well as it is supposed to follow a normal distribution and should be stationary.
The plot of our Residuals from our ARIMA model shows our forecast for 100.
Fitting the ARIMA model and forecasting
Now, to fit the model into the training data set, we use;
arima<-arima(train_data, order=c(0, 0, 2)
summary(arima)
Now, we can make our forecast for the next 100 days using the forecast
package with h=100.
And we can plot our forecast using plot(forecast)
.
And then check residuals in our model using checkresiduals(arima)
.
Our forecast will be:
This shows that in the next 100 days, there will be a rise in AMAZON’s stock prices with a slight downward movement in the next few days and then an almost steady rise.
Now that the investor knows the expected trends for the next 100 days in Amazon stock, he/she can make the right decision in buying and selling to maximize profits and avoid losses. You can make a prediction of your desired period of time by using h="time"
.
Conclusion
We learned how to predict Amazon stock prices using R programming language, perform financial modeling, and then use time series models in forecasting. There are various automated functions that can fit into models, which will give accurate results when fed with enough data.
You can go ahead and learn more about how to perform forecasting in R and Python using the resources below.
Happy coding!
Further reading
- Time series forecasting in R
- Pluralsight time series forecasting in R
- A guide to forecasting in R
- Forecasting
Peer Review Contributions by: Jethro Magaji