Sales Forecasting with Prophet in Python
Prophet is a library developed by Facebook that is ideal for performing time series forecasting. It is used to forecast anything that has a time series trend, such as the weather and sales. <!--more--> This tutorial will leverage this library to estimate sales trends accurately. We will use the Python programming language for this build.
Prerequisite
To follow along, you need to be familiar with:
Outline
- Installing and importing required dependencies
- Loading data into our notebook
- Data preprocessing
- Training the time series model
- Making predictions and evaluating performance
- Wrapping up
- Further reading
Installing and importing required dependencies
Let's begin by installing the model.
!pip install prophet
After installing it, we need to import it into our notebook.
import pandas as pd
from prophet import Prophet
pandas
allows us to bring in tabular data.prophet
allows us to import the Prophet library into our Google Colab.
Let's bring our data into the notebook. We will use store sales transaction data from Kaggle.
The dataset includes dates, store and product information, and sales numbers. It contains four years' worth of sales data sold at Favorita stores located in Ecuador. You'll need to download the data and upload it into your Colab.
Loading data into our notebook
We will use the pandas
library to read in our csv
file.
dataframe = pd.read_csv('transactions.csv')
We load in our data and save it inside a variable called dataframe
. We can check the first five rows of data using the pandas head()
method.
You can use the tail()
method to check the last five rows.
dataframe.head()
Output:
date store_nbr transactions
0 2013-01-01 25 770
1 2013-01-02 1 2111
2 2013-01-02 2 2358
3 2013-01-02 3 3487
4 2013-01-02 4 1922
Let's take a look at the data types for these columns.
dataframe.dtypes
Output:
date object
store_nbr int64
transactions int64
From these results, we can see that the date column is a string. The model cannot accept it as it is. It needs to be converted into a date-time format for it to work with the model.
Let's perform some preprocessing.
It's important whenever you're working with time-series data that you have a date or timestamp column. It is a requirement by the Prophet model to forecast trends.
Data preprocessing
Using the Pandas to_datetime()
function, we will convert the date column from a string to a date-time format.
dataframe ['date'] = pd.to_datetime(dataframe ['date'])
dataframe.dtypes
Output:
date datetime64[ns]
store_nbr int64
transactions int64
We have converted our date column into a date-time format.
We need to drop the store_nbr
column. Besides, for this data to work with the Prophet model, we only need two columns, i.e, a ds
and y
column. We need to rename our date column to ds
and the transactions column to y
.
dataframe.drop('store_nbr', axis=1, inplace=True)
We are dropping only the store_nbr
column. The axis=1
argument tells Pandas library to drop the columns and not the rows.
Output:
date transactions
0 2013-01-01 770
1 2013-01-02 2111
2 2013-01-02 2358
3 2013-01-02 3487
4 2013-01-02 1922
We now have only two columns. As mentioned above, we need to rename the data column to ds
and transactions columns to y
.
dataframe.columns = ['ds', 'y']
dataframe.head()
Output:
ds y
0 2013-01-01 770
1 2013-01-02 2111
2 2013-01-02 2358
3 2013-01-02 3487
4 2013-01-02 1922
Using the command above, we have successfully renamed our columns. That's the last of the preprocessing step. We can now go ahead and create the time series model.
Training the time series model
We begin by creating an instance p
of the Prophet class.
p = Prophet(interval_width=0.92, daily_seasonality=True)
We use the interval_width
argument to estimate the uncertainty interval from the number of samples used. We've set ours to 0.92
.
The argument daily_seasonality=True
will fit daily seasonality for a sub-daily time series. It will default to weekly and yearly seasonalities if you don't set this parameter.
You can play around with these values to check how it affects the results obtained after training.
We can now train our model.
model = p.fit(dataframe)
After running the command above, the model will be trained on the data.
Making predictions and evaluating performance
Let's go ahead and make predictions.
future = p.make_future_dataframe(periods=200, freq='D')
future.tail()
Output:
ds
1877 2018-02-27
1878 2018-02-28
1879 2018-03-01
1880 2018-03-02
1881 2018-03-03
From the results, we can see that the model has made future predictions, 200
days away from the last data value using a daily frequency. If you want to train for longer periods, you can change the value in the periods=200
argument.
To predict, we use the predict()
method and pass in the future dataframe as shown:
forecast_prediction = p.predict(future)
forecast_prediction.tail()
From the results generated, the model has generated a lot of sales information in addition to the predicted ds
and yhat
column. The most important column is the yhat
column, as it is what represents your sales forecast.
We can visualize these predictions.
plot1 = p.plot(forecast_prediction)
If you take a keen look at the plot, you'll notice that the predicted sales trend mimics the actual data's trend. We could take this plotting even a step further and plot the individual components that make up the above plot.
plot2 = p.plot_components(forecast_prediction)
This plot could give you a lot more information about the sales data. For instance, more sales are made between Friday and Monday. Also, they seem to make a lot of sales between November and February. During the rest of the year, sales are average.
You can find the complete code for this tutorial here.
Wrapping up
That's sales forecasting using the Prophet model in a nutshell.
This tutorial introduces you to time series forecasting using Prophet. It should only introduce you to how to use the model in a project, and is in no way to be used for production purposes.
To use the model for production, you'll need to do more research on it. You can also read about the Neural Prophet library. It is an extension of Prophet that adds neural networks to the mix.
Further reading
Peer Review Contributions by: Wilkister Mumbi