arrow left
Back to Developer Education

Time Series Analysis and Forecasting using Auto Time Series

Time Series Analysis and Forecasting using Auto Time Series

A time series is a sequence of data points that occur in successive order over time. It shows all the data set variables that change over time. <!--more--> Time series analysis extracts meaningful patterns and attributes from the historical data. It enables the model to gain knowledge and identify trends in the dataset.

Time series builds a model that predicts future values based on historical data. The model can forecast forex exchange rates, stock prices, weather, and Covid-19 caseload. In stock prediction, a time series model tracks the movement of stock prices, such as Apple stock. Accurate predictions of the model will yield profit to the investors.

In this tutorial, we will build an electricity consumption prediction model. We will use Auto Time Series library (Auto-TS) to train the model.

Table of contents

Prerequisites

To easily understand this article, a reader should:

Getting started with Auto Time Series library

Auto Time Series (Auto-TS) is an open-source Python library to automate time series analysis and forecasting. It trains high-accuracy models within a short time. Auto-TS automatically runs multiple time series models on the training dataset. It then automatically selects the best model from all the models.

There are different types of time series models. The most common models that Auto Time Series runs are as follows:

All the listed models above support time series analysis and forecasting. Auto-TS chooses the best model based on its accuracy score and predictions made. We will then plot a line graph to show the forecast values.

Benefits of using Auto Time Series library

  • It performs automated dataset preprocessing. It will automatically transform the input dataset into a format the model can use. It removes noise and unnecessary information in the dataset.

  • It can handle missing values and outliers. Auto-TS handles the missing values to ensure we have a complete dataset. It also removes outliers that are not within the dataset range.

  • It trains high-accuracy models. Auto-TS produces reliable and accurate models.

  • It selects the optimal time series model. Auto-TS automatically runs multiple time series models listed above. It then automatically selects the optimal model. This model will give the most accurate results.

  • Automatic hyperparameter tuning and configurations. Auto-TS automatically fine-tunes the model parameters. It ensures the model gives the best accuracy score.

Installing Auto Time Series library

To install the Auto Time Series library, run this command:

!pip install auto_ts

We import this using this code:

import auto_ts as AT

Let's now start working with our dataset.

Working with the dataset

We will use the electricity consumption dataset to train the model. The dataset shows the monthly electricity consumption of an individual household from 2016-01-01 to 2020-05-01. You can download the electricity consumption dataset here.

The dataset output:

Electricity consumption dataset

From the image above, the dataset has six columns:

  • Bill_Date: It shows the date on which the billing period ends.

  • On_peak: It is the electricity consumption during the peak season.

  • Off_peak: It is the electricity consumption during the off-peak season.

  • Usage_charge: It is the total cost of electricity consumption without the tax.

  • Billed_amount: It is the total cost of electricity consumption and the tax.

  • Billing_days: It shows the number of days within the billing period.

We need to convert the Bill_Date column to the DateTime format. The DateTime format is the format Auto Time Series understands. It also enables us to perform time-series operations on this column.

We will use the Python Datetime module.

from datetime import datetime

Let's create a Python function to convert the Bill_Date column to the DateTime format.

def parse(x):
    return datetime.strptime(x, '%m/%d/%Y')

We will call the function when loading the dataset.

Loading the dataset

We will load the dataset using Pandas.

import pandas as pd

To load the dataset and also convert the Bill_Date column to the DateTime format, use this code:

df = pd.read_csv('/content/electricity_consumption.csv', parse_dates = ['Bill_Date'], date_parser=parse)

To see the loaded dataset, use this code:

df

The output of the dataset:

Dataset output

To check the dataset information, use this code:

df.info()

The output:

Dataset information

From the output, the dataset has 53 entries. Also, there are no missing values.

Let's make the Bill_Date the index column.

ec_df = df.set_index('Bill_Date')

To see the dataset with Bill_Date as the index column, use this code:

ec_df.head()

The dataset output:

Dataset

Selecting the dependent variable

The dependent variable is the variable that the model will predict. This variable changes with time. The dependent variable is the Billed_amount.

ec_data = ec_df['Billed_amount']

Plotting the line graph

We will plot the line graph that shows the data points using Matplotlib. Let's import Matplotlib.

import matplotlib.pyplot as plt

To plot the line graph, use this code:

ec_data.plot(grid=True)

The line graph output:

Dataset

The image shows the Billed_amount and the Bill_Date from 2016 to 2020.

Let's plot a line graph to show electricity consumption for 2019.

Line graph for 2019

To plot the line graph, use this code:

ec_df_2019=ec_df.loc['2019']
ec_data_2019=ec_df_2019['Billed_amount']
ec_data_2019.plot(grid=True)

The output:

Line graph for 2019

From the image above, the highest energy consumption was for September. We can also plot a bar graph to show electricity consumption for 2019.

Bar graph for 2019

To plot the bar graph, use this code:

ec_df_2019=ec_df.loc['2019']
ec_data_2019=ec_df_2019['Billed_amount']
ec_data_2019.plot.bar()

The output:

Bar graph for 2019

The bar graph shows the highest energy consumption was in September.

Creating a copy of the dataset

We will use this copy of the dataset to train the model.

final_df = df.copy()
final_df=final_df[['Bill_Date','On_peak','Off_peak','Billed_amount','Billing_days']]

Splitting the dataset

We will split the dataset into two sets. One set for model training and the other for model testing.

train = final_df[:50]
test = final_df[50:]

The first 50 entries/data points will train the model. The remaining entries will test the model.

Let's print the shape of the train and test datasets.

print(train.shape, test.shape)

The output:

(50, 5) (3, 5)

Selecting the timestamp and the target columns

The Auto Time Series model expects an input dataset with timestamp and target columns. The timestamp column contains the DateTime of the time series. The target column has the time series values (data points). The model will learn from these columns.

ts_column = 'Bill_Date'
sep = ','
target = 'Billed_amount'

The Bill_Date is the timestamp column, and the Billed_amount is the target column. Also, our dataset is comma-separated.

Initializing the Auto Time Series model

We initialize the Auto Time Series model using the following code:

ml_dict = AT.Auto_Timeseries(train, ts_column,
                            target, sep,  score_type='rmse', forecast_period=6,
                            time_interval='Months', non_seasonal_pdq=None, seasonality=True,
                            seasonal_period=12,seasonal_PDQ=None, model_type='best',
                            verbose=2)

The Auto Time Series model has the following parameters:

  • train: It contains the training set. These are the first 50 entries/data points that trains the model.

  • ts_column: It contains the DateTime of the time series.

  • sep: It specifies the dataset format. Our dataset is comma-separated values (CSV).

  • score_type: It is the scoring metrics for the model. We use the Root Mean Square Error (RMSE). RMSE calculates the error of a model when making predictions. It indicates the absolute fit of the model to the data – how close the observed data points are to the predicted values.

  • forecast_period: It shows the number of months the model will predict. The model will make predictions for the next six months.

  • time_interval: It shows the time interval of the time series. It can be in minutes, hourly, daily, monthly, or yearly. Our dataset has monthly intervals.

  • non_seasonal_pdq: It contains the parameters that train the Non-Seasonal ARIMA model.

  • seasonality: It handles the periodic changes in the time series that occur within a given time. Seasonality shows a regular pattern within the dataset.

Seasonality can be daily, weekly, or yearly. Our dataset has monthly seasonality. In our dataset, the highest energy consumption occurs during September. It keeps on repeating during this month for all the years. It is because of the seasonality effect.

  • seasonal_period=12: It shows the monthly seasonality.

  • seasonal_PDQ=None: It contains the parameters that train the Seasonal SARIMAX Model.

  • model_type='best: It shows the types of models that Auto Time Series will use for training. We set the values to best so that Auto Time Series will run multiple time series models and select the best one.

When you execute the code above, Auto Time Series will run multiple time series models and produce the following outputs:

Running Facebook Prophet Model

Running Facebook Prophet Model

Running PyFlux Model

Running PyFlux Model

Running Non-Seasonal ARIMA Model

Running Non-Seasonal ARIMA Model

Running Seasonal SARIMAX Model

Running Seasonal SARIMAX Model

Running VAR Model

Running VAR Model

Running Machine Learning Models

Running Machine Learning Models

Showing time series components

Running Machine Learning Models

It shows the overall trend of the time series data and the seasonality in the dataset.

Original time series

Original time series

Histogram of original time series

Histogram of original time series

After the Auto Time-series automatically runs, it selects the best model.

Selecting the best model

Auto Time Series will select the best model with the lowest RMSE score. It shows the model with the lowest error when making predictions.

The best model is:

Best model

From the image above, the best model is Facebook Prophet. It also shows an array of actual and forecast values. The model has an RMSE score of 39.91. It indicates the model can make accurate predictions.

Finally, Auto-TS will plot a line graph to show the actual and the forecast values.

Actual vs Forecast values

The line graph output:

Actuals vs Forecast Values

From the image above, the red line shows the actual values. The green line shows the forecast values. The model has made predictions for the next six months.

Conclusion

We have learned how to perform time series analysis and forecasting using the Auto Time Series library. The tutorial shows the models that Auto Time Series runs. We also discussed the benefits of the Auto Time Series and how to install it. We used the Auto Time Series library to build an electricity consumption model. It selected Facebook Prophet as the best model. It had the lowest RMSE score and made predictions for six months.

To get the Python code in Google Colab, use this link.

References


Peer Review Contributions by: Wilkister Mumbi

Published on: Mar 21, 2022
Updated on: Jul 15, 2024
CTA

Start your journey with Cloudzilla

With Cloudzilla, apps freely roam across a global cloud with unbeatable simplicity and cost efficiency