TIME SERIES AND AUTOREGRESSION

Motivation

Time series models are useful while working with serially correlated data. Most business houses work on time series data to analyze sales number (for the next year), website traffic, competition position and much more.

Some other important application spheres include:

Economics: monthly data for unemployment, hospital admissions, etc.
Finance - daily exchange rate, a share price, etc
Environmental - daily rainfall, air quality readings.
Medicine - e.g., ECG brain wave activity every 2^-8 secs

We will consider in detail the the AirPassengers dataset while discussing time series in detail. You can find the dataset here:

https://www.kaggle.com/rakannimer/air-passengers/data

It is a simple dataset with just two columns and we will be working on Python all the while. As evident from the figure below, the data gives information about the number of passengers travelling each month.

This is how the data head looks like:

The trends look like:

We see a definite rise in the overall passenger traffic as well as a seasonal nature

Few Important Terms

Trends:
Long term movement of mean
Seasonality:
Cyclic fluctuations in calendar.
For example, there is seasonality in the monthly data, where high values always tend to occur in some particular months and low values always tend to occur in other particular months. In case, S = 12 (months per year) is the span of the periodic seasonal behavior. For quarterly data, S = 4 time periods per year.
Schocastic Processes:
A stochastic or random process is a mathematical object usually defined as a collection of random variables
Stationary Process:
- Strictly Stationary: Sequence {Xt ,t} is strictly stationary, if:
- Weakly Stationary: A process is weakly stationary or second order stationary if:
  Where, E is the expectation function
  Basically, the mean and variance should not be a function of time (property of constant variance is also called homoscedasticity), i.e. In the first figure, the mean of the series is a function of time while in the second series the variance of the series changes over time i.e. the width between peaks and troughs.
White Noise:
A sequence of independent random variables with zero mean and variance 𝝈 2 is called white noise. It is a weakly stationary series with 𝜸0=𝝈 2 and 𝜸k=0 (k≄0)
Co-Variance:
A statistical measure of variation or association of one variable X with the other, Y
Where E[X] is the expectation of X, also known as mean of random variable X cov(X,Y) is also denoted by 𝝈(X,Y)
The above equation can be simplified using linearity property of Expectation function
Also,
Autocovariance:
Autocovariance is the covariance between a stochastic process at different times.
Autocorrelation:
The autocorrelation function is defined as ratio of 𝜸(k) and 𝜸(0)
Autocorrelation and Autocovariance Function:
The autocovariance function of a weakly stationary process is a capture of variation of autocovariance with k. It can be expressed as f(k)= 𝜸(k). For example, the autocovariance function of a stochastic process
Where ut is the white noise (WN(0,𝝈 2 )).
The autocovariance function for the above process is given by:
- Note the autocovariance cuts off after lag 1
- Autocorrelation function can be similarly modelled , ⲣ(k) is used instead of 𝜸(k)
- Autocovariance Matrix
- Autocorrelation Matrix
Partial Autocorrelation Function (PACF):

Partial correlation is different from variance in terms that it is a measure of association between two random variables, with the effect of one set of contributing variables removed. For example, we have data on the consumption, income, and wealth of various individuals and we wish to see if there is a relationship between consumption and income, failing to control for wealth. When we compute a correlation coefficient between consumption and income, the result would be misleading. Since income might be numerically related to wealth, which in turn might be numerically related to consumption; a measured correlation between consumption and income might actually be contaminated by these other correlations. The use of a partial correlation helps avoid this problem.

Partial Autocorrelation is the partial correlation of a time series with its own lagged values, controlling for the values of the time series at all shorter lags.

The Partial Autocorrelation function(PACF) plays an important role in identifying the lag of an autoregressive process. The use of this function was introduced as part of the Bob-Jenkins approach to time series modelling.

Going into the mathematical insights of PACF is out of the scope for this text. Only the plots of PACF are useful in understanding the order of AR(p) and ARMA(p,q) process.

General Linear Process:

Ѱ weights

It is a representation of a stochastic process as the output from a linear filter, whose input is white noise at :

where:

It allows us to represent the process as a weighted sum of present and past values of the white noise process at . The following properties of the white noise should be noted:

Π weights

An alternative way of modelling a linear process would be by representing the current deviation as a weighted sum of the previous deviations zt-1 ,zt-2 ,zt-3….

Relation between Ѱ and Π weights is given by

Where B is the backshift operator

AR Models- AR(p) Process

An autoregressive process of order p is represented as

Where B is the backshift operator and 𝝓1 ,𝝓2 ,𝝓3 ,𝝓4 ,𝝓5…. are adjustable parameters

Mathematically, we compare an AR(p) process with a general linear process to check stationarity, by imposing the condition:

For an AR(1) process the maths works like:

Which can written as:

The autocorrelation function of an AR(p) Process dies down eventually
The PACF function for an AR(p) Process cuts off

Autocorrelation Plot:

Partial Autocorrelation Plot:

The partial autocorrelation function is represented by 𝝓kk k=1,2,3…. For an AR process solving for k=1,2,3.. We obtain

For an AR(p) process 𝝓kk =0 for k>p and non zero for k ≤ p
Basically the PACF cuts off after lag p

The above figure is a plot of an estimated PACF with two standard error limits assuming the model is AR(1). Since E[𝝓22 ] is also significant, there is a possibility of the process being AR(2). We can make further investigations to clear our doubts.

MA(q) Models : Moving Average Models

An MA(q) process looks like:

Since θB is a finite set, on comparing an MA process to a general linear process we always have the summation of ᴪ(B) converging. Hence an MA process is always stationary. We have to work to establish invertibility of an MA process.

The autocorrelation function of an MA(q) process cuts off after q
While the PACF dies down gradually

Autocorrelation Function:

Consider an MA(1) process with θ1=0.7

xt = 10 + wt + .7wt-1

The theoretical and practical ACF plot might look dissimilar but the trend follows an overall similar pattern as PACF for an AR process.

For an MA(2) process

Partial Autocorrelation Function:

PACF for an MA(q) process gradually tapers towards zero in some manner

The PACF plot looks like

ARMA(p,q) Process: AutoRegressive Moving Average

It is just the linear combination of the above two processes studied

General Equation:

ARIMA(p,d,q) Models- Autoregressive Integrated Moving Average Models

It is just a generalization of the ARMA models. ARIMA models are applied in some cases where data shows evidence of non-stationarity, where an initial differencing step (corresponding to the "integrated"part of the model) can be applied one or more times to eliminate the non-stationarity.

Non- seasonal ARIMA Models:

If we combine differencing with autoregression and a moving average model, we obtain a non-seasonal ARIMA model. ARIMA is an acronym for AutoRegressive Integrated Moving Average model (“integration” in this context is the reverse of differencing). The full model can be written as

where yt is the differenced series (it may have been differenced more than once). The “predictors” on the right hand side include both lagged values of yt and lagged errors. We call this an ARIMA(p,d,q) model, where

Implementation and Model Identification:

Remember: The trends of ACF and PACF would play a key role in choosing a model

Implementation on our Case Study:

1) Visualizing and deriving Inferences

The above is a data visualization from the AirPassengers dataset, capturing their travel frequencies against time. Such a plot clearly demonstrates seasonality and non-stationarity(since the variance or the width of the peaks keep on increasing with time).

The data seems to have a trend and seasonality. We will try to break it down into four components. To make our lives easier, Python has a library called statsmodels. It would break the data down into an additive model or a multiplicative model.

Additive Model:

It works on the principle where the models looks like:

y(t)=Level+Trend+Seasonality+Noise

An additive model is linear where changes over time are consistently made by the same amount
A linear trend is a straight line
A linear seasonality has the same frequency (width of cycles) and amplitude (height of cycles)

Multiplicative Model:

y(t)=Level*Trend*Seasonality*Noise

A multiplicative model is nonlinear, such as quadratic or exponential. Changes increase or decrease over time.
A nonlinear trend is a curved line.
A non-linear seasonality has an increasing or decreasing frequency and/or amplitude over time

From the above plot we can see that it is clearly a Multiplicative model, as it is non linear, changing nearly as a power of 2.

Real-world problems are messy and noisy. There may be additive and multiplicative components. There may be an increasing trend followed by a decreasing trend. There may be non-repeating cycles mixed in with the repeating seasonality components.

Nevertheless, these abstract models provide a simple framework that you can use to analyze your data and explore ways to think about and forecast your problem.

1. A clear inference is the seasonality is at least 12 months.
2. The year on year trend show the number of passengers are increasing without fail.
3. July and August have a significantly higher seasonal passenger traffic compared to other months.

2. Stationarize the series

When modeling, there are assumptions that the time series we are dealing with is stationary. In reality, this assumption can be easily violated in time series by the addition of a trend, seasonality, and other time-dependent structures.
Once we know the patterns, trends, cycles and seasonality , we can check if the series is stationary or not. Dickey – Fuller is a popular test to check the same. You can read up on the maths behind Dickey Fuller test; for now we will cover implementation and inferences

Augmented Dickey Fuller Test:

Statistical tests make strong assumptions about your data. They can only be used to inform the degree to which a null hypothesis can be accepted or rejected. The result must be interpreted for a given problem to be meaningful.

Nevertheless, they can provide a quick check and confirmatory evidence that your time series is stationary or non-stationary.

The Augmented Dickey-Fuller test is a type of statistical test called a unit root test. The intuition behind a unit root test is that it determines how strongly a time series is defined by a trend.

There are a number of unit root tests and the Augmented Dickey-Fuller is one of the more widely used . It uses an autoregressive model and optimizes an information criterion across multiple different lag values.

The null hypothesis of the test is that the time series can be represented by a unit root, that it is not stationary (has some time-dependent structure). The alternate hypothesis (rejecting the null hypothesis) is that the time series is stationary.

● Null Hypothesis (H0): If accepted, it suggests that the time series has a unit root, meaning it is non-stationary. It has some time dependent structure.
● Alternate Hypothesis (H1): The null hypothesis is rejected; it suggests the time series does not have a unit root, meaning it is stationary. It does not have a time-dependent structure.

We interpret this result using the p-value from the test. A p-value below a threshold (such as 5% or 1%) suggests we reject the null hypothesis (stationary), otherwise a p-value above the threshold suggests we accept the null hypothesis (non-stationary).

● p-value > 0.05: Accept the null hypothesis (H0), the data has a unit root and is non-stationary.
● p-value <= 0.05: Reject the null hypothesis (H0), the data does not have a unit root and is stationary.

Below is an example of calculating the Augmented Dickey-Fuller test, its implementation on our dataset

Since the p-value > 0.05 we accept the null hypothesis, hence the series is non-stationary

This doesn’t ends here! Now we know that the series is non-stationary, we have to use some techniques to make it stationary

There are three commonly used techniques to make a time series stationary:

1. Detrending : Here, we simply remove the trend component from the time series. For instance, the equation of my time series is:

x(t) = (mean + trend * t) + error

We’ll simply remove the part in the parentheses and build a model for the rest.

2. Differencing : This is the commonly used technique to remove non-stationarity. Here we try to model the differences of the terms and not the actual term. For instance,

x(t) – x(t-1) = ARMA (p , q)

This differencing is called as the Integration part in AR(I)MA. Now, we have three parameters
p : AR
d : I
q : MA

3. Seasonality : Seasonality can easily be incorporated in the ARIMA model directly. More on this has been discussed in the applications part below.

Before implementing any of above processes on our data we should address the issue of unequal variances(dealing with trends). We deal with this using log operation on the series. Trends can be dealt by other mathematical operations like sq. root, cube root, log etc.

Then we resort to differencing to deal with seasonality.

As after our operations, we have brought down the p-value to a lesser level, though not 0.05, we can reject our null hypothesis with certain confidence interval(would need to consult the p-value table for exact confidence interval), hence the series is now more stationary.

Finding Optimal Parameters:

The parameters p,d,q can be found using ACF and PACF plots. In addition to this approach if both ACF and PACF decrease gradually, it indicates that we need to make the time series stationary and we introduce a value to “d”

The ACF and PACF plots clearly don’t follow the standard plots for AR and MA models. Hence the model best fitting here would most probably be an ARIMA model.

The next question is to determine the order of AR and MA involved.

1. p – The lag value where the PACF chart crosses the upper confidence interval for the first time. If you notice closely, in this case p=2.

2. q – The lag value where the ACF chart crosses the upper confidence interval for the first time. If you notice closely, in this case q=2.

Building Model

We would consider the RSS (Residual Sum of Squares) of all the 3 possibilities, i.e.:

ARIMA(2,1,0)
ARIMA(0,1,2)
ARIMA(2,1,2)

ARIMA(2,1,0)

ARIMA(0, 1, 2)

ARIMA(2, 1, 2)

Evidently among the three models, ARIMA(2,1,2) gives the minimum RSS and hence is most suitable

Making Predictions:

We need to make sure that we undo all the operations we did on the series to remove trends and seasonality. The RSS on our modified series and the original might vary, but we can’t help that!

Finally we can check the RSS on our original data. Go ahead see how well does your data fit!

Motivation

Few Important Terms

Trends:

Seasonality:

Schocastic Processes:

Stationary Process:

White Noise:

Co-Variance:

Autocovariance:

Autocorrelation:

Autocorrelation and Autocovariance Function:

Autocovariance Matrix

Autocorrelation Matrix

Partial Autocorrelation Function (PACF):

General Linear Process: