How to Check Time Series Stationarity in Python

You can use visual inspection, global vs. local analysis, and statistics to analyze stationarity. The Augmented Dickey-Fuller (ADF) test is the most commonly used parametric test, and the Zivot-Andrews test is better than the ADF at detecting stationarity through structural breaks.

This post assumes you understand what stationarity is and why it’s important.

Note: I use matplotlib plots for analysis and Plotly charts for presentation in this post. The complete code for this post is on the Analyzing Alpha GitHub Repository.

Visual Inspection

It’s easy to see if a process is creating a stationary time series. If we see the mean or the distribution changing, it’s non-stationary.

Let’s first create a random walk model using Python, and then we’ll plot it in Plotly.

# Grab imports
import numpy as np
import pandas as pd

# Create index
periods = 2000
vol = .002
dti = pd.date_range("2021-01-01",
                    periods=periods,
                    freq="H")

# Generate normal distribution
random_walk = pd.DataFrame(index=dti,
                   data=np.random.normal(size=periods)  * vol)

# Plot it using a custom function
make_plot(random_walk,
          "Stationary "Random Walk" Time Series").show()

We can identify a stochastic process by analyzing a time series plot and gain an understanding of if there’s a shift in the mean or distribution through time. This is called global vs. local analysis, which we’ll discuss further below.

However, in practice, it’s trivial to use more advanced plot functions to identify a stationary process.

Decomposition

Time series decomposition separates the signal, trend, seasonality, and error automatically. Let’s decompose Apple’s revenue. We can see the clear seasonality and trend in this non-stationary data.

from statsmodels.tsa.seasonal import seasonal_decompose
sd = seasonal_decompose(apple_revenue_history)
sd.plot()

Autocorrelation Function

The autocorrelation and partial-autocorrelation functions analyze a data set for statistical significance between the first data point and prior data points.

A non-stationary time series data will show significance between itself and its lagged values, and that significance will decay to zero slowly as in the first plot.

The second plot, a stationary time series, will quickly drop to zero. The red lines are confidence intervals where values above or below the lines are more significant than two standard deviations.

from statsmodels.graphics.tsaplots import plot_acf
plot_acf(df2, lags=40, alpha=0.05)
plot_acf(df2.diff().dropna(), lags=40, alpha=0.05)

Sometimes it’s evident after visual inspection that we have stationary or non-stationary data. But even in these circumstances, we want to test our hypothesis using statistical tests to understand our series better.

Global vs. Local Analysis

We can break our time series into multiple segments and analyze the summary statistics of each against the time series or another partition to see if our time series data is changing through time.

def get_quads(df):
    quadlen = int(len(df1) * 0.25)
    ss = df[:quadlen].describe()
    ss[1] = df[quadlen:quadlen*2].describe()
    ss[2] = df[quadlen*2:quadlen*3].describe()
    ss[3] = df[quadlen*3:].describe()
    return ss

Random Walk

Notice how the random walk time series has a constant mean, variance, and the distribution between the min, percentages, and max are relatively consistent (no seasonality)?

random_walk = pd.DataFrame(index=dti,
                   data=np.random.random(size=periods) * vol)
get_quads(random_walk)
   0           1           2           3
count  500.000000  500.000000  500.000000  500.000000
mean     0.000136   -0.000054    0.000070   -0.000216
std      0.002067    0.001909    0.002008    0.002020
min     -0.005851   -0.005614   -0.006003   -0.006325
25%     -0.001263   -0.001344   -0.001262   -0.001492
50%      0.000095   -0.000094    0.000045   -0.000229
75%      0.001503    0.001225    0.001331    0.001073
max      0.007002    0.006702    0.006338    0.005969

Trend Stationary

You can see the mean is increasing in the trending time series.

trending = pd.DataFrame(index=dti,
                   data=np.random.random(size=periods) * vol).cumsum()
get_quads(trending)
                0           1           2           3
count  500.000000  500.000000  500.000000  500.000000
mean     0.253949    0.754164    1.245338    1.752112
std      0.148433    0.138083    0.153263    0.144376
min      0.001630    0.509963    0.989912    1.509835
25%      0.122746    0.637347    1.111457    1.629752
50%      0.261204    0.749095    1.243160    1.748031
75%      0.376252    0.878295    1.376870    1.879781
max      0.508350    0.989398    1.509197    2.000878

Volatile Time Series

The standard deviation is changing in the volatile time series.

varying = pd.DataFrame(index=dti,
                   data=np.random.normal(size=periods) * vol \
              * np.logspace(1,5,num=periods, dtype=int))
              0           1           2           3
count  500.000000  500.000000  500.000000  500.000000
mean     0.000510    0.082075   -0.206946   -1.990338
std      0.094116    0.935861   10.263379   99.674838
min     -0.664262   -3.887168  -46.277549 -472.434713
25%     -0.039151   -0.280147   -4.025722  -45.672561
50%     -0.001081    0.067228   -0.068825   -4.059768
75%      0.037203    0.476481    3.295115   35.828933
max      0.424776    5.000719   61.889090  455.791059

Seasonal Time Series

You can identify seasonality by analyzing the distribution through min, max, and the percentages in between. Notice how the 50% area is changing.

def simulate_seasonal_term(periodicity, total_cycles, noise_std=1.,
                           harmonics=None):
    duration = periodicity * total_cycles
    assert duration == int(duration)
    duration = int(duration)
    harmonics = harmonics if harmonics else int(np.floor(periodicity / 2))

    lambda_p = 2 * np.pi / float(periodicity)

    gamma_jt = noise_std * np.random.randn((harmonics))
    gamma_star_jt = noise_std * np.random.randn((harmonics))

    total_timesteps = 100 * duration # Pad for burn in
    series = np.zeros(total_timesteps)
    for t in range(total_timesteps):
        gamma_jtp1 = np.zeros_like(gamma_jt)
        gamma_star_jtp1 = np.zeros_like(gamma_star_jt)
        for j in range(1, harmonics + 1):
            cos_j = np.cos(lambda_p * j)
            sin_j = np.sin(lambda_p * j)
            gamma_jtp1[j - 1] = (gamma_jt[j - 1] * cos_j
                                 + gamma_star_jt[j - 1] * sin_j
                                 + noise_std * np.random.randn())
            gamma_star_jtp1[j - 1] = (- gamma_jt[j - 1] * sin_j
                                      + gamma_star_jt[j - 1] * cos_j
                                      + noise_std * np.random.randn())
        series[t] = np.sum(gamma_jtp1)
        gamma_jt = gamma_jtp1
        gamma_star_jt = gamma_star_jtp1
    wanted_series = series[-duration:] # Discard burn in

    return wanted_series

duration = 100 * 3
periodicities = [10, 100]
num_harmonics = [3, 2]
std = np.array([2, 3])
np.random.seed(8678309)

terms = []
for ix, _ in enumerate(periodicities):
    s = simulate_seasonal_term(
        periodicities[ix],
        duration / periodicities[ix],
        harmonics=num_harmonics[ix],
        noise_std=std[ix])
    terms.append(s)
terms.append(np.ones_like(terms[0]) * 10.)
seasonal = pd.DataFrame(index=dti[:duration],data=np.sum(terms, axis=0))
                0            1            2            3
count    75.000000    75.000000    75.000000    75.000000
mean    328.055459  -280.450561  -149.840302   120.579444
std     733.650363   750.299351   958.572453  1007.596395
min    -863.464617 -1694.424423 -1815.786373 -1733.291362
25%    -239.600008  -770.199418  -828.906000  -694.550888
50%      72.425299  -346.127307  -405.201792    59.358360
75%     923.255524    59.875525   485.926284   945.035545
max    1813.015082  1642.090923  1916.794045  1832.084148

Parametric Tests

Statistical tests identify specific types of stationarity. Most parametric tests analyze if the time series has a unit root, which is a systematic pattern that is random in the presence of serial correlation — which is a fancy way of saying a lagged version of itself. The reason it’s called a unit root test is due to the math of the process.

Testing if the time series has a unit root and is therefore not stationary is the null hypothesis, which is a fancy way of saying the commonly accepted belief. We want to reject the null hypothesis with a level of certainty to state the time series is stationary — or more accurately, the process generating the time series is stationary.

Let’s look at the most common unit root tests.

Dickey-Fuller

The Dickey-Fuller test is the first statistical test that analyzes if a unit root exists in an autoregressive model of a time series.

It runs into issues with serial correlation, which is why there’s an Augmented Dickey-Fuller test.

Augmented Dickey-Fuller (ADF)

The Augmented Dickey-Fuller tests for a unit root in a univariate process in the presence of serial correlation. The ADF test handles more complex models and is the typical go-to for most analysts.

Non-Stationary Process

Let’s use the ADF test on Apple’s revenue, which is non-stationary.

from statsmodels.tsa.stattools import adfuller
t_stat, p_value, _, _, critical_values, _  = adfuller(df6['observed'].values, autolag='AIC')
print(f'ADF Statistic: {t_stat:.2f}')
for key, value in critical_values.items():
     print('Critial Values:')
     print(f'   {key}, {value:.2f}')
ADF Test Statistic: -2.12
Critial Values:
   1%, -3.69
Critial Values:
   5%, -2.97
Critial Values:
   10%, -2.63

You can compare the ADF test statistic of -2.12 against the critical values. We see that the test statistic is less than all of the critical values, so we cannot reject the null hypothesis — in other words, as we already knew, Apple’s revenue is non-stationary; we see trends, and it’s mean and variance are changing.

Stationary Process

Now let’s do the same for the random walk data, which exhibits stationarity.

from statsmodels.tsa.stattools import adfuller
result = adfuller(random_walk[0].values, autolag='AIC')
t_stat, p_value, _, _, critical_values, _  = adfuller(random_walk[0].values, autolag='AIC')
print(f'ADF Statistic: {t_stat:.2f}')
for key, value in critical_values.items():
     print('Critial Values:')
     print(f'   {key}, {value:.2f}')

print(f'\np-value: {p_value:.2f}')
print("Non-Stationary") if p_value > 0.05 else print("Stationary")
ADF Statistic: -43.83
Critial Values:
   1%, -3.43
Critial Values:
   5%, -2.86
Critial Values:
   10%, -2.57

p-value: 0.00
Stationary

Notice that this time I’ve included the p-value. When a p-value is greater than 5%, we accept the null hypothesis. We also see an extreme ADF test statistic from this random walk model.

I like the critical value approach compared to the p-value as I can see to what degree I can reject that the time series is not stationary.

Kwiakowski-Phillips-Schmidt-Shin (KPSS)

The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests for the null hypothesis that the time series process is stationary or trend stationarity. In other words, the KPSS test null hypothesis is the opposite of the ADF test.

Exhibits Non-Stationarity

from statsmodels.tsa.stattools import kpss
t_stat, p_value, _, critical_values = kpss(df6['observed'].values, nlags='auto')

print(f'ADF Statistic: {t_stat:.2f}')
for key, value in critical_values.items():
     print('Critial Values:')
     print(f'   {key}, {value:.2f}')

print(f'\np-value: {p_value:.2f}')
print("Stationary") if p_value > 0.05 else print("Non-Stationary")
ADF Statistic: 0.91
Critial Values:
   10%, 0.35
Critial Values:
   5%, 0.46
Critial Values:
   2.5%, 0.57
Critial Values:
   1%, 0.74

p-value: 0.01
Non-Stationary

Exhibits Stationarity

from statsmodels.tsa.stattools import kpss
t_stat, p_value, _, critical_values = kpss(random_walk[0].values, nlags='auto')

print(f'ADF Statistic: {t_stat:.2f}')
for key, value in critical_values.items():
     print('Critial Values:')
     print(f'   {key}, {value:.2f}')

print(f'\np-value: {p_value:.2f}')
print("Stationary") if p_value > 0.05 else print("Non-Stationary")
from statsmodels.tsa.stattools import kpss
t_stat, p_value, _, critical_values = kpss(random_walk[0].values, nlags='auto')

print(f'ADF Statistic: {t_stat:.2f}')
for key, value in critical_values.items():
     print('Critial Values:')
     print(f'   {key}, {value:.2f}')

print(f'\np-value: {p_value:.2f}')
print("Non-Stationary") if p_value > 0.05 else print("Stationary")

Zivot and Andrews

Zivot-Andrews tests for the same thing as the ADF and KPSS tests for the presence of a structural break.

Let’s add a break to our random walk model. Notice how the variance and cyclicality properties are relatively constant, but there’s a significant shift in the mean.

Now let’s run both an ADF test and a Zivot-Andrews to test for non-stationarity.

t_stat, p_value, _, _, critical_values, _  = adfuller(stationary_with_break[0].values, autolag='AIC')
print(f'ADF Statistic: {t_stat:.2f}')
for key, value in critical_values.items():
     print('Critial Values:')
     print(f'   {key}, {value:.2f}')

print(f'\np-value: {p_value:.2f}')
print("Non-Stationary") if p_value > 0.05 else print("Stationary")
ADF Statistic: -1.05
Critial Values:
   1%, -3.43
Critial Values:
   5%, -2.86
Critial Values:
   10%, -2.57

p-value: 0.73
Non-Stationary
from statsmodels.tsa.stattools import zivot_andrews
t_stat, p_value, critical_values, _, _ = zivot_andrews(stationary_with_break[0].values)
print(f'Zivot-Andrews Statistic: {t_stat:.2f}')
for key, value in critical_values.items():
     print('Critial Values:')
     print(f'   {key}, {value:.2f}')

print(f'\np-value: {p_value:.2f}')
print("Non-Stationary") if p_value > 0.05 else print("Stationary")
Zivot-Andrews Statistic: -60.35
Critial Values:
   1%, -5.28
Critial Values:
   5%, -4.81
Critial Values:
   10%, -4.57

p-value: 0.00
Stationary

We see different results between the two stationarity tests. The ADF test fails to incorporate the structural break while the Zivot-Andrew identifies the series as stationary.

The Bottom Line

The process for testing for stationarity in time series data is relatively straightforward. You analyze the information visually, perform a decomposition, review the summary statistics, and then select a parametric test to gain confidence in your assumptions.

The Augmented Dickey-Fuller or ADF test is the most commonly used test for stationarity; however, it’s not always the best. To analyze stationarity in time series with structural breaks, you’ll want to use the Zivot-Andrews test. As always, it’s essential to have an intuitive understanding of the problem set and to have a deep knowledge of the tools at your disposal.

Leave a Comment