How to Interpret ARIMA Results

To analyze ARIMA results, you need to determine if the model meets the assumptions using Jlung-Box chi-square statistics and autocorrelation of residuals; understand if each term is significant using p-values, and recognize if your model fits well using mean-squared error.

Understanding ARIMA Results

After creating an autoregressive model, check the results to see if your model makes sense and how well it performs. Using statsmodels or any other library will print something out like the below.

The best way to understand is by example. We’ll review the results of a simple AR model trying to predict Bitcoin’s future results using these steps:

  1. Review general information
  2. Determine term significance
  3. Analyze model assumptions
  4. Compare models and improve the fit

We’ll review every line item within each step so you’ll walk away having a crystal clear understanding of your results.

1. Review General Information

The first thing you want to do is review the general information.

Also, statsmodels uses the same module for all autoregressive models, so the header displays SARIMAX Results when your model may only be a vanilla autoregression.

SARIMAX stands for Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors.

The basic information is pretty self-explanatory:

  • Dep. Variable – What we’re trying to predict.
  • Model – The type of model we’re using. AR, MA, ARIMA.
  • Date – The date we ran the model
  • Time – The time the model finished
  • Sample – The range of the data
  • No. Observations – The number of observations

The dependent variable is the close, which we’re trying to predict. The independent variables are the constant beta. The error term is sigma2 or epsilon in our equation above. Our lag variables are ar.L1, ar.L2, and ar.L3.

After reviewing that we didn’t make any basic mistakes with our model, we can move on to the next step and analyze the term significance.

2. Determine Term Significance

We want to make sure each term in our model is statistically significant. The null for this section is that each coefficient is NOT statistically significant. Therefore, we want each term to have a p-value of less than 0.05, so we can reject the null hypothesis with statistically significant values.

In our example, Ll and L2 are not statistically significant as their p-values are above the 0.05 threshold.

3. Review Assumptions

Next, we want to make sure our model meets the assumption that the residuals are independent, known as white noise.

If the residuals are not independent, we can extract the non-randomness to make a better model.


The Ljung Box test, pronounced “Young” and sometimes called the modified Box-Pierce test, tests that the errors are white noise.

The Ljung-Box (L1) (Q) is the LBQ test statistic at lag 1 is, the Prob(Q) is 0.01, and the p-value is 0.94. Since the probability is above 0.05, we can’t reject the null that the errors are white noise.

If you’re interested in seeing all of the Ljung-Box test statistics and p-values for the lags, you can use a Ljung-Box diagnostic function.

from statsmodels.stats.diagnostic import acorr_ljungbox
mod = ARIMA(endog=train, order=order)
res =
jlung = acorr_ljungbox(res.resid)


Heteroscedasticity tests that the error residuals are homoscedastic or have the same variance. The summary performs White’s test. Our summary statistics show a test statistic of 1.64 and a p-value of 0.00, which means we reject the null hypothesis and our residuals show variance.

This variance poses forecasting problems, and if you don’t know why, this video is an excellent refresher on the topic.


Jarque-Bera tests for the normality of errors. It tests the null that the data is normally distributed against an alternative of another distribution. We see a test statistic of 4535.14 with a probability of 0, which means we reject the null hypothesis, and the data is not normally distributed. Also, as part of the Jarque-Bera test, we see the distribution has a slight negative skew and a large kurtosis.

4. Fit Analysis

The Log-Likelihood, AIC, BIC, and HQIC help compare one model with another.


The log-likelihood function identifies a distribution that fits best with the sampled data. While it’s useful, AIC and BIC punish the model for complexity, which helps make our ARIMA model parsimonious.

Akaike’s Information Criterion

Akaike’s Information Criterion (AIC) helps determine the strength of the linear regression model. The AIC penalizes a model for adding parameters since adding more parameters will always increase the maximum likelihood value.

Bayesian Information Criterion

Bayesian Information Criterion (BIC), like the AIC, also punishes a model for complexity, but it also incorporates the number of rows in the data.

Hannan-Quinn Information Criterion

Hannan-Quinn Information Criterion (HQIC), like AIC and BIC, is another criterion for model selection; however, it’s not used as often in practice.

The Bottom Line

It’s essential to understand how to analyze ARIMA results. In this post, you learned first to examine the general information, review the coefficients for significance, understand how to determine if our results meet the model assumptions, and then compare various models.

Leave a Comment