ARIMA stands for AutoRegressive Integrated Moving Average. The ARIMA modeling and forecasting approach is also known as the Box-Jenkins approach.
I will not discuss here the "I" in ARIMA. This is done in the guided tour on ARIMA estimation and forecasting.
In this guided tour I will explain how to select the orders p and q of an ARIMA(p,d,q) process, given the value of d. Recall that an ARIMA(p,0,q) process is the same as an ARMA(p,q) process.
As is well known (if not to you, stop here and don't use module ARIMAMODSEL!), the general form of an ARMA(p,q) process y(t) is:
where the e(t)'s are independently distributed with zero expectation and variance s2, and m is a constant. Thus, the p in "ARMA(p,q)" is the maximum lag of the AR part, and q is the maximum lag of the MA part.
This model can be written more compactly in terms of lag polynomials and lag operators. Define the lag operator L as:
etcetera. Then we can write:
say, where
and similarly
say, where
Thus, the ARMA(p,q) model involved can now be written as:
In the case of seasonal time series, for example quarterly time series, there may be seasonal effects in the AR and/or MA lag polynomials. The model then becomes, for example,
where
is the seasonal AR lag polynomial, and
is the seasonal MA lag polynomial.
As explained here and in my lecture notes on forecasting, the AR and MA orders p and q, respectively, and in the case of seasonal time series the seasonal AR and MA orders s1 and s2 as well, can be estimated consistently on the basis of the Hannan-Quinn and Schwarz information criteria. However, this is not a foolproof method, in particular in the presence of seasonal AR and MA polynomials, because it only works well for long time series.
To demonstrate how this approach works, I have generated 500 observations on the stationary ARMA(1,1) process
where the errors et are i.i.d. standard normally distributed. The data involved is available in (US style) CSV format, as ARIMAMODSELDATA.CSV, which should be intepreted as "annual" time series, starting from "year" 1.
Once you have imported this data file, open Menu > Single equation models > ARIMA model selection via information criteria. Then EasyReg opens with the following window:
Click "Continue":
We are not going to select a subsample. Thus, click "No", "Confirm" and "Continue":
Since the time series is stationary, there is no need for differencing. Of course, you need to verify this by conducting unit root tests first.
Click "Continue":
You do not know in advance whether an intercept is needed or not. Therefore, leave the intercept, and click "Continue":
Now you have to specify the maximum values of p and q. I will choose max p = a(1,3) = 3 and max q = a(2,3) = 3.
Click "Specification O.K.":
Click "Continue":
Click "Start". Then the Akaike, Hannan-Quinn and Schwarz information criteria will be computed for all combinations of p = 0,1,2,3 and q = 0,1,2,3:
All three criteria select the true ARMA(1,1) model:
To demonstrate that this approach is not foolproof, let us consider the data used in the guided tour on ARIMA estimation and forecasting. Recall that this data was generated as
where e(t) is i.i.d. standard normally distributed, and t = 1,2,....,200. This is a seasonal ARMA process in Dy(t), with p = q = 1, s1 = 0 and s2 = 1. The maximum values of there orders have been choosen as follows: p £ a(1,2) = 2, q £ a(2,2) = 2, s1 £ c(1,1) = 1, s2 £ c(2,1) = 1:
Now follow the same steps as before:
Model 8 is the optimal model indicated by the Hannan-Quinn and Schwarz information criteria, which is an ARMA(2,1) process in Dy(t), without seasonal effects. Although these two citeria generate consistent estimates of the true orders, the sample size of 200 is too small for the consistency to kick in.
Model 26 is the optimal model indicated by the Akaike information criterion, which corresponds to a seasonal ARMA model for Dy(t), with p = 2, q = 1, s1 = 0 and s2 = 1. This is close to the truth: p = q = 1, s1 = 0 and s2 = 1. This result is what we should expect, because the Akaike information criterion is known to overshoot the true orders.
This example shows that if the time series is rather short one should not rely too much on the consistency of the Hannan-Quinn and Schwarz information criteria.