1、Prediction Intervals for ARIMA ModelsRalph D. Snyder; J. Keith Ord; Anne B. KoehlerJournal of Business Bayesian; Forecasting; Holt-Winters; Simulation; State space. As indicated by Chatfield (1993) in his comprehensive state- of-the-art review, the construction of valid prediction intervals (PIs) fo
2、r time series continues to present considerable diffi- culties. In particular, Chatfield noted a number of reasons why PIs may be too narrow; these include the following: I. Model parameters may have to be estimated. 2. Innovations may not be normally distributed. 3. There may be outliers in the dat
3、a. 4. The wrong model may be identified. 5. The underlying model may change, either during the period of fit or in the future. In this article, we focus on the first of these issues. If the uncertainty relating to parameter estimation is not allowed for explicitly, the resulting PIs would be too nar
4、row. Further, the nonlinear nature of the parameter estimates in time series makes the problem intractable as regards an exact analytic solution, so we develop various approximate solutions, which are then explored in a simulation study. Only when we are confident of our ability to produce reliable
5、PIs in the basic case can we address the remaining issues. Thus, in this arti- cle, we examine the construction of PIS when the parameters are unknown and the errors are assumed to be normal, leaving the other issues to be addressed in further research. We identify four approaches to the constructio
6、n of PIs and report on an extensive simulation study of these alternatives. The particular model used in our simulations is the addi- tive Holt-Winters (HW) scheme; see Example 1.2. Yar and Chatfield (1990) provided PIs for this scheme based on its autoregressive integrated moving average (ARIMA) re
7、presen- tation and setting the parameter values equal to their estimates (the “plug-in“ approach). These authors found the method to be superior to previous, albeit heuristic, approaches, and the plug-in PI is one of the options considered in our study. However, rather than use an ARIMA framework, w
8、e have opted for a state-space scheme; details are given in Section 1. We are not the first to consider the construction of PIs for autoregressive moving average (ARMA) models by the use of a state-space scheme; see Ansley and Kohn (1986). They proposed a linear approximation method to account for t
9、he estimation error of the autoregressive (AR) and moving aver- age (MA) parameters in computing PIS. However, we con- sider ARIMA models and the equivalent state-space models with a single source of error instead of ARMA models and state-space models with multiple sources of error. The lin- ear app
10、roximation that is part of our two proposed methods linear approximation (LA) method and Bayesian simulation (BS) method is applied to the time series itself rather than to the variance of the state variables. In the second of our pro- posed methods, we consider the impact of estimating the error va
11、riance as well as the AR and MA parameters. Moreover, we conduct a simulation study to investigate the improvement in PI coverage by using methods that account for the error in the estimation of the parameters. The second of our two proposed methods in the article is a BS scheme. When Ansley and Koh
12、n (1986) showed how to obtain the conditional mean squared error (MSE) for a time series in the state-space framework, they pointed out that the correction to the MSE has a Bayesian interpretation. Under appropriate conditions we can apply the asymptotic sampling distribution developed by Ansley and
13、 Kohn (1986) to gen- erate the predictive distribution, using simulation. De Jong and Whiteman (1994) followed this approach in developing PIs for AR(p) schemes; the resulting simulated distribution O 2001 American Statistical Association Journal of Business see Barnett, Kohn, and Sheather (1996, 19
14、97) for the development of MCMC estimation procedures for ARMA models. Our method uses an analytic approximation to the posterior distri- bution of the parameters, which we then feed into the compu- tation of the predictive distribution. Thus, our scheme may be viewed as a “partial“ MCMC technique,
15、which should be less demanding computationally, an important consideration when a large number of series is to be analyzed. Another approach would be to consider the nonparametric bootstrap; see, for example, Thombs and Schucany (1990), Kabaila (1993), and McCullough (1994). However, since our curre
16、nt focus is on getting the correct coverage with a known underlying error process, we have not pursued that line of inquiry at this time. In the article, we address three main issues-(1) the exten- sion of the Bayesian simulation approach to state-space schemes, (2) the use of approximations to simp
17、lify the com- putational task, and (3) an extensive simulation study to determine whether two suggested approaches to computing PIs (linear approximation method and Bayesian simulation method) provide PIs with improved coverage. The structure of the remainder of this article is as follows. In Sectio
18、n 1, we compare single-source and multiple-source state-space schemes and justify our use of a single-source model. In Section 2, we describe the various approaches to be considered for the construction of PIs. Section 3 describes the simulation study and summarizes the conclusions from that study.
19、An application to real data is presented in Section 4. The summary and outline of future directions appears in Section 5. 1. STATE-SPACE REPRESENTATION For the PIs in this article, we use models in their state- space form. In particular, we employ state-space models with a single source of error (SS
20、OE); see Snyder (1985, 1988). The SSOE representation for a time series y,) is y,=h,_+s, where x,=Fx,-,+as, and - a2) I) The k vector x,-, represents the state of the underlying process at the beginning of period t, stis from an IID(0, a*) series of disturbances, h is a fixed k vector, F is a fixed
21、k by k matrix, and a is a fixed k vector of parameters. It is assumed that st is independent of y ,-,x,-, i 2 1). Akaike (1974) first showed that any ARMA(p, q) scheme has a Markovian state-space representation with a single source of variation, a small but critical difference. Moreover, any ARIMA(p
22、, d, q)(P,D, Q), model can be represented by an SSOE model, and vice versa. However, it should be noted that Akaikes model does not include an error term in the observation equation. In Appendix A, we demonstrate the equivalence of the ARIMA and SSOE schemes. Journal of Business see Harvey (1990, ch
23、ap. 2) or West and Harrison (1997, chap. 2). Although ARIMA schemes may be represented by an MSOE scheme, the parameter space is often restricted. For example, the MA() scheme with parameter 0 has -1 0 1, but the MSOE representation requires 0 0 1. An empirical study by Garcia-Ferrer and Del Hoyo (1
24、992) contrasted the multiple source scheme (Harveys basic struc- tural model or BSM; Harvey 1990) with ARIMA modeling for a number of series. Garcia-Ferrer and del Hoyo concluded that the ARIMA formulation generally produces better predictions than BSM, a result they attributed to the lack of orthog
25、onal- ity among the components of the state vector, but it may be due to restrictions on the parameter space. Given the equiva- lence of the ARIMA and SSOE representations, their conclu- sions imply that the SSOE form of the BSM is superior to its traditional multiple-source counterpart, provided a
26、suitable model-selection procedure is applied. 2. MODEL ESTIMATION AND PREDCTONERROR 2.1 Exponential Smoothing Versus the KalmanFilter ML estimates of the parameters a and a may be obtained using a procedure that incorporates the Kalman filter to expedite the evaluation of the likelihood function (S
27、chweppe 1965). The Kalman filter for the SSOE scheme (Snyder 1985) includes the equations and where xtI, denotes the estimator for x, based on the sample Y, = (y, . . . ,y,), and where a, is the Kalman gain. An inherently simpler strategy is to bypass the Kalman filter and use exponential smoothing
28、from the outset. Condi- tioning on a trial value for x, and assuming that the sample Y, = (y, . . . ,y,) is known, the SSOE scheme implies that fixed successive values of the state vector x, can be computed Snyder, Ord, and Koehler: Prediction Intervals for ARlMA Models recursively with the error-co
29、rrection form of the exponential smoothing equation Strictly speaking, the x, should be read as x, lY,-, , 6, x, in (2.3), where 6 denotes the vector of unknown parameters contained in (h, F, a, a). The SSOE scheme then implies that y,lY ,-,6, x, - IID(hlx,_,a), from which it follows that the likeli
30、hood function has the form L(6, x,lY,) = n:-,p(y,JY,-, 6, x,), where p() is the pdf for E,. Since x, is treated as a fixed vector of unknown parameters, we have here a conditional, rather than a marginal, likelihood function. When the likelihood is based on a normal distribution, the ML estimate of
31、x, denoted by x, for given 6, is a linear least squares estimate. Estimates of successive state vectors, given a sample of size n, are then obtained with the recurrence relationship based on (2.3) This is similar to the updating Equation (2.2) of the Kalman filter, there being two differences, (1) t
32、he Kalman gain is replaced by the vector of smoothing parameters and (2) fil-tered values of the state vectors are replaced by corresponding smoothed values. The Kalman filter can only be used as part of the ML proce- dure when the disturbances in the SSOE scheme are normally distributed. The expone
33、ntial smoothing method outlined in this section, however, can be applied for any disturbance distribu- tion. The seed vector estimates no longer correspond to linear least squares estimates and the links with the Kalman filter disappear. This is of little consequence, however, because the method con
34、tinues to yield ML estimates. For the rest of the article the ML estimates of 6 and x, will be denoted by 6 and 5 j = 1,2, . . . , r. The point predictions for t =n +1, . . . ,n +r are the conditional expectations of Model (1.1): f, = hi,-, and if= Fit-,. Four principal approaches for the PIs will b
35、e considered-a heuristic (HE) method, the “plug-in“ (PL) method, a linear approximation (LA), and a Bayesian sim- ulation (BS) scheme. To simplify the notation, we will use y, = (yl,. . . ,y,) for the sample of past values and yf = (y, . . . .Y,+)for future values. 2.2.1 Heuristic Method (HE). This
36、method was devel-oped by Bowerman and OConnell (1993. chap. 8). The com- putation of the PI does not rely on a model that includes the changing nature of the level, trend, and seasonal components. Instead, it relies on the adjustment of the MAD (mean abso- lute deviation), which is computed from des
37、easonalized data. The adjustment is based on work by R. G. Brown and takes into account the smoothing constants and length of the fore- casting horizon. 219 2.2.2 Plug-in Method (PL). For the construction of the PI by the PL method, the density function p(yf 16, x, a, y,) of the future time series i
38、s approximated by the Gaussian den- sity +(yf /B,?, , E; is the corresponding (n + r) x 1 vector of error terms; the vector of zeros corre- sponds to the predicted values for Ef. For an invertible process the dependence of yf on x, will be slight see the appendix, Eq. (A.8). Thus, we assume that y i
39、s approximately a lin- ear function of 6 and E only and write y* =Z6 +LE, where y* denotes y minus the constant term from the Taylor series expansion. The matrices Z and L contain the derivatives of y with respect to 6 and E, evaluated at 6, el. Note that L is a unit lower triangular matrix because
40、the typical y, cannot depend on future values of the disturbances. This linear approximation can be expressed as the following equations for the past (p) and future (f) values: and Assuming a diffuse prior, we approximate the posterior by a Gaussian distribution with mean 6 and variance where L,% =
41、Z,. To construct PIs we need the variance of the forecast error for future time periods. In the development of this variance, first solve for E, in (2.5) to get Then substitute (2.8) into (2.6) to find y; = (Zf -L,%)O + Lff cf +LpfL;jyi. This equation can be rewritten in the fol- lowing form: -where
42、 Zf = Zf -L,?,. As a result. the distribution p(yf x, a, y,) IS approximated by p(yf (i, second, this relative insensi- tivity led to some numerical instabilities, particularly when the seasonal cycle of length m = 12. For both reasons, we decided to focus attention on the variations in 6 and i3 onl
43、y. Thus, we now reformulate (2.10) for the current problem as that is, we perform a simulation to arrive at the predictive distribution, which has the form A Monte Carlo integration method is employed to evaluate p(yf ly,) as follows; the steps are entirely the same as those described by Ord, Koehle
44、r, and Snyder (1997): 1. p(ulf, y,) is approximated by an inverted gamma dis- tribution. A value of u2 is randomly generated from the approximating distribution. 2. p(Olu, f, y, is approximated by a Gaussian distribu- tion with mean and variance matrix (2.7). A value of 6 are adjusted. For example,
45、for the additive structural model in Example 1.2, we saw that the smoothing parameters a must be nonnegative. Negative values are truncated to 0. (We ignore the other restrictions on the smoothing parameters for the HW method because we rarely found them to be binding in practice.) 3. The distributi
46、on p(yf 18, i,a, y,) is approximated by a synthetic sample of M values of the vector yf generated from the model in Example 1.2. The future values of e, required for the calculation of each instance of yf are themselves generated from an N(0, a*) distribution. Thus, for each yf we estimate the pdf b
47、y where the M sets of values of 6, and a, are generated in accordance with the preceding steps 1-3. 4. PIs are constructed directly from the sample of the yf for a specified confidence interval P by deleting those M(l -P) sample values that are farthest from the associated point prediction for each
48、period t. The smallest and largest values that remain in the culled sample are used as the lower and upper boundaries of the prediction intervals. The method proposed is similar to that of Thompson and Miller (1986) for stationary AR(p) processes. Thompson (per- sonal communication to Koehler) indic
49、ated that they did not extend their method to ARIMA schemes. Our method has two key differences: (1) We impose constraints on the parameter estimates to ensure stationarity after differencing and (2) we use ML estimators. We have not compared the performance of our procedures with those of Thompson and Miller, since our primary interest lay in the construction of PIs for complex models like HW with a strong MA component. Since least squares methods are highly efficient for AR schemes except near the boundaries of the parameter