1、Pooling of Time Series and Cross Section DataV. K. ChettyEconometrica, Vol. 36, No. 2. (Apr., 1968), pp. 279-290.Stable URL:http:/links.jstor.org/sici?sici=0012-9682%28196804%2936%3A2%3C279%3APOTSAC%3E2.0.CO%3B2-WEconometrica is currently published by The Econometric Society.Your use of the JSTOR ar
2、chive indicates your acceptance of JSTORs Terms and Conditions of Use, available athttp:/www.jstor.org/about/terms.html. JSTORs Terms and Conditions of Use provides, in part, that unless you have obtainedprior permission, you may not download an entire issue of a journal or multiple copies of articl
3、es, and you may use content inthe JSTOR archive only for your personal, non-commercial use.Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp:/www.jstor.org/journals/econosoc.html.Each copy of any part of a JSTOR transmission mus
4、t contain the same copyright notice that appears on the screen or printedpage of such transmission.The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academicjournals and scholarly literature from around the world. The Archive is supported by
5、 libraries, scholarly societies, publishers,and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community takeadvantage of advances in technology. For more information regarding JSTOR, please contact supportjstor.org.http:/www.jstor.orgTu
6、e Oct 30 17:00:59 2007Econometrics, Vol. 36, No. 2 (April, 1968) POOLING OF TIME SERIES AND CROSS SECTION DATA Estimates of parameters from cross section data are often introduced into time series regression as known with certainty, which leads to conditional estimates of the time series regression.
7、 This paper develops a method of pooling cross section and time series data from the Bayesian point of view, to estimate all the parameters simultaneously. It is shown that the parameters which are common to both the regressions will have on the average sharper posterior distributions. It is also de
8、monstrated that the traditional method often leads to underestimates of the standard errors of the time series estimates. The method is applied to estimate a statistical demand function for the U.S. based on cross section and time series data given in Tobin 18. 1. INTRODUCTION INTHIS PAPER we consid
9、er the problem of pooling time series and cross section data. Several interesting attempts have been made in the past to combine time series and cross section data to estimate the parameters of a model-see Klein 6, Marshack S, Solow 12, Staehle 13, Stone 14, Tobin IS, and Wold and Jureen 19. In all
10、these studies, cross section data are used to obtain estimates of some of the parameters of the model. These estimates are then introduced into time series regression as known with certainty to estimate the other parameters of the model. Thus the estimates obtained from the time series data are cond
11、itional upon the estimates obtained from the cross section data, and hence the time series regression yields only conditional estimates of the parameters. This fact is not often mentioned in the literature. Tobin IS, however, notes that a refinement of this method is certainly necessary to avoid int
12、roducing the estimates of the para- meters as known with certainty. Kuh 7 has also pointed out that cross section estimates may often be biased and therefore can contaminate the combined estimates if they are introduced as point estimates. From the sampling theory point of view, Durbin 3, Theil and
13、Goldberger 16, and Theil 15 have suggested methods of introducing extraneous information about some coefficients in a regression model. It would be undoubtedly better and desirable to introduce information from cross section data into the time series regression employing these methods, rather than i
14、ntroducing them with certainty. But it is to be noted that these estimation methods have only an asymptotic justification. In this paper, a method for combining time series and cross section data is proposed from the The research work relating to this paper was done while the author was at the Unive
15、rsity of Wisconsin, Milwaukee. He would like to express his appreciation for the research support made avail- able by the Graduate School of the University of Wisconsin, Milwaukee. He would also like to ack- nowledge Professors A. Zellner and Jacques Drkze for their valuable comments and Mr. David A
16、. Vick for his assistance in computing. 280 V. K. CHETTY Bayesian point of view. With this method, the coefficients of the cross section and time series regressions are simultaneously estimated, and exact finite sample results are obtained. Also it is shown that the posterior distributions of the pa
17、rameters common to both time series and cross section regressions will on the average be sharper than those obtained using the traditional method. The order of presentation is as follows. In Section 2, we specify the model along the lines suggested by Kuh 7 and analyze it from the Bayesian point of
18、view. Section 3 contains an application of the estimation procedure developed in this paper. 2. SPECIFICATION AND ANALYSIS OF THE MODEL Specification of the Model We assume that we have one cross section sample relating to N economic units and a time sequence of aggregates for T periods. These units
19、 may be firms, con- sumers, regions, or any other statistical entity. The cross section regression model is (2.1) y=XlBl+XzB2+, where y is an N x 1 vector of observations on the dependent variable relating to different economic units at a point in time, and XI and X, are matrices of obser- vations o
20、n the independent variables. The orders of X, and X, are assumed to be N x M and N x (K -M).The vectors P1 and B, are vectors of coefficients of sizes M x 1 and (K-M)x 1 respectively. The vector u is an N x 1 vector of disturbances. The time series model is where Y is a Tx 1 vector, the elements of
21、which are the time series aggregates of observations on the micro units dependent variable; Z1 and Z, are matrices of orders T x M and T x (L-M),the elements of which are the time series aggregates of observations on the variables that vary through time ;B1 and y, are the vectors of coefficients of
22、sizes M x 1 and (L-M)x 1 respectively. The vector B1 is assumed to be the same as that of regression (2.1).The vector w is a Tx 1 vector of random disturbances. Regarding the properties of the disturbance terms, we assume, following Kuh 7, that each residuql associated with the unit may be decompose
23、d into three parts : the individual effect, the pure time effect, and a remainder. Thus, the error term relating to the ith unit at time t,uit, is decomposed as uit =vi +r, +e,. Further, we assume that vi, r, and e, have zero means and that vi and r, are in- dependent. The ei:s are assumed to be ind
24、ependent and to have common variance 0,.The jth element of w in the time series regression is given by N POOLING DATA 281 It can be seen that if there is no pure time effect or individual effect, then wj=Xy=,eij. Hence the ws have the common variance No2.In other cases, the variance of the time seri
25、es disturbance term will not be a scalar product of the cross section disturbance. In what follows, we assume that the us and ws are normally distri- buted with zero means and variances a2 and of respectively, and we analyze the model under two different assumptions about their variances :(i)a =kc1,
26、where k is a known constant ;(ii)a and a, are not directly related. Analysis of the Model when o =ko, Without loss of generality we assume that k= 1. This will be the case when Y and Z represent time series aggregates of the observations divided by JN.The joint likelihood function of the coefficient
27、s in our model (2.1)and (2.2)is given by where For simplicity in notation we shall use the symbol Q(B, 6, A) throughout this paper to denote a quadratic form in variables B centered at 6 and with matrix A, namely Q(B, 6, A)=(B-J)A(B-J). In this notation. the likelihood function (2.3)can be written a
28、s (2.4) l(B, y, a Idata) cc o-(N+ x exp - =(y -Xb)(y -xP), R =ZZ, 9 =R-Z Y, and TS:=(Y-z?)(Y-zY). As regards the prior distribution for the parameters of the model, we assume that we know little. a priori, about them and follow Jeffreys 5,Savage ll,and Box and Tiao I by assuming that log o and the e
29、lements of PI, B2, and y, are locally, uniformly, and independently distributed. That is, our prior distribution is Using Bayes Theorem, we combine the likelihood function in (2.4)and the prior distribution in (2.5)to obtain the following joint posterior distribution for the parameters : 282 V. K. C
30、HETTY The marginal posterior distribution of PI-the vector of coefficients that is common to both the cross section and time series regressions-can be obtained as follows. The matrices P and R are partitioned as i“ R“ = ;zl “1 R = R21 R22 z;zl z;z2 The quadratic form Q(B, 8, P) in (2.6) can be rewri
31、tten as (P-B)/P(/-8) = Q(P2, ,-,;P,(fi-), Dl, P-PP;P,).P,)+Q(P, ere the quadratic form Q(P,8,P) is split into two quadratic forms, one containing B, only and the other containing p2 and Dl. Recognizing this, it is easy to integrate out 8,. Similarly, the quadratic form Q(y,9, R)can be split into two
32、 quadratic forms, one containing P1 only and the other containing and y ,. Now P2 and yl can be integrated out using the properties of the multivariate normal distribution ;hence we have the marginal distribution of Dl and o as (2.7) p(P, a ldata) cc a-(T+N-K-L+2M) 1 x exp -(NS:+Q(Pl,81, v1)+7s: +Q(
33、p1, PI, v2)12a2 where Vl =PI, -P12P,;1P2, and V2 =Rll -R12 R;: R,. Equation (2.7)can be rewritten as p(bl, aldata) cc a-(T+N-K-L+2M) 1 x XP-Ns: +Ts: +Q(Pl,Di, v) 2a2 where V= Vl + V2and Dl = V-I (Vl 8,+ V2PI). Integrating out a, we have This is in the form of a multivariate t distribution. Hence, ea
34、ch of the elements of pl will have a univariate t distribution. In a similar manner, it can be shown that b2 and y, have a multivariate t distribution. Thus exact marginal posterior 283 POOLING DATA distributions of the coefficients of the cross section and time series regressions are obtained simul
35、taneously. In this procedure, the information contained in both cross section and time series data is used to derive the marginal posterior distribu- tion of any element of p,. Hence the posterior distribution of any element of P1 obtained this way is likely to have a smaller variance than that obta
36、ined from the two-stage procedure mentioned before. This can be analytically established as follows. Before the samples are drawn, sf and s: should be regarded as random variables. Hence the expected values of the variances of the elements of B, are given by the leading diagonal elements of o2 V; wh
37、en 8, is estimated from the cross section data alone. On the other hand, if B, is estimated from both time series and cross section data, the corresponding variances are the leading diagonal elements of a2 V- l. In order to show that the variances in the second case are smaller or the same, it is en
38、ough if we show that (V; -V-I) is semipositive definite. This easily follows from a well known theorem (see Goldberger 4) which states that if A= B+C where B is positive definite and Cis nonnegative definite, then B- -A- is nonnegative definite. In our case V= Vl + V2, and V and V, are positive defi
39、nite. Of course this is not surprising, since in the second case we are using more ob- servations to estimate pl. It may also be noted that in the traditional method of combining cross section and time series data, the conditional distribution of y, namely p(yl IB1 =PI), is used for making inference
40、s about y, while the marginal distribution of y, should be used whenever B, is not known with certainty. As many writers have pointed out before, the marginal distribution p(y,) can be regarded as a suitably weighted average of the conditional distributions p(y, I Dl) with p(B, I data) serving as th
41、e weight function. Unless the conditional distribution p(yl (PI) is insensitive to changes in the values of PI, it is clear that assuming B, equals some fixed value -say I1=P, the estimate obtained from cross section regression-could lead to a posterior distribution of y, far different from that giv
42、en in (2.8). Thus, introducing the estimates of some of the parameters from the cross section regression as known with certainty into the time series regression can vitally affect our inference about the remaining parameters. Also, since the variance of a conditional distribution of any parameter wi
43、ll be smaller or at the most equal to that of its marginal distribu- tion, the traditional method will lead to underestimates of the variances of the posterior distributions of the elements of y ,. It can be easily seen that our procedure is equivalent to a Bayesian estimation of time series regress
44、ion using as prior distribution for p1 and o, the posterior distribution obtained from cross section data. Analysis of the Model When o and o1 Are Not Directly Related As pointed out before, the error variances will not be directly related if either the individual effect or the time effect are prese
45、nt in the disturbance term u,. In such cases we must analyze the model assuming that o and o, are different. 284 V. K. CHETTY The posterior distribution of the parameters of the model, for prior distributions similar to the one used before, is (2.9) p(p, y,a,o, I data) cc Integrating with respect to
46、 a and a, we have the posterior distribution of p and y as This is in the form of a product of two multivariate t distributions. In order to get the normalizing constant in (2.10), an integral of dimension (Kt L- M) has to be evaluated. This may lead to some practical difficulties in the numerical e
47、valuation of the posterior distribution, particularly when (K + L- M) is large. But the poster- ior distributions can be approximated numerically using methods similar to those suggested by Tiao and Zellner 17 and Chetty 2 as follows. If we are interested in the marginal distribution of p, we can in
48、tegrate P2 and y, from (2.10) using the properties of the multivariate t distribution. We now have Since si and s: are known quantities, they can be supressed by setting D=(vV,)/ (Ns?) and G =(v, R)/(Ts;) where v = N -K and v, = T- L. We can rewrite (2.1 1) as The expression can be written as Dl, D)
49、 -log (1 + Q17 fl,D) , v Expanding the second factor of (2.13) in powers of v-, we obtain POOLING DATA Similarly the second factor in (2.12) can be written as where the gs are similar to the ps. Hence we can express the posterior distribution in (2.12) as where B =D + G, p, =B-(D,+ G Dl). In order to get the normalizing constant, the expression in (2.16) can be inte- grated term by term. This involves evaluating mixed moments of the quadratic forms Q(B1, fl, D) and Q(B, PI, G)