收藏 分享(赏)

北大经院高级计量经济学回归模型.pdf

上传人:weiwoduzun 文档编号:1760052 上传时间:2018-08-22 格式:PDF 页数:10 大小:339.24KB
下载 相关 举报
北大经院高级计量经济学回归模型.pdf_第1页
第1页 / 共10页
北大经院高级计量经济学回归模型.pdf_第2页
第2页 / 共10页
北大经院高级计量经济学回归模型.pdf_第3页
第3页 / 共10页
北大经院高级计量经济学回归模型.pdf_第4页
第4页 / 共10页
北大经院高级计量经济学回归模型.pdf_第5页
第5页 / 共10页
点击查看更多>>
资源描述

1、Topic 2: Classical MultivariateRegression ModelsAdvanced Econometrics (I)Dong ChenSchool of Economics, Peking University1ModelSpecificationOur interest lies in the following modely = 1x1+ 2x2+ . + KxK+ . (1.1)Model (1.1) expresses a relationship for the population. To estimate the pa-rameters in the

2、 model, we obtain a sample of size n.Fortheith observation,we can write the model asyi= 1xi1+ 2xi2+ . + KxiK+ i(1.2)by adding the observation index. Model (1.2) can be written in a more conciseway by using the matrix notation,yi= xi + i, (1.3)where xi= xi1xi2 xiKand = 12 K.For the whole sample of n

3、observations, defineX(nK)=x1x2.xn=x11. x1K.xn1. xnK,andy =y1y2.yn.Then we can write the model for the sample asy = X + . (1.4)Expressing the model with matrix notations will make it convenient tostate the assumptions underlying the model and to derive the formula for theOLS estimator. We will swich

4、between the matrix notation and the scalarnotation as in (1.1) from time to time as appropriate. Note that the matrixX usually contains a column of 1s and thus the model will be have a constantterm.12Assumptions 22 AssumptionsAssumption 1: Linearity Linear functional form of the relationship, y =X +

5、 .Note that linearity here refers to linearity in parameters instead of invariables. For example, among the following two models, (2.1) is linear while(2.2) is non-linear.yi= 1+ 2xi+ 3x2i+ 4x3i+ i. (2.1)yi= 1+ 2xi+ 3x2i+23x3i+ i. (2.2)Assumption 2: Full rank X is an nK matrix with rank K,i.e.,(X)=Kn

6、. That is, there is no exact linear relationship among any of the independentvariables and there are at least K observations. This assumption implies thatXX also has full rank K.If X does not have full rank, then we will have the perfect collinearityproblem.Example 1: Supposeyi= 1+ 2xi2+ 3xi3+ i, (2

7、.3)and thatxi3= c1+ c2xi2. (2.4)So (X)=2.Substituting(2.4)into(2.3),wehaveyi= 1+ 2xi2+ 3(c1+ c2xi2)+i=(1+ 3c1)+(2+ 3c2)xi2+ i= 1+ 2xi2+ i.(2.5)We can determine c1and c2exactly from (2.4) and we can estimate 1and2from (2.5), but we cannot identify 1, 2,and3separately.Assumption 3: Strict exogeneity E

8、(|X)=E(1|X)E(2|X).E(n|X)= 0.The expression of E(|X)=0is a shorthand forE(i|x1,.,xn)=0,i=1,2,.,n. (2.6)In general, E(i|x1,.,xn) is a (nonlinear) function of x1,.,xn,or,altogethernK random variables. Assumption 3 states that this function has a constantvalue of 0. Assumption 3 has the following implic

9、ations. The unconditional mean of iis also zero:E(i)=EE(i|X) = E(0) = 0. (2.7)2Assumptions 3 xjand i, i,j =1,2,.,n,areorthogonal:E(xji)=E(xj1i)E(xj2i).E(xjKi)= 0. (2.8)Or, we may write more compactlyE(X)=0. (2.9)To see this, by the law of iterative expectations,E(i|xjk)=EE(i|X)|xjk) = 0.Using the pr

10、operty again and by linearity of conditional expectations,we haveE(xjki)=EE(xjki|xjk)= ExjkE(i|xjk)=0. iand X are uncorrelated:Cov(i,xjk)=E(xjki)E(xjk)E(i)= E(xjki)=0. (2.10) Regression of y:E(y|X)=X. (2.11)The conditional mean of y on X is called the regression of the model.As we will show later, s

11、ome properties of the OLS estimator depends cruciallyon the assumption of strict exogeneity. However, this assumption may not besatisfied. in many situation. For instance, consider a first-order autoregressivemodel in a time-series context,yt= yt1+ t. (2.12)Since tis a component of yt, it is necessa

12、rily correlated with yt.Butytwillbe the regressor in the next period of t +1.Hence,tis correlated with theregressor of the next period, which violates strictly exogeneity.Also note that, by random sampling all observations are independent ofeach other. It then follows that the conditional density fu

13、nction of f(i|xi,Xi),where Xiindicates all other xs except xi,isequaltof(i|xi). (This result isobtained by the definition of conditional density, in which the marginal densityfunctions that only involve Xiwill be cancelled out in both the numeratorand the denominator of the formula.) This further im

14、plies thatE(i|X)=E(i|xi). (2.13)3 Least Squares Estimator for 4Hence, under random sampling,thestrictlyexogeneityassumptioncanbereduced toE(i|xi)=0,i. (2.14)The condition of (2.14) is also called contemporaneous exogeneity.Assumption 4: Spherical disturbances E(|X)=2In(the “variance-covariancematrix

15、” of ).E(|X)=E12.n12. nX(2.15)=E21|X. E(1n|X).E(1n|X) . E2n|X (2.16)=Var (1|X) . Cov(1n|X).Cov(1n|X) . Var (n|X) (2.17)= 2In(by assumption) (2.18)=2. 0.0 . 2.(nK)(2.19)Assumption 4 implies that Homoskedasticity: E2i|X= 2, i. Non-autocorrelation: Cov(ij|X)=E(ij)=0, i = j.Assumption 5: Normality |X N0

16、,2In. That is, vector follows a mul-tivariate normal distribution.3 Least Squares Estimator for 3.1 Least Squares (LS) PrincipleConsider the following empirical model.y = X + . (3.1)Let b0be an arbitrary K 1 vector, and we can definee0= yXb0, (3.2)which is a residual vector associated with b0.3 Leas

17、t Squares Estimator for 5Choose b0such that it minimizess(b0)=e0e0=(yXb0)(yXb0) (3.3)= yyb0XyyXb0+b0XXb0(3.4)= yy2b0Xy +b0XXb0. (3.5)Remark 1: If a and x are two n1 column vectors and A is an nn matrix,thenaxx=xax= a, (3.6)and (xAx)x=(A+A)x (3.7)=2Ax if A is symmetric. (3.8)Using the above results,

18、we have the first order conditions(b0)b0=e0e0b0(3.9)= 2Xy +2XXb0=0. (3.10)Let b be the solution to (3.10). By construction, b satisfies the least squaresnormal equation(XX)b = Xy. (3.11)Auniquesolutionto(3.11)existsifXX is non-singular, which holds underthe full-rank assumption. Therefore,b =(XX)1Xy

19、. (3.12)Note that for b to be the solution to the minimization problem, we alsoneed to check the second-order condition. Specifically, we need2s(b)bb=2XX (3.13)to be a positive definite matrix. This is satisfied if X has full rank. Hence, bminimizes SSE and is the ordinary least squares (OLS) estima

20、tor for .Alternatively, we may write b in terms of sample means. To see this, notethatb =XXn1Xyn=1nni=1xixi11nni=1xiyi,where1nni=1xixiis the sample mean of xx,and1nni=1xiyiis the samplemean of xy.Expressingb in this form will facilitate the discussion of thelarge-sample properties of b.3 Least Squar

21、es Estimator for 63.2 OLS ResidualsOLS residual vector ise = yXb. (3.14)So we haveXe = X(yXb) (3.15)= XyXX(XX)1Xy (3.16)= 0. (3.17)This means that for every column xkof X,wehavexke =0,orni=1xikei=0,k.If the first column of X is a column of 1s, then there are three implications.1. The OLS residuals s

22、um to zero, i.e.,ni=1ei=0.2. The regression hyperplane passes through the point of means of thedata. That isy = xb, (3.18)where x=x1. xKand xk=1nni=1xik, k =1,2,.,K.3. Denote the fitted value of the regression as y , where y = Xb. Themean of the fitted value from the regression equals the mean of th

23、eactual values (which follows from point 1). That is,1nni=1yi= y.It is useful to writee = yXb= yX(XX)1Xy=IX(XX)1Xy= My. (3.19)M is an nn matrix that is fundamental in regression analysis. M is symmetric, i.e., M = M. M is idempotent, i.e., MM= M = M. MX = 0.Interpretation:M produces the OLS residual

24、 vector in the regression of y on X when it ispostmultiplied by any vector y.WhenM is multiplies by X,itimpliesthatX is regressed on X, which produces a perfect fit and thus e = 0.y = Xbleast squares fitted value+ eOLS residual. (3.20)We can also define the projection matrix P = IM, which is also sy

25、m-metric and idempotent.y = ye= yMy=(IM)y= Py. (3.21)4 Goodness of Fit 7P is the matrix constructed from X such that when a vector y is premultipliedby P,theresultisthefittedvalueintheOLSregressionofy on X.Exercise: Show that PX = X and PM = MP = 0.4GoodnesofFitGenerally, regression analyses are for

26、 two purposes: (i) estimating coecientsand testing hypotheses; (ii) forecasting. So we would like to know, how wellthe model predicts the variations in dependent variables. Or, put simply, how“good” the model is. One measure of “goodness of fit” is R2. The basic idea ofR2is that it indicates whether

27、 the variation in X is a good predictor of thevariation in y.Dependingonthemeasurementofthevariationofvariables,there are two ways to define R2.Uncentered R2Suppose we define the variation of the dependent variableby its sum of squares, namely,ni=1y2i= yy. It follows thatyy =(y +e)(y +e)= yy +2ye+ee

28、= yy +2bXe+ee= yy +ee. (4.1)Hence, we can defined the uncentered R2asR2uc 1eeyy=yyyy. (4.2)Note that by construction, 0 R2uc 1.Andsincey is a function of theexplanatory variables (X), the interpretation of R2ucis that it measure theextent to which the variation of the dependent variable can be expla

29、ined bythe variation of the explanatory variables.Centered R2Variation of the dependent variable can also be defined interms of the deviation from its mean, (yiy), where y =1nni=1yi. Thetotal variation in y is the sum of the squared deviations:SST =ni=1(yiy)2. (4.3)Recall thaty = Xb+e = y +e. (4.4)F

30、or an individual observation, we haveyi= yi+ ei= xib+ ei, (4.5)where xi=xi1xi2 xiK.Subtractingy from (4.5), we haveyiy = yiy + ei(4.6)= xiby + ei. (4.7)4 Goodness of Fit 8Fig. 4.1: Decomposition of Variation of yRecall from our previous results, if the regression contains a constant term,then the re

31、siduals will sum to zero (ni=1ei=0)andtheregressionhyper-plane passes through the point of means of the data (y = xb). Therefore, wemay write (4.7) asyiy = xibxb+ ei(4.8)=(xix)b+ ei. (4.9)Equation (4.9) is illustrated in Figure 1. Intuitively, the regression fits well ifthe deviations of y from its

32、mean are more largely accounted for by deviationsof x from its mean than by the residuals. Since both terms in this decom-position sum to zero, to quantify the fit of the regression, we use the sum ofsquares instead.Define a matrix M0=I1nii, where i is a column of 1s. M0is a nnidempotent matrix that

33、 transforms observations into deviations from samplemean. To see this, note that for any vector x and the average of its elements,x,wehave,xx.x= ix = i1nix =1niix. (4.10)4 Goodness of Fit 9Therefore,M0x =I1niix = xix =x1xx2x.xnx. (4.11)The same result holds for matrices. So,M0X =I1niiX = Xix =x11x1x

34、12x2 x1KxKx21x1x22x2 x2KxK.xn1x1xn2x2 xnKxK.(4.12)Using matrix M0, we can write the full set of observations of (4.9) asM0y = M0Xb+M0e. (4.13)Again, if the regression contains a constant term, then the residuals will havezero mean (e =1nni=1ei=0), which implies M0e = e.SowehaveM0y = M0Xb+e. (4.14)Ta

35、king the sum of squares of both sides of (4.14) yields,M0yM0y =M0Xb+eM0Xb+e, (4.15)which can be simplified asyM0y = bXM0Xb+ee+2eM0Xb. (4.16)(Note that in the above expression we have used the fact that M0is an idem-potent matrix, i.e., M0M0= M0). Recall that we have shown that M0e = e,which implies

36、eM0= e(M0is symmetric, so M0= M0). Therefore, theterm 2eM0Xb =2eXb = 0.Hence,wehaveyM0y Total Sum of Squares (SST)= bXM0Xb Regression Sum of Squares (SSR)+ eeError Sum of Squares (SSE).(4.17)DefineR2=SSRSST(4.18)=1SSESST(4.19)=1ni=1e2ini=1(yiy)2. (4.20)Note that in the derivation of R2,wehaveusedthe

37、factthatni=1ei=0,which is true only if the data contains a column of 1s. Otherwise, there willbe additional terms remaining in (4.16), and we cannot decompose yM0yexactly into the two parts as desired. In that case, the result is unpredictable4 Goodness of Fit 10when we calculate R2.Also,R2may not f

38、all in the interval of 0,1 for modelsother than linear regression or OLS estimator of .It is useful to think of R2as the square of the simple correlation coecientbetween yiand yi. That isR2=ni=1(yiy)(yiy)2ni=1(yiy)2ni=1(yiy)2. (4.21)In this sense, R2is a measurement of linear correlation between yis

39、 and yis.When X contains a constant term, then 0 R2 1. R2=0if the regressionis a horizontal line, i.e., all elements of b except the constant term are zero.In this case, yi= y, i,andX has no explanatory power. R2=1if yi= yi,i.Limitations of R2One should be cautious when using R2as the basis forthe c

40、hoice of models. First, R2can only be compared across models with thesame dependent variable and the same sample size. Second, it is dicult tocompare R2between models with dierent number of regressors because R2is non-decreasing in K,nomatterhowrelevant(orirrelevant)theadditionalregressor is. (Green

41、e, Theorem 3.6.)To solve this problem, considerR2=1ee/(nK)yM0y/(n1)(4.22)=1n1nKeeyM0y(4.23)=1n1nK1R2. (4.24)R2is called the “adjusted R2”, which is R2adjusted for the degrees of freedomof ee and yM0y: ee has degrees of freedom of (nK) as we have estimatedK parameters; and yM0y has degrees of freedom of (n1) since y has tobe first calculated before we can obtain this quantity (so it loses 1 degree offreedom).

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 经济财会 > 经济学

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报