1、Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityStatistics 191: Introduction to Applied StatisticsMultiple Linear RegressionJonathan TaylorDepartment of StatisticsStanford UniversityFebruary 2, 20091/38Statistics 191:Introductionto AppliedStatist
2、icsJonathanTaylorDepartment ofStatisticsStanfordUniversityMultiple linear regressionOutlineSpecifying the model.Fitting the model: least squares.Interpretation of the coe cients.More on F -statistics.Matrix approach to linear regression.T -statistics revisited.More F statistics.Tests involving more
3、than one .2/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityJob supervisor dataDescriptionVariable DescriptionY Overall supervisor job ratingX1 How well do they handle complaintsX2 Do they allow special privelegesX3 Give opportunity to learn ne
4、w thingsX4 Raises based on performanceX5 Too critical of poor performanceX6 Good rate of advancement3/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityJob supervisor dataR code 4/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDep
5、artment ofStatisticsStanfordUniversitySpecifying the modelMultiple linear regression modelRather than one predictor, we have p = 6 predictors.Yi = 0 + 1Xi1 + + pXip +“iErrors “ are assumed independent N(0; 2), as in simplelinear regression.Coe cients are called (partial) regression coe cientsbecause
6、 they allow“ for the e ect of other variables.5/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityGeometry of Least Squares6/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityFitting the model
7、Least squaresJust as in simple linear regression, model is t byminimizingSSE ( 0;:; p) =nXi=1(Yi ( 0 +pXj=1jXij)2=kY bY ( )k2Minimizers: b = (b 0;:;b p) are the least squaresestimates“: are also normally distributed as in simplelinear regression.7/38Statistics 191:Introductionto AppliedStatisticsJon
8、athanTaylorDepartment ofStatisticsStanfordUniversityError componentEstimating 2As in simple regressionb 2 = SSEn p 1 2 2n p 1n p 1independent of b .Why 2n p 1? Typically, the degrees of freedom in theestimate of 2 isn #number of parameters in regression function.8/38Statistics 191:Introductionto App
9、liedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityInterpretation of jsSupervisor exampleTake 1 for example. This is the amount the average jobrating increases for one unit“ of Handles complaints“,keeping everything else constant.Units of Handles complaints“ are individual favorabl
10、eresponses, so on average for every extra person who ratedthe supervisor as good at handling complaints (otherthings being xed), the average job rating increases by 1.9/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityInterpretation of jsWhy are
11、 they partial regression coe cients?The term partial refers to the fact that the coe cient jrepresent the partial e ect of X j on Y , i.e. after the e ectof all other variables have been removed.Speci cally,Yi kXl=1;l6=jXil l = 0 + jXij +“i:Let ei;(j) be the residuals from regressing Y onto all X se
12、xcept X j, and let Xi;(j) be the residuals from regressingX j onto all X s except X j, and let Xi;(j).If we regress ei;(j) against Xi;(j), the coe cient is exactlythe same as in the original model (see R code).10/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsS
13、tanfordUniversityGoodness of t for multiple regressionSums of squaresSSE =nXi=1(Yi bYi)2SSR =nXi=1(Y bYi)2SST =nXi=1(Yi Y )2R2 = SSRSSTR2 is called the multiple correlation coe cient of the model, orthe coe cient of multiple determination.11/38Statistics 191:Introductionto AppliedStatisticsJonathanT
14、aylorDepartment ofStatisticsStanfordUniversityAdjusted R2Compensating for more variablesAs we add more and more variables to the model evenrandom ones, R2 will increase to 1.Adjusted R2 tries to take this into account by replacingsums of squares by mean squaresR2a = 1 SSE=(n p 1)SST=(n 1) = 1 MSEMST
15、 :12/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityGoodness of t testAnother F -testAs in simple linear regression, we measure the goodness oft of the regression model byF = MSRMSE = kY 1 bYk2=pkY bYk2=(n p 1):Under H0 : 1 = = p = 0,F Fp;n p
16、1so reject H0 at level if F Fp;n p 1;1 :13/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityGeometry of Least Squares14/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityReasoning behind the
17、F testMeasuring lengthsThe F statistic is a ratio of lengths of orthogonal vectors(divided by degrees of freedom).We can prove that our model impliesE(MSR) = 2 +k 1k2=p| z ( )E(MSE ) = 2i = E(Yi) = 0 + 1Xi1 + + pXipso F should be not be too far from 1 if H0 is true, i.e.( ) = 0.If F is large, it is
18、evidence that ( )6= 0, i.e. H0 is false.15/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityF-test revisitedExample in more detailFull (bigger) model :Yi = 0 + 1Xi1 +: pXip +“iReduced (smaller) model:Yi = 0 +“iThe F -statistic has the formF = (S
19、SE (R) SSE (F )=(dfR dfF )SSE (F )=dfF:16/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityGeometry of Least Squares17/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityMatrix formulationEqui
20、valent formulationY n 1 = X n (p+1) (p+1) 1 +“n 1X is called the design matrix of the model“ N(0; 2In n) is multivariate normalSSE in matrix formSSE ( ) = (Y X )0(Y X )18/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityMatrix formulationDesign
21、matrixThe design matrix is the n (p + 1) matrix with entriesX =0B1 X11 X12 : X1;p. . . .1 Xn1 Xn2 : Xn;p1CA19/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityLeast squares solutionSolving for b Normal equations j SSEb = 2Y X b tX j = 0; 0 j p:E
22、quivalent to(Y Xb )tX = 0b = (X tX ) 1X tYProperties:b N ; 2(X tX ) 1 ;indep. of b 2R code20/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityInference for multiple regressionRegression function at one pointOne thing one might want to learn abou
23、t the regressionfunction in the supervisor example is something about theregression function at some xed values of X 1;:;X 6, i.e.what can be said about0 + 65 1 + 50 2 + 55 3 + 64 4 + 75 5 + 40 6 (*)roughly the regression function at typical“ values of thepredictors.The expression (*) is equivalent
24、to6Xj=0aj j; a = (1;65;50;55;64;75;40):21/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityInference for Ppj=0 aj jCon dence interval for Ppj=0 aj jSuppose we want a (1 ) 100% CI for Ppj=0 aj j.Just as in simple linear regression:pXj=0ajb j t1 =
25、2;n p 1 SE0pXj=0ajb j1A:22/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityInference for Ppj=0 aj jT -statistics revisitedSuppose we want to testH0 :pXj=0aj j = h:As in simple linear regression, it is based onT =Ppj=0 ajb j hSE (Ppj=0 ajb j):If
26、 H0 is true, then T tn p 1, so we reject H0 at level ifjTj t1 =2;n p 1; ORp value = 2 (1 pt(jTj;n p 1) :23/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityInference for Ppj=0 aj jOne-sided testsSuppose, instead, we wanted to test the one-sidedh
27、ypothesisH0 :pXj=0aj j h; vs. Ha :pXj=0aj j hIf H0 is true, then T is no longer exactly tn p 1 butP(T t1 ;n p 1) 1 so we reject H0 at level ifT t1 ;n p 1; ORp value = (1 pt(T;n p 1) :24/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityInference
28、for Ppj=0 aj jStandard error of Ppj=0 ajb jBased on matrix approach to regressionSE0pXj=0ajb j1A =qb 2a(XT X ) 1aT:Dont worry too much about implementation R will dothis for you in general, R code25/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUnivers
29、ityInference for Ppj=0 aj jPrediction intervalIdentical“ to simple linear regression.Prediction interval at X1;new;:;Xp;new:b 0 +pXj=1Xj;newb j t1 =2;n p 1vuuutb 2 + SE0b 0 +pXj=1Xj;newb j1A2:26/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityI
30、nference for multiple regressionQuestions about many (combinations) of jsIn multiple regression we can ask more complicatedquestions than in simple regression.For instance, we could ask whetherX2 : Do they allow special privelegesX3 : Give opportunity to learn new thingsexplains little of the variab
31、ility in the data, and might bedropped from the regression model.These questions can be answered answered by F -statistics.27/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityInference for more than one Dropping one or more variablesSuppose we w
32、anted to test whether how the supervisorhandles special priveleges, or allows employees to try newthings explains a signi cant amount of the variability inthe overall job rating. Formally, this is:H0 : 2 = 3 = 0; vs. Ha : one of 2; 36= 0This test is again an F -test based on two modelsR : Yi = 0 + 1
33、Xi1 + 4Xi4 + 5Xi5 + 6Xi6 +“iF : Yi = 0 +6Xj=1jXij +“iNote: The reduced model R must be a special case of thefull model F to use the F -test. 28/38Statistics 191:Introductionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityGeometry of Least Squares29/38Statistics 191:Introdu
34、ctionto AppliedStatisticsJonathanTaylorDepartment ofStatisticsStanfordUniversityInference for more than one SSE of a modelIn the graphic, a model“, M is a subspace of Rn =column space of X .Least squares t = projection onto the subspace of M,yielding predicted values bYMError sum of squares:SSE (M) =kY bYMk2:30/38