收藏 分享(赏)

第十章 直线回归与相关(The tenth chapter is about linear regression and correlation).doc

上传人:dzzj200808 文档编号:4429319 上传时间:2018-12-28 格式:DOC 页数:50 大小:68KB
下载 相关 举报
第十章 直线回归与相关(The tenth chapter is about linear regression and correlation).doc_第1页
第1页 / 共50页
第十章 直线回归与相关(The tenth chapter is about linear regression and correlation).doc_第2页
第2页 / 共50页
第十章 直线回归与相关(The tenth chapter is about linear regression and correlation).doc_第3页
第3页 / 共50页
第十章 直线回归与相关(The tenth chapter is about linear regression and correlation).doc_第4页
第4页 / 共50页
第十章 直线回归与相关(The tenth chapter is about linear regression and correlation).doc_第5页
第5页 / 共50页
点击查看更多>>
资源描述

1、第十章 直线回归与相关(The tenth chapter is about linear regression and correlation)The tenth chapter, linear regression and related.Txt, two men chasing a woman, with shallow, will give up first. Two women chase a man, love will give up. You - person), I do not believe even punctuation and China where 1 billi

2、on 300 million of the population could not communicate between man and woman. The tenth chapter is about linear regression and correlationThis chapter introduces the linear regression (linear regression) of bivariate data and linear correlation(linearcorrelation) statistical method used to study the

3、 quantitative relations between two variables, including the systemStatistical description and statistical inference.Section 1 linear regressionLinear regression equationThere are 2 kinds of variables in statistical research: one is a variable, and the other is XThe X value is selected; a variable i

4、s a random variable, represented by Y, whose Y value varies randomly. medicineIn biology and biology, X values are normal, and Y obeys normal distribution. For example, the selected variable is age X,At 1 years of age, the random variables were the height Y of each age group, and the Y of each X val

5、ue was normalCloth. The two variables are random variables, represented by X and Y. It is common (X, Y) to obey double variationA normal distribution (bivariation, normal, distribution), that is, any X value, Y obeys the positiveThe X distribution obeys the normal distribution at any Y value. For ex

6、ample, the height and weight of a particular population are measured by X and YThe mean (X, Y) obeys bivariate normal distribution.The statistical method of bivariate data is to study the number of random variables and selected variables or two random variablesQuantity relation. This kind of researc

7、h is a group study, and its quantitative relation is statistical relation or uncertain relation. The value of XThe individual Y value is uncertain, but the random change is centered on its mean number. At different ages, for exampleThe individual height value is uncertain, but the random variation i

8、s based on the height mean of this age. thisUnlike general mathematical studies, the individual functions or relations of the two variables are different.The quantitative relationship between the two variables depends on the purpose of the study and depends on each other. DependenceOne is an indepen

9、dent variable, often represented by X; one is a dependent variable, often expressed as Y. Studying X versus YRole or Y dependence on X, using regression analysis. The two variables in interdependence can be called X and Y. studyInvestigate the relationship between X and Y or influence each other, us

10、ing correlation analysis. In general, the correlation analysis applies onlyDouble random variables.The quantity relation of two variables has linear and curvilinear relations (or non linear) in mathematical formRelation. The linear relation is the simplest and the most basic relation. It is describe

11、d by the linear regression and the straight line. bookSection introduces linear regression.If the selected variable, each X value, the random variable Y obeys the normal distribution and the variance is equal, and each X value is equal to each otherThe population mean mu of YY.xEqual to and in a str

12、aight line, then the line is called the whole MuY.XOn XRegression line. Bivariate normal distribution, if the overall correlation coefficients of X and Y (see below) are not equalIn zero, there is muY.XFor X and muY.XThe two regression line of Y (the total population of X at Y),But in practical appl

13、ication, a regression line is usually studied. The independent variable is X, and the corresponding variable is Y.Set the n of the sample to the variable value: (XOneY.Two) (XTwoY.Two),. (XNY.N). asIt is observed that the variation of Y with X has a linear trend, i.e., Y increases with X and decreas

14、es correspondinglyPotential, the N on the value of variables in the Cartesian coordinates of the graph to describe the corresponding n points, scatter diagram (scatterDiagram) a linear trend (not all points are exactly in the 1 line), then the sample Y can be fittedThe linear regression equation of

15、X (linear, regression, equation) is used as a general linear regressionEstimation of equation. According to the sample n, the least square method is used to fit the linear regression equation to the variable value(leastsquare, method) even if the divergence of each divergence point is linear from th

16、e regression line, the sum of squares is minimum.The sample linear regression equation and its calculation formula are as follows)Y, a, bX=+ (.) 101BX X Y YX XXYY nXXn=?=?SigmaSigmaSigma SigmaSigma Sigma()()(/)(/)Two hundred and twenty-two(). 102() a, Y, bXY, N, n=, bX, sigma, sigma, (/) / () 103)Y

17、is the estimate of the population mean (MU) at the value. The constant term is the regression line XYaY.XThe intercept of the line on the Y axis. B is called linear regression coefficient, referred to as regression coefficient (regression)Coefficient) is the slope of the line. The regression coeffic

18、ient b describes the number of linear changes in the Y dependence of XDirection and size of relation.In order to make intuitive analysis, the regression line can be drawn according to the linear regression equation. Measured full distance at XThe two X values which are far apart and easy to read are

19、 taken into account, and the two Y values are obtained by substituting the linear regression equation,Mark the two pairs of values on the Cartesian coordinate map, and connect the two points through the straight line.The main uses of the linear regression equation and the corresponding regression li

20、nes are as follows: describe dependent variables, change themselvesA quantity relation that varies. For example, describe the number of children whose height depends on ageThe Department of. The measurement variables to estimate the value of the variable and unpredictable. For example, the body surf

21、ace is estimated by patient weightProduct. The present variable value is used to predict the future variable value. As predicted by his fathers heightHeight after. Because the introduction of independent variables reduces the variation of the dependent variable, it can be more precisely determinedTh

22、e normal range of the variable values at different values of independent variables. For example, if the age is not considered, the weight changes of the childrenDifferent range, that is, a wide range of normal values, the introduction of age as an independent variable weight, children of all agesThe

23、 weight variation decreases and the normal range narrows.Body weight and vital capacity of 10 cases of female middle school students in 10 - 1 place such as table 10-1 section (1), (2),(3) column. Weight X (kg) and vital capacity of Y (L), scatter plots were drawn to observe whether there was any ve

24、rtical or notLine trend. If there is a linear trend, the linear regression equation of Y to X is fitted.As shown in Figure 10-1, the 10 pairs (X, Y) values of table 10-1 are traced on the Cartesian coordinate map10 points. It is observed that the scatter plot has a linear trend. Then the linear regr

25、ession equation of Y fitted to X is fitted. Calculation tableAs shown in table 10-1, the calculation of the correlation coefficients is shown in the following section.= b*?= =946554052315101650140510Eight thousand nine hundred and seventy-fiveNine hundred and eighty-five00911Two/./.A = 23.15/10-0.09

26、11, 405/10=, -1.3746 *Therefore, the linear regression equation of body weight, Y (L) and body weight X (kg) was obtained for female students)Y = -1.3746+0.0911XDraw the regression line according to the fitted linear regression equation:Take advantage of XTake advantage of XX = 35, Y = -1.3746+0.091

27、1, 35=1.81X = 45, Y = -1.3746+0.0911, 45= 2.72ElevenTwenty-two)As shown in Figure 10-1, (35, 1.81) and (45, 2.72) draw 2 points on the Cartesian coordinate map,Through these 2 points, even a straight line is a straight line. The regression line should be within or within the measured range of XWithi

28、n the actual range of application, do not extend at will.Table 10-1 weight of X (kg) and vital capacity Y (L) in 10 female middle school students in a certain placeThe equation of linear regression and the calculation of correlation coefficientsSerial number XYXTwoYTwoXY(1) (2) (3) (4) (5) (6);1351.

29、6012252.560056.002371.6013692.560059.203372.4013695.760088.804402.1016004.410084.005402.6016006.7600104.006422.5017646.2500105.007422.6517647.0225111.308432.7518497.5625118.259442.7519367.5625121.0010452.2020254.840099.00Total 40523.151650155.2875946.55Figure 10-110 scatter points of weight and vita

30、l capacity of female middle school students and the regression of fitted vital capacity to body weightstraight lineTwo, the hypothesis test of population regression coefficientTotal MuY.XThe regression coefficient of the linear regression equation of X (X) is YExpress。 If beta =0, then each X value

31、is muY.XThe change of Y does not depend on X, that is, there is no mu in the populationY.XThe X linear regression equations and linear regression; only beta = 0 overall only existsY.XOn XRegression line. Therefore, it is inferred that there is a mu in the populationY.XThe regression line of X, that

32、is, infer the overall regressionIs the coefficient beta equal to zero?. The sample regression coefficient B is the point estimate of the overall regression coefficient, and the sample line is linearThe regression equation is the estimation of the overall linear regression equation. Obviously only be

33、ta = 0, the fitting line to sampleRegression equation and draw samples, regression line makes sense.Test hypotheses (or null hypotheses) for the hypothesis tests of the overall regression coefficients HZeroFor beta =0; alternative leaveLet HOneGenerally use double beta = 0. If HZeroAs a result, the

34、difference between B and 0 is entirely caused by sampling errors.The sample test statistic is t, which is called the t test comparing the sample regression coefficient B and the overall regression coefficient 0.The formula for calculating the T value isTBSBSV nBB=?= =?| | |02 (I) 104Formula sBFor th

35、e standard error of regression coefficients, the formula is:SSX XBYX=Sigma?.()Two(). 105Formula sY.XThe residual standard deviation for Y is the variance index of Y after deducting the influence of XForSY YNYX.()=?Sigma)TwoTwo(). 106The sum of the remaining squares in the middle of the equation, tha

36、t is, the scattered points on the Cartesian coordinate map are separated from the regression Y-YY)2 sigmaThe square sum of the longitudinal distance of a straight line is calculated as follows()()()Y YY YX X Y YX X=?SigmaSigmaSigmaSigma)Twenty-twoTwoTwo(). 107In the formula, other available () evalu

37、ated () Y-Y=Y- (Y) /nb102222 sigma sigma SigmaNumerator and denominator data.Example 10. 2. According to the data of the weight and lung capacity of 10 female middle school students in 10. 1Is there a regression line between the vital capacity and the weight of the female students in this area? Exam

38、ple 10 = 1 fitted in the sampleIs the linear regression equation meaningful?The overall lung capacity of the female students in this area is the regression coefficient of weight, which is assumed to be:HZeroBeta =0HOneB: = 0Alpha =0.05In example, 101b=0.0911X-XY-=8.975X-, X () () (), Y sigma SigmaTw

39、o=98.5. According to the data in table 10-1,()/.YY = = = Sigma?Twenty-two55287523151016953Then calculate(YY) / sigma = =?)Twenty-two16953897598508775SSYXB.=?= =087751020331203312Nine hundred and eighty-five00334According to (10, 9) there is= = t=?0091100334Two thousand seven hundred and twenty-eight

40、1028.VThe T bounds value table is P 0.05. Press a=0.05 level to refuse HZeroAccept HOneAssume that the land existsThe regression curve of vital capacity of female middle school students to weight, so the fit sample of 10. 1 is straight backThe equation is meaningful.The second is linear correlationC

41、oefficient of correlationLinear correlation is applicable to double random variables subject to bivariate normal distribution. Linear correlation study twoThe interdependence between variables X and Y, i.e., the quantitative relationship between X and Y.The linear correlation coefficient, referred t

42、o as correlation coefficient (correlation, coefficient), is used to describe twoThe direction and degree of tightness associated with a variable. The overall correlation coefficient for P said sample correlation coefficientExpressed in R. The value of the variable (X) is determined by the sample nOn

43、eY.One) (XTwoY.Two),. (XNY.NSeek RThe formula forRX X Y YX XY YXYXY nXXnYYn=?=?SigmaSigma SigmaSigma sigma SigmaSigma sigma sigma Sigma() (_)() )() )()/()/二十二万二千二百二十二()108相关系数没有单位,其取值范围为:- 1R(或 )1。相关系数 R的意义可用图 102 说明若散点图呈椭圆形分布和Y,X。有同时增大或减小的趋势,则 0R1,称为正相关;若和有一个增大、X Y另一个减小的趋势,则- 1R0,称为负相关。R = 1 为完全正相关

44、;R = - 1为完全负相关。完全相关散点都在一条直线上,即和有确定函数关系 X Y。两个随机变量不可能完全相关。R = 0,称为零相关零相关表示 X Y没有协和。同变化的数量关系,如图中所示常见的种零相关情况因此零相关实际可 3。认为是无相关。注意的是:R(x,y)是就双变量的样本而言,就总体而言应该用 。图 102 相关系数的意义示意图例 103 10 1 10名女中学生体重和肺活量的相关系数求例中某地。据表 101 有()R =?94655 405 2315 1016501 405 10 652875 2315 10二十二/()零点六九四五该地 10名女中学生体重和肺活量的相关系数为 0.6945。二、总体相关系数的假设检验双变量(x,y)正态分布总体 X和 Y的相关系数为 。若 = 0,则 X和Y无相关;只有 0 时 X和 Y才有直线相关关系,0 为正相关,0为负相关。因此推断总体的 X和 Y有无直线相关关系,即推断总体相关系数 是否等于零。样本相关系数 R反映样本中 N对变量值直线相关的方向与紧密程度,为总体相关系数 的点估计。显然只有 0,所求得的样本相关系数才有意义。总体相关系数的假设检验的检验假设(或无效假设)h零为 = 0;备择假

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 外语学习 > GRE

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报