1、1,Relationship among variables Functional relationship Statistical relationship(correlation) Y depends on X, but isnt merely determined by X. Example:price and salesdaily high temperaturethe demand for air-conditioning RegressionAccording to observed data, establish regression equation and make stat
2、istical reference (predict) .,Chapter 10 (P 227) Correlation and Regression Analysis,2,What does regression do?,Solve the following problems: Determine whether there is statistical relationship among variables, if does, give the regression equation. Forecast the value of another variable (dependent)
3、 according to one variable or a group of variables (independent).,3,Example: X-price,Y-sales for a kind of product We collect data:1. Scatter plot 2. Regression equation(the Least Square Estimation) 3. Correlation coefficient (Testing the regression model) 4.Forecasting (point and interval forecasti
4、ng ),Simple Linear Regression,4,Linear Regression Model,Variables consist of a linear function.,Y,X,i,i,i,0,1,Slope,Y-Intercept,Independent (Explanatory) Variable,Dependent (Response) Variable,Random Error,5,Sample Linear Regression Model,e,i,= randomerror,Y,X,Y,b,b,X,e,i,i,i,0,1,Y,b,b,X,i,i,0,1,Sam
5、pled Observed Value,6,Sample Linear Regression Model,The least squares method provides an estimated regression equation that minimizes the sum of squared deviations between the observed values of the dependent variable yi and the estimated values of the dependent variable .,7,Least Squares estimatio
6、n,e,2,Y,X,e,1,e,3,e,4,Y,b,b,X,e,i,i,i,0,1,Y,b,b,X,i,i,0,1,OLS Min,e,e,e,e,e,i,i,2,1,1,2,2,2,3,2,4,2,Predicted Value,8,Coefficient & Equation,Y,b,X,b,X,Y,n,X,Y,X,n,X,b,Y,b,X,i,i,i,i,i,n,i,i,n,0,1,1,1,2,2,1,0,1,Sample regression equation,Slope for the estimated regression equation P 238 (10.17),Interc
7、ept for the estimated regression equation,b,9,Evaluating the Model,Significance Test Test Coefficient of Determination and Standard Deviation of Estimation Residual Analysis,10,Measures of Variation in Regression,SST = SSR + SSE1. Total Sum of Squares (SST) P 239 (10.20) Measure the variation betwee
8、n the observed value Yi and the mean Y. 2. Sum of Squares due to Regression (SSR) Variation caused by the relationship between X and Y. 3. Sum of Squares due to Error (SSE) Variation caused by other factors.,11,Variation Measures,Y,X,Y,X,i,SST (Yi - Y)2,SSE (Yi -Yi)2,SSR (Yi - Y)2,Yi,Y,b,b,X,i,i,0,1
9、,12,Coefficient of Determination,0 r2 1,r,b,Y,b,X,Y,n,Y,Y,n,Y,i,i,i,i,n,i,n,i,i,n,2,0,1,2,1,1,2,1,2,Explained variation,Total variation,SSR,SST,A measure of the goodness of fit of the estimated regression equation. It can be interpreted as the proportion of the variation in the dependent variable y
10、that is explained by the estimated regression equation.,13,Correlation Coefficient,A numerical measure of linear association between two variables that takes values between 1 and +1. Values near +1 indicate a strong positive linear relationship, values near 1 indicate a strong negative linear relati
11、onship, and values near zero indicate lack of a linear relationship.,14,Coefficients of Determination (r2) and Correlation (r),15,Test of Slope Coefficient for Significance,1. Tests a Linear Relationship Between X & Y 2. Hypotheses H0: 1 = 0 (No Linear Relationship) H1: 1 0 (Linear Relationship) 3.
12、Test Statistic,16,Example Test of Slope Coefficient,H0: 1 = 0 H1: 1 0 .05 df 5 - 2 = 3 Critical value:,Statistic: Determine:Conclusion:,Reject at = 0.05,There is evidence of a relationship.,17,Multiple Regression Model,There exists linear relationship among an dependent variable and two or more than
13、 two independent variables.,Y,X,X,X,i,i,i,P,Pi,i,0,1,1,2,2,slope of population,intercept of population Y,random error,Dependent Variable,Independent Variables,18,Example: New York Times,You work in the advertisement department of New York Times(NYT). You will find to what extent do ads size(square i
14、nch ) and publishing volume (thousand) influence the response to ads(hundred).,You have collected the following data: response size volume1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6,19,Example (NYT) Computer Output,Parameter EstimatesParameter Standard T for H0: Variable DF Estimate Error Param=0 Prob|T| I
15、NTERCEP 1 0.0640 0.2599 0.246 0.8214 ADSIZE 1 0.2049 0.0588 3.656 0.0399 CIRC 1 0.2805 0.0686 4.089 0.0264,20,Interpretation of Coefficients,1.Slope (b1) If the publishing volume remains unchanged,when ads size increases one square inch, the response is expected to increase 0.2049 hundred times. 2.S
16、lope (b2) If ads size remains unchanged, when publishing volume increases one thousand, the response is expected to in- crease 0.2805 hundred times.,21,Evaluating the Model,1. How does the model describe the relationship among variables? 2. Closeness of Best Fit 3. Assumptions met 4. Significance of
17、 estimates 5. Correlation among variables 6. Outliers (unusual observations),22,Testing Overall Significance,Test whether there is linear relationship between Y and all the independent variables. 2. Use F statistic. HypothesisH0: 1 = 2 = . = P = 0 There is no linear relationship between Y and indepe
18、ndent variables. H1: At least there is a coefficient isnt equal to 0. At least there is an independent variable influences Y,23,Testing Overall Significance Computer Output,Analysis of VarianceSum of Mean Source DF Squares Square F Value ProbF Model 2 9.2497 4.6249 55.440 0.0043 Error 3 0.2503 0.083
19、4 C Total 5 9.5000,P,n - P -1,n - 1,MSR / MSE,p Value,24,Transformations in Regression Models,Non-linear models that can be transformed into linear models (convenient to carry out OLS). Data Transformation Multiplicative Model Example,25,Square-Root Transformation,26,Logarithmic Transformation,27,Exponential Transformation,