1、1,1.Sampling & Sampling Distributions 2.Parameter Estimation 3.Interval Estimation for Population Mean & Population proportion 4. Interval Estimation for the difference between two population means,Chapter 6 Sampling & Parameter Estimation,2,(3) cluster sampling The population is divided into N grou
2、ps of elements called clusters such that each element in the population belongs to one and only one cluster.,1. About Sampling,3,(4) systematic sampling It is often used as an alternative to simple random sampling. Each element in the sample is selected based on the fixed distance.,1. About Sampling
3、,4,1.3 The Sample Statistic It is the function of sample (For example, , S2, etc.)。Statistic also has the twofold property. The distribution that Statistic follows is sampling distribution. We will explain the concept further.,1. About Sampling,5,There is a sample consisted of 30 employees in a comp
4、any, their income and ratio of management training are investigated. = 2000yuan,s=200yuan,ratio p=0.70Now, select another 30 employees, (a simple random sample): = 2010yuan,s=202yuan,ratio p=0.72,2. Sampling Distribution,6,Repeat this procedure, we can obtain a large amount of estimators about ,s, p
5、We can sketch histogram in accordance with these outcomes. ,s,p can be taken as random variable. From these outcomes, we can also estimate the mean, variance and probability distribution, i.e.,sampling distribution.,2. Sampling Distribution,7,2. Sampling Distribution,Sampling distribution is general
6、ly complicated, but it is easy for a Normal population. 2.1 X1 Xn iid follow N(,2), then N(, ) (When dealing with a large sample, (n30),the normal assumption of the population does not needed. ) see p125,8,The Central Limit Theorem (see p126),In selecting simple random samples of size n from a popul
7、ation, the sampling distribution of the sample mean can be approximated by a normal probability distribution as the sample size becomes large (n30).We can see a figure which is the illustration of the theorem for three different populations. See p127, example 6.2,9,2. Sampling Distribution,2.2 The C
8、hi-square Distribution (see p127)(Asymmetric Distribution) X1. Xn iid follow N(0,1), 2 =Xi2 2(n) ndf (degree of freedom) Generally compute 2 :P(2 (n) 2 )= , (critical value 2 ) see the distribution table at page 370 Example: =0.05, n=10, then 20.05(10)=18.307 see p128, density function of 2(n),10,2.
9、3 student t Distribution (symmetric distributiom) When n30, it approximates the Normal Distribution It also has the degree of freedom . (p130) We often compute: t/2 two-tailed percentile:P(T t/2)= (critical value t/2 ) t one-tailed percentile: P(Tt)= see the table at page 368 For example: =0.1 t/2(2
10、1)=1.721, t (21)=1.323,2. Sampling Distribution,11,The t Distribution (p130),12,Students t distribution table (p368),Tail area on the right side.,df,.25,.10,.05,1,1.000,3.078,6.314,2,0.817,1.886,2.920,3,0.765,1.638,2.353,t,0,Suppose: n = 3 df= n - 1 = 2 = .10 /2 =.05,2.920,Critical value of t., / 2,
11、.05,13,F distribution,F(n1-1,n2-1) distribution, the first df n1-1,the second df n2-1, see the density of F statistic on p129see the table of critical value Fp on p373,14,3.1 Point Estimator Moment Estimation: Substitute sample moments for population moments.(Population moments include mean,variance
12、,etc.) Substitute relative frequency for probability,3.Parameter Estimation,15,Notice: When using this method, the relationship between population characteristics and parameters must be known. Sample mean Sample variance,3. Parameter Estimation,16,Example: XN(,2), This indicates that X is the point
13、estimator of parameter , other situations can be explained by the same way. Example: XF(x)=1-e-x,Mean=1/ Then Example: X follows the uniform distribution on a,b. = , From these two equations, compute,3.Parameter Estimation,17,2、The criterions of point estimators (1) Unbiasedness E( )= see page135 Ex
14、ample: 、the median are all unbiased estimators of the mean parameter() . S2 is the unbiased estimator of the variance parameter (2),Parameter Estimation,18,(2) Effectiveness If the two unbiased estimators 1, 2 of parameter satisfy : E(1-)2E(2-)2,we say that 1 is more effective than 2。 For example, i
15、s more effective than the median.is “the minimum variance unbiased estimator” in all the mean estimators.,Parameter Estimation,19,(3) Consistencysee p135If lim P(|1-|)=1, then 1 is the estimator of parameter that satisfies the consistency.,Parameter Estimation,20,Parameter Estimation,Unbiasedness,21
16、,Parameter Estimation,sampling distribution of the median,sampling distribution of the mean,A,B,Effectiveness,22,Parameter Estimation,Consistency,23,3. Interval Estimation The idea about interval estimation: Determine an interval of a parameter, which can ensure that the parameter is within it with
17、a large probability. (Generally, this interval should include the point estimator of the parameter)。 Example: Suppose x1 ,x2 ,xn is a sample from the population N(, 32), try to determine an interval, which can ensure that the parameter is within it with probability 0.95 。(see the next page),Interval
18、 Estimation,24,Solution: It is known that follows N(0,1)BecauseWe can look up in the normal distribution table and get Therefore, we getis called the degree of confidence . The interval includingis called the confidence interval.,Parameter Estimation,25,The interval estimation of the population mean
19、 ,Suppose X1 Xn iid follow N(,2), is the point estimator of. Compute s confidence interval with the degree of confidence 1-. When is known Because z = follows N(0,1),The confidence interval we get is,Note :As for the non-normal population, when n 30 , z statistic still can be used for interval estim
20、ation, and can be substituted by S.,26,When is unknown (small sample)It is known that t = follows t(n-1) ,We get the confidence intervalGgenerally, the symmetrical interval has shortest length.,The interval estimation of the population mean ,27,In point estimation, unbiasedness and effectiveness are
21、 used as the criteria for judging the quality of the estimator 。 In interval estimation, the degree of confidence and the width of the interval are used for assessing the quality of the interval.,Parameter Estimation,28,Notice the following relations When n remains unchanged, if 1-increases, then in
22、creases accordingly; that means when the width increases, precision decreases. Therefore, the increase of confidence level is at the expense of the decrease of precision. When 1- remains unchanged, if n increases,then the width decreases and the precision increases. However, if n is too large, wasti
23、ng will be caused and sampling becomes meaningless. Therefore, the precision should be selected carefully.,Parameter Estimation,29,Determining the sample size= Z/2 is called permissible erroris called standard error 标准误we can get the confidence interval with Excel (see a example on p154),Parameter E
24、stimation,30,Interval estimation for population proportion (p141)If np5, nq5, then p follows N(p, p(1-p)/n) normal distributionconfidence interval estimate of a population proportion is (6.30) p142,Parameter Estimation,31,This is another issue of statistical inference, focusing on getting the conclu
25、sion of “Yes” or “No”. Background: After improving technology, does the average product size change significantly? After improving technology, whether the production is stable or not? Is the qualified rate up to the standard? Does the life of the product follow the normal distribution? Etc.,Chapter
26、7 Hypothesis Tests (p156),32,When considering the above questions, we can assume that the hypothesis is tenable, then according to sample, judge whether the hypothesis is right or not. If it is right, then accept the hypothesis; if not, then reject the hypothesis. These are the content Hypothesis Te
27、sts covers. Generally, the hypothesis to be tested is called the Null Hypothesis (H0), the opposite hypothesis is called the Alternative Hypothesis (H1).,Hypothesis Tests,33,Hypothesis Tests Process,Population,Suppose,The average age of population is 50.(H0),Rejecthypothesis,Sample mean is 20,Sample
28、,34,Its theoretical base is the principle of small probability: In one experiment, the event with small probabilityhardly happens. Example:H0:= 0=200mm, H1: 0=200mmIt is known that the population X follows N(, 2), if H0 is tenable,then we geti.e. it appears with a large probability. The opposite eve
29、nt appears with a small probability. After sampling, compute:,The Idea of Hypothesis Tests,35,If ,then there is no contradiction. If the opposite appears,then it is proved that the event with small probability happens in one experiment, which contradicts with the principle of small probability and p
30、roves that H0 is wrong.,Hypothesis Tests,36,Steps for Hypothesis Tests,37,If the statistic is larger than the critical value, reject H0. If the statistic is smaller than the critical value, accept H0. If the statistic equals to the critical value, then enlarge the sample size, and make a retesting.,
31、Hypothesis Tests,38,Testing hypotheses concerning the population mean When is known, use Z statistic, whenis unknown, use t statistic. As for a large sample, whatever distribution it is, Z statistic can be used as approximation. Testing a hypothesized value of the population proportion Testing the d
32、ifference between two means The hypothesis test about the population variance.,Content of The Hypothesis Tests,39,Is there a significant difference between the net asset income rate of 1993、1994 of commercial corporations listed in Shanghai Stock Exchange ?Solution: Suppose the income rate of the tw
33、o years is X1,X2, and follows the normal distribution N(1,12), N(2,22) respectively. Method 1: = 1-2 Method 2: Use the formula at page 170。,An Example About Hypothesis Tests,40,When H0 is true, H0 may be rejected (caused by stochastic factors),we call this kind of error Rejecting Truth Error. From t
34、he prior formula we can know, this kind of probability is , it is also called Type Error or Supplier Risk. When H0is false, H0 may be accepted (caused by stochastic factors),we call this kind of error Accepting Falseness Error. Its probability is , it is also called TypeError or User Risk. In genera
35、l, we controlin most time,our lecture does not cover the computing aspect of . (If the sample size isnt enlarged, the two types of risks can not be reduced simultaneously ),Two Types of Errors,41, & Have an Inverse Relationship,42, & Have an Inverse Relationship,The two types of risks can not be red
36、uced simultaneously!,43,One-tail Hypothesis Tests,Sample mean,= 50,Sampling Distribution,Then the sample mean is not possible to be this value.,If the population mean is true,Therefore, reject the H0 hypothesis = 50.,20,H0,Basic Idea,44,Level of Significance,1.Definition: It is the interval that, if
37、 the null hypothesis is tenable, sample statistics are impossible within it. It is also called “Rejection region of sampling distribution” 2. Denoted by Typical values are 0.01, 0.05, 0.10 3. Determined by research staff when a test starts.,45,Z-Test Statistic ( Known),Transform sample statistic (i.
38、g. ) into the standard normal distribution variable Z.Compare with the critical value of Z. If the value of tested statistic is within the critical region, then reject H0; if not ,accept H0,46,p Value Test,The p-value, the observed level of significance, is a measure of the likelihood of the sample
39、results when the null hypothesis is assumed to be true. The smaller the p-value, the less likely it is that the sample results came from a situation where the null hypothesis is true. If p , do not reject H0 If p , reject H0.,47,One-Tail Z Test about Mean ( Known),We assume the population follows a
40、normal distribution, when n 30, a non-normal distribution can be approximated by a normal distribution. 2. Null hypothesis only uses or Test Statistic Z,48,Rejection Region,Z,0,reject H,0,Z,0,reject H,0,H0:0 H1: 0,H0:0 H1: 0,Only when statistic is significantly less than that it will be rejected.,Sm
41、aller value does not contradict with H0, therefore, H0 will not be rejected.,49,One-Tail Z Test: Finding Critical Z Values,Z,.05,.07,1.6,.4505,.4515,.4525,1.7,.4599,.4608,.4616,1.8,.4678,.4686,.4693,.4744,.4756,Z,0,Z,= 1,1.96,.500 - .025 .475,.06,1.9,.4750,Table of Standard Normal Distribution (Part
42、),When = 0.025,compute Z?, = .025,50,Test about p-value,P(Z 1.50) = 0.0668,Z,0,1.50,p-value=,.0668,Z value of sample statistic,In Z table, find: 1.50,.4332,.5000 -.4332 .0668,Determine the direction of test by using the alternative hypothesis.,51,Test about p-value,0,1.50,Z,reject H0,(p = 0.0668) (
43、= 0.05). Do not reject H0, = 0.05,Test statistic is not within the rejection region.,52,Two-Tailed Z Test for Mean ( Known),Assume a population follows the normal distribution, when n 30,the population of a non-normal distribution can be approximated by a normal distribution. Thy null hypothesis is
44、an equality. Test Statistic Z,53,Rejection Regions,H0,critical value,critical value,1/2,1/2,sample statistic,rejection region,non-rejectionregion,sampling distribution,1 - ,confidence level,rejection region,54,Two-Tailed Test: Finding Critical Z values,Z,.05,.07,1.6,.4505,.4515,.4525,1.7,.4599,.4608
45、,.4616,1.8,.4678,.4686,.4693,.4744,.4756,Z,0,Z,= 1,1.96,-1.96,.500 - .025 .475,.06,1.9,.4750,When = 0.05,compute Z?, /2 = .025, /2 = .025,55,Test about p-value,P(Z -1.50 or Z 1.50) = 0.1336,Find the probability of Z1.50,.5000 -.4332 .0668,multiply 2,56,Test about p-value,(p = .1336) ( = .05) do not
46、reject the null hypothesis,57,A manager of a hotel claims that the mean of the bills at the end of every week is smaller or equal to 400 yuan. However, an accountant of this hotel finds that the total income is increasing in the recent month. This accountant will use a sample consisted of the bills
47、of recent weekends to testify the managers claim.a.What kind of hypothesis should be used?H0: 400, H0: 400, H0:=400b.In this example, whats the meaning of rejecting H0?,Exercises for Hypothesis tests,58,A new weight-reducing method claims that the participant will averagely lose at least 8 kg in the first week. 40 participants consist of a random sample, the sample mean of the reduced weight is 7 kg, the standard deviation is 3.2 kg. Compute: a. When a=0.05, what is the rejection criterion? b. What is your conclusion about this weight-reducing method?,