收藏 分享(赏)

post-stratification and calibration.pdf

上传人:weiwoduzun 文档编号:1753551 上传时间:2018-08-22 格式:PDF 页数:30 大小:109.52KB
下载 相关 举报
post-stratification and calibration.pdf_第1页
第1页 / 共30页
post-stratification and calibration.pdf_第2页
第2页 / 共30页
post-stratification and calibration.pdf_第3页
第3页 / 共30页
post-stratification and calibration.pdf_第4页
第4页 / 共30页
post-stratification and calibration.pdf_第5页
第5页 / 共30页
点击查看更多>>
资源描述

1、Post-strati cation and calibrationThomas LumleyUW BiostatisticsWNAR|2008622What are they?Post-strati cation and calibration are ways to use auxiliaryinformation on the population (or the phase-one sample) toimprove precision.They are closely related to the Augmented Inverse-ProbabilityWeighted estim

2、ators of Jamie Robins and coworkers, but areeasier to understand.Estimating a totalPopulation size N, sample size n, sampling probabilities i,sampling indicators Ri.Goal: estimateT =NXi=1yiHorvitzThompson estimator:T = XRi=11iyiTo estimate parameters replace yi by loglikelihood i( ) orestimating fun

3、ctions Ui( ).Auxiliary informationHT estimator is ine cient when some additional population dataare available.Suppose xi is known for all iFit y x by (probability-weighted) least squares to get . Letr2 be proportion of variation explained.Treg = XRi=11i(yi xi ) +NXi=1xi ie, HT estimator for sum of r

4、esiduals, plus population sum oftted valuesAuxiliary informationLet be true value of (ie, least-squares t to wholepopulation).Regression estimatorTreg = XRi=11i(yi xi ) +0NXi=1xi1A +NXi=11 Ri i!xi( )compare to HT estimatorT = XRi=11i(yi xi ) +0 XRi=11ixi1A Second term uses known vs observed total of

5、 x, third term isestimation error for , of smaller order.Auxiliary informationFor large n, N and under conditions on moments and samplingschemesvarhTregi= (1 r2) varhTi+O(N=pn) =1 r2 +O(n 1=2)varhTiand the relative bias is O(1=n)The lack of bias does not require any assumptions about YjX is consiste

6、nt for the population least squares slope , for whichthe mean residual is zero by construction.ReweightingSince is linear in y, we can write x as a linear function of yand so Treg is also a linear function of YTreg = XRi=1wiyi = XRi=1giiyifor some (ugly) wi or gi that depend only on the xsFor these

7、weightsNXi=1xi = XRi=1giixiTreg is an IPW estimator using weights that are calibrated ortuned (French: calage) so that the known population totals areestimated correctly.CalibrationThe general calibration problem: given a distance function d( ; ),nd calibration weights gi minimizingXRi=1d(gi; 1)subj

8、ect to the calibration constraintsNXi=1xi = XRi=1giixiLagrange multiplier argument shows that gi = (xi ) for some(), ; and can be computed by iteratively reweighted leastsquares.For example, can choose d(;) so that gi are bounded below (andabove).Deville et al JASA 1993; JNK Rao et al, Sankhya 2002C

9、alibrationWhen the calibration model in x is saturated, the choice ofd(;) does not matter: calibration equates estimated and knowncategory counts.In this case calibration is also the same as estimating samplingprobabilities with logistic regression, which also equates esti-mated and known counts.Cal

10、ibration to a saturated model gives the same analysis aspretending the sampling was strati ed on these categories: post-strati cationPost-strati cation is a much older method, and is computation-ally simpler, but calibration can make more use of auxiliary data.Standard errorsStandard errors come fro

11、m the regression formulationTreg = XRi=11i(yi xi ) +NXi=1xi The variance of the second term is of smaller order and is ignored.The variance of the rst term is the usual HorvitzThompsonvariance estimator, applied to residuals from projecting y on thecalibration variables.ComputingR provides calibrate

12、() for calibration (and postStratify() forpost-strati cation)Three basic types of calibrationLinear (or regression) calibration: identical to regressionestimatorRaking: multiplicative model for weights, popular in US,guarantees gi 0Logit calibration: logit link for weights, popular in Europe,provide

13、s upper and lower bounds for giComputingUpper and lower bounds for gi can also be speci ed for linearand raking calibration (these may not be achievable, but wetry). The user can specify other calibration loss functions (egHellinger distance).ComputingThe calibrate() function takes three main argume

14、ntsa survey design objecta model formula describing the design matrix of auxiliaryvariablesa vector giving the column sums of this design matrix in thepopulation.and additional arguments describing the type of calibration.Computing data(api) dclus1 pop.totals (dclus1g svymean(api00, dclus1g)mean SEa

15、pi00 642.31 23.921 svymean(api00,dclus1)mean SEapi00 644.17 23.542Computing svytotal(enroll, dclus1g)total SEenroll 3680893 406293 svytotal(enroll,dclus1)total SEenroll 3404940 932235 svytotal(stype, dclus1g)total SEstypeE 4421 1.118e-12stypeH 755 4.992e-13stypeM 1018 1.193e-13Computing (dclus1g3 sv

16、ymean(api00, dclus1g3)mean SEapi00 665.31 3.4418 svytotal(enroll, dclus1g3)total SEenroll 3638487 385524 svytotal(stype, dclus1g3)total SEstypeE 4421 1.179e-12stypeH 755 4.504e-13stypeM 1018 9.998e-14Computing range(weights(dclus1g3)/weights(dclus1)1 0.4185925 1.8332949 (dclus1g3b range(weights(dclu

17、s1g3b)/weights(dclus1)1 0.6 1.6Computing svymean(api00, dclus1g3b)mean SEapi00 665.48 3.4184 svytotal(enroll, dclus1g3b)total SEenroll 3662213 378691 svytotal(stype, dclus1g3b)total SEstypeE 4421 1.346e-12stypeH 755 4.139e-13stypeM 1018 8.238e-14Computing (dclus1g3c range(weights(dclus1g3c)/weights(

18、dclus1)1 0.5342314 1.9947612 svymean(api00, dclus1g3c)mean SEapi00 665.39 3.4378Computing (dclus1g3d range(weights(dclus1g3d)/weights(dclus1)1 0.5943692 1.9358791 svymean(api00, dclus1g3d)mean SEapi00 665.43 3.4325Types of calibrationPost-strati cation allows much more exibility in weights, insmall

19、samples can result in very in uential points, loss ofe ciency.Calibration allows for less exibility (cf strati cation vs regressionfor confounding)Di erent calibration methods make less di erenceExample from Kalton & Flores-Cervantes (J. O . Stat, 2003):a 3 4 table of values.Types of calibrationa71a

20、71a71a71a71a71a71a71a71a71a71a710.51.01.52.02.53.03.54.0IndexCalibration weight (g)1.1 2.1 3.1 4.1 1.2 2.2 3.2 4.2 1.3 2.3 3.3 4.3a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71a71PoststratifiedLinearBounded linearRakingTwo-phase studiesSample a co

21、hort of N people from population and measure somevariables then subsample n of them and measure more variablesgenes, biomarkers, coding of open-text data, copies of originalmedical recordsIncludes nested casecontrol, casecohort designs.Better use of auxiliary information by either stratifying thesam

22、pling or calibrating to full cohort data after sampling.Calibration of second phase is just like calibration of a single-phase design.RRZ estimatorsRobins, Rotnitzky & Zhao de ned augmented IPW estimatorsfor two-phase designsNXi=1RiiUi( ) +NXi=11 Ri i!Ai( ) = 0where Ai() can be any function of phase

23、-1 data. Equivalent tocalibration estimator Treg using Ai as calibration variable.NXi=1Rii (Ui( ) Ai( ) +NXi=1Ai( ) = 0RRZ estimatorsIncludes the e cient estimator in the non-parametric phase-1model (e cient design-based estimator) | the most e cientestimator that is consistent for the same limit as

24、 if we hadcomplete data.Typically not fully e cient if outcome-model assumptions areimposed at phase 1.Example: Cox model assumes in nitely many constraints atphase 1, and e cient two-phase estimator is known (Nan 2004,Can J Stat) and is more e cient than calibration estimator.Estimated weightsRRZ a

25、lso note that estimating from phase-1 data gives betterprecision than using true known . Widely regarded as aparadox.Estimated weights (eg logistic regression) solveNXi=1xiRi =NXi=1xipiie, equate observed and estimated population moments. Fordiscrete x this is exactly calibration, for continuous x i

26、t ise ectively equivalent.Estimated weightsGain of precision in calibration is not paradoxical: comes fromreplaceing variance of Y with variance of residuals for a reductionby (1 r2) nothing to do with estimationExactly same issue as gain of precision when adjusting random-ized trial for baseline: c

27、an write randomized trial estimator ascalibration with counterfactuals.Estimation error in weights does increase uncertainty, but this issecond order: for p predictors it is O(1 +p=n)Calibration provides increased precision only when r2 is largeenough (compared to p=n).Judkins et al, Stat Med 26:102

28、2-33Computingcalibrate() also works on two-phase design objectsSince the phase-one data are already stored in the object, thereis no need to specify population totals when calibrating.It is necessary to specify phase=2.This morning we had a two-phase casecontrol designdccs2-twophase(id=list(seqno,se

29、qno),strata=list(NULL,interaction(rel,instit),data=nwtco, subset=incc2)Calibrating it to 16 strata of relapse stage institutional histol-ogy:gccs8-calibrate(dccs2, phase=2,formula=interaction(rel,stage,instit)Logistic regressionAs all the phase-one data are available we can also esti-mate sampling w

30、eights by logistic regression, as suggested byRobins,Rotnitzky & Zhao (JASA, 1994).Either use calibrate with calfun=“rrz“ or estWeights.estWeights takes a data frame with missing values as inputand produces a corresponding two-phase design with weightsestimated by logistic regression.Choice of auxil

31、iariesThe other heuristic gain from the calibration viewpoint is inchoosing predictors for estimating .The regression formulation shows that the predictors should havestrong linear relationships with Ui( ).If Ui( ) is of a form such asziwi(yi i( )then zi is approximately uncorrelated with UiSo, dont use a variable correlated with a phase-2 predictor asa calibration variable, use a variable correlated with the phase-2score function.estWeights() can take a phase-one model as an argument and usethe estimating functions from that model as calibration variables.More detail from Norm.

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 企业管理 > 经营企划

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报