1、Econometric Analysis of Panel Data,William GreeneDepartment of EconomicsStern School of Business,Econometric Analysis of Panel Data,24. Bayesian Econometric Models for Panel Data,Sources,Lancaster, T.: An Introduction to Modern Bayesian Econometrics, Blackwell, 2004Koop, G.: Bayesian Econometrics, W
2、iley, 2003 “Bayesian Methods,” “Bayesian Data Analysis,” (many books in statistics)Papers in Marketing: Allenby, Ginter, Lenk, Kamakura,Papers in Statistics: Sid Chib, Books and Papers in Econometrics: Arnold Zellner, Gary Koop, Mark Steel, Dale Poirier,Software,Stata, Limdep, SAS, etc.S, R, Matlab,
3、 GaussWinBUGSBayesian inference Using Gibbs Sampling(On random number generation),http:/www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml,A Philosophical Underpinning,A method of using new information to update existing beliefs about probabilities of eventsBayes Theorem for events. (Conceived for updating be
4、liefs about games of chance),On Objectivity and Subjectivity,Objectivity and “Frequentist” methods in Econometrics The data speakSubjectivity and BeliefsPriorsEvidencePosteriorsScience and the Scientific Method,Paradigms,ClassicalFormulate the theoryGather evidenceEvidence consistent with theory? Th
5、eory stands and waits for more evidence to be gatheredEvidence conflicts with theory? Theory fallsBayesianFormulate the theoryAssemble existing evidence on the theoryForm beliefs based on existing evidenceGather evidenceCombine beliefs with new evidenceRevise beliefs regarding the theory,Application
6、s of the Paradigm,Classical econometricians doggedly cling to their theories even when the evidence conflicts with them that is what specification searches are all about.Bayesian econometricians NEVER incorporate prior evidence in their estimators priors are always studiously noninformative. (Inform
7、ative priors taint the analysis.) As practiced, Bayesian analysis is not Bayesian.,Likelihoods,(Frequentist) The likelihood is the density of the observed data conditioned on the parametersInference based on the likelihood is usually “maximum likelihood”(Bayesian) A function of the parameters and th
8、e data that forms the basis for inference not a probability distribution The likelihood embodies the current information about the parameters and the data,The Likelihood Principle,The likelihood embodies ALL the current information about the parameters and the dataProportional likelihoods should lea
9、d to the same inferences,Application:,(1) 20 Bernoulli trials, 7 successes (Binomial)(2) N Bernoulli trials until the 7th success (Negative Binomial),Inference,The Bayesian Estimator,The posterior distribution embodies all that is “believed” about the model.Posterior = f(model|data) = Likelihood(,da
10、ta) * prior() / P(data)“Estimation” amounts to examining the characteristics of the posterior distribution(s).Mean, varianceDistributionIntervals containing specified probabilities,Priors and Posteriors,The Achilles heel of Bayesian EconometricsNoninformative and Informative priors for estimation of
11、 parametersNoninformative (diffuse) priors: How to incorporate the total lack of prior belief in the Bayesian estimator. The estimator becomes solely a function of the likelihoodInformative prior: Some prior information enters the estimator. The estimator mixes the information in the likelihood with
12、 the prior information.Improper and Proper priorsP() is uniform over the allowable range of Cannot integrate to 1.0 if the range is infinite.Salvation improper, but noninformative priors will fall out of the posterior.,Diffuse (Flat) Priors,Conjugate Prior,THE Question,Where does the prior come from
13、?,Large Sample Properties of Posteriors,Under a uniform prior, the posterior is proportional to the likelihood functionBayesian estimator is the mean of the posteriorMLE equals the mode of the likelihoodIn large samples, the likelihood becomes approximately normal the mean equals the modeThus, in la
14、rge samples, the posterior mean will be approximately equal to the MLE.,Reconciliation A Theorem (Bernstein-Von Mises),The posterior distribution converges to normal with covariance matrix equal to 1/N times the information matrix (same as classical MLE). (The distribution that is converging is the
15、posterior, not the sampling distribution of the estimator of the posterior mean.)The posterior mean (empirical) converges to the mode of the likelihood function. Same as the MLE. A proper prior disappears asymptotically.Asymptotic sampling distribution of the posterior mean is the same as that of th
16、e MLE.,Mixed Model Estimation,MLWin: Multilevel modeling for Windowshttp:/multilevel.ioe.ac.uk/index.htmlUses mostly Bayesian, MCMC methods“Markov Chain Monte Carlo (MCMC) methods allow Bayesian models to be fitted, where prior distributions for the model parameters are specified. By default MLwin s
17、ets diffuse priors which can be used to approximate maximum likelihood estimation.” (From their website.),Bayesian Estimators,First generation: Do the integration (math)Contemporary - Simulation: (1) Deduce the posterior(2) Draw random samples of draws from the posterior and compute the sample means
18、 and variances of the samples. (Relies on the law of large numbers.),The Linear Regression Model,Marginal Posterior for ,Nonlinear Models and Simulation,Bayesian inference over parameters in a nonlinear model:1. Parameterize the model2. Form the likelihood conditioned on the parameters3. Develop the
19、 priors joint prior for all model parameters4. Posterior is proportional to likelihood times prior. (Usually requires conjugate priors to be tractable.)5. Draw observations from the posterior to study its characteristics.,Simulation Based Inference,A Practical Problem,A Solution to the Sampling Prob
20、lem,The Gibbs Sampler,Target: Sample from marginals of f(x1, x2) = joint distributionJoint distribution is unknown or it is not possible to sample from the joint distribution.Assumed: f(x1|x2) and f(x2|x1) both known and samples can be drawn from both.Gibbs sampling: Obtain one draw from x1,x2 by ma
21、ny cycles between x1|x2 and x2|x1.Start x1,0 anywhere in the right range.Draw x2,0 from x2|x1,0.Return to x1,1 from x1|x2,0 and so on.Several thousand cycles produces the drawsDiscard the first several thousand to avoid initial conditions. (Burn in)Average the draws to estimate the marginal means.,B
22、ivariate Normal Sampling,Gibbs Sampling for the Linear Regression Model,Application the Probit Model,Gibbs Sampling for the Probit Model,Generating Random Draws from f(X),Example: Simulated Probit,? Generate raw dataSample ; 1 - 1000 $Create ; x1=rnn(0,1) ; x2 = rnn(0,1) $Create ; ys = .2 + .5*x1 -
23、.5*x2 + rnn(0,1) ; y = ys 0 $Namelist; x=one,x1,x2$Matrix ; xx=xx ; xxi = $Calc ; Rep = 200 ; Ri = 1/Rep$Probit ; lhs=y;rhs=x$? Gibbs samplerMatrix ; beta=0/0/0 ; bbar=init(3,1,0);bv=init(3,3,0)$Proc = gibbs$Do for ; simulate ; r =1,Rep $Create ; mui = xbeta ; f = rnu(0,1) ; if(y=1) ysg = mui + inp(
24、1-(1-f)*phi( mui); (else) ysg = mui + inp( f *phi(-mui) $Matrix ; mb = xxi*xysg ; beta = rndm(mb,xxi) ; bbar=bbar+beta ; bv=bv+beta*beta$Enddo ; simulate $Endproc $Execute ; Proc = Gibbs $ (Note, did not discard burn-in)Matrix ; bbar=ri*bbar ; bv=ri*bv-bbar*bbar $Matrix ; Stat(bbar,bv); Stat(b,varb)
25、 $,Example: Probit MLE vs. Gibbs,- Matrix ; Stat(bbar,bv); Stat(b,varb) $+-+|Number of observations in current sample = 1000 |Number of parameters computed here = 3 |Number of degrees of freedom = 997 |+-+-+-+-+-+-+|Variable | Coefficient | Standard Error |b/St.Er.|P|Z|z |+-+-+-+-+-+ BBAR_1 .2148328
26、1 .05076663 4.232 .0000 BBAR_2 .40815611 .04779292 8.540 .0000 BBAR_3 -.49692480 .04508507 -11.022 .0000+-+-+-+-+-+|Variable | Coefficient | Standard Error |b/St.Er.|P|Z|z |+-+-+-+-+-+ B_1 .22696546 .04276520 5.307 .0000 B_2 .40038880 .04671773 8.570 .0000 B_3 -.50012787 .04705345 -10.629 .0000,A Ra
27、ndom Parameters Approach to Modeling Heterogeneity,Allenby and Rossi, “Marketing Models of Consumer Heterogeneity,” Journal of Econometrics, 89, 1999.Discrete Choice Model Brand Choice“Hierarchical Bayes”Multinomial ProbitPanel Data: Purchases of 4 brands of Ketchup,Structure,Bayesian Priors,Bayesia
28、n Estimator,Joint posterior mean=Integral does not exist in closed form.Estimate by random samples from the joint posterior.Full joint posterior is not known, so not possible to sample from the joint posterior.,Gibbs Cycles for the MNP Model,Samples from the marginal posteriors,Bayesian Fixed Effect
29、s,Application: Koop, et al., “Hospital Cost Efficiency,” Journal of Econometrics, 1997, 76, pp. 77-106Treat individual constants as first level parametersModel=f(1,N,data)Formal Bayesian treatment of K+N+1 parameters in the model.Stochastic Frontier as in latent variable applicationBayesian counterp
30、arts to fixed effects and random effects models? Incidental parameters? (Almost surely, or something like it.) How do you deal with itIrrelevant There are no asymptotic propertiesMust be relevant estimates are numerically unstable,Comparison of Maximum Simulated Likelihood and Hierarchical Bayes,Ken
31、 Train: “A Comparison of Hierarchical Bayes and Maximum Simulated Likelihood for Mixed Logit”Mixed Logit,Stochastic Structure Conditional Likelihood,Note individual specific parameter vector, i,Classical Approach,Bayesian Approach Gibbs Sampling and Metropolis-Hastings,Gibbs Sampling from Posteriors
32、: b,Gibbs Sampling from Posteriors: ,Gibbs Sampling from Posteriors: i,Metropolis Hastings Method,Metropolis Hastings: A Draw of i,Application: Energy Suppliers,N=361 individuals, 2 to 12 hypothetical suppliers. (A stated choice experiment)X=(1) fixed rates, (2) contract length, (3) local (0,1),(4)
33、well known company (0,1), (5) offer TOD rates (0,1), (6) offer seasonal rates,Estimates: Mean of Individual i,Conclusions,Bayesian vs. Classical EstimationIn principle, some differences in interpretationAs practiced, just two different algorithmsThe religious debate is a red herringGibbs Sampler. A
34、major technological advanceUseful tool for both classical and BayesianNew Bayesian applications appear daily,Standard Criticisms,Of the Classical ApproachComputationally difficult (ML vs. MCMC)No attention is paid to household level parameters.There is no natural estimator of individual or household
35、 level parametersResponses: None are true. See, e.g., Train (2003, ch. 10)Of Classical Inference in this SettingAsymptotics are “only approximate” and rely on “imaginary samples.” Bayesian procedures are “exact.”Response: The inexactness results from acknowledging that we try to extend these results
36、 outside the sample. The Bayesian results are “exact” but have no generality and are useless except for this sample, these data and this prior. (Or are they? Trying to extend them outside the sample is a distinctly classical exercise.),Standard Criticisms,Of the Bayesian ApproachComputationally diff
37、icult.Response: Not really, with MCMC and Metropolis-HastingsThe prior (conjugate or not) is a canard. It has nothing to do with “prior knowledge” or the uncertainty of the investigator.Response: In fact, the prior usually has little influence on the results. (Bernstein and von Mises Theorem)Of Bayesian InferenceIt is not statistical inferenceHow do we discern any uncertainty in the results? This is precisely the underpinning of the Bayesian method. There is no uncertainty. It is exact.,