1、Variance Reduction in Gibbs Sampler Using Quasi Random Numbers J. G. LIAO A sequence of s-dimensional quasi random numbers fills the unit cube evenly at a much faster rate than a sequence of pseudo uniform deviates does. It has been successfully used in many Monte Carlo problems to speed up the conv
2、ergence. Direct use of a sequence of quasi random numbers, however, does not work in Gibbs samplers because the successive draws are now dependent. We develop a quasi random Gibbs algorithm in which a randomly permuted quasi random sequence is used in place of a sequence of pseudo deviates. One laye
3、r of unnecessary variation in the Gibbs sample is eliminated. A simulation study with three examples shows that the proposed quasi random algorithm provides much tighter estimates of the quantiles of the stationary distribution and is about 4-25 times as efficient as the pseudo algorithm. No rigorou
4、s theoretical justification for the quasi random algorithm, however, is available at this point. Key Words: Algorithm; Bayes inference; Markov chain; Monte Carlo. 1. INTRODUCTION Monte Carlo Markov chains have recently received a lot of attention as a powerful computational tool in Bayes statistical
5、 inference. The conceptual simplicity, ease of im- plementation, and applicability to a wide range of problems are among their attractive features. The basic idea is to formulate a Markov chain such that its stationary distri- bution is the desired posterior one. The Monte Carlo method is then used
6、to sample the Markov chain. Characteristics of the posterior distribution can be obtained with any degree of accuracy if the number of iterations is large enough. See Tanner and Wong (1987), Gelfand and Smith (1990), Smith and Roberts (1993), Besag and Green (1993), Tanner (1993), and Tierney (1994)
7、 for more details. Like other Monte Carlo methods, the computational burden of the Monte Carlo Markov chains can be heavy especially if high accuracy is needed. Several authors have studied ways to modify the Markov chain scheme to speed up convergence. Liu, Wong, and Kong (1994) compared different
8、Gibbs sampling plans. Hills and Smith (1992) dis- cussed reparameterization of Gibbs samplers. Besag and Green (1993) considered the use of auxiliary variables. They showed that reparameterization and auxiliary variables can J.G. Liao is Assistant Professor, Department of Epidemiology and Biostatist
9、ics, University of South Florida, 13201 Bruce B. Downs Boulevard, Tampa, FL 33612 (E-mail: jliaocoml.med.usf.edu). )1998 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America Journal of Computational and Graphical Statistics, Volume 7, Numb
10、er 3, Pages 253-266 253 J. G. LIAO sometimes speed computation dramatically. Standard variance reduction techniques such as importance sampling, conditioning, and control variables can also be used (Tierey 1994). This article presents a quasi random Gibbs algorithm that uses quasi random numbers in
11、a Gibbs sampler for variance reduction. A sequence of s-dimensional quasi random numbers fills in the unit cube 0, 15 evenly at a much faster rate than a sequence of pseudo uniform deviates does and has been used in many Monte Carlo problems with independent draws to speed the convergence (Niedereit
12、er 1992; Fang and Wang 1994). Direct use of a quasi random sequence, however, does not work in problems with de- pendent draws because the successive numbers in a quasi random sequence can have very particular dependence structures. We develop a quasi random Gibbs algorithm in which a randomly permu
13、ted quasi random sequence is used in place of a sequence of pseudo random deviates. This eliminates one layer of unnecessary variation in the Gibbs sample. The proposed quasi random algorithm is applied to three Gibbs examples with minimal increase in implementation cost and compared with the standa
14、rd pseudo random algorithm through simulation. In all three examples, the quasi random algorithm provides much tighter estimates of the quantiles of the stationary distribution and is 4-25 times as efficient as the pseudo random algorithm. In spite of this empirical support, however, no rigorous the
15、oretical justification for the quasi random algorithm is available at this point. The dependence structure of the successive draws in the quasi random algorithm proves more complicated and further work is needed to fully understand it. We hope this article will stimulate more research on the applica
16、tion of quasi random numbers in problems with dependent draws. The article is organized as follows. Section 2 reviews the basic ideas of quasi random numbers. Section 3 presents the proposed quasi random Gibbs algorithm. Section 4 provides some implementation details and a simulation study of three
17、examples. Section 5 contains a discussion. 2. QUASI RANDOM NUMBERS Quasi random numbers and their applications are discussed in detail in Niedereiter (1992) and Fang and Wang (1994). The following are some basic ideas. Let x1, x2,. be a sequence of independent random deviates from the uniform distri
18、bution on 0, 11, where s 1. The uniform cumulative distribution on 0, 1 will be denoted as F(.). The empirical cumulative distribution of the first n deviates is defined as n Fx(x) = n- ind(xi 0. There is no probability statement here as a quasi random sequence is conceptually a deterministic one. A
19、 sequence of quasi random numbers can readily substitute for a sequence of pseudo deviates in the Monte Carlo evaluation of f g(x)dF(x). The usual expression fg(x)dFn(x) is replaced by f g(y)dFY(y). As FY(.) approaches F(.) at a faster rate, the quasi random integral converges to f g(x)dF(x) more ra
20、pidly for a continuous g(x). The use of a sequence of quasi random numbers in place of a sequence of the pseudo deviates can substantially speed up the convergence of a Monte Carlo problem if the solution depends on the pseudo deviates only through their empirical distribution and if an empirical di
21、stribution closer to that of the uniform distribution leads to a more accurate solution. Areas with successful applications include numerical integration, optimization, experimental design, and solutions of differential equations. The successive numbers in a quasi random sequence, however, can have
22、very particular dependence structures. For this reason we cannot simply substitute a sequence of quasi random numbers for a sequence of pseudo random deviates in problems with dependent draws such as a Gibbs sampler. There are two considerations in choosing a quasi random sequence for use in the fol
23、lowing quasi random Gibbs algorithm. First, the sequence must have good properties for possibly very high dimension s. Second, the quasi random numbers in the sequence need to be generated efficiently in any desired order. A good point set sequence (Fang and Wang 1994, pp. 26-27) meets the two requi
24、rements. We especially recommend, based on our experiments, the good point set given in (1.3.8) in Fang and Wang with some modification. Let p be a prime number and q = pl/(s+l). Define yj = qjJ, where q3 denotes the fractional part of qJ. The kth number Yk in this sequence is defined as (L7kJ, L2y2
25、k, LSkJ). (2.1) We shall call L?jkj the jth coordinate of yk It would take a large n for the ith and jth coordinates of yl, Y2, ., n to fill 0, 12 evenly if 1i - 7j is small. In other words, the ith and jth coordinates mix slowly. We recommend choosing p = 2 as this makes 7Y, 72, . . , Yn well sprea
26、d out between 0 and 1. We also recommend rearranging the order of 7, 72, ., Ys by defining yj = q(j+l)/2 for j = 1, 3,., and yj= Lq(S+l)/2+j/21 for j = 2, 4,. (2.2) 255 J. G. LIAO when s is odd. Replace (s+ 1)/2 in (2.2) by s/2 when s is even. With this rearrangement, any two neighboring yj and yj+l
27、 are not very close and their corresponding coordinates do not mix very slowly. This is helpful when neighboring coordinates are used together such as in the rejection method of generating random deviates. 3. QUASI GIBBS ALGORITHM Let 0 = (01, 02, ., 0m) be a random vector with distribution 7r(.). E
28、ach iteration of the Gibbs sampler cycles through the subvectors of 0, drawing each subvector conditional on the values of all others. Let (i) = (0(i) ), 0.i , 0() be the ith iterate. Each 0) is drawn from the conditional distribution of Oj given 0(,., i)j -, 1 , , 0 .m . Under quite general conditi
29、ons, the distribution of 0( approaches 7r(.) as i -+ oo (Tier- ney 1994). The Gibbs sample (), 0(2), ., 0(n) can thus be used to estimate the char- acteristics of 7r() after an initial proportion of the iterations is discarded as a bum-in period. An extensive collection of methods have been invented
30、 for drawing from various kinds of distributions (Devroye 1986). They are usually ways to convert one or more independent uniform deviates on 0,1 to a deviate of a desired distribution. The trans- formation method and the rejection method are most widely used. For convenience, a uniform deviate on 0
31、,1 will be called a standard deviate. The number of standard devi- ates needed in the transformation method is fixed while the number of standard deviates in the rejection method is random with a certain stopping rule. We will denote the se- quence of independent standard deviates used in drawing al
32、l the components of 0() as u(. The u(1), u(2), ., un) are independent of each other. Our quasi Gibbs algorithm can be motivated by a two-step plan for generating u(1), u(2) ., un). Consider first the case where the number of uniform deviates needed in drawing from every conditional distribution is f
33、ixed. This happens, for example, when only the transformation method is used. Let s be the total number of standard deviates needed in drawing from the m conditional distributions. The sequence u(), u(2),., un) can be generated in the following way: Step 1: Draw x,x2, . , xn independently from the u
34、niform distribution on 0, 1S. Step 2: Randomly permute x l,X2,. ,x such that each of the n! permutations has equal probability of being selected. The permuted sequence is denoted as xl, x2, .,xn. Let u ()= xi i-1, 2,.,n. Let the conditional distribution of the Gibbs sample 0(), 0(2),., (), given xl,
35、 x2, ,xn in Step 1, be denoted as 0(1), 0(2) . 0Ix(n) 1, xl2 . ,Xn,. The variation comes from the random permutation of xl, x2,., Xn in Step 2. Note that each of the n! permu- tations has the same probability of being selected. Therefore, 0(1), 0(2),., (n) ll, x2, .,xn depends only on the empirical
36、distribution Fn (.) and can be written as 0(1), 0(2), .(n) IFn (.). The quasi random algorithm is simply to replace the pseudo random se- quence xl, x2, ., Xn in Step 1 by the quasi random sequence yi, Y2, .? ? , yn and deliver the Gibbs sample from 0(1), 0(2),., 0(n) IFnY(.). This is based on the i
37、ntuition that the fluctuation of Fn (.) around F(.) serves no useful purpose. The quasi random Fn (.) is much closer to F(.) and could be used instead. 256 VARIANCE REDUCTION IN GIBBS SAMPLER There is some slight complication in the general situation where the number of standard deviates needed for
38、one or more components is random. This happens if the rejection method is used. Let the jth component Oj be such an example. The standard deviates needed can be treated as an infinite sequence with some finite foremost elements actually used depending on the stopping rule. The quasi random algorithm
39、 replaces the lj foremost elements by the coordinates of the quasi numbers, where lj is a pre-chosen positive integer. A larger Ij reduces the variability of the Gibbs sample but leads to a higher dimension s for the quasi number yi, which in turn requires a larger n for Y1, Y2,. ,Yn to fill 0, 1S e
40、venly. An n that is not sufficiently large may lead to some bias in the Gibbs sample. This point will be discussed further in Section 5. The quasi random algorithm, in a form suitable for implementation, is: Step A: Decide on the positive integers Ij, j = 1, 2,., m. The Ij should be the number of de
41、viates needed in generating Oj if this number is fixed. If the number is random, then Ij can be chosen such that the actual number of standard deviates needed does not exceed lj with, say, 90% probability. Let s = 1 + 12 + . - + Im. Step B: Randomly permute 1, 2,., n with the resultant sequence as a
42、l, a2, ., an. Step C: Use successively the first 11 coordinates of the oith quasi number on 0, 15 to generate 0(). If all the 11 coordinates are used before 0() is generated, supplement with independent standard deviates. If, however, 0() is generated before all the 11 coordinates are used, the unus
43、ed coordinates shall be skipped. Next, use the following 12 coordinates of the o-ith quasi number to generate 0(7) in the same way. This process continues until 0) is generated with the last lm coordinates of crth quasi number. A note is due here. The quasi random algorithm eliminates the fluctuatio
44、n of Fn (.) in the pseudo random algorithm by replacing it with F y(.). The variation in the quasi random Gibbs sample comes from the random permutation of the quasi random sequence Yl, Y2, ., yn. As this is a pseudo random permutation one should expect the usual n-1/2 rate to apply to the variation
45、 of the quasi random Gibbs sample. The simulation study in the next section demonstrates this. A standard application of the quasi random numbers as described in Section 2 does not need this random permutation and the near n- rate is achieved there. 4. IMPLEMENTATION AND SIMULATION STUDY We now illu
46、strate the implementation of the quasi random algorithm using a hi- erarchical Poisson model from Gelfand and Smith (1990). Failures in ten pumps at a nuclear plant are assumed to occur according to independent Poisson distributions with rate Al, A2,., Aio, respectively. The period of observation fo
47、r the ith pump was ti and the number of failure si. The data for ti and si can be found in Gelfand and Smith (1990). The Bayes model assumes that, conditional on 3, Ai G(a, /3), independently with a = 1.802. The hyper parameter / has a Gamma distribution G(y, 6) with y = .01 and d = 1. Note that we
48、denote G(a, 3) as having a density proportional to x-l e-. Each Gibbs iteration consists of drawing A1, A2,., A10 from G(ac + si, ti + 3) and drawing 3 from G(y + 10a, E Ai + d). The pseudo random implementation of the Gibbs model is straightforward as each 257 J. G. LIAO iteration consists of 11 Ga
49、mma draws. For the quasi random algorithm, we need to prepare two additional routines and modify the Gamma generator. The first additional routine returns a random permutation of 1, 2,., n. The swapping method (Devroye 1986, p. 646) can be coded in several lines with a computing cost of n - 1 standard uniform deviates and n - 1 operations of swapping. The second one returns the jth coordinate of the ith quasi random number for any given 1 =l. do ii=l,huge(l) if (ii =2) then !uses quasi coordinates in first two trials CALL gp_qui (i, , u) !gets the jth coord