Computer Vision：models, learning and inference booklet.pdf-道客多多

资源描述

1、Algorithms bookletNovember 2, 20132Copyright c 2012 by Simon Prince. This latest version of this document can be downloaded fromhttp:/.Algorithms bookletThis document accompanies the book Computer vision: models, learning, and inference bySimon J.D. Prince. It contains concise descriptions of almost

2、 all of the models and algorithmsin the book. The goal is to provide su cient information to implement a naive version ofeach method. This information was published separately from the main book because (i) itwould have impeded the clarity of the main text and (ii) on-line publishing means that I ca

3、nupdate the text periodically and eliminate any mistakes.In the main, this document uses the same notation as the main book (see Appendix A fora summary). In addition, we also use the following conventions: When two matrices are concatenated horizontally, we write C = A;B. When two matrices are conc

4、atenated vertically, we write C = A; B. The function argminxfx returns the value of the argument x that minimizes fx. Ifx is discrete then this should be done by exhaustive search. If x is continuous, then itshould be done by gradient descent and I usually supply the gradient and Hessian ofthe funct

5、ion to help with this. The function x for discrete x returns 1 when the argument x is 0 and returns 0otherwise. The function diagA returns a column vector containing the elements on the diagonalof matrix A. The function zerosI;J creates an I J matrix that is full of zeros.As a nal note, I should poi

6、nt out that this document has not yet been checked very care-fully. Im looking for volunteers to help me with this. There are two main ways you can help.First, please mail me at s.princecs.ucl.ac.uk if you manage to successfully implement oneof these methods. That way I can be sure that the descript

7、ion is su cient. Secondly, pleasealso mail me if you if you have problems getting any of these methods to work. Its possiblethat I can help, and it will help me to identify ambiguities and errors in the descriptions.Simon PrinceCopyright c 2012 by Simon Prince. This latest version of this document c

8、an be downloaded fromhttp:/.4Copyright c 2012 by Simon Prince. This latest version of this document can be downloaded fromhttp:/.List of Algorithms4.1 Maximum likelihood learning for normal distribution . . . . . . . . . . . . . . 74.2 MAP learning for normal distribution with conjugate prior . . .

9、. . . . . . . . 74.3 Bayesian approach to normal distribution . . . . . . . . . . . . . . . . . . . . . 84.4 Maximum likelihood learning for categorical distribution . . . . . . . . . . . . 84.5 MAP learning for categorical distribution with conjugate prior . . . . . . . . . 94.6 Bayesian approach t

10、o categorical distribution . . . . . . . . . . . . . . . . . . . 96.1 Basic Generative classi er . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107.1 Maximum likelihood learning for mixtures of Gaussians . . . . . . . . . . . . . 117.2 Maximum likelihood learning for t-distribution . .

11、. . . . . . . . . . . . . . . . 127.3 Maximum likelihood learning for factor analyzer . . . . . . . . . . . . . . . . . 138.1 Maximum likelihood learning for linear regression . . . . . . . . . . . . . . . . 148.2 Bayesian formulation of linear regression. . . . . . . . . . . . . . . . . . . . . . 1

12、58.3 Gaussian process regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168.4 Sparse linear regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178.5 Dual formulation of linear regression. . . . . . . . . . . . . . . . . . . . . . . . 188.6 Dual Gaussian proce

13、ss regression. . . . . . . . . . . . . . . . . . . . . . . . . . 188.7 Relevance vector regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199.1 Cost and derivatives for MAP logistic regression . . . . . . . . . . . . . . . . . 209.2 Bayesian logistic regression . . . . . . . . .

14、. . . . . . . . . . . . . . . . . . . . 219.3 Cost and derivatives for MAP dual logistic regression . . . . . . . . . . . . . . 229.4 Dual Bayesian logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . 239.5 Relevance vector classi cation . . . . . . . . . . . . . . . . . . . . . .

15、 . . . . . 249.6 Incremental logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 259.7 Logitboost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269.8 Cost function, derivative and Hessian for multi-class logistic regression . . . . . 279.9 Multicla

16、ss classi cation tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2810.1 Gibbs sampling from undirected model . . . . . . . . . . . . . . . . . . . . . . 2910.2 Contrastive divergence learning of undirected model . . . . . . . . . . . . . . . 3011.1 Dynamic programming in chain . . . . .

17、 . . . . . . . . . . . . . . . . . . . . . 3211.2 Dynamic programming in tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 3311.3 Forward backward algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3411.4 Sum product: distribute . . . . . . . . . . . . . . . . . . . . . . . .

18、 . . . . . . . 3511.4b Sum product: collate and compute marginal distributions . . . . . . . . . . . . 3612.1 Binary graph cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3712.2 Reparameterization for binary graph cut . . . . . . . . . . . . . . . . . . . . . 38Copyright c 2

19、012 by Simon Prince. This latest version of this document can be downloaded fromhttp:/.6 LIST OF ALGORITHMS12.3 Multilabel graph cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3912.4 Alpha expansion algorithm (main loop) . . . . . . . . . . . . . . . . . . . . . . 4012.4b Alpha

20、 expansion (expand) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4113.1 Principal components analysis (dual) . . . . . . . . . . . . . . . . . . . . . . . 4213.2 K-means algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4314.1 ML learning of extrinsic parameter

21、s . . . . . . . . . . . . . . . . . . . . . . . . 4414.2 ML learning of intrinsic parameters . . . . . . . . . . . . . . . . . . . . . . . . 4514.3 Inferring 3D world position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4615.1 Maximum likelihood learning of Euclidean transformation . .

22、 . . . . . . . . . . 4715.2 Maximum likelihood learning of similarity transformation . . . . . . . . . . . . 4815.3 Maximum likelihood learning of a ne transformation . . . . . . . . . . . . . . 4915.4 Maximum likelihood learning of projective transformation . . . . . . . . . . . . 5015.5 Maximum li

23、kelihood inference for transformation models . . . . . . . . . . . . 5115.6 ML learning of extrinsic parameters (planar scene) . . . . . . . . . . . . . . . . 5215.7 ML learning of intrinsic parameters (planar scene) . . . . . . . . . . . . . . . . 5315.8 Robust ML learning of homography . . . . . .

24、 . . . . . . . . . . . . . . . . . . 5415.9 Robust sequential learning of homographies . . . . . . . . . . . . . . . . . . . . 5515.10 PEaRL learning of homographies . . . . . . . . . . . . . . . . . . . . . . . . . . 5616.1 Extracting relative camera position from point matches . . . . . . . . . .

25、. . . 5716.2 Eight point algorithm for fundamental matrix . . . . . . . . . . . . . . . . . . 5816.3 Robust ML tting of fundamental matrix . . . . . . . . . . . . . . . . . . . . . 5916.4 Planar recti cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6017.1 Generalized Procrus

26、tes analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 6117.2 ML learning of PPCA model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6218.1 Maximum likelihood learning for identity subspace model . . . . . . . . . . . . 6318.2 Maximum likelihood learning for PLDA model . . . . .

27、 . . . . . . . . . . . . . 6418.3 Maximum likelihood learning for asymmetric bilinear model . . . . . . . . . . 6518.4 Style translation with asymmetric bilinear model . . . . . . . . . . . . . . . . . 6619.1 The Kalman lter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6719.2

28、Fixed interval Kalman smoother . . . . . . . . . . . . . . . . . . . . . . . . . . 6819.3 The extended Kalman lter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6919.4 The iterated extended Kalman lter . . . . . . . . . . . . . . . . . . . . . . . . 7019.5 The unscented Kalman lter . . .

29、 . . . . . . . . . . . . . . . . . . . . . . . . . 7119.6 The condensation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7220.1 Learn bag of words model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7320.2 Learn latent Dirichlet allocation model . . . . . . . . . . .

30、. . . . . . . . . . . 7420.2b MCMC Sampling for LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75Copyright c 2012 by Simon Prince. This latest version of this document can be downloaded fromhttp:/.Fitting probability distributions 7Algorithm 4.1: Maximum likelihood learning of normal

31、distributionThe univariate normal distribution is a probability density model suitable for describingcontinuous data x in one dimension. It has pdfPr(x) = 1p2 2 exp 0:5(x )2= 2 ;where the parameter denotes the mean and 2 denotes the variance.Algorithm 4.1: Maximum likelihood learning for normal dist

32、ributionInput : Training data fxigIi=1Output: Maximum likelihood estimates of parameters =f ; 2gbegin/ Set mean parameter = PIi=1 xi=I/ Set variance 2 = PIi=1(xi )2=IendAlgorithm 4.2: MAP learning of univariate normal parametersThe conjugate prior to the normal distribution is the normal-scaled inve

33、rse gamma distribu-tion which has pdfPr( ; 2) =p p2 ( ) 1 2 +1exp 2 + ( )22 2 ;with hyperparameters ; ; 0 and 2 1;1.Algorithm 4.2: MAP learning for normal distribution with conjugate priorInput : Training data fxigIi=1, Hyperparameters ; ; ; Output: MAP estimates of parameters =f ; 2gbegin/ Set mean

34、 parameter = (Pi=1 xi + )=(I + )/ Set variance 2 = (PIi=1(xi )2 + 2 + ( )2)=(I + 3 + 2 )endCopyright c 2012 by Simon Prince. This latest version of this document can be downloaded fromhttp:/.8 Fitting probability distributionsAlgorithm 4.3: Bayesian approach to univariate normal distributionIn the B

35、ayesian approach to tting the univariate normal distribution we again use a normal-scaled inverse gamma prior. In the learning stage we compute a normal inverse gammadistribution over the mean and variance parameters. The predictive distribution for a newdatum is computed by integrating the predicti

36、ons for a given set of parameters weighted bythe probability of those parameters being present.Algorithm 4.3: Bayesian approach to normal distributionInput : Training data fxigIi=1, Hyperparameters ; ; ; , Test data x Output: Posterior parameters f ; ; ; g, predictive distribution Pr(x jx1:I)begin/

37、Compute normal inverse gamma posterior over normal parameters = +I=2 = Pix2i=2 + + 2=2 ( +Pixi)2=(2 + 2I) = +I = ( +Pixi)=( +I)/ Compute intermediate parameters = + 1=2 = x 2=2 + + 2=2 ( +x )2=(2 + 2) = + 1/ Evaluate new datapoint under predictive distributionPr(x jx1:I) =p = p2 p endAlgorithm 4.4:

38、ML learning of categorical parametersThe categorical distribution is a probability density model suitable for describing discretemultivalued data x2f1;2;:Kg. It has pdfPr(x = k) = k;where the parameter k denotes the probability of observing category k.Algorithm 4.4: Maximum likelihood learning for c

39、ategorical distributionInput : Multi-valued training data fxigIi=1Output: ML estimate of categorical parameters =f 1 : kgbeginfor k=1 to K do k = PIi=1 xi k=IendendCopyright c 2012 by Simon Prince. This latest version of this document can be downloaded fromhttp:/.Fitting probability distributions 9A

40、lgorithm 4.5: MAP learning of categorical parametersFor MAP learning of the categorical parameters, we need to de ne a prior and to this end,we choose the Dirichlet distribution:Pr( 1: K) = PKk=1 kQKk=1 kKYk=1 k 1k ;where is the Gamma function and f kgKk=1 are hyperparameters.Algorithm 4.5: MAP lear

41、ning for categorical distribution with conjugate priorInput : Binary training data fxigIi=1, Hyperparameters f kgKk=1Output: MAP estimates of parameters =f kgKk=1beginfor k=1 to K doNk = PIi=1 xi k) k = (Nk 1 + k)=(I K +PKk=1 k)endendAlgorithm 4.6: Bayesian approach to categorical distributionIn the

42、 Bayesian approach to tting the categorical distribution we again use a Dirichlet prior.In the learning stage we compute a probability distribution over K categorical parameters,which is also a Dirichlet distribution. The predictive distribution for a new datum is basedon a weighted sum of the predi

43、ctions for all possible parameter values where the weights usedare based on the Dirichlet distribution computed in the learning stage.Algorithm 4.6: Bayesian approach to categorical distributionInput : Categorical training data fxigIi=1, Hyperparameters f kgKk=1Output: Posterior parameters f kgKk=1,

44、 predictive distribution Pr(x jx1:I)begin/ Compute categsorical posterior over for k=1 to K do k = k +PIi=1 xi kend/ Evaluate new datapoint under predictive distributionfor k=1 to K doPr(x = kjx1:I) = k=(PKm=1 m)endendCopyright c 2012 by Simon Prince. This latest version of this document can be down

45、loaded fromhttp:/.10 Learning and inference in visionAlgorithm 6.1: Basic generative classi erConsider the situation where we wish to assign a label w2f1;2;:Kgbased on an observedmultivariate measurement vector xi. We model the class conditional density functions asnormal distributions so thatPr(xij

46、wi = k) = Normxi k; k;with prior probabilities over the world state de ned byPr(wi) = Catwi :In the learning phase, we t the parameters k and 2k of the kth class conditional densityfunction Pr(xijwi = k) from just the subset of data Sk =fxi : wi = kg where the kth statewas observed. We learn the pri

47、or parameter from the training world states fwigIi=1. Herewe have used the maximum likelihood approach in both cases.The inference algorithm takes new datum x and returns the posterior Pr(w jx ; ) overthe world state w using Bayes rule,Pr(w jx ) = Pr(x jw )Pr(w )PKw =1Pr(x jw )Pr(w ):Algorithm 6.1:

48、Basic Generative classi erInput : Training data fxi;wigIi=1, new data example x Output: ML parameters =f 1:K; 1:K; 1:Kg, posterior probability Pr(w jx )begin/ For each training classfor k=1 to K do/ Set mean k = (PIi=1 xi wi k)=(PIi=1 wi k)/ Set variance k = (PIi=1(xi k)(xi k)T wi k)=(PIi=1 wi k)/ S

49、et prior k = PIi=1 wi k=Iend/ Compute likelihoods for each class for a new datapointfor k=1 to K dolk = Normx k; kend/ Classify new datapoint using Bayes rulefor k=1 to K doPr(w = kjx ) = lk k=PKm=1 lm mendendCopyright c 2012 by Simon Prince. This latest version of this document can be downloaded fr

50、omhttp:/.Modelling complex densities 11Algorithm 7.1: Fitting mixture of GaussiansThe mixture of Gaussians (MoG) is a probability density model suitable for data x in Ddimensions. The data is described as a weighted sum of K normal distributionsPr(xj ) =KXk=1 kNormx k; k;where 1:K and 1:K are the me

展开阅读全文