子空间半监督Fisher判别分析.doc-道客多多

资源描述

1、Vol. 35, No. 12 ACTA AUTOMATICA SINICA December, 2009Subspace Semi-supervised Fisher Discriminant AnalysisYANG Wu-Yi1, 2, 3 LIANG Wei1XIN Le4ZHANG Shu-Wu1Abstract Fisher discriminant analysis (FDA) is a popular method for supervised dimensionality reduction. FDA seeks for anembedding transformation

2、such that the ratio of the between-class scatter to the within-class scatter is maximized. Labeled data,however, often consume much time and are expensive to obtain, as they require the eorts of human annotators. In order to copewith the problem of eectively combining unlabeled data with labeled dat

3、a to nd the embedding transformation, we propose anovel method, called subspace semi-supervised Fisher discriminant analysis (SSFDA), for semi-supervised dimensionality reduction.SSFDA aims to nd an embedding transformation that respects the discriminant structure inferred from the labeled data and

4、theintrinsic geometrical structure inferred from both the labeled and unlabeled data. We also show that SSFDA can be extended tononlinear dimensionality reduction scenarios by applying the kernel trick. The experimental results on face recognition demonstratethe eectiveness of our proposed algorithm

5、.Key words Fisher discriminant analysis (FDA), semi-supervised learning, manifold regularization, dimensionality reductionIn cases of machine learning and data mining, such asimage retrieval, and face recognition, we may increasinglyconfront with the collection of high-dimensional data. Thisleads us

6、 to consider methods of dimensionality reductionthat allow us to represent the data in a lower dimensionalspace. Techniques for dimensionality reduction have at-tracted much attention in computer vision and patternrecognition. The most popular dimensionality reduction al-gorithms include principal c

7、omponent analysis (PCA)12and Fisher discriminant analysis (FDA)3.PCA is an unsupervised method. It projects the originalm-dimensional data into a d (d m)-dimensional subspacein which the data variance is maximized. It computes theeigenvectors of the data covariance matrix, and approx-imates the orig

8、inal data by a linear combination of theleading eigenvectors. If the data are embedded in a linearsubspace, PCA is guaranteed to discover the dimensionalityof the subspace and produces a compact representation.Unlike PCA, FDA is a supervised method. In the contextof pattern classication, FDA seeks f

9、or the best projectionsubspace such that the ratio of the between-class scatterto the within-class scatter is maximized. For classicationtask, FDA can achieve signicant better performance thanPCA.Labeled data, however, often consume much time andare expensive to obtain, as they require the eorts of

10、hu-man annotators4. Contrarily, in many cases, it is far easierto obtain large numbers of unlabeled data. The problemof eectively combining unlabeled data with labeled datais therefore of central importance in machine learning4.Learning from labeled and unlabeled data has attractedan increasing amou

11、nt of attention recently, and severalnovel approaches have been proposed. Graph-based semi-supervised learning algorithms413 have attracted consid-erable attention in recent years. These algorithms considerthe graph over all the data as a priori knowledge to guidethe decision making. The regularizat

12、ion-based techniqueReceived March 18, 2008; in revised form June 8, 2009Supported by National Sciences and Technology Support-ing Program of China (2008BAH26B02-3, 2008BAH21B03-04,2008BAH26B03)1. Hi-tech Innovation Center, Institute of Automation, ChineseAcademy of Sciences, Beijing 100190, P. R. Ch

13、ina 2. Key Lab-oratory of Underwater Acoustic Communication and Marine Infor-mation Technology of the Minister of Education, Xiamen University,Xiamen 361005, P. R. China 3. College of Oceanography and Envi-ronmental Science, Xiamen University, Xiamen 361005, P. R. China4. School of Electronics Infor

14、mation and Control Engineering, Bei-jing University of Technology, Beijing 100124, P. R. ChinaDOI: 10.3724/SP.J.1004.2009.01513of Cai8is closest in spirit to the intuitions of our paper.Techniques of Belkin5and Cai8are based on regulariza-tion.In this paper, we aim at dimensionality reduction insemi

15、-supervised case. To cope with the problem of eec-tively combining unlabeled data with labeled data, we pro-pose a novel semi-supervised dimensionality reduction algo-rithm called subspace semi-supervised Fisher discriminantanalysis (SSFDA). SSFDA exploits the geometric structureof the labeled an un

16、labeled data and incorporates it as anadditional regularization term. SSFDA intends to nd anembedding transformation that respects the discriminantstructure inferred from the labeled data and the intrinsicgeometrical structure inferred from both labeled and unla-beled data.Semi-supervised discrimina

17、nt analysis (SDA)8is themost relevant algorithm to our algorithm. In the follow-ing, we list the similarities and major dierence betweenSDA and our algorithm:1) Both SDA and our algorithm are graph-based ap-proaches. Both use a p-nearest neighbor graph to modelthe relationship between the nearby dat

18、a points and incor-porate the geometric structure of the labeled and unlabeleddata as an additional regularization term.2) There is one major dierence between SDA and ouralgorithm. In the SDA algorithm, without considering thelabels of the labeled data, the weight matrix of the p-nearestneighbor gra

19、ph is constructed according to the relationshipbetween nearby points in the original data space. In ouralgorithm, using the labeled data, we rst nd a projec-tion subspace by applying the FDA algorithm and embedthe labeled and unlabeled data into this subspace. Then,the weight matrix of the p-nearest

20、 neighbor graph is con-structed according to the relationship between nearby datapoints in the subspace, as well as the labels of the labeleddata.The rest of this paper is organized as follows. In Section1, we provide a brief review of FDA. The proposed SSFDAalgorithm for dimensionality reduction is

21、 introduced in Sec-tion 2. The experimental results are presented in Section3. Finally, we conclude the paper in Section 4.1 Fisher discriminant analysisIn this section, we rst formulate the problem of lineardimensionality reduction. Then, FDA is reviewed. Last,the graph perspective of FDA is introd

22、uced.1514 ACTA AUTOMATICA SINICA Vol. 351.1 FormulationSuppose that we have a set of l samples x1, , xl R mthat belong to c classes, then l =ck=1lk, where lkis thenumber of samples in the k-th class. For the linear dimen-sionality reduction, we focus on nding a transformationmatrix A = (a1, , ad) th

23、at maps these l points to a setof points y1, , ylin Rd(d m). The embedded sample1.3 Graph perspective of Fisher discriminantanalysisWe havecSb = lk(k) )(k) )T=k=1 ! !Tyiis given by yi= ATxi. c lk 1 lk x(ik) 1 lk x(ik) lT=1.2 Fisher discriminant analysis for dimensional-ity reductionk=1clk i=1 lk i=1

24、FDA3is one of the most popular dimensionality reduc-tion techniques. FDA seeks directions on which the datapoints of dierent classes are far from each other while datak=1X(k)W(k)(X(k)T lTc lk(8)points of the same class are close to each other3. Here, webriey describe the denition of FDA.Let Sw, Sb,

25、and Stbe the within-class scatter matrix,St =k=1 i=1(x(ik)(x(ik)T lT (9)the between-class scatter matrix, and total scatter matrix,respectively.clkwhere W(k) is an lk lk matrix with all the elementsequal to 1/lk and X(k)= x(1k), , x(lkk).Let Xl =X(1), , X(c) and dene an l l matrix Wll asSw=k=1 i=1c(

26、x(ik) (k)(x(ik) (k)T (1)Wll =W(1)0 00 W(2) 0. . . .(10)Sb = k=1 lk(k) )(k) )T (2)Then, we have0 0 W(c)St = c lkk=1 i=1(x(ik) )(x(ik) )T= Sw + Sb (3) Sb= XlBXlTSt = XlCXlT(11)(12)where lk is the number of data in the k-th class, x(ik) is the B = Wll 1 Ti-th data in the k-th class, (k) is the mean of

27、the data inthe k-th class, and is the mean of all data: C = I 1 l11T(k)=1lklki=1x(ik) (4)l11where 1 = 1, , 1Tis an l-dimensional vector and I is anl l identity matrix. Thus, the objective function of FDAin (6) can be rewritten asl aTSbaaTlBXTla =1 xi (5) aopt = arg maxaTSta = arg maxaaTXlCXlTa (13)l

28、 i=1 aThe objective function of FDA is as follows:aTSba aTSbaa opt= arg maxaTSwa= arg maxaaaTSta (6)The generalized eigenvalue problem in (7) can be rewrittenasXlBXlTa= XlCXlTa(14)This formulation of FDA objective function was rst intro-duced in 14.The projection vector a that maximizes (6) is given

29、 by themaximum eigenvalue solution to the generalized eigenvalueproblem:Sba = Sta (7)Let the column vector a1, , ad be the solutions of (7), or-dered according to their eigenvalues, 1 d. Thus,the embedding is as follows:xi y i= ATxiwhere yiis a d-dimensional representation of the high di-mensional d

30、ata point xi and A = (a1 , , ad) is the trans-formation matrix. The between-class scatter matrix Sbhasat most rank c 1 . This implies that the multiplicity of = 0 is at least m c + 1. Therefore, FDA can nd atmost c 1 meaningful directions.2 Subspace semi-supervised Fisher dis-criminant analysisWe in

31、troduce our SSFDA algorithm that respects bothdiscriminant and geometrical structures in the data. Webegin with a description of the semi-supervised learningproblem.2.1 Problem formulationGiven a sample set x1 , , xl, xl+1, , xn R manda label set L = 1, , c, the rst l points xi (i l) arelabeled as t

32、i L and the remaining points xu(l + 1 u n) are unlabeled. Find a transformation matrixA = (a1, , ad) that maps these n points y1, , yntoa set of points in Rd(d m). The embedded sam-ple yiis given by yi= ATxi. For any unlabeled sampleNo. 12 YANG Wu-Yi et al.: Subspace Semi-supervised Fisher Discrimin

33、ant Analysis 1515xu (l + 1 u n), its label is then predicted as ti pro-vided that yi=ATxi minimizes kyi yuk, i = 1, , l.The performance of an algorithm is measured by the recog-nition error rate on the unlabeled samples.2.2 The objective functionFDA is a supervised method. It wants to nd an em-beddi

34、ng transformation such that the ratio of the between-class scatter to the within-class scatter is maximized. Whenthere is no sucient training (labeled) samples, the prob-lem of learning from both labeled and unlabeled samples(semi-supervised learning) is of central importance in im-proving the perfo

35、rmance of recognition.SSFDA will be performed in the subspace R(Xk), whereR(Xk) = Spanx1, , xk is the subspace spanned by thecolumns of Xk = x1, , xk and k = l or n. Supposethat the rank of Xk is r, i.e., rank(Xk) = dim(R(Xk) = r.Perform the singular value decomposition of Xkas Xk=U VT, where U = u1

36、, , ur, ur+1, , um is the leftorthonormal matrix, is the singular value matrix, and Vknown labeled and unlabeled points7. In order to discoverboth geometrical and discriminant structures of the datamanifold, we incorporate the manifold structure of both la-beled and unlabeled data as the regularizat

37、ion term in theobjective function (18).When k = n, St in (16) may be singular. We use regu-larization to ensure the nonsingularity of St: St = St + Ir ,where ( 0) is the regularization parameter and Ir is ther r identity matrix. Let the column vectors b1, , bc1be the solutions of (17), ordered accor

38、ding to their eigen-values, 1 c1. Then, we nd a transformationmatrix B = b1, , bc1. The r (c 1) transforma-tion matrix B maps all the n samples to a set of pointsqi|qi= BTzi, i = 1, , n in Rc1. Let l(xi) be theclass label of xi, and Np(qi) = q1i, q2i, , qpibe the setof q0isp-nearest neighbors. We co

39、nstruct a p-nearest neigh-bor graph G to model the relationship between nearby datapoints. Thus, the weight matrix W of G can be dened asfollows:is the right orthonormal matrix. The subspace R(Xk) canbe spanned by the columns of P , where P = u1, , ur.When rank(Xk) = m, we can simply select the mm i

40、den-tity matrix I as P , i.e., P = I. We project both labeled andunlabeled data into subspace R(Xk). Thus, the embeddingis as follows:xi z i = PTxi, i = 1, , l, l + 1, , nLet X = x1, x2 , , xn, Xl = x1, x2, , xl,Z(k)= z(1k), , z(lkk),Zl= Z(1), , Z(c), and Z =z 1, z2 , zn. Then, Zl = P Xl and Z = P X

41、. In the sub-space, the between-class scatter matrix Sb and total scattermatrix St are as follows:Wij= , if xiand xjare labeled, and they share the same label;, if xiis labeled, xjis unlabeled,qifix sNis labeled,p(qj),and for eachl(xs) = l(xqis); Np(qj), if xj is labeled, xi is unlabeled, qifjx sNis

42、 labeled,p(qi) and for eachl(xs) = l(qxsj); N p(qi),1, if xiand xjare unlabeled,and qi Np(qj) or qj Np(q i);(19)Sb = ZlBZlT=P XlBXlTPTSt = ZlCZlT=P XlCXlTPTThen, the objective function of FDA is as follows:(15)(16)0, otherwiseIn general, if two data points are linked by an edge, theyare likely to be

43、 in the same class. Thus, a natural regular-izer can be dened as follows:a TP XlBXTlPTa R(a) =1 (aTz i aTzj)2Wij =aopt = arg maxa aTP XlCXlTPTa2i,j The optimal a is the eigenvector corresponding to the max-imum eigenvalue of eigen-problem: iaTziDiizTia 2i,jaTziWijzTja= (20)ZlBZlTa= ZlCZlTa (17)aTZ(D

44、 W )ZTa = aTP XLXTPTawhere D is a diagonal matrix; its entries are column sumA typical way to prevent overtting, which may happenwhen there is no sucient training sample, is to impose aregularizer15. The optimization problem of the regularizedversion of FDA can be written as follows:aTSbaof W , Dii

45、=jWij, and L = D W is the Laplacianmatrix. We get the objective function of SSFDA:aTSbaa opt= arg maxaT(St+ (1)P XLXTPT)a(21)aaopt= arg maxa aTSta + (1 )R(a)(18) The projective vector a that maximizes (21) is given by themaximum eigenvalue solution to the generalized eigenvalueThe a priori knowledge

46、 of the data can be incorporatedin the regularized term R(a), which provides us the exi-bility to incorporate the geometrical structure of the datamanifold8. The key to semi-supervised learning problemsproblem:Sba = (St+ (1 )P XLXTPT)a (22)is the priori assumption of consistency7, which means that1)

47、 nearby points are likely to have the same label and 2)points on the same structure (typically referred to as acluster or a manifold) are likely to have the same label. Aprincipled approach to formalize the assumption of consis-tency is to design a mapping function which is sucientlysmooth with resp

48、ect to the intrinsic structure revealed byP XlBXlTPTa= P (XlCXlT + (1 )XLXT)PTa (23)Let the column vectors a1, , ad be the solutions of (23),ordered according to their eigenvalues, 1 d.Thus, the embedding is as follows:xi y i= ATz i = ATPTxi = (P A)Txi1516 ACTA AUTOMATICA SINICA Vol. 35where yiis a d-dimensional representation of the high

展开阅读全文