1、稀疏表示与图像视频标注,lihaojieyilei2011年11月4日,1,Sparsity,A signal is sparse if most of its coefficients are (approximately) zero,1. Nonparametric Label-to-Region by SearchXiaobai Liu, Shuicheng Yan, Jiebo Luo, Jinhui Tang, Zhongyang Huang and Hai JinCVPR 20102. Sparse Ensemble Learning for Concept DetectionSh
2、eng Tang, Yan-Tao Zheng, Yu Wang, Tat-Seng ChuaIEEE Trans. on Multimedia, 2012,4,Nonparametric Label-to-Region by Search,Label-to-Region, L2Rpropagate annotated labels for a given single image from the image-level to their corresponding semantic regions,5,L2R,In CV, known as simultaneous object reco
3、gnition and image segmentationUnsupervised learning methodsobject localization: image segmentation along with object classification multi-label image segmentation and classificationcan only handle images either with single major object or with clean background and without occlusions between objects
4、Supervised learning methods, i.e., classifier-based methods, which usually first learn image classifiers to characterize concepts (or keywords) based on the training images, and then identify the images belonging to the specific category E.g., CMRM,6,L2R by Search-Overview,Each label of the image is
5、 used as query for online image search engines to obtain a set of semantically related and visually similar imagesSegment both input image and online images returned from image search engine into local atomic image patches to obtain the so-called bag-of-patches (BOP) representationA label-specific f
6、eature mining procedure is employed for each label to discover distinctive and descriptive features from the proposed Interpolation SIFT (iSIFT) feature pool. These features are used to discover the patch level label-specific representations,7,L2R by Search-Overview,Construct the candidate regions T
7、he continuity-biased sparsity prior is introduced to select a small number of patches from the online images with preference to larger patchesUse a sparse coding formulation to construct. The candidate regions are further ranked based on the reconstruction errors and the top ones are used to derive
8、the label confidence vector for each atomic patch of the input image.A patch clustering procedure is performed on the input image as a post-processing step to obtain the ultimate L2R assignments,8,L2R by Search-Advantages,1. the sparsity and continuity-biased priors are used to ensure the reliabilit
9、y of label assignment, 2. it does not require exact image parsing, which remains an open problem for real world images,3. no generative or discriminative models need be learned for each label, and thus it is extremely scalable for applications with large-scale image sets as well as large semantic on
10、tology,9,L2R by Search-Techniques,Image RepresentationLabel-Specific Feature Mining by SearchSparse Region CodingSparse Region Coding with Continuity-PriorLabel Assignment via Sparse RepresentationPatch Clustering,10,Image Representation,1. Bag-of-PatchesGraph-based segmentation Felzenszwalb2004Resi
11、ze all the images into a roughly equal resolution and initialize each pixel as one atomic patchUse color features to describe the appearance of an initial image patch and apply graph algorithm to merge the smaller patches into larger onesThis step iterates until all the image patches are merged into
12、 one single patch, namely the original image,11,Image Representation,2. Interpolation SIFT featuresSIFT is robust to image noises and scale changes, but is sparseTo interpolate some new interest points between the sparse interest points detected by the standard SIFT detector to enhance the image des
13、cription capability.SIFT 2D Delaunay triangulation interpolation,12,Label-Specific Feature Mining by Search,only a part of the vocabulary is descriptive or informative for the corresponding label,13,ObservationsIn order to capture objects or scenes, the visual representations should have the followi
14、ng properties: i) the visual words should appear on the input image, ii)the visual words that are informative for a specific label should appear more frequently than other words in the images containing the label, or they should be less frequent in the images not containing the label, and iii) the d
15、escriptive visual words should be located on the objects or scenes.,Label-Specific Feature Mining by Search,14,Method:2-stage procedure1. remove the words that do not appear in the input image2. mining with a probabilistic inference frameworkVocabulary W = W1,W2, . . . ,WNW . for each label c1) freq
16、uency of each visual word and 2) co-occurrence of each word with other wordsSelect the top 20% of these ranked words as label-specific representation,Label-Specific Feature Mining by Search,15,Label-Specific Feature Mining by Search,16,Sparse Region Coding,To discover the cross-image region/patch co
17、rrespondence via sparse coding is the feature descriptor of the candidate region is the coefficient vector, whose entries are expected to be zeros except for those samples containing the same label as , and is a noise vector which explicitly accounts for the possible sparse noises,17,Sparse Region C
18、oding,18,Sparse Region Coding with Continuity-Prior,The reconstructions of candidate regions are with sparsity prior, which means that we prefer to select as few patches as possible. Since our goal is to discover the cross image correspondence, it is natural to additionally enforce that the matched
19、image patches are perceptually and spatially coherent.This motivation leads to the preference to image patches with larger size, namely the continuity-biased prior.,19,Sparse Region Coding with Continuity-Prior,20,the derived coefficient is both sparse and continuity-biased,21,The top 5 selected ima
20、ge patches ranked according to the reconstruction coefficients using different priors,22,L2R Assignment via Sparse Representation,Given a candidate region y of the input image and the feature basis matrix A, we first compute its sparse representation by solving (5)Then, we classify y based on how we
21、ll the coefficients associated with all image patches of each label reproduce y.,23,24,25,Experiments,To evaluate the effectiveness of iSIFT feature pool, feature mining procedure and the continuity-biased sparse coding formulation for Label-to-Region assignment taskDatasets: MSRC 9, 500 images, 23
22、categories/labels, region-level ground truthsCOREL, 4000 images, 8 labels, region-level annotationsDataset collected by Stephen 20, 715 images, 7 labels, region-level annotationsBaselines:SVMKNN,26,Experiments,Dense sift: one sift each lattice of 10*10 pixelsSVM-I: svm + dense SiftSVM-II: svm + iSiftKNN-I: knn + dense SiftKNN-II: knn + iSiftLAS-A-I: LAS + dense Sift + sparse priorLAS-A-II: LAS+iSift + sparse priorLAS-B-II: LAS + iSift + sparse prior + continuity-biased priorBing, Google,27,Experiments,28,29,Dicussions,Image with noisy labelsVideo tag localization,30,Thanks & QA,31,