1、6.869 Advances in Computer Vision,http:/people.csail.mit.edu/torralba/courses/6.869/6.869. computervision.htm,Lecture 1 Introduction,Spring 2010,Mondays/Wednesdays 1:00-2:30 pm Room 2-139 Instructor: Antonio Torralba Email: torralbacsail.mit.edu TA: Joseph Lim,http:/people.csail.mit.edu/torralba/cou
2、rses/6.869/6.869. computervision.htm,Readings,Course requirements,Two take-home exams (given out Monday, due back Wednesday)Five problem sets with lab exercises in Matlab (one to two weeks per problem set)Final project (no final exam),Grading,Problem sets are graded check, check-plus, check-minus. (
3、Outstanding solutions get extra credit.)Final grade: 5 problem sets: 1/3 2 take-home exams: 1/3final project: 1/3,Collaboration policy,Problem sets may be discussed, but all written work and coding must be done individually. Please note on your problem sets who you discussed the homework problems wi
4、th.,Take-home exams,Take-home exams may not be discussed. Individuals found submitting duplicate or substantially similar materials due to inappropriate collaboration may get an F in this class and other sanctions.,Final project,The final project may be An original implementation of a new or publish
5、ed idea A detailed empirical evaluation of an existing implementation of one or more methods A paper comparing three or more papers not covered in class, or surveying recent literature in a particular area Something related to your research.A project proposal not longer than two pages must be submit
6、ted by April 1st. I can provide ideas or suggestions for projects.,Prerequisites,Familiarity with linear algebra Familiarity with probability Covers topics complementary to 6.801/866 and these subjects may be taken in sequence. Prerequisites: 6.041 or 6.042; 18.06,Other classes,6.801/6.866 Machine V
7、ision 6.869 Advances in Computer Vision 6.870 Advanced Topics in Computer Vision 6.815/6.865 Digital and Computational Photography, Fredo & Bill MAS 132/532 Camera Culture: Future of Imaging, Ramesh6.344 Digital Image Processing, J. Lim 6.342 Wavelets, Approximation, and Compression, V. K. Goyal,Wha
8、t is vision?,What does it mean, to see? “to know what is where by looking”. How to discover from images what is present in the world, where things are, what actions are taking place.,from Marr, 1982,The importance of images,100 million $,“Dora Maar au Chat” Pablo Picasso, 1941,Some images are more i
9、mportant than others,Why is vision hard?,The structure of ambient light,The structure of ambient light,The Plenoptic Function,The intensity P can be parameterized as:,P,(q, f,t,l,X, Y, Z),“The complete set of all convergence points constitutes the permanent possibilities of vision.” Gibson,Adelson &
10、 Bergen, 91,Measuring the Plenoptic function,“The significance of the plenoptic function is this: The world is made of 3D objects, but these objects do not communicate their properties directly to an observer. Rather, the objects fill the space around them with the pattern of light rays that constit
11、utes the plenoptic function, and the observer takes samples from this function.” Adelson, & Bergen 91.,Measuring the Plenoptic function,Why is there no picture appearing on the paper?,Light rays from many different parts of the scene strike the same point on the paper.,Forsyth & Ponce,Measuring the
12、Plenoptic function,The camera obscura The pinhole camera,The pinhole camera only allows rays from one point in the scene to strike each point of the paper.,Light rays from many different parts of the scene strike the same point on the paper.,Forsyth & Ponce,Problem Set 1,http:/ Set 1,Effect of pinho
13、le size,Wandell, Foundations of Vision, Sinauer, 1995,Wandell, Foundations of Vision, Sinauer, 1995,Animal Eyes,Animal Eyes. Land & Nilsson. Oxford Univ. Press,Measuring distance,Object size decreases with distance to the pinholeThere, given a single projection, if we know the size of the object we
14、can know how far it is.But for objects of unknown size, the 3D information seems to be lost.,Playing with pinholes,Two pinholes,Two pinholes,What is the minimal distance between the two projected images?,Anaglyph pinhole camera,Anaglyph pinhole camera,Anaglyph pinhole camera,Synthesis of new views,A
15、naglyph,Problem set 1,Build the device Take some pictures and put them in the report Take anaglyph images Work out the geometry Recover depth for some points in the image,Why is vision hard?,Some things have strong variations in appearance,Some things know that you have eyes,Brady, M. J., & Kersten,
16、 D. (2003). Bootstrapped learning of novel objects. J Vis, 3(6), 413-422,Why is vision hard?,Measuring light vs. measuring scene properties,Measuring light vs. measuring scene properties,Measuring light vs. measuring scene properties,We perceive two squares, one on top of each other.,Measuring light
17、 vs. measuring scene properties,by Roger Shepard (”Turning the Tables”),Depth processing is automatic, and we can not shut it down,Measuring light vs. measuring scene properties,(c) 2006 Walt Anthony,Assumptions can be wrong,Ames room,By Aude Oliva,Vision has to solve an ill-posed problem,Sinha & Ad
18、elson 93,Generic view assumption,Image,Generic view assumption: the observer should not assume that he has a special position in the world The most generic interpretation is to see a vertical line as a vertical line in 3D.,Freeman, 93,A simple idea to recover 3D shapes from line drawings,Marill, AI-
19、Memo-1136, 1989,Task: Given the set of image coordinates of the vertices (xi, yi), recover the world coordinates (Xi, Yi, Zi). We will assume orthographic projection so that Xi = xi and Yi = yiThen, the problem is: recover the missing depth Zi,A simple idea to recover 3D shapes from line drawings,Ma
20、rill, AI-Memo-1136, 1989,Heuristic: find Zi that minimize the standard deviation of angles in the 3D object,Zi = 0 for all i,70,20,110,Possible solution 1,Possible solution 2,All 90 degree angles,A simple idea to recover 3D shapes from line drawings,Marill, AI-Memo-1136, 1989,INPUT,Reconstruction -1
21、0 degrees,Reconstruction +10 degrees,A simple idea to recover 3D shapes from line drawings,Marill, AI-Memo-1136, 1989,A simple idea to recover 3D shapes from line drawings,Marill, AI-Memo-1136, 1989,A simple idea to recover 3D shapes from line drawings,Marill, AI-Memo-1136, 1989,Where is now Compute
22、r Vision?,Application of statistical image model, and variational Bayesian inference: removing motion blur,Original,Variational Bayes,Fergus et al, 2006,Close-up,Original,Nave Sharpening,Variational Bayes,Fergus et al, 2006,Texture synthesis,Input,Output: new instances of the same “kind” of texture.
23、,Portilla & Simoncelli, 1999,Texture synthesis,Input,Output: new instances of the same “kind” of texture.,Portilla & Simoncelli, 1999,2D frontal face detection,Amazing how far they have gotten with so little,Face detection,Face detection,Haar filters and integral image Viola and Jones, ICCV 2001,The
24、 average intensity in the block is computed with four sums independently of the block size.,Google street view,Google street view,Assisted driving,Lane detection,Pedestrian and car detection,Collision warning systems with adaptive cruise control, Lane departure warning systems, Rear object detection
25、 systems,Iris recognition,JOHN DAUGMAN,http:/www.cl.cam.ac.uk/jgd1000/iriscollage.jpg,Iris recognition,JOHN DAUGMAN, 93,Genetically identical eyes also have different iris,Iris recognition immigration system,http:/www.ind.homeoffice.gov.uk/managingborders/technology/iris/,SIFT,D. Lowe, 2004,SIFT vec
26、tor formation,Thresholded image gradients are sampled over 16x16 array of locations in scale space Create array of orientation histograms 8 orientations x 4x4 histogram array = 128 dimensions,Image stitching,Brown, Lowe, 2007,Photo turism,PhotoSynth,Snavely et al. 2006,(Goesele et al. 2007).,Video G
27、oogle,Sivic, J. and Zisserman, A. Video Google: A Text Retrieval Approach to Object Matching in Videos Proceedings of the International Conference on Computer Vision (2003),Visually defined search,Given an object specified by its image, retrieve all shots containing the object:must handle viewpoint
28、change etcmust be efficient at run time,people,objects,places,Slide by Josef Sivic,Object search in video: why is it hard?,an objects imaged appearance varies ,scale changeslighting changesviewpoint changespartial occlusion,sheer amount of datafeature length movie 100,000 -150,000 frames,Slide by Jo
29、sef Sivic,Image,visual nouns,Visual description visual words,Slide by Josef Sivic,Visual vocabulary unaffected by scale and viewpoint,The same visual word,Slide by Josef Sivic,Image representation using visual words,Use efficient google like search on visual words,Slide by Josef Sivic,Efficient sear
30、ch: In a classical file structure all words are stored in the document they appear in. An inverted file structure has an entry (hit list) for each word where all occurrences of the word in all documents are stored. In our case the inverted file has an entry for each visual word, which stores all the
31、 matches, i.e. occurrences of the same word in all frames. The document vector is very sparse and use of an inverted file makes the retrieval very fast. Querying a database of 4k frames takes about 0.1 second with a Matlab implementation on a 2GHz pentium.,Video Google,Query:,Retrieved frames:,retri
32、eved shots,Example : Groundhog Day,Video Google, Sivic & Zisserman, ICCV 2003,Slide by Josef Sivic,Example: Casablanca,retrieved shots,Slide by Josef Sivic,Improving online search,Query: STREET,Organizing photo collections,Views,Labels,Web users,Problem set 1,Build the device Take some pictures and put them in the report Take anaglyph images Work out the geometry Recover depth for some points in the image,