1、INTERNATIONAL ORGANISATION FOR STANDARDISATIONORGANISATION INTERNATIONALE DE NORMALISATIONISO/IEC JTC1/SC29/WG11CODING OF MOVING PICTURES AND AUDIOISO/IEC JTC1/SC29/WG11 N6834Palma de Mallorca, Spain, October 2004Source: Requirements Title: Requirements on Multi-view Video Coding v.2Status: Approved
2、1 Introduction.12 Applications 22.1 Free Viewpoint Video (FVV) / Free Viewpoint Television (FTV) 22.2 3DTV 32.3 Immersive teleconference .33 Requirements for Multi-view Video Coding 33.1 Compression related requirements33.1.1 Compression efficiency.33.1.2 Scalability33.1.3 Performance efficiency .33
3、.1.4 Low delay33.1.5 Robustness 33.1.6 Resolution, Color space and depth33.1.7 Quality consistency among views.33.1.8 View random access, partial decoding and rendering.43.1.9 Temporal random access.43.1.10 Camera motion43.1.11 Resource management 43.2 System support related requirements 43.2.1 Sync
4、hronization 43.2.2 View generation 43.2.3 Non-planar imaging and display systems .43.2.4 Camera parameters44 References.41 IntroductionThere have been many input documents brought to MPEG in the last 2 years on free viewpoint video in which multiple-view video coding techniques show improved coding
5、efficiency over existing MPEG compression tools. In this time, it has been recognized that Multi-view Video Coding (MVC) is a key technology that serves a wide variety of applications, including FTV (free-viewpoint television), 3DTV (3D television) and surveillance. In response to a “Call for Commen
6、ts on 3DAV” 3, a large number of companies have expressed their need for standards that enable FTV (free viewpoint television) and 3DTV. Multi-view video coding (MVC) is an encoding framework for multiple video streams and associated camera parameters. This document first presents some cases where M
7、VC is applicable, and then the requirements for MVC.A list of 3DAV applications and the reasons why they require multiple-view video coding are given in the Application and Requirements document 1. Details on the technology itself are described in the “Report on 3DAV Exploration” 2. 2 Applications2.
8、1 Free Viewpoint Video (FVV) / Free Viewpoint Television (FTV)In this application scenario, the viewpoint and view direction can be interactively changed which may be different from any of the input ones, i.e., those at which the original videos are shot. During such viewing, the viewers experience
9、the free viewpoint navigation within the range covered by the shooting cameras. Such a scenario can appear in the below applications:1. Entertainment concert, sport, multi-user game, movie2. Education cultural archives, manual with real video, instruction of sports playing, medical surgery3. Sightse
10、eing zoo, aquarium, botanical garden, museum.4. Surveillance traffic intersection, underground parking, bank5. Archive space archive, living national treasures, traditional entertainment6. Art/Content creation of new type of media art and digital contentThe basic components of an example FTV system
11、are depicted in Figure 1. The output images from the MVC decoder are used for FTV view generation; this view generation procedure may interpolate images from different views. To achieve high-quality view generation results, a correction process (i.e., rectification of misalignment (未对准校正)and normali
12、zation of colors(颜色标准化)) is necessary in most cases. In the example FTV system shown in Figure 1, the correction is applied prior to encoding. Figure 1: Basic components of an example FTV systemA more detailed architecture of an example FTV decoder is depicted in Figure 2. Input streams to the FTV d
13、ecoder include multi-view video elementary information, video resource management information, timing information, and camera parameters information. In this architecture, the MVC decoder provides reconstructed video data, which is then used in the view generation process. Note that camera parameter
14、s may also be used during the MVC decoding process. Video resource management information may be used for managing the picture memory in an efficient way and for generating predictive images for the MVC decoder. Finally, view generation is performed according to the video data information and associ
15、ated camera parameters information.Figure2: Example architecture of an FTV decoder 2.2 3DTVThis can be thought of as an extension of the current stereoscopic movie. In stereoscopic movie, all viewers share the same viewpoint. In 3DTV, multiple cameras are used to capture the light field of the scene
16、. When such a light field is displayed, multiple viewers can see different stereoscopic views consistent with their relative locations. The application potentials are similar to above. Video Resource Mangmnt IfoMVC ideo Elemntary Stram InfoTiming IfoCamera Prametrs Info MVC Decoder View GnrationShar
17、ed MemoryVideo Captur CorectionMVC Encoder MVC Decoder View GnrationDisplay2.3 Immersive teleconference In both scenarios above, there is interactivity between the viewers and the video content, but not between the views themselves. In immersive teleconference, participants at different geographical
18、 sites meet virtually and see one another in either free viewpoint or 3DTV style. The immersiveness provides a more natural way of communications.1. corporate teleconference2. remote trainingRefer to 1 for more information on relevant applications.3 Requirements for Multi-view Video Coding Note that
19、 in the sequel, we use “shall” if a certain requirement is very important, and “should” if a certain requirement is desirable.3.1 Compression related requirements3.1.1 Compression efficiency MVC shall provide high compression efficiency. Some overhead may be necessary to ease view interpolation, (i.
20、e., trading coding efficiency for functionality). However, the overhead data should be limited, in order to increase acceptance of the new services.3.1.2 ScalabilityVarious types of scalability should be supported including SNR scalability, spatial scalability, temporal scalability, complexity scala
21、bility, and view scalability. This enables to display the same content on a multitude of terminals and network condition that exhibit a variety of capabilities. 3.1.3 Performance efficiency MVC should be efficient in terms of computation complexity and resource consumption. 3.1.4 Low delay MVC shoul
22、d support modes that have low encoding and decoding delay, view change delay, and end-to-end delay.3.1.5 Robustness Robustness to errors (also known as error resilience) should be supported. This enables the delivery of 3D content on error-prone networks such as wireless networks and other networks.
23、3.1.6 Resolution, Color space and depth MVC should support a variety of resolutions (e.g. QCIF, CIF, SD, or HD), color space (e.g. YCrCb 4:4:4, 4:2:2 and 4:2:0 samplings, or RGB) with and color depth up to 16 bits per pixel component. 3.1.7 Quality consistency among views MVC should provide perceptu
24、ally similar visual quality over different views to be presented at the same time frame. 3.1.8 View random access, partial decoding and rendering MVC should support random view access (e.g. view switching), and partial decoding of a certain subset of views.3.1.9 Temporal random accessMVC should supp
25、ort random access at a certain time.3.1.10 Camera motionMVC should support encoding of sequences subject to camera motion.3.1.11 Resource managementMVC should support efficient management of decoder resources.3.2 System support related requirements3.2.1 Synchronization MVC shall support accurate tem
26、poral synchronization among the multiple views. 3.2.2 View generationMVC should support robust generation of a virtual view or interpolated view 3.2.3 Non-planar imaging and display systems MVC should support efficient representation and coding methods for 3D display including IP (integral photography) and non-planar image (e.g. dome) display systems. 3.2.4 Camera parametersMVC should support transmission of camera parameters4 References1 N5877, “Application and Requirements for 3DAV”2 N5878, “Report on 3DAV Exploration” 3 N6051, “Call for Comments on 3DAV”