1、Wide Area Camera Calibration Using Virtual Calibration Objects Xing Chen, James Davis, Philipp Slusallek*Computer Graphics Lab Stanford University, Stanford, CA 94305 xcchen, jedavis, slusallekgraphics.stanford.edu Abstract This paper introduces a method to calibrate a wide area system of unsynchron
2、ized cameras with respect to a single global coordinate system. The method is simple and does not require the physical construction of a large calibration object. The user need only wave an identifiab le point in front of all cameras. The method generates a rough estimate of camera pose by first per
3、forming pair - wise structure - from - motion on observed points, and then combining the pair - wise registrations into a single coordinate frame. Using the initial camera pose, the moving point can be tracked in world space. The path of the point defines a “virtual calibration object” which can be
4、used to improve the initial estimates of camera pose. Iterating the above process yields a more precise estimate of both camera pose and the point path. Experimental results show that it performs as well as calibration from a physical target, in cases where all cameras share some common working volu
5、me. We then demonstrate its effectiveness in wide area settings by calibratin g a system of cameras in a configuration where traditional methods cannot be applied directly. 1 Introduction Many applications of tracking and observation require operation over a wide area, such as monitoring the traffic
6、 flow of vehicles in a parking structure or people in a building. In such cases, a single camera is unlikely to be sufficient. Rather, a network of interconnected cameras is required, each of which functions over only a small subset of the total area. In order to build such a system, a number of iss
7、ues need to be addressed. The cameras must be calibrated in some global coordinate system; distributed components may need to communicate with each other; and some estimate of the system state which integrates all available sources of informa tion must be computed. In this paper we address the issue
8、 of calibrating cameras in a wide area sensing environment. Wide area system calibration is much more challenging than calibrating a single camera. In single camera calibration, the usual method inv olves placing a carefully instrumented calibration target in the field of view. Based on corresponden
9、ces between known 3D features on the target and their 2D locations in the image, calibration can be obtained. If multiple cameras are active in the same wo rking volume, then each can be calibrated individually using an identical process. The case of wide area calibration introduces a number of diff
10、iculties. Cameras each cover only a small subset of the total working volume. A calibration target can be moved so that each camera is calibrated separately. However, a global calibration requires knowledge of relative target motion. This is difficult to obtain without expensive instrumentation. Sim
11、ultaneous activation of cameras poses an additional problem. In lar ge systems with many possibly heterogeneous cameras it becomes difficult to ensure that all cameras record observations at exactly the same moment. Many algorithms rely on simultaneity as a fundamental constraint. In this paper we i
12、ntroduce a method that brings a system of unsynchronized cameras into calibration in a single global coordinate system. A rough estimate of each cameras pose (i.e. location and orientation) is obtained using standard structure - from - motion techniques. The rough camera calibrati on can be used to
13、track the path of a point moving through the entire working volume. This path defines a virtual calibration object, which can be used to improve the estimate of camera pose in the global coordinate space. Iterating the above process resul ts in the convergence to both the point path defining the vir
14、tual calibration object and a precise estimate of camera pose. We evaluate our method by comparing it to traditional calibration techniques. Furthermore, we demonstrate its effectiveness in wide area settings by calibrating a multi -camera indoor tracking system where cameras cover * Current address
15、: University of Saarbruecken, P.O.Box 15 11 50 66041 Saarbruecken, Germany. IEEE Conference on Computer Vision and Pattern Recogntion 2000 (CVPR) Copyright IEEE 2000disjoint viewing regions, a more challenging situation where traditional methods cannot be easily applied. The rest of the paper is org
16、anized as follows. Section 2 intr oduces the example application to which our calibration technique is applied. Section 3 discusses previous work. Section 4 describes our proposed method and Section 5 gives experimental results. We conclude in Section 6. 2 Application The wide area calibration techn
17、ique presented in this paper can be generalized to a wide variety of applications, sensors, and environments. However, an understanding of the specific application in our lab may prove illus trative. Our experimental system was designed to track people in front of a large screen, multi - projector d
18、isplay 1 . In one application, the head position of the user is acquired to provide the correct viewpoint for images rendered on th e display. The tracking system has ten ceiling - mounted cameras oriented to observe a 4.0 x 4.5 meter area. Coverage extends from approximately a half meter to 2 meter
19、s from the floor. The wide - angle cameras in the corners cover the volume. In addition, sin ce our application requires higher tracking resolution right in front of the display, a few more narrowly focused cameras are installed to increase the resolution in that area. Individual cameras observe onl
20、y a portion of the volume. In aggregate, however, they cover the space. Figure 1 shows a photograph of the tracking space, and a plan view of camera placement. Note that in order to ensure correct estimates of observed object position, it is required that at least two cameras ob serve any given regi
21、on in space. However, there is no single point that is observed by all cameras. Each camera is connected to a digitizing board and a CPU. Interlaced fields are captured and processed at 60 Hz. At capture time, fields are time stamped by the local CPU, and all CPUs use a standard network time daemon
22、to ensure consistent time within 3ms. However, the cameras themselves are not synchronized for simultaneous capture. After digitization the local CPU extracts features and sends these over the network to a central estimator that uses an extended Kalman filter 2 to integrate data from all cameras int
23、o a single estimate of object position and orientation. 3 Previous work There has been a great deal of research in the area of a ccurate camera calibration. Most previous methods use a known calibration pattern that is imaged by the camera. Features are extracted from the image, and the best fit of
24、intrinsic camera parameters and extrinsic camera pose is obtained. Tsai proposed a wi dely used model, but other more robust models are used as well 3, 4 . Azarbayejani and Pentland propose a method for calibrating the relative position of cameras 5 . An identifiable object is waved in front of a sy
25、nchron ized stereo pair of cameras, and the per - camera image location of the object at each time step is recorded. A standard structure from motion system is used to derive the relative pose of the two cameras. Their focus is not wide area tracking, and synchroni zed cameras with a common viewing
26、volume are required. Stein proposes a system of cameras to track vehicles in an outdoor environment 6 . By observing the motion of objects in video sequences from multiple cameras, an (a) (b) Figure 1 : (a) Ceiling mounted cameras are used to track users around a wide area environment in front of th
27、e large display. (b) Cameras are arranged so that observation of the entire space is possible, although no single camera o bserves the entire working volume. approximate camera pose and time offset can be recovered from several unsynchronized cameras. Image features are used to refine the calibratio
28、n estimate. This system requires a flat ground plane in all the images and solves the homography relating objects on this 2D plane. Rander and Kanade have a system of approximately 50 cameras arranged in a dome to observe a room sized space. In order to calibrate these cameras, a large calibration o
29、bject is built and then moved precisely to several locations, in effect building a virtual calibration object that covers the room 7 . While this works well, it can be quite cos tly to ensure the precise movement of a calibration object, and it is not easily adaptable when the shape or size of the w
30、orking volume changes. Gottschalk and Hughes propose a framework for auto - calibration in wide area spaces 8 . Head mounted sensors observe precisely synchronized beacons mounted on the ceiling of their UNC lab. Data gathered from the sensors can be used to estimate both the moving head location an
31、d orientation, and to refine the initially available position estimate of the beacons. Like the system in this paper, they also employ the principle of iterative calibration. Welch later proposed a refined estimation method 9 . However their tracking architecture is quite different from the multi -
32、camera enviro nments we consider. This papers contribution is a wide area calibration method that addresses several previously ignored difficulties. A large number of unsynchronized cameras can be calibrated in a single consistent coordinate system. This can be achiev ed even when some cameras are a
33、rranged with non - overlapping working volume and when no initial estimate of camera pose is available. In addition, the method requires no complex instrumentation, and is easily adaptable to working volumes of variable size an d shape. 4 Proposed Method An outline of our method is shown in Figure 2
34、 . After the separate calibration of intrinsic camera parameters, our method begins by obtaining 2D image correspondences. The pairwise relative pose between cameras can be found using structure from motion. Then, a unification process brings these pairwise relationships into a single global space.
35、The rough estimate of global pose calculated by the preceding steps can be used to initialize the following iterat ive procedure. A 3D trace of an object moving through space is estimated using an extended Kalman filter (EKF). This trace can be used as a virtual calibration object by correlating it
36、with camera observations. Using traditional camera calibration, a new set of camera pose estimates is obtained. Iteration produces a globally consistent camera calibration. 4.1 Intrinsic calibration Camera calibration is typically divided into two parts: intrinsic and extrinsic parameter calibration
37、. The intrinsic parameters usually consist of lens distortion, image center, and focal length. For a short baseline stereo pair the relative pose between cameras in the pair could also be included. Extrinsic parameters define how the local camera coordinate system relates to a glob al coordinate sys
38、tem, i.e. the six parameters defining position and orientation. We propose that intrinsic calibration is best performed on cameras individually since it is not dependent on the global coordinate system. As mentioned previously, many calibr ation methods exist that are appropriate for a single camera
39、. We use the intrinsic models proposed by Heikkila 4 . Finding the extrinsic parameters of each camera in a way that is globally Structure from MotionGlobal UnificationEKF based Physical Point TrackingVirtual Object CreationTraditionalCamera Calibration2D - 2D image correspondencesRough global camer
40、a posePair - wise camera relations3D object trace3D - 2D correspondencesImproved global camera pose2D image correspondencesRough global camera posePairglobal camera poseFigure 2: The main stages in our calibration method. Boxes represent computational stages. Italic text shows data flow. consistent
41、is the focus of the remaining portion of this paper. 4.2 Initial extrinsic calibration Pairwise calibration using structure from motion . To obtain a globally consistent extrinsic calibration of cameras, we start by searching for pairwise registration between nearby cameras in our system. By finding
42、 corresponding 2D image points in a pair of camera views we can employ any structure from motion system to recover both the 3D location of these corresponding points, and more importantly for our application, the relative pose of the camera pair. We use a publicly available structure from motion imp
43、lementation from Zhang 10 . An easily identifiable object is moved so that over time it covers the working volume of our system. We use an LED or flashlight in a darkened room. Since each camera sees only a subset of the working area, not all cameras observe the object at any given location. At this
44、 stage, however, only pairwise registration is required. The corresponding 2D observations for all relevant pairs of cameras are recorded. It should be noted that since the cameras are not synchronized for simultaneous input, no pair of cameras will actually observe the point at exactly the same loc
45、ation. At this stage we make an approximation that will be refined in a later part of our algorit hm. Since the object is known to move continuously, we discretize time into small intervals. We use 36ms, since this is approximately the time required for two NTSC video fields to be processed by our 6
46、0Hz cameras. Observations occurring during the same ti me interval are approximated as collocated both temporally and spatially. Given this approximation and the resulting set of pairwise image correspondences, we can employ structure from motion to obtain a set of pairwise camera registrations. Glo
47、bal unif ication . The pairwise camera registration that has been obtained provides only the relative rotation and translation up to an unknown scale factor. The desired global calibration will place all cameras in a single global coordinate system. Starting with an arbitrary pair of cameras, we def
48、ine a global coordinate system. New cameras can be added incrementally until all available cameras have been included in the global framework. Figure 3 shows an example of a new camera being add ed to the global framework. In this example the translation and orientation of cameras A and B are known
49、in the global coordinate system. In addition, we have pairwise relationships giving the translation and orientation of camera C, relative to those of A and B. Note that the translation of C given by its pairwise relationships is a vector with unknown scale. However, intersection of the rays AC and BC is sufficient to acquire the scale factors and . Due to er rors, the rays may not intersect. We use the point with the minimum distance to both rays as the approximate intersection point. Once the scale factor or is computed, the global location and orientati