Prev: Spiking Neurons Up: ^ Stereo Mainpage ^ Next: Cyclopean View

Sampling 3d-space with Coherence-based Stereo

The fusion range of any stereo system is limited. In the human visual system for example, the space around the current fixation point which can be fused, called Panum's area, extents around 15 arcmin (this depends somewhat on the stimulus used to measure it).

As the following diagram shows, this region is a rather small area of three-dimensional space.

Panum's area

The near (green line) and far (red line) borders of the fusion area around the fixation point (blue line).

Outside this region, some qualitative depth can still be perceived, but stimuli can not be fused.

There is of course a simple solution to this limited working range of the fusional system - humans and many other animals sample the surrounding space by constantly changing the fixationpoint of their eyes.

Selecting a new fixation point requires two choices: deciding on a view direction, and selecting a corresponding view distance in this direction. Now, the first part is simple, but the second contains a a nice circular problem: we're started the business of stereovision to calculate distances, but arrived now at the conclusion that we have to know the distance of the fixation point before we can start to use our stereosystem, with its limited fusion range.

The solution lies in using low-resolution copies to calculate approximate depths for a vergence system. Reducing the original image sizes by a certain factor reduces the disparities in the stereo pair by the same amount. By appropriate reductions, the range of disparities can be brought into the fusion-range of a small vergence network. The following figure shows an example: it shows disparity maps calculated by a network which had only a small fusion range (4 pixel wide). The disparity range of the original stereo pair clearly exceeded this fusion range (left), but a smaller version could be fused without problems (right):

Full Resolution Medium Resolution Low Resolution

From left to right, image resolution is decreasing, finally bringing the whole image into the fusion range of the network.

This technique works of course only if the disparity estimates are precise enough for vergence, but since coherence-based stereo calculates disparities with subpixel-precision, this is a feasible approach. In principle, one could now use the low-resolution vergence maps to go through a sequence of fixation points, sample 3d-space. A vergence algorithm based on this idea is working as a pre-processing stage in the online-implementation of coherence-based stereo. It uses only one fixation point in the center of the stereoimages.

Interplay of Vergence and Fusion

However: even if a vergence movement was successful and correctly moved the two eyes onto the chosen fixationpoint, it is only guaranteed that this single point was transfered into the fusion range of the network (actually: to zero disparity). Thus, a combination of data from different fixation points needs usually to be done, and it needs to be done carefully, since there is not way of knowing beforehand the size or the shape of the area around the fixation point which will also be fused correctly. Wrong estimates should be singled out before data fusion from different fixation points.

Combination would be simple if a verification measure could be obtained for the disparity estimates. In coherence-based stereo, there is such a verification measure intrinsically available in the algorithm: one can use directly the amount of coherence within a disparity stack as a verification value (more about this at the pages dealing with difficult data or the definition of coherence).

Time for an example! Below, on the left, you see two superposed stereo images after alignment through vergence movements to the tip of the dragon nose.

Fixation Point Disparity Map Coherence Map

The shape of the area which can be fused around a fixation point depends on the 3d-structure of the scene, and is not known beforehand. But it is visible in the coherence map calculated by coherence-based stereo (right).

The disparity map calculated by the fusion system shows that, in addition, disparities at the backwall could be estimated successfully with this choice of the fixation point (middle). Accordingly, theses areas have also high values in the corresponding coherence map (right).

Updating Accumulators

For the combination of the data from various fixation points, accumulators can be used. The combination rule is simple: new data is only inserted if the new verification value exceeds the old one already stored in the accumulator.

Updating Process

Current Data is combined with the new estimates by selecting from the map with the higher verification value; in this way, a complete map builds up in the accumulators (top images: disparity data, low images: verification data).

The above figure continues the example. Old disparity values already stored in the accumulator (left) are overridden by new estimates (middle) only when the new estimates have a higher coherence value. In the example, the updated accumulator (right) has filled-in disparity values at the dragon nose and on the top of the wall; areas with bad estimates (like the left part of the wall) didn't have enough coherence to make it into the accumulator.

Putting it all together

It's a simple thing to add an automatic scanning algorithm to complete the whole system; one basically chooses fixation points in image areas where the accumulated verification values are still low. One can stop the algorithm after a preset number of fixation points, or when the verification is everywhere high enough.

Below are some sample runs of the system, obtained with un-calibrated stereo images. Note that evenso the original stereo pairs are badly aligned (in fact: not at all ...), the algorithm is able to calculate the disparity map and the co-registered cyclopean view!

Sum of two stereo images Disparity
The intial stereo images for the dragon-sequence overlaid. These images were obtained by a handheld camera which was simply shifted a few centimeters between the two exposures. The final cyclopean view of the scene. The vergence-fusion system simply locks onto the 3d-structure present in the stereo data, and unsrcambles it. Co-registered with this cyclopean image is an appropriate disparity map.

Movie 1 2 3 4 5 6 7 8 9 10
Movie 1 2 3 4 5 6 7 8 9 -
Movie 1 2 3 4 5 6 7 8 9 -

Click on the image to see an animated .gif-file which shows the accumulating disparity data (careful: these are files are large!), on the "Movie"-entry to watch a .mpg-file which displays in addition a lot of intermediate results, or on any of the single numbers to view one time step. Clicking on the disparity map on the right gets you the final result obtained by the system.

For many tasks, like grasping an object or navigating through narrow passages, the co-registered disparity map which is output by the vergence-fusion-system provides sufficient data, evenso it only provides relative depth values. Conversion of these relative depth values into absolute depth values is simple, once the separation between the two camera centers is known. This is of course a fixed value in the human visual system (well, only to a first approximation - the separation changes slightly with eye-rotation!).

In summary, the combined vergence- and fusion-system locks onto the 3d-structure present in the scene, recovers this structure from the two stereo views and (re-)constructs in this way the three-dimensional world.

The vergence algorithm described here is also used in the online-imageprocessing stereo algorithm. Try it with your own images!

© 1994-2003 - all rights reserved.