Prev: Cyclopean View Up: ^ Stereo Mainpage ^ Next: Deconstruction

Transparency and Rivalry

The visual processes leading to human perception are complex; fusion of disparate images into a single cyclopean view is only one option. Many other percepts are possible and a good theory of stereovision should be able to handle at least
  • Transparency: There are many natural stimulus conditions which cause the perception of transparency. Most notable are specular highlights on shiny surfaces, which are perceived at a different depths than the object surface creating them. Another simple and common example are thin leaves of a tree with the background shining through them.

    Most stereo algorithms can not handle such stimulus situations which the human visual system handles gracefully. This stereogram (arranged for crossed-eye viewing),


    creates the perception of transparency.

  • Binocular Rivalry: If the two eyes see very disparate images, like in this stereogram,
    humans (and probably some other animals) experience binocular rivalry: after a short time-delay during which one sees both images superimposed, perception then alternates in a random fashion between the left and right view of the scene.

    This stimulus condition is actually quite common in natural viewing conditions as only a small part of 3d-space around the current fixation point (namely Panum's area) can be fused by the human visual system. In all other image areas, where the images are too disparate to be fused, binocular rivalry can happen.

    The perceptual switching in binocular rivalry shares many features with other multistable percepts, and might be the outcome of a general neuronal process.

Coherence-based Stereo

Transparency and binocular Rivalry, and even their combination, can be handled by a network which is basically an extension of coherence-based stereo to multiple orientation channels.

Let's look at the following stereogram

Stereogram - big
which creates (for crossed-eye viewing) the following percepts:
  • A - Rivalry: pure binocular rivalry
  • B - Fusion: a checkered disk, floating above the stony background.
  • C - Transparency: two disks with stripes, one floating below, one above the stone-background.
  • D - Transparency and binocular Rivalry: one disk with stripes, floating below the stone-background, plus a rivalling striped disk with no depth, fading in and out of perception.

In a coherence-network utilizing distinct orientation channels, three different interactions are possible for coherence-clusters in and between the channels:

  1. disparity estimates in different orientation channels differ by an amount smaller than the coherence-threshold
      -> data of these orientation channels are fused into a common percept.

  2. disparity estimates differ by more than the coherence-threshold
      -> transparent surfaces at the different depths are seen.

  3. no coherence is detected at all within a binocular orientation channel
      -> binocular rivalry starts between left and right monocular image streams.

Using this paradigm, simulations with the above stimulus lead to the following results (using just two orientation channels):

  • Depth estimates in orientation channel  1 -
 in channel 1
    Depth is recovered in most areas; two "holes" are lacking coherence in this channel. It's these areas where binocular rivalry might develop.

  • Depth estimates in orientation channel  2 -
 in channel 2
    • In the top-left corner, again, no coherence is detected. This is case  3 from above. Therefore, rivalry has to be expected at this location.
    • In the top-right corner, the same depth-estimate as in orientation channel  1 is obtained. As assumed in case  1 above, data from these two channels are fused (see next display).
    • At the bottom-left, orientation channel  2 has a different depth-estimate from its counterpart. This is case  2 from above, which results in transparency perception.
    • At bottom-right, the surface floating below the stone-background is recovered. Yet no coherence was in the other orientation channel so rivalry will be present. We have a mixture of cases  2 and 3.

  • Combined depth-estimates over all orientation channels -
 in the combined channel
    This combined channel shows all image areas where case  1 holds: the depth-estimates in different orientation channels are closer to each other than the coherence-threshold and therefore combined in a united percept.

    Clearly, this display shows only these image areas where a single solid surface can be perceived. Neither the areas where rivalry occurs (top-left/bottom-right) nor the areas where transparency was perceived (bottom-left) are present here.

Final Result

Combining all evidence, the following perception chart emerges for the above test stimulus (with R=rivalry, F=Fusion, T=transparency, and F+R=simultaneous fusion and rivalry):
 of assignment
This corresponds nicely with human perception; if you want to see some results with a larger network utilizing a total of four orientation channels, please click here.

Some further thoughts ...

Most models proposed for binocular rivalry assume recurrent inhibition to block the monocular input stream which is currently not perceived (Sugie '82, Matsuoka '84, Lehky '88) or utilize some auxiliary variables to model a fatigue process (Grossberg '87, Blake '89, Dayan '98) governing this suppression.

In essence, all these models are assuming that during the dominant phase of, say, the left input stream, the right input stream receives a inhibitory signal blocking this stream. During the course of time, this inhibition (or the variable governing the strength of this inhibition) decays until the roles are switched: the right input stream becomes dominant, and the left stream is suppressed.

However, there are some experimental facts which are hard to explain with such models:

  • binocular rivalry is a random process: the time an input channels stays dominant follows approximately a gamma-distribution. Simply introducing a random process in the bistable switching system proposed to govern perception won't help much because of the time constants needed in this system. They would average out most random fluctuations.

  • There is fast switching between the left and right percepts; never (except for about 200ms during the onset of binocular rivalry) is a mixture state perceived in a specific view-direction. Fast switching would require the inhibitory signal to be either "on" or "off", but not gradually weakening over time.

  • Detection thresholds stay more or less constant during the suppression phase (Fox & Check, 1972). Again, this is hard to explain with a time-decaying inhibition mechanism.

  • The suppression of an input channel is nonspecific - it is as if simply all information from a suppressed channel is turned off ( Blake '89, but see O'Shea & Crassini '81 for a dissenting view).

  • Increasing the contrast of a monocular stimulus decreases the average time this stimulus is suppressed (and vice versa); this has no strong effect on the average duration of the other monocular percept (Levelt '65).
In order to explain these points, the model proposed here involves competition over access to post-processing stages rather than inhibition between input streams.

It is assumed that this competition depends on the amount of coherence in the different channels. Coherence in monocular channels might be defined similar to the binocular channels of the stereo subsystem, only that coherence in monocular channels is calculated in texture space, not in disparity space (cf. the feature maps of the segmentation network and the similarity of disparity- and texture-estimation).

If the different coherence pools are marked with different temporal codes, (which would be the case if coherence detection is carried out by a network of spiking neurons) it will be possible for the post-processing neuronal circuitry to lock onto the signals from these pools. But of course, locking will be possible only to one temporal code signal at a time. If this assumption is correct, it would explain the suppression of all the other inputs (i.e. the binary behaviour of the input selection): at any time, only a single coherence pool can get access to the post-processing layer.

If one further assumes that the probability of switching to a certain channel is proportional to the coherence strength in this channel, all experimental facts noted above are explained by the model proposed here.

It seems that binocular rivalry can also change the visual direction in which a target appears. For the fusional channel, the cyclopean view geometry applies, leading to an object appearance as seen a central cyclopean eye. The monocular channels have of course visual directions corresponding to the perspective of the appropriate eye. So perceptual switching during rivalry should also move the visual directions at which a target appears. This is what the following stereogram (arranged for cross-fusion) attempts to show:

Rivalry Stereogram - big

From top to bottom, you might see:

  • fusion and binocular rivalry; the perceived horizontal position of the three lines should change, depending on what channel is currently active.
  • normal fusion; the three lines should stay right in the middle of the display, appearing somewhat in depth behind the outline box.
  • pure binocular rivalry; just for comparison.
These effects would be present in a fusional network utilizing coherence detection.

For the references noted here and many more references on binocular rivalry, check the Binocular Rivalry Bibliography Page of Robert P. O'Shea.

© 1994-2003 - all rights reserved.