Transparency and Rivalry
The visual processes leading to human perception are complex; fusion of
disparate images into a single cyclopean view is
only one option. Many other percepts are possible and a good theory of
stereovision should be able to handle at least
- Transparency:
There are many natural stimulus conditions which cause the perception of
transparency. Most notable are specular highlights on shiny surfaces, which
are perceived at a different depths than the object surface creating them.
Another simple and common example are thin leaves of a tree with the
background shining through them.
Most stereo algorithms can not handle such stimulus situations which
the human visual system handles gracefully. This stereogram (arranged for
crossed-eye viewing),
creates the perception of transparency.
- Binocular Rivalry:
If the two eyes see very disparate images, like in this stereogram,
humans (and probably some other animals) experience binocular rivalry: after
a short time-delay during which one sees both images superimposed,
perception then alternates in a random fashion between the left and right
view of the scene.
This stimulus condition is actually quite common in natural viewing
conditions as only a small part of 3d-space around the current fixation
point (namely Panum's area) can be fused by the human visual system. In all
other image areas, where the images are too disparate to be fused, binocular
rivalry can happen.
The perceptual switching in binocular rivalry shares many features with
other multistable percepts, and might be the outcome of a general neuronal
process.
Coherence-based Stereo
Transparency and binocular Rivalry, and even their combination, can be
handled by a network which is basically an extension of coherence-based
stereo to multiple orientation channels.
Let's look at the following stereogram
which creates (for crossed-eye viewing) the following percepts:
- A - Rivalry: pure binocular rivalry
- B - Fusion: a checkered disk, floating above the
stony background.
- C - Transparency: two disks with stripes, one floating
below, one above the stone-background.
- D - Transparency and binocular Rivalry: one disk with stripes, floating below
the stone-background, plus a rivalling striped disk with no depth, fading in and out
of perception.
In a coherence-network utilizing distinct orientation channels, three
different interactions are possible for coherence-clusters in and
between the channels:
- disparity estimates in different orientation channels differ by an
amount smaller than the coherence-threshold
data of these orientation channels are fused into a common
percept.
- disparity estimates differ by more than the coherence-threshold
transparent surfaces at the different depths are seen.
- no coherence is detected at all within a binocular orientation channel
binocular rivalry starts between left and right monocular image streams.
Using this paradigm, simulations with the above stimulus lead to
the following results (using just two orientation channels):
Final Result
Combining all evidence, the following perception chart emerges for the above test stimulus (with R=rivalry, F=Fusion,
T=transparency, and F+R=simultaneous fusion and
rivalry):
This corresponds nicely with human perception; if you want to see some results
with a larger network utilizing a total of four orientation channels, please click
here.
Some further thoughts ...
Most models proposed for binocular rivalry assume recurrent inhibition to
block the monocular input stream which is currently not perceived (Sugie '82, Matsuoka '84, Lehky '88) or utilize some auxiliary
variables to model a fatigue process (Grossberg '87, Blake
'89, Dayan '98) governing this suppression.
In essence, all these models are assuming that during the dominant phase
of, say, the left input stream, the right input stream receives a inhibitory
signal blocking this stream. During the course of time, this inhibition (or
the variable governing the strength of this inhibition) decays until the
roles are switched: the right input stream becomes dominant, and the left
stream is suppressed.
However, there are some experimental facts which are hard to explain
with such models:
- binocular rivalry is a random process: the time an input channels stays
dominant follows approximately a gamma-distribution. Simply introducing a
random process in the bistable switching system proposed to govern
perception won't help much because of the time constants needed in this
system. They would average out most random fluctuations.
- There is fast switching between the left and right percepts; never
(except for about 200ms during the onset of binocular rivalry) is a mixture
state perceived in a specific view-direction. Fast switching would require
the inhibitory signal to be either "on" or "off", but not gradually
weakening over time.
- Detection thresholds stay more or less constant during the suppression
phase (Fox & Check, 1972). Again, this is hard to
explain with a time-decaying inhibition mechanism.
- The suppression of an input channel is nonspecific - it is as if simply
all information from a suppressed channel is turned off (
Blake '89, but see O'Shea & Crassini '81 for a
dissenting view).
- Increasing the contrast of a monocular stimulus decreases the average time
this stimulus is suppressed (and vice versa); this has no strong effect on the
average duration of the other monocular percept (Levelt
'65).
In order to explain these points, the model proposed here involves
competition over access to post-processing stages rather
than inhibition between input streams.
It is assumed that this competition depends on the amount of coherence in
the different channels. Coherence in monocular channels might be defined
similar to the binocular channels of the stereo subsystem, only that
coherence in monocular channels is calculated in texture space, not in
disparity space (cf. the feature maps of the
segmentation network and the similarity of
disparity- and texture-estimation).
If the different coherence pools are marked with different temporal
codes, (which would be the case if coherence
detection is carried out by a network of spiking neurons) it will be
possible for the post-processing neuronal circuitry to lock onto the signals
from these pools. But of course, locking will be possible only to one
temporal code signal at a time. If this assumption is correct, it would
explain the suppression of all the other inputs (i.e. the binary behaviour
of the input selection): at any time, only a single coherence pool can get
access to the post-processing layer.
If one further assumes that the probability of switching to a certain
channel is proportional to the coherence
strength in this channel, all experimental facts noted above are
explained by the model proposed here.
It seems that binocular rivalry can also change the visual
direction in which a target appears. For the fusional channel, the cyclopean view geometry applies,
leading to an object appearance as seen a central cyclopean eye.
The monocular channels have of course visual directions
corresponding to the perspective of the appropriate eye. So perceptual
switching during rivalry should also move the visual directions at which a target
appears. This is what the following stereogram (arranged for cross-fusion)
attempts to show:
From top to bottom, you might see:
- fusion and binocular rivalry; the perceived horizontal position of the
three lines should change, depending on what channel is currently active.
- normal fusion; the three lines should stay right in the middle of the display,
appearing somewhat in depth behind the outline box.
- pure binocular rivalry; just for comparison.
These effects would be present in a fusional network utilizing coherence
detection.
For the references noted here and many more references on
binocular rivalry, check the Binocular
Rivalry Bibliography Page of Robert P. O'Shea.
© 1994-2003 - all rights reserved.
|