next up previous
Next: Summary Up: A Worked-Out Example: Depth-Perception Previous: Network-Structure


In the first few layers of the stereo network standard rate-coded neurons are used for computations. Starting with the disparity stacks, network operations are realized in detail by using leaky integrate-and-fire neurons (technical details can be found in the Appendix). Real image data is supplied to the network, and pre-processing in the retina is simulated by a logarithmic nonlinearity, followed by convolving the data with a Mexican-hat filter.

Figure: The performance of the coherence network with a real stereo image as input (right image of stereo pair is shown in A). Brighter image areas in the disparity map (B) are estimated as nearer to the viewer. The modulation depth of the output current of the coherence layer, here named coherence map (D), gives a hint on how secure the neural network is of its estimate. Network operation depends on the strength of the interlayer synaptic links; in the range \( w_{\mathcal{CC}}\approx 0.5-2.0\protect \), a useable disparity map is calculated (C).
\resizebox* {0.9\textwidth}{!}{\includegraphics{figs/dragon/dragon.eps}}

Fig. 9A shows one of the stereoimages used for testing and the disparity map estimated by the network (Fig. 9B). Distance is coded in the disparity map as brightness, with brighter image areas estimated as nearer, darker image areas estimated further away from the viewer. For nearly all view directions, disparity is calculated correctly out of the raw stereo pair.

During network operations, all disparity stacks operate independently of each other and in parallel. The lower traces (B, C) in Fig. 2 show examples of the dynamics of different disparity stacks during such a simulation.

The coupling within each disparity stack leads quickly to the development and synchronization of the coherence cluster, which in turn shows up as a prominent modulation of the total output current (see Fig. 3). This modulation of the output current coming from the coherence detection layer can be picked up by read-out neurons with appropriately adjusted weights. It is the firing rate of these read-out neurons which is displayed in Fig. 9B as the disparity estimate.

Of course, more elaborate read-out schemes are conceivable, like phase-locked loops which will lock exclusively on the oscillatory component of the output current coming from the coherence layer. Also, the inclusion of additional synchronizing links between the independently operating disparity stacks would further enhance network performance.

Switching off the synaptic links responsible for the coherence detection process, i. e., the links internal to the disparity stacks, results in an unusable disparity map. This can be seen in Fig. 9C, which displays an overview of network operations with changing coupling constant \( w_{\mathcal{CC}} \) in the disparity stacks. This coupling constant regulates the coherence detection process; in Fig. 9C \( w_{\mathcal{CC}} \) is varied along the horizontal image dimension for an overview (note the logarithmic scale).

At low values of \( w_{\mathcal{CC}} \), no synchronization is possible between the neurons and the readout-layer computes an average of the signals coming from the disparity stacks. Due the large amount of noise within the stacks, no reliable estimates are obtained. In the range of \( w_{\mathcal{CC}}\approx 0.5-2.0\protect \), synchronization becomes possible, and the coherence detection delivers correct disparity estimates, somewhat decreasing in fidelity with increasing \( w_{\mathcal{CC}} \). For values larger than \( w_{\mathcal{CC}}\approx 2.0 \), all the units in a single disparity stack synchronize, resulting again in unusable estimates.

The coherence detection process gives also a hint about the validity of the estimate. The modulation depth of the total output current of the coherence layer reflects the number of neural units participating in an estimate, so this number can be used as validation measure for the estimate.

In the example of Fig. 9, image areas where the disparity estimate might be in doubt can be found at the borders around the dragon head. No distance can be calculated here, due to occlusion effects: image areas visible in the right image are not visible in the left image and vice versa. Looking at the map which records the modulation depth of the output current, (Fig. 9D), one sees that the critical areas are indeed marked by low coherence measure.

Figure 10: Another example of disparity estimation by the coherence network. The stereo pair (top images) is arranged for crossed-eye viewing. The estimates of the disparity map can be validated by the corresponding coherence map.
\resizebox* {0.8\textwidth}{!}{\includegraphics{figs/salz/salz.eps}}

The reliability of the coherence map for validating the disparity estimates can also be inferred from a second example (Fig. 10), where all prominent object borders in the scene show up as dark lines of low confidence in the network coherence map. The structureless sky area and some additional image areas where no stable disparity estimate could be obtained are also marked by a low coherence count.

Coherence detection in combination with several parallel processing streams has the interesting property of opportunistic selection of data sources. The neural coherence detection process recruits every piece of information which is usable, but discards the rest. This can lead to the effect of ``filling-in'' in areas of low or no texture of a stereogram.

Fig 11 displays results obtained with a sparse random-dot stereogram, where only 3% of the image pixels are set to black. The rest of the images is a constant background, so in most image areas, there is no data available to estimate depth.

Figure 11: The coherence-based stereo network ``fills-in'' missing data; a sparse stereogram, where only 3% of the pixels are set, still leads to the perception of planar surfaces (from left: stereopair, estimated disparity, true disparity).
\resizebox* {0.9\textwidth}{!}{\includegraphics{figs/rds/rds.eps}}

Despite this sparse information, humans perceive this stereograms as an arrangement of several flat surfaces, and this is also what happens in the stereo network. This is caused essentially by the network operating at three different spatial scales. For image areas where no data is available in the fine resolution channels, the coarser scales can still give a guess about the disparity present in these areas. The coherence-detection process locks onto this partial information, and discards the non-valid or absent estimates of the finer resolution channels. The same kind of opportunistic signal selection might also be responsable for the perception of transparency, where several depth planes are seen in a single view direction.

next up previous
Next: Summary Up: A Worked-Out Example: Depth-Perception Previous: Network-Structure