next up previous
Next: Coherence-Detection Up: A Worked-Out Example: Depth-Perception Previous: Stereovision, Optical Flow and

Input Layers of the Network

Textures seem to be analyzed by humans mainly along the dimensions ``direction'' and ``granularity'' [43]. Granularity refers here to textures with no prominent direction, but similar spatial texture variation. In the case of stereo vision, we are only interested in texture directions, indicating local image shifts, i.e., distances of objects.

Figure 6: Cutting space-time slices out of a moviestrip (top line) creates characteristic flow patterns (timeslice), which show up as spikes in Fourier space. Local measurements of the Fourier energy can be used to detect the presence of these spikes. A simple way to measure local energies is the filtering of the timeslice by matched filters (right displays). Using Gabor filter patches in quadrature (small insets) is equivalent to masking Fourier energy with blob-like gaussian envelopes in Fourier space. Note that the filter ed signals depend also on the local image contrast.
\resizebox* {0.8\columnwidth}{!}{\includegraphics{figs/gaborfilter/gaborfilter.eps}}

Texture directions can be analyzed quite simply with neural hardware. Transforming a texture with one or several prominent texture directions into Fourier space, the texture directions show up as distinct spikes in the energy spectrum of the signal (Fig. 6).

Spikes in the spectrum can easily be detected by local measurements in Fourier space. For example, one might sample the local energy available in blobs placed around a circle. The blob with the largest energy content will indicate the main texture-direction (Fig. 6, bottom left).

Local texture energy can also be measured directly in the original signal space. Since the Fourier transformation of a Gaussian is a Gabor function and vice versa, the masking of signal energy in Fourier space by a Gaussian is equivalent to the convolution of the original signal with two Gabor filters, each related to each other by a phase shift of \( \pi /2 \). Filter functions of this type are called quadrature-filters (or Hilbert transform pairs). Squaring and adding together the resulting filter amplitudes gives a local measure of signal energy (filtered signals in Fig. 6).

The important point is that these filter kernels and nonlinear point operations easily map to receptive field profiles and transfer functions of simple and complex cells in the visual cortex. In disparity space only two slices out of the full space-time texture of the moving camera are available (compare Fig. 5), so the two-dimensional filter kernels used in Fig. 6 are reduced to two simple one-dimensional filter profiles, convolving data either from the left or from the right eye. To compute the local energy, the signals coming from the quadrature-paired filters have to be squared and summed. This results in a circuit identical in structure to the one sketched in Fig. 7: units \( S\) with Gabor-like receptive fields sample data from the left and right retinae, and the squared output of these units is summed by a energy unit \( C\), giving finally local texture energy (only this time in disparity-space). Interestingly, a circuit structurally equivalent to the one in Fig. 7 was proposed in [42] to account for experimental data measured from complex cell recordings in the visual cortex, with units \( S\) representing simple cells, and unit \( C\) representing a complex cell.

Figure 7: Spikes in Fourier space can be detected by local filter operations, having a close relationship with circuits proposed for the first stages of visual cortex. Simple cells \( S\) sample data from the left and right retinae, with filter profile which are essentially slices out of a two-dimensional Gabor function (white traces). Two simple cell in quadrature configuration fed their squared signals into the complex cell \( C\), which calculates local Fourier energy (compare [42]).
\resizebox* {0.8\columnwidth}{!}{\includegraphics{figs/first_layers/first_layers.eps}}

Note that the raw local energies calculated by any neural circuit similar to the one in Fig. 7 can not be used directly for texture analysis, since the estimated energies depend also on local image contrast (compare Fig. 6). To deduce texture direction, one has to perform either a maximum detection around a circle in Fourier space, as already discussed, or some kind of contrast normalization. Adapting these two possibilities to the task of disparity estimation, we recover either the approaches by Qian (disparity estimation via maximum-detection, in [44]) or Adelson & Bergen (normalization, formulated in the context of optical flow estimation, in [41]).

In the network simulation presented here, disparity estimators derived from the original optical flow estimator of Adelson & Bergen [41] are used. This means that the difference between the output of two complex cells (estimating left and right disparity energies) is normalized by the output of a corresponding complex cell (measuring local contrast). For details, see the Appendix.

next up previous
Next: Coherence-Detection Up: A Worked-Out Example: Depth-Perception Previous: Stereovision, Optical Flow and