06.09.2021 Views

Mind, Body, World- Foundations of Cognitive Science, 2013a

Mind, Body, World- Foundations of Cognitive Science, 2013a

Mind, Body, World- Foundations of Cognitive Science, 2013a

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8.5 Vision, Cognition, and Visual Cognition<br />

It was argued earlier that the classical approach to underdetermination, unconscious<br />

inference, suffered from the fact that it did not include any causal links<br />

between the world and internal representations. The natural computation approach<br />

does not suffer from this problem, because its theories treat vision as a data-driven<br />

or bottom-up process. That is, visual information from the world comes into contact<br />

with visual modules—special purpose machines—that automatically apply natural<br />

constraints and deliver uniquely determined representations. How complex are the<br />

representations that can be delivered by data-driven processing? To what extent<br />

could a pure bottom-up theory <strong>of</strong> perception succeed?<br />

On the one hand, the bottom-up theories are capable <strong>of</strong> delivering a variety<br />

<strong>of</strong> rich representations <strong>of</strong> the visual world (Marr, 1982). These include the primal<br />

sketch, which represents the proximal stimulus as an array <strong>of</strong> visual primitives,<br />

such as oriented bars, edges, and terminators (Marr, 1976). Another is the<br />

2½-D sketch, which makes explicit the properties <strong>of</strong> visible surfaces in viewercentred<br />

coordinates, including their depth, colour, texture, and orientation<br />

(Marr & Nishihara, 1978). The information made explicit in the 2½-D sketch is<br />

available because data-driven processes can solve a number <strong>of</strong> problems <strong>of</strong> underdetermination,<br />

<strong>of</strong>ten called “shape from” problems, by using natural constraints<br />

to determine three-dimensional shapes and distances <strong>of</strong> visible elements. These<br />

include structure from motion (Hildreth, 1983; Horn & Schunk, 1981; Ullman, 1979;<br />

Vidal & Hartley, 2008), shape from shading (Horn & Brooks, 1989), depth from binocular<br />

disparity (Marr, Palm, & Poggio, 1978; Marr & Poggio, 1979), and shape from<br />

texture (Lobay & Forsyth, 2006; Witkin, 1981).<br />

It would not be a great exaggeration to say that early vision—part <strong>of</strong> visual processing<br />

that is prior to access to general knowledge—computes just about everything<br />

that might be called a ‘visual appearance’ <strong>of</strong> the world except the identities and<br />

names <strong>of</strong> the objects. (Pylyshyn, 2003b, p. 51)<br />

On the other hand, despite impressive attempts (Biederman, 1987), it is generally<br />

acknowledged that the processes proposed by natural computationalists cannot<br />

deliver representations rich enough to make full contact with semantic knowledge<br />

<strong>of</strong> the world. This is because object recognition—assigning visual information to<br />

semantic categories—requires identifying object parts and determining spatial relationships<br />

amongst these parts (H<strong>of</strong>fman & Singh, 1997; Singh & H<strong>of</strong>fman, 1997).<br />

However, this in turn requires directing attention to specific entities in visual representations<br />

(i.e., individuating the critical parts) and using serial processes to<br />

determine spatial relations amongst the individuated entities (Pylyshyn, 1999, 200<br />

1, 2003c, 2007; Ullman, 1984). The data-driven, parallel computations that characterize<br />

natural computation theories <strong>of</strong> vision are poor candidates for computing<br />

Seeing and Visualizing 379

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!