Computational Vision U. Minn. Psy 5036 Daniel Kersten Lecture 6
Computational Vision U. Minn. Psy 5036 Daniel Kersten Lecture 6
Computational Vision U. Minn. Psy 5036 Daniel Kersten Lecture 6
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Computational</strong> <strong>Vision</strong><br />
U. <strong>Minn</strong>. <strong>Psy</strong> <strong>5036</strong><br />
<strong>Daniel</strong> <strong>Kersten</strong><br />
<strong>Lecture</strong> 6<br />
http://vision.psych.umn.edu/www/kersten-lab/courses/<strong>Psy</strong><strong>5036</strong>/SyllabusF2000.html)<br />
‡ Initialize standard library files:<br />
6.ProbabilityPatternIIb.nb 3<br />
Illumination<br />
h =<br />
1<br />
Face<br />
s i =<br />
2<br />
One example is in shape from shading. An image provides the "test measurements" that can be used to estimate an object's<br />
shape and/or estimate the direction of illumination. Accurate object identification often depends crucially on an object's<br />
shape, and the illumination is a confounding (secondary) variable. This suggests that visual recognition should put a high<br />
cost to errors in shape perception, and lower costs on errors in illumination direction estimation. So the process of perceptual<br />
inference depends an task. The effect of marginalization in the fruit example illustrated task-dependence. Now we<br />
show how marginalization can be generalized through decision theory to model other kinds of goals than error minimization<br />
(MAP) in task-dependence.<br />
Bayes Decision theory provides the means to model visual performance as a function of utility.<br />
Some terminology. We've used the terms switch state, hypothesis, signal state as essentially the same--to represent the<br />
random variable indicating the state of the world--the "state space". So far, we've assumed that the decision, d, of the<br />
observer maps directly to state space, d->s. We now clearly distinguish the decision space from the state or hypothesis<br />
space, and introduce the idea of a loss L(d,s), which is the cost for making the decision d, when the actual state is s.<br />
Often we can't directly measure s, and we can only infer it from observations. Thus, given an observation (image measurement)<br />
x, we define a risk function that represents the average loss over signal states s:<br />
R Hd; xL = ‚<br />
s<br />
L Hd, sL p Hs » xL<br />
(1)<br />
This suggests a decision rule: a(x)=argmin R(d;x). But not all x are equally likely. This decision rule minimizes the<br />
expected risk average over all observations:<br />
d<br />
R HaL = ‚<br />
s<br />
R Hd; xL p HxL<br />
(2)<br />
We won't show them all here, but with suitable choices of likelihood, prior, and loss functions, we can derive standard<br />
estimation procedures (maximum likelihood, MAP, estimation of the mean) as special cases.<br />
For the MAP estimator,<br />
R Hd; xL = ‚ L Hd, sL p Hs » xL = ‚<br />
s<br />
s<br />
H1 -dd,sL p Hs » xL = 1 - p Hd » xL<br />
(3)<br />
where dd,sis the discrete analog to the Dirac delta function--it is zero if d≠s, and one if d=s.<br />
Thus minimizing risk with the loss function L = H1 -d d,s Lis equivalent to maximizing the posterior, p(d|x).<br />
What about marginalization? You can see from the definition of the risk function, that this corresponds to a uniform loss:<br />
L = -1.<br />
4 6.ProbabilityPatternIIb.nb<br />
R Hs1; xL = ‚<br />
s2<br />
L Hd2, s2L p Hs1, s2 » xL<br />
(4)<br />
So for our face recognition example, a really huge error in illumination direction has the same cost as getting it right.For the<br />
fruit example, optimal classification of the fruit identity required marginalizing over fruit color--i.e. effectively treating fruit<br />
color identification errors as equally costly...even tho', doing MAP after marginalization effectively means we are not<br />
explicitly identifying color.<br />
Graphical models for Hypothesis Inference: Three types<br />
Three types of nodes in a graphical model: known, unknown to be estimated, unknown<br />
to be integrated out (marginalized)<br />
We have three basic states for nodes in a graphical model: known , unknown to be estimated, unknown to be integrated out<br />
(marginalization). We have causal state of the world S, that gets mapped to some image data I, through some intermediate<br />
parameters M: S->M->I. So face identity S determines facial shape M, which in turn determines the image I itself. Consider<br />
three very broad types of task:<br />
Image synthesis (forward, generative model): We want to model I through p(I|S). In our example, we want to specify<br />
"Bill", and then p(I|S="Bill") can be used implemented as an algorithm to spit out images of Bill. M gets integrated out.<br />
Hypothesis inference: we want to model samples for S : p(S|I). Given an image, we want to spit out likely object identities,<br />
so that we can minimize risk, or do MAP classification for accurate object identification. M gets integrated out again.<br />
Learning: we want to model M: p(M|I,S), to learn how the intermediate variables are distributed. Given lots of samples of<br />
objects and their images, we want to learn the mapping parameters between them. (Alternatively, do a mental switch and<br />
consider a neural network in which an input S gets mapped to an output I through intermediate variables M. We can think of<br />
M as representing synaptic weights to be learned.)<br />
Regression: estimating parameters that provide a good fit to data. E.g. slope and intercept for a straight line through points<br />
{xi,yi}.<br />
Density estimation: Regression on a probability density functions, with the added condition that the area under the fitted<br />
curve must sum to one.<br />
Hypothesis inference: Three types<br />
‡ Detection<br />
Let the decision variable d, be represented by s`.
6.ProbabilityPatternIIb.nb 5<br />
6 6.ProbabilityPatternIIb.nb<br />
‡ Classification<br />
MAP rule: argmax<br />
8pHSi » x
6.ProbabilityPatternIIb.nb 7<br />
‡ Continous estimation<br />
argmax<br />
S<br />
8pHS » xP (S =sd |x ). This is<br />
exactly like the problem of guessing “heads ”or “tails ” for a biased coin, say with a probability of heads P (S =sb|x ).<br />
Imagine the light discrimination experiment repeated many times and you have to decide whether the switch was set to<br />
bright or not –but only on those trials for which you measured exactly x .The optimal strategy is to always say “bright ”.<br />
Let's see why. First note that:<br />
p(error|x) = p(say "bright", actually dim |x) + p(say "dim", actually bright |x) =<br />
p(s1` , s2 | x)+p(s1, s2` | x)<br />
Given x, the response is independent of the actual signal state (see graphical model for detection above--"response is<br />
conditionally independent of signal state, given observation x"), so the joint probabilities factor:<br />
p(error|x) = p(say "bright" |x)p(actually dim |x) + p(say "dim" |x)p(actually bright |x)<br />
Let t= p(say "bright" |x), then<br />
p(error|t,x) = t*p(actually dim |x) + (1-t)*p(actually bright |x).
6.ProbabilityPatternIIb.nb 9<br />
p(error|t), as a function of t, defines a straight line with slope p(actually dim |x)-p(actually bright |x). (Just take the<br />
partial derivative with respect to t.) We've assumed P (S =sb |x )>P (S =s d |x )), so p(error|t) has a negative slope, with the<br />
smallest non-negative value of t being one. So, error is minimized when t=p(say "bright" |x)=1. I.e. Always say "bright".<br />
pHerror<br />
0.8<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2 0.4 0.6 0.8 1 t<br />
Always saying "bright" results in a probability of error P (error |x )=P (S =sd |x ).That’s the best that can be done on average.<br />
On the other hand, if the observation is in a region for which P (S =sd|x )>P (S =sb |x ),the minimum error strategy is to<br />
always pick “dim” with a resulting P (error |x )=P (S =sb |x ).Of course, x isn’t fixed from trial to trial, so we calculate the<br />
total probability of error which is determined by the specific values where signal states and decisions don ’t agree:<br />
`<br />
p(error)=⁄ i∫j pHsi , s jL<br />
`<br />
=⁄ i∫j Ÿ pHsi , s j » xL pHxL „<br />
`<br />
x=⁄ i∫j Ÿ pHsi » xL pHs j » xL pHxL „ x<br />
Because the MAP rule ensures that<br />
`<br />
pHsi , s j » xL is the minimum for each x, the integral over all x minimizes the total probability<br />
of error.<br />
Ideal pattern detector for a signal which is exactly known (SKE)<br />
In this notebook we will study an ideal detector called the signal-known-exactly ideal (SKE). This detector has a built-in<br />
template that matches the signal that it is looking for. The signal is embedded (by adding pixel intensity values) in "white<br />
gaussian noise". "white" means the pixels are not correlated with each other--intuitively this means that you can't reliably<br />
predict what one pixel's value is from any of the others. Assignment 1 simulates the behavior of this ideal. In the absence of<br />
any internal noise, this ideal detector behaves as one would expect a linear neuron to behave when a target signal pattern<br />
exactly matches its synaptic weight pattern. There are some neurons in the the primary cortex of the visual system called<br />
"simple cells". These cells can be modeled as ideal detectors for the patterns that match their receptive fields. In actual<br />
practice, neurons are not noise-free, and not perfectly linear.<br />
10 6.ProbabilityPatternIIb.nb<br />
Calculating the Pattern Ideal's d' based on signal-to-noise ratio<br />
‡ Overview<br />
The generative model is a straightfoward extension of the single variable gaussian model, to patterns. s[i] represents the<br />
signal pattern values at pixel locations i=1..M:<br />
H = signal pattern: x[i] = s[i] + n[i], where n[i] ~ N(0,s), i.i.d.<br />
H = no signal pattern: x[i] = 0 + n[i], where n[i] ~ N(0,s), i.i.d.<br />
(i.i.d. is probability jargon meaning that the random variables are "independently drawn and identically distributed". N(m,s)<br />
means "normal distribution" or gaussian, with mean m and standard deviation s.)<br />
We are going to do two things:<br />
1. Prove that a simple decision variable for detecting a known fixed pattern in white gaussian noise is the dot product of the<br />
observation image with the known signal image: r = s.x<br />
2. Show that d' is given by: d'= m 2-m1<br />
ÅÅÅÅÅÅÅÅÅÅÅÅÅ<br />
sr<br />
= è!!!!! s.s<br />
ÅÅÅÅÅÅÅÅÅÅÅ<br />
s<br />
‡ 1. Proof that cross-correlation produces an ideal decision variable:<br />
Let r be the decision variable. By definition:<br />
d'= m 2-m1<br />
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅ<br />
sr<br />
(5)<br />
where u2 is the mean of the decision variable, r, under the signal hypothesis, u1 is the mean under the noise-only hypothesis.<br />
sr is the standard deviation of r, assumed to be the same under both hypotheses (we'll show this to be true below). But<br />
what is the decision variable? Starting from the maximum a posteriori rule, we saw that basing decisions on the likelihood<br />
ratio is ideal, in particular we could minimize the probability of error with the appropriate decision criterion. So the likelihood<br />
ratio is a decision variable. But it isn't the only one, because any monotonic function is still optimal. So our goal is to<br />
pick a decision variable which is simple, intuitive, and easy to compute.<br />
First, we need an expression for the likelihood ratio:<br />
p Hx » signal plus noiseL<br />
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ<br />
p Hx » noise onlyL<br />
(6)<br />
where x is the vector representing the image measurements actually observed. For pixel i,<br />
x[i] = s[i] + n[i], under signal plus gaussian noise condition<br />
x[i] = n[i], under gaussian noise only condition<br />
(7)<br />
Consider just one pixel of intensity x. Under the signal plus noise condition, the values of x fluctuate about the average<br />
signal intensity s with a Gaussian distribution (gp[]) with mean s and standard deviation s.<br />
(Note that s[i] is the average value of a pixel intensity at location i. But m1 and m2 are the averages of the decision variable
6.ProbabilityPatternIIb.nb 11<br />
r--our goal will be to calculate these in the following subsection on d')<br />
So under the signal plus noise condition, the likelihood p[x|signal] is the gp[x-s; s]:<br />
gp[x_,s_,s_]:= (1/(s*Sqrt[2 Pi])) Exp[-(x-s)^2/(2 s^2)]<br />
gp[x,s,s]<br />
‰ - ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ H-s+xL2 ÅÅÅÅÅÅ<br />
2 s<br />
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ<br />
2<br />
è!!!!!!!<br />
2 ps<br />
ÅÅÅÅÅÅÅ<br />
Again, consider just one pixel of intensity x. Under the noise only condition, the values of x fluctuate about the average<br />
intensity corresponding to the mean of the noise, which we assume is zero.<br />
So under the noise only condition, the likelihood p[x|noise] is:<br />
gp[x,0,s]<br />
‰ - ÅÅÅÅÅÅÅÅ<br />
x2 ÅÅÅÅÅÅÅÅÅ<br />
2 s 2<br />
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ è!!!!!!<br />
2 p ÅÅÅÅÅÅÅ<br />
s<br />
But we actually have a whole pattern of values of x, which make up an image vector x. So consider a pattern of image<br />
intensities represented now by a vector x = {x[1],x[2],...x[N]}. Let the mean values of each pixel under the signal plus noise<br />
condition be given by vector s = {s[1],s[2],...,s[N]}. The joint probability of an image observation x, under the signal<br />
hypothesis is p(x|s):<br />
Product[gp[x[i],s[i],s],{i,1,N}]<br />
N<br />
Â<br />
i=1<br />
‰ - ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ H-s@iD+x@iDL2 ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ<br />
2 s2<br />
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ è!!!!!!! ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ<br />
2 ps<br />
This is because we are assuming independence (whether we are right or not depends on the problem). But independence<br />
between pixels means we can multiply the individual probabilities to get the global joint image probability.<br />
The joint probability of an image observation x, under the noise only hypothesis is:<br />
Product[gp[x[i],0,s],{i,1,M}]<br />
N<br />
Â<br />
i=1<br />
‰ - ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ<br />
x@iD2<br />
2 s2<br />
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ è!!!!!!!<br />
2 ps<br />
12 6.ProbabilityPatternIIb.nb<br />
Now we have what we need for the likelihood ratio:<br />
Product@gp@x@iD, s@iD, sD, 8i, 1, M
6.ProbabilityPatternIIb.nb 13<br />
s@1D x@1D + s@2D x@2D<br />
(8)<br />
But this is just the dot product of the image x with the signal s.<br />
So in general, to recap using full summation notation, the log likelihood ratio is:<br />
log<br />
i<br />
M<br />
j  k i=1<br />
‰ - ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ<br />
Hx HiL-s HiLL2 ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ -Hx ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ HiLL 2<br />
2 s<br />
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ<br />
2<br />
è!!!!!!! ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ<br />
2 ps<br />
y<br />
z<br />
{<br />
(9)<br />
which is monotonic with:<br />
M<br />
LogA‰<br />
i=1<br />
‰<br />
M<br />
2 xHiL sHiL<br />
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅ<br />
2 s2<br />
E = H1 ê s 2 L ‚<br />
i=1<br />
xHiL sHiL<br />
(10)<br />
which is monotonic with:<br />
M<br />
r = ‚<br />
i=1<br />
xHiL sHiL = Sum@x@iD s@iD, 8i, 1,N
6.ProbabilityPatternIIb.nb 15<br />
<strong>Psy</strong>chophysical tasks & techniques<br />
The Receiver Operating Characteristic (ROC)<br />
Although we can't directly measure the internal distributions of a human observer's decision variable, we've seen that we<br />
can measure hit and false alarm rates, and thus d'. But one can do more, and actually test to see if an observer's decisions are<br />
consistent with Gaussian distributions with equal variance. If the criterion is varied, we can obtain a set of n data points:<br />
{(hit rate 1, false alarm rate 1), (hit rate 2, false alarm rate 2), ..., (hit rate n, false alarm rate n)}<br />
all from one experimental condition (i.e. from one signal-to-noise ratio, call it dideal ' ). This is because as the hit rate varies,<br />
so does the false alarm rate. One could compute the d' for each pair and they should all be equal for the ideal observer. Of<br />
course, we would have to make a large number of measurements for each one--but on average, they should all be equal.<br />
To get meaningful and equal d's for each pair of hit and false alarm rates assumes that the underlying relative<br />
separation of the signal and noise distributions remain unchanged and that the distributions are Gaussian, with equal standard<br />
deviation. We might know this is true (or true to a good approximation) for the ideal, but we have no guarantee for the<br />
human observer. Is there a way to check? Suppose the signal and noise distributions look like:<br />
S1 S2<br />
σ n σ s<br />
X<br />
c<br />
possible criterion placement<br />
If we plot the hit rate vs. false alarm rate data on a graph, usually we get something that looks like:<br />
16 6.ProbabilityPatternIIb.nb<br />
1<br />
Hit rate<br />
0<br />
0 1<br />
False alarm rate<br />
One can show that the area under this ROC curve is equal to the proportion correct in a two-alternative forced-choice<br />
experiment (Green and Swets). (Sometimes, sensitivity is operationally defined as this area. This provides a single summary<br />
number, even if the standard definition of d' is inappropriate, for example because the variances are not equal.)<br />
We return to our basic question: is there a way to spot whether our gaussian equal-variance assumptions are correct for<br />
human observers?<br />
If we take the same data and plot it in terms of Z-scores we get something like:<br />
Z(Hit rate)<br />
slope = σ n /σ s<br />
Z(False alarm rate)<br />
(µ s - µ n )/ σ s<br />
In fact, if the underlying distributions are Gaussian, the data should lie on a straight-line. If they both have equal variance,<br />
the slope of the line should be equal to one. This is because:<br />
Z ( hit rate )<br />
X<br />
=<br />
Z ( false alarm rate )<br />
−<br />
σ<br />
c s<br />
X<br />
=<br />
µ<br />
−<br />
σ<br />
c n<br />
n<br />
µ<br />
And if we solve for the criterion Xc, we obtain:<br />
s
6.ProbabilityPatternIIb.nb 17<br />
σ<br />
n<br />
Z( hit rate ) = Z( false alarm rate ) −<br />
σ<br />
µ − µ<br />
σ<br />
s n<br />
(I've switched notation here, where s2 = µs, and s1 = µn). The main point of this plot is to see if the data tend to fall on a<br />
straight line with slope of one. If a straight line, this would support the Gaussian assumption. A slope = 1 supports the<br />
assumption of equal variance Gaussian distributions.<br />
In practice, there are several ways of obtaining an ROC curve in human psychophysical experiments. One can vary<br />
the criterion that an observer adopts by varying the proportion of times the signal is presented. As observers get used to the<br />
signal being presented, for example, 80% of the time, they become biased to assume the signal is present. One needs to<br />
block trials in groups of, say 400 trials per block. Where the signal and noise priors are fixed for a given block.<br />
One can also use a rating scale method in which the observer is asked to say how confident she/he was (e.g. 5<br />
definitely, 4 quite probable, 3 don't know for sure, 2, unlikely, 1 definitely not). Then we can bin the proportion of "5's"<br />
when the signal vs. noise was present to calculate hit and false alarm rates for that rating, do the same for the "4's", and so<br />
forth. The assumption is that an observer can maintain not just one stable criterion, but four---the observer has in effect<br />
divided up the decision variable domain into 5 regions. An advantage of the rating scale method is efficiency--relatively few<br />
trials are required to get an ROC curve. Further, in some experiments, ratings seem psychologically natural to make. But if<br />
there is any "noise" in the decision criterion itself, e.g. due to memory drift, or whatever, this will act to decrease the<br />
estimate of d' in both yes/no and rating methods.<br />
Usually rather than manipulating the criterion, we would rather do the experiment in such a way that it does not<br />
change. Is there a way to reduce the problem of a fluctuating criterion?<br />
The 2AFC (two-alternative forced-choice) method<br />
‡ Relating performance (proportion correct) to signal-to-noise ratio, d'.<br />
In psychophysics, the most common way to minimize the problem of a varying criterion is to use a two-alternative forcedchoice<br />
procedure (2AFC). In a 2AFC task the observer is presented on each trial a pair of stimuli. One stimulus has the<br />
signal (e.g. bright flash), and the other the noise (e.g. dim flash). The order, however, is randomized. So if they are presented<br />
temporally, the signal or the noise might come first, but the observer doesn't know which from trial to trial. In the<br />
spatial version, the signal could be on the left of the computer screen with the noise on the right, or vice versa. One can<br />
show that for 2AFC:<br />
d ' = - è!!!! 2 z Hproportion correctL<br />
(16)<br />
If you want to prove this for yourself, here are a couple of hints--actually, a lot of hints. Let us imagine we are giving the<br />
light discrimination task to the ideal observer. We have two possibilities for signal presentation: Either the signal is on the<br />
left and the noise on the right, or the signal is on the right and the noise on the left. There are two ways of being right. The<br />
observer could say "on the left" when the signal is on the left, or "on the right" when the signal is on the right. For example,<br />
for the light detection experiment, a reasonable guess is that all the ideal observer would have to do is to count the number<br />
of photons on the left side of the screen and count the number on the right too. If the number on the left is bigger than the<br />
number on the right, the observer should say that the signal was on the left. Thus, a 2AFC decision variable would be the<br />
difference between the left and right decision variables, where each of these is what we calculated for the yes/no<br />
experiment.<br />
r = rL - rR<br />
(17)<br />
18 6.ProbabilityPatternIIb.nb<br />
For example, rL and rR for the SKE observer would be the dot products of the signal pattern template with observation<br />
image vectors on the left and right sides.<br />
So, the probabililty of being correct is:<br />
pc = p(r>0|signal on left) p(signal on left) + p(r
6.ProbabilityPatternIIb.nb 19<br />
‡ Z-score. You used the inverse of a standard mathematical function called Erf[] to get Z from a measured P.<br />
z[p_] := Sqrt[2] InverseErf[1 - 2 p];<br />
where Z(*) is the z-score for P c , the proportion correct.<br />
dprime[x_] := N[-Sqrt[2] z[x]]<br />
‡ Adaptive procedures for finding thresholds using 2AFC or yes/no.<br />
There have been a number of advances in the art of efficiently finding values of a signal which produce a certain percent<br />
correct in a psychophysical task such as 2AFC. For more on this, see the QUEST procedure of Watson and Pelli (1983) and<br />
the analyses of Treutwein (1993).<br />
Appendices<br />
Figure code<br />
Exercises<br />
20 6.ProbabilityPatternIIb.nb<br />
References<br />
Applebaum, D. (1996). Probability and Information . Cambridge, UK: Cambridge University Press.<br />
Burgess, A. E., Wagner, R. F., Jennings, R. J., & Barlow, H. B. (1981). Efficiency of human visual signal discrimination.<br />
Science, 214, 93-94.<br />
Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis . New York.: John Wiley & Sons.<br />
Golden, R. (1988). A unified framework for connectionist systems. Biological Cybernetics, 59, 109-120.<br />
Green, D. M., & Swets, J. A. (1974). Signal Detection Theory and <strong>Psy</strong>chophysics . Huntington, New York: Robert E.<br />
Krieger Publishing Company.<br />
<strong>Kersten</strong>, D. and P.W. Schrater (2000), Pattern Inference Theory: A Probabilistic Approach to <strong>Vision</strong>, in Perception and the<br />
Physical World, R. Mausfeld and D. Heyer, Editors. , John Wiley & Sons, Ltd.: Chichester.<br />
<strong>Kersten</strong>, D. (1984). Spatial summation in visual noise. <strong>Vision</strong> Research, 24,, 1977-1990.<br />
<strong>Kersten</strong>, D. (1999). High-level vision as statistical inference. In M. S. Gazzaniga (Ed.), The New Cognitive Neurosciences --<br />
2nd Edition (pp. 353-363). Cambridge, MA: MIT Press.<br />
Knill, D. C., & Richards, W. (1996). Perception as Bayesian Inference. Cambridge: Cambridge University Press.<br />
Knill, D. C., <strong>Kersten</strong>, D., & Yuille, A. (1996). A Bayesian Formulation of Visual Perception. In K. D.C. & R. W. (Eds.),<br />
Perception as Bayesian Inference (pp. Chap. 0): Cambridge University Press.<br />
Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge, UK: Cambridge University Press.<br />
Treutwein, B. (1993). Adaptive psychophysical procedures: A review. submitted to <strong>Vision</strong> Research, (email:<br />
bernhard@tango.imp.med.uni-muenchen.de)<br />
Van Trees, H. L. (1968). Detection, Estimation and Modulation Theory . New York: John Wiley and Sons.<br />
Watson, A. B., & Pelli, D. G. (1983). QUEST: A Bayesian adaptive psychometric method. 33, 113-120.<br />
Watson, A. B., Barlow, H. B., & Robson, J. G. (1983). What does the eye see best? Nature, 31,, 419-422.<br />
Yuille, A. L., & Bülthoff, H. H. (1996). Bayesian decision theory and psychophysics. In K. D.C. & R. W. (Eds.), Perception<br />
as Bayesian Inference. Cambridge, U.K.: Cambridge University Press.<br />
© 2000 <strong>Daniel</strong> <strong>Kersten</strong>, <strong>Computational</strong> <strong>Vision</strong> Lab, Department of <strong>Psy</strong>chology, University of <strong>Minn</strong>esota.<br />
(http://vision.psych.umn.edu/www/kersten-lab/kersten-lab.html)