Modelling the limits of peripheral vision

My latest publication about the limits of peripheral vision is now fully referenced and available. The journal has given me permission to host a PDF version for folks to download for a period of time, so get it while it’s hot! (Be sure also to download the Supplemental Info!) The aim of this blog post is to describe the main points of the paper.

Preamble

Although we experience a highly detailed visual world, only a small portion of our vision – central vision – is high resolution. In the visual periphery, objects appear a bit blurry, and worse, they are prone to visual crowding. Visual crowding refers to the inability to recognise an object when other objects surround it. I’ve written about crowding before, here and here, but for simplicity, check out the picture below. When fixating the central spot, the letter A on the left should be easily visible, whereas the same letter on the right is almost impossible to identify. (This demo was inspired by a similar one in a great review by Pelli and Tillman1.)

crowding demonstration

The obvious difference between the two sides of the above figure is that there are extra distracting lines on the right. These lines interfere with your ability to recognise the A. This is crowding. It happens with all kinds of objects, and happens throughout the entire visual field. At any given moment, there are probably many objects in your vision that are crowded beyond recognition. Right now, the speed at which you read this text is determined by your level of crowding2. We tend not to be aware of such limited recognition, however, because we can effortlessly make eye movements to use our high-resolution central vision to break crowding.

Our contribution

In my paper, we discuss why crowding occurs. We describe a new method for measuring crowding, and we provide a computational model that simulates specific neural processes that may cause crowding.

Our new method involves using a certain method of reporting the appearance of a stimulus in peripheral vision. Studies of crowding typically involve showing an observer a target, such as a letter, and then asking the observer to report the target identity. In these experiments, the observer’s response can only be correct or incorrect. However, we know even from introspection when viewing demonstrations like the figure above, that when an object is crowded, we can still see something. In fact, being able to see something but not being able to correctly identify it is a defining feature of crowding3. It may be informative, therefore, to quantify crowded perception in more ways than as simply correct or incorrect. We thus used a measure that allowed us to describe crowded perception as a loss of perception along a continuum.

The task

We used a target stimulus like the one on the left below, known as a Landolt C. An observer saw this target in peripheral vision. Importantly, from trial to trial, it was rotated randomly, so that the gap section was orientated toward any direction. We then presented the observer with a second Landolt C, this time in their central vision, that they could rotate by pressing buttons. They had to rotate the central C so that it matched the target orientation as closely as possible. On each trial, therefore, the observer’s response was not classed as wrong or right, but instead we found their perceptual error – the difference in rotation between the actual target and the observer’s report.

Landolt floating gaps

This method can be particularly useful for modelling, because over dozens of trials, the observer’s errors conform very nicely to known mathematical functions. People with typical vision are generally pretty good at this – their errors cluster nicely around the actual target orientation. A recent study has related these sorts of perceptual errors to the noisy encoding of the stimulus in primary visual cortex4, the first cortical sight of visual processing (though in that study stimuli were presented at fixation, not in the periphery).

We can quantify crowding by seeing how these patterns of data – and their corresponding mathematical functions – are affected in the case of a crowded target, like that shown on the right of the figure above. In this example, the target is surrounded by a larger ring with a gap in it. When fixating on the blue spot, you may be able to discern that there is a gap in each ring, but the exact position of the gaps may appear “fuzzy”. You may even experience the impression that gaps are sort of joined in some strange way, and that their orientations are very difficult to identify precisely. At the very least, it’s unlikely the target gap will appear as clearly as it does on the left of the figure, despite the physical properties of the targets being the same.

Our human data

When we examined the errors that observers made, we noted that we could see in our data patterns that corresponded to separate, previously competing explanations of crowding. First, we found that, when the orientation of the target and distractor were similar, observers tended to report an orientation close to the average of the orientations. Such feature averaging has been reported widely5, and strongly supports the idea that the visual system simplifies perception in the periphery by finding higher-level statistical associations between objects6. Second, we found that, when the orientation of the target and distractor were dissimilar, observers reported either the target or distractor, and their reports in these cases were quite precise. These sorts of errors are known as confusions or substitutions7, because the observer can apparently see both gap elements of the stimulus, but they are apparently unsure about which gap belongs to which ring. You may be able to experience both of these during different periods of viewing the target stimuli above. Note that these effects disappear in our data when the distractor ring is large and very far away from the target, showing that these phenomena are linked to processes that depend on the spatial positions of items. (Note that there is nothing particularly new about these data – these patterns merely replicate the same effects that have already been widely reported across studies.)

The model

We next generated a computational model that simulates a basic visual process – the coding of orientation information. Because the human data in the uncrowded condition of our experiment conformed well to a mathematical function (a circular normal distribution), we simply asserted that there is some neural process that results in an output of data with the same probabilities as given by that function. That is, for any given uncrowded target, we make our model make the same sorts of errors humans make. We then make the model spatial, by asserting that it’ll respond most strongly to objects centred right on the target; the strength of the model response decreases gradually with the distance of an object from the centre of the target. In the figure below, I’ve tried to give an intuition for how the model works. On the left are three example stimuli, the target is coloured magenta and the distractors are coloured green and blue. The yellow lines can be thought of us our model trying to detect whether there is a gap section at any of those orientations.

model example

On the right side of the above figure are illustrative model responses to the stimuli presented on the left. For any given gap orientation the model will indicate the probability that various orientations of the gap are present. The colours of each of the distributions correspond to the stimuli on the left. The magenta target distribution is centred on 0° (the target orientation in polar coordinates), and is strongest because the target is closest to the centre of the model. You can see that the peaks of the distractor distributions are shifted away from zero, and are weaker than the target.

The model works…

In our study, we simply summed the model response to a target with the model’s response to a distractor, and tested how well those summed probabilities predicted observers’ perceptual reports. We essentially had our model perform the exact same trials our observers performed. This worked very well: the model predicted the average errors made by observers over a range of distractor conditions. More importantly, this model predicted the mixed pattern of data described above, in which some trials indicate that the observer reported the average of the target and distractor orientations, and in other trials they confuse which gap corresponds to which object.

The model works… sort of.

In a second experiment, we used a novel behavioural paradigm to quantify the level of detail an observer could identify. This experiment requires much more explanation, so I’ll save it for another post. However, it’s important to note that, although we found it preferable than models proposed by others, our model had to be altered slightly to be able to account for our observations in this second experiment. This issue – how well our model can be used to describe other datasets – is important and remains to be tested. So, although our model describes the data using our specific task, how well it represents what the brain is doing in crowded scenes requires much further testing. Based on another great model that is conceptually similar{vandenBerg:2010bj}, I’m optimistic these sorts of models can go far.

And finally…

As with all of science, our work builds upon a lot of great stuff of work that came before. While the content of the paper is new, similar methods8-10 and models11,12 have already been published. I’ll save a discussion of the differences between our work and this other work in another post.

My paper’s citation:

Harrison, W. J., & Bex, P. J. (2015). A Unifying Model of Orientation Crowding in Peripheral Vision. Current Biology, 25(24), 3213–3219. http://doi.org/10.1016/j.cub.2015.10.052

References

  1. Pelli, D. G. & Tillman, K. A. The uncrowded window of object recognition. Nature Neuroscience 11, 1129–1135 (2008).
  2. Kwon, M., Legge, G. E. & Dubbels, B. R. Developmental changes in the visual span for reading. Vision Research 47, 2889–2900 (2007).
  3. Pelli, D. G., Palomares, M. & Majaj, N. J. Crowding is unlike ordinary masking: distinguishing feature integration from detection. Journal of Vision 4, 1136–1169 (2004).
  4. van Bergen, R. S., Ji Ma, W., Pratte, M. S. & Jehee, J. F. M. Sensory uncertainty decoded from visual cortex predicts behavior. Nature Neuroscience (2015). doi:10.1038/nn.4150
  5. Greenwood, J. A., Bex, P. J. & Dakin, S. C. Positional averaging explains crowding with letter-like stimuli. Proceedings of the National Academy of Sciences 106, 13130–13135 (2009).
  6. Freeman, J. & Simoncelli, E. P. Metamers of the ventral stream. Nature Neuroscience 14, 1195–1201 (2011).
  7. Strasburger, H. & Malania, M. Source confusion is a major cause of crowding. Journal of Vision 13, (2013).
  8. Ester, E. F., Klee, D. & Awh, E. Visual crowding cannot be wholly explained by feature pooling. Journal of Experimental Psychology: Human Perception and Performance 40, 1022–1033 (2014).
  9. Ester, E. F., Zilber, E. & Serences, J. T. Substitution and pooling in visual crowding induced by similar and dissimilar distractors. Journal of Vision 15, 1–12 (2015).
  10. Tamber-Rosenau, B. J., Fintzi, A. R. & Marois, R. Crowding in Visual Working Memory Reveals Its Spatial Resolution and the Nature of Its Representations. Psychological Science (2015). doi:10.1177/0956797615592394
  11. van den Berg, R., Roerdink, J. B. T. M. & Cornelissen, F. W. A neurophysiologically plausible population code model for feature integration explains visual crowding. PLoS Computational Biology 6, e1000646 (2010).
  12. Dakin, S. C., Cass, J., Greenwood, J. A. & Bex, P. J. Probabilistic, positional averaging predicts object-level crowding effects with letter-like stimuli. Journal of Vision 10, 14 (2010).

 

Why is crowding released at the saccade goal? A commentary by van Koningsbruggen and Buonocore

mechanisms behind perisaccadic increase of perception

At the start of 2013, I published a paper in The Journal of Neuroscience showing that, just prior to a saccade, the deleterious effects of crowding are released at the saccade goal. “Crowding” refers to the phenomenon where an object in peripheral vision becomes difficult to recognise when it is closely surrounded by other objects. There’s a demo and a longer explanation of crowding and my previous paper here.

In this week’s issue of the same journal, Martijn van Koningsbruggen and Antimo Buonocore published a Journal Club paper examining the potential cause of a pre-saccadic release of crowding. The paper is behind a paywall here — email me for a copy if you don’t have access. It’s quite interesting to read how others interpret my work, and I think these sorts of commentaries represent and important stage of peer-review that occurs post-publication. Our formal response to this commentary has been published alongside the Journal Club paper, but here I’ll add some more thoughts about van Koningsbruggen and Buonocore’s explanation of our data.

In a nutshell, van Koningsbruggen and Buonocore’s main suggestion is that crowding may be released prior to a saccade because visual attention shifts to the saccade goal before the eyes move. The known relationship between eye movements and visual attention (e.g. Remington, 1980, Deubel et al, 1996) was part of my motivation for running the experiments in the first place. However, I don’t think pre-saccadic shifts of visual attention are adequate to explain our results, especially without a clear definition of “visual attention”. Our formal reply includes a summary of the specific reasons why “visual attention”, as it is used in the cognitive psychology literature, can’t fully account for our data. Depending on the authors’ definition of attention, there might be some explanatory power in their suggestion, but it seems to me that their use of “attention” actually describes an effect, such as a change in identification accuracy, rather than a mechanism, such as a change in the gain response settings of visual neurons.

To play devil’s advocate, I could, for example, argue that previous demonstrations of improved performance at the saccade goal also were the result of pre-saccadic changes in crowding, but have been called “attention” effects. Supporting this hypothetical argument, previous studies on eye movements and attention used stimulus configurations that would have been prone to crowding (e.g. Duebel et al, 1996, Kowler et al, 1995). My intention here is not to argue that this is in fact the case, but instead I’m trying to demonstrate that simply saying that changes in performance are due to “attention” may not necessarily encompass a meaningful explanation of the underlying neural mechanisms driving the changes in performance. Britt Anderson has written a great article about distinguishing “attention” as an effect versus cause in an open access article here.

Van Koningsbruggen and Buonocore bring up a few other interesting points about our study and its limitations which I mostly agree with, so it’s certainly worthwhile to read their article in full.

I’m very interested to carry on these discussions with other researchers, so please feel free to drop me a line or leave a comment here to share your thoughts. [Note that this was originally published on my old Harvard site, and there were a few contributions from people that may be worth reading: http://scholar.harvard.edu/willjharrison/news/why-crowding-released-saccade-goal-commentary-van-koningsbruggen-and-buonocore# ]

References

Our original article showing a release from crowding at the saccade goal:

Harrison, W. J., Mattingley, J. B., & Remington, R. W. (2013). Eye movement targets are released from visual crowding. Journal of Neuroscience, 33(7), 2927–2933. doi:10.1523/JNEUROSCI.4172-12.2013

Van Koningsbruggen and Buonocore’s response:

van Koningsbruggen, M. G., & Buonocore, A. (2013). Mechanisms behind Perisaccadic Increase of Perception. Journal of Neuroscience, 33(13), 11327–11328. doi:10.1523/​JNEUROSCI.1567-13.2013

Other references:

Anderson, B. (2011). There is no such thing as attention. Frontiers in Psychology, 2, 1–8. doi:10.3389/fpsyg.2011.00246

Deubel, H., & Schneider, W. X. (1996). Saccade target selection and object recognition: evidence for a common attentional mechanism. Vision Research, 36(12), 1827–1837.

Kowler, E., Anderson, E., Dosher, B., & Blaser, E. (1995). The role of attention in the programming of saccades. Vision Research, 35(13), 1897–1916.

Remington, R. W. (1980). Attention and saccadic eye movements. Journal of Experimental Psychology: Human Perception and Performance, 6(4), 726–744.

For a great introduction to crowding, as well as a heap of crowding demos, check out:

Pelli, D. G., & Tillman, K. A. (2008). The uncrowded window of object recognition. Nature Neuroscience, 11(10), 1129–1135.

Peer commentary: what does remapped crowding mean?


Along with James Retell, Roger Remington, and Jason Mattingley, I recently published a paper in Current Biology in which we describe “remapped crowding”. You can read about that article and effect here and here.

Denis Pelli and Patrick Cavanagh have just published a Dispatch in Current Biology discussing my article (link below), and what it means for trans-saccadic object recognition. I spent the better part of my PhD reading articles published by Pelli and (separately) by Cavanagh, so reading something they’ve written together about my work is novel.

Highlighted in the figure of their paper, Pelli and Cavanagh describe two ways in which remapped crowding may come about. Under both hypotheses, the location attributed to a target object is erroneously assigned to two different locations just prior to a saccade. One location is the object’s actual position, and the other location is the predicted, post-saccadic location of the object. The basic effect we described in our paper was that this object becomes difficult to identify when distractors are placed at the predicted location of the object. One possible explanation for such “remapped crowding” is that remapping may shift the representation of an object’s features prior to object recognition (Figure 1E of Pelli & Cavanagh). Target- and distractor-object elements appear jumbled and imperceptible because all features are necessarily mixed in a common (early?) processing area. Alternatively (Figure 1F), relatively accurately processed featural information may be drawn from two spatially separate locations simultaneously during remapping. Because the visual system is trying to make sense of multiple conflicting inputs, the ability to distinguish target and distractor becomes more difficult.

In our paper, in the second to last paragraph of the Discussion, we tended to favour the latter of the two suggestions because it seems more parsimonious based on previous work. However, the precise answer to the question will only come from further experimentation across a broad range of disciplines, and from independent lab groups. Some of my current work in psychophysics focusses on answering this question, but, for whichever hypothesis the behavioural research favours, we also need a plausible neural mechanism from the neurophysiology folk.

Here’s a link to Pelli and Cavanagh’s full article (email me if you don’t have access): http://www.sciencedirect.com/science/article/pii/S0960982213004302

Pelli, D. G., & Cavanagh, P. (2013). Object Recognition: Visual Crowding from a Distance. Current Biology, 23(11), R478–R479. doi:10.1016/j.cub.2013.04.022

(commentary on: Harrison, W. J., Retell, J. D., Remington, R. W., & Mattingley, J. B. (2013). Visual Crowding at a Distance during Predictive Remapping. Current Biology, 23(9), 793–798. doi:10.1016/j.cub.2013.03.050 )