What you see is what IT gets: Responses in primate visual cortex during natural viewing
Date Posted:
October 20, 2021
Date Recorded:
October 19, 2021
CBMM Speaker(s):
Will Xiao All Captioned Videos CBMM Research
Description:
Abstract: How does the brain support our ability to see? Studies of primate vision have typically focused on controlled viewing conditions exemplified by the rapid serial visual presentation (RSVP) task, where the subject must hold fixation while images are flashed briefly in randomized order. In contrast, during natural viewing, eyes move frequently, guided by subject-initiated saccades, resulting in a sequence of related sensory input. Thus, natural viewing departs from traditional assumptions of independent and unpredictable visual inputs, leaving it an open question how visual neurons respond in real life.
We recorded responses of interior temporal (IT) cortex neurons in macaque monkeys freely viewing natural images. We first examined responses of face-selective neurons and found that face neurons responded according to whether individual fixations were near a face, meticulously distinguishing single fixations. Second, we considered repeated fixations on very close-by locations, termed ‘return fixations.’ Responses were more similar during return fixations, and again distinguished individual fixations. Third, computation models could partially explain neuronal responses from an image crop centered on each fixation.
These results shed light on how the IT cortex does (and does not) contribute to our daily visual percept: a stable world despite frequent saccades.
WILL XIAO: Today, I want to share some progress on my project studying vision neuron responses during free, natural viewing. Some of you have heard about this project in a flash presentation exactly two months ago, and today I'll just share more results and go into more detail. I'll be talking about neurons in the macaque IT cortex or inferior temporal cortex. And most of you probably already know IT is a high-level visual area in the object selected central visual pathway.
So I want to start by explaining why I think it's important to study natural vision. One reason is the intriguing phenomena of our visual perception. That is, we're constantly making eye movements. This isn't news to most of you, but just for the sake of being on the same page please bear with me. When we look at a picture like this, our eyes will be scanning all over the picture illustrated by this trace in the middle and the x location over time on the right.
From the right plot you can appreciate that in each second, our eyes are stationary most of the time, but they stay at any particular place only for a fraction of a second. In between these stationary periods, which I'll call fixations, there are rapid eye movements known as saccades which move the eye from one location to another. This raises a problem.
We perceive a stable visual world. The world does not look like it's moving despite our eye movements. In fact, we cannot see our own eye movements in a mirror. Why does it matter? Well, if the world is actually moving with our eye movements, it would look like a video recorded on a shaky camera. Instead of something smooth like this video on the left, we'll see the world jittering like this video on the right.
This is just an analogy, but if you need more convincing, you can try to, very carefully and gently, press on one eye through the eyelid and you should see the world jittering with the fingerprints, which you wouldn't see if you're actually moving your eye. So how does the brain stitch together this stable representation from this unstable input? And where in the brain is this stitching happening?
This question of visual stability has been asked since at least the 1600s. And by now, there's a good understanding of why we don't see the world moving with our eye movements, but there's virtually no account of where in the brain is our stable visual perception represented, or indeed what type of information do we expect to find in the brain. Particularly for my project, there is precious little information about whether objects selected visual areas in the visual stream contain any stable representation.
There are to be sure a couple of very notable studies which I will discuss shortly. But for now, the problem of visual stability is-- I use it as an example of why we should study natural vision. It's mainly because we want to understand phenomena in our daily visual experience, phenomena such as invariance to eye movements, stable perception, and so on.
And this leads me to the second reason. We must study natural vision in order to test our understanding of vision as derived from more controlled experiments. The standard task for studying object selectivity at least is the RSVP task, or rapid visual serial visual presentation. Apart from how natural the stimuli are, which are usually isolated, flashing random images, this task also contains no eye movements. The subjects are required to hold fixation on a dot like this red dot. So the task will not reflect any effects of eye movements.
And of course, all these unnatural aspects are totally intentional because they make the responses easier and cleaner to interpret. But at the same time, these unnatural manipulations mean that at some point we must test generalization, generalization of understanding derived from these simplifying assumptions to the complex reality of natural viewing. I don't know if any of you think that studying our species is meaningful for its own sake, but if you are, I like to have that philosophical discussion offline.
For now, I will assume we ultimately care about natural behavior, and so we must take any conclusions, theories, models, and so on derived from controlled experiments and test them in natural vision. So I've just argued it's important to study natural vision, but in fact, it has been studied-- and even so writing the interior temporal cortex that I'm going to talk about-- in two remarkable studies, which I'll briefly summarize and explain why I think there's still something left to do.
In the first study, from our own Jane DiCarlo and John Maunsell, the results show that whether doing fixation or doing control viewing-- or doing free viewing, sorry, the same stimulus generates the same responses. This is shown by the red and black lines which represents Steichen histograms aligned to the stimulus or fixation onset respectively. Not only is this response magnitude the same in the two conditions, the dynamics are also practically identical.
The selectivity is also the same, here illustrated by the differential responses to two stimuli, best target and worst target. In the second study by Sheinberg and Logothetis, they also showed that IT neurons had the same response magnitude and selectivity doing free and controlled viewing, here shown by the difference across stimuli. So that's it. Case closed. The two studies agree, and my talk is done. Thank you.
[LAUGHTER]
Well, yeah, of course I'm kidding, but seeing what's left to be done requires looking at these studies in a bit more detail. And I want to preface by saying that I'm not criticizing the studies. They're really technically admirable and fun to read. I encourage you to check them out, but I want to explain why I'm still studying the same problem and what potential new findings I'm looking for.
In the case of the first study, they used these monochrome simple shapes which were put into a linear array for free viewing. I know this is really hard to see because of how zoomed out this is compared to what the monkey would see, but you can just barely make out there are, I think, 11 simple shapes here in a linear array. Still, the stimuli are not very memorable or natural.
Another way to say that is if I look at two different arrays of these, I'm not sure I will see one overall scene as different from another. There's no so-called scene gist. In other words, I'm not sure I expect any stable representation that is specific in the sense of being different among different trials. To go into to finer detail, this study as well as the second study free viewing actually means visual search.
In this study in particular, the visual target did not stay constant. The visual scene did not stay constant but actually changed during each saccade as represented by these breaks in the-- I guess space-time worms of the individual shapes. So the stimuli are basically turning on and off during the saccades. Moreover, the stimulus that's represented in this plot, in this dark grey bar, didn't turn on until during the final saccade.
So not only is there no stable visual scene because the thing is changing, it also rules out any predictive responses before the saccade because the target seems like it wasn't there. And what I mean-- what do I mean by predictive responses? Well, this will be more clear in the second study. Here, they used more natural stimuli, a target object, this parrot, for example, that was semi-transparent superimposed on a natural image background.
The location of the target is indicated by this red circle and there's a blow up over here. This is also a visual search task, so the monkey is required to find the target and do a binary classification based on some arbitrary label that was learned. I'm actually lying in this title because this study did report an unexpected result that responses were different during free and what you would expect from controlled viewing. And what that means is that neurons responded to the search target slightly before foveating the target. That's shown here in this dark black line.
For comparison this lighter gray line shows a response aligned to the stimulus during controlled viewing. Obviously here, the response can't happen before the stimulus turns on. And to make this effect seem more real, it actually depended on the distance away from the target before the fixation as shown in this plot. But a predictive response is present even at 12 degrees away compared to the control condition.
And by the way, some of you might be wondering why the new around wouldn't be responding to the whole time since the target is always on the screen even before the monkey found it. I have not mentioned this, but even in IT, neurons have rather circumscribed receptive fields, much smaller than the size of the whole image. So most of the time when the monkey is looking at some place on the screen, the target would usually be too far away for the neuron to respond.
And to convince you of this, here are the same responses on the left plotted in color but also a comparison condition where the response is also taken during the trial interval plotted by the same sized bins but when the second actually ended up somewhere else rather than the target, and there, there was no predictive response. So at face value, this result seems to contradict the previous study. I should say they are not strictly incompatible, for example, because of the constant visual seeing the first study, but there are further details that I won't go into you unless you ask.
I'm introducing the two studies here in order to preface the question, what is left for us to learn about free viewing? I kind of primed the answers, so it should not surprise you that, in my opinion, it would be interesting to look at free viewing without any particular task while doing so using naturalistic complex images. And it will be interesting to look at the whole response the entire time over all fixations.
These plots that I selected from the two studies only considered the last fixation in the trial when the monkey found the search target. And I think only the second study analyzed the rest of fixations but not with a high level of detail. So these three points are intended for the experiment to approach natural behavior as much as possible. Or perhaps more interestingly, I then want to specifically look at properties that could be unique to free viewing, such as predictive responses to the future like that reported in the Sheinberg and Logothetis study, retrospective responses to the past or history dependence, and potentially whether there's any evidence of a stable visual representation.
So finally, I'll get to exactly what I did. Here are my experiments and current results. First, we train the monkey to do a simple task. It wasn't really a task. The monkeys didn't have to do anything except to look at the images while we tracked eye movements. This timeline illustrates one example experiment. We showed a sequence of images, each one lasting for 1.5 seconds or so depending on the session. I will call the presentation period a trial, during which time the monkey can freely view the image.
And I'll hand a bit here and say that, although it's not very natural to show each image relatively briefly ending a random sequence, these choices are not essential because I've tried to reproduce the conclusions for each image lasting as long as the monkey cares to look at it. But since the majority of the experiments are run this way, I'm introducing it this way. The plot in the middle illustrates the scan path in one example image that was repeated in several trials. Each color represents a scan path in a different trial.
And here are some basic statistics of the looking behavior. As a reminder, in my analysis fixation always alternated with saccades by definition. So fixation lasted about 200 milliseconds on average and saccades about 50 milliseconds. That means the monkey made about four fixations or saccades per second. The average saccades was four degrees. This can be compared to the image size which was set at 16 degrees, so about the size of a laptop screen at normal viewing distance.
I'll play a movie of an example trial. An image will be shown briefly and this red, which represents the eye location, will start moving. The movie will be in real time speed if it plays. Here it goes. Yeah. So I think there is some stirring of the eye trace. It should move smoothly, so trust me that it does. And, I guess, just for fun I aligned this video, instead of in real world coordinates, with the fixation center, so now you'll start the image jittering just to give a sense of what an unstable visual representation would look like.
OK. It's easy to imagine, but I guess it's fun to see. So while the monkey was doing that, we recorded neural responses in the inferior temporal cortex or IT. I don't need to belabor this. IT is a high-level visual area, but it also sets only one or two synapses away from the hippocampus where memory is formed. So it's not really a crazy place to look for whether visual stability has been achieved.
And for starters, here's the average firing rate of the neurons in one experimental session, showing a time window containing a field image presentation. So you can see the neurons responded more during presentation as expected for vision neurons. But now the interesting question is, did fixation and saccades during presentation modulate the response? If I simply realigned the same responses to fixation onset, it looks like there's not much modulation. That's shown here by the blue trace.
The black is the same on the left just with a more zoomed in x-axis. So did the neurons not care about fixations? Well, that wasn't the case but only after we take selectivity into account, as I will now show. And to really test specificity to individual fixations, I'll start with an easy to understand analysis using face neurons.
Face neurons respond more highly to faces than to non-face images, and that's useful because it's really easy to draw regions of interest for faces. So here that's what I did. And I labeled the fixations, color-coded here, red for face white for non-face, depending on whether the fixation fell on a face. And we selected face neurons from our recording based on what I call the zero fixation which is the period right after image onset but before the first eye movement, so a period that's essentially the same as RSVP.
Because there was no gaze control at image onset, the eye can happen to be on any part of the image. So it could be either inside or outside a face or an eye, and I can have some trials containing the two conditions. And this plot just confirms that we are looking at face-selective neurons. It's also worth emphasizing that this specificity implies a spatial specificity because the image containing a face will be present on the screen the whole time, but the response depended on whether or not the gaze was on the face.
And I guess also relevant here is that all the neurons I'm showing today have foveal receptive fields, so I'm using like gaze and neural response receptacles interchangeably. Well, the interesting question is, how did the same neurons respond after the eye starts moving? And here are the responses aligned to fixation onset now grouped by two categories, either before or after each saccade.
So the dashed lines, dashed blue and orange lines at the critical conditions, they represent when the viewed category was different across a saccade. So for example, in this dashed orange trace, the eye would have moved from a non-face part of the image to a face part of the image. And you can appreciate that the neural responses followed where the fixation was and matched the category of the fixation at that time roughly.
So I think that's a really nice demonstration of specificity to individual fixations. And just in case you're wondering why this doesn't show up in the previous plot, of course, that was because this blue trace is an average of all of these conditions, and then most of the selectivity modulated responses averaged out. So this analysis already shows a break between IT neurons perceptions, but one downside is that it's limited to face-selective neurons only.
So what about the rest of IT? Ideally, we can test fixation specificity for any neuron without having to draw the appropriate area each time, which may anyway not be as clearly defined as faces. To do so, I'm going to show another analysis. And before that, I just want to quickly relate back to one of the studies I introduced, the Sheinberg and Logothetis study. And if you recall, I was pointing out this anticipatory response, which in this case, if we take the non-face to face saccades as an example, would mean that the response starts to rise before time 0.
And as you can already kind of see here, there's little difference between the dashed orange line and the solid blue line. And is this tiny difference real? We can look at this as a function of saccade size because we would expect this rise to be larger for smaller saccades based on their results. And that split is done in the upper right plot here for non-face to face saccades again but for different saccade-sized bits.
So there's no clear trend for the response before time 0, and if anything, I think the responses are a little higher for large saccades than for small saccades. So my results are more in line with no anticipatory responses, but I'm not entirely ruling this out, although I will not show analysis that specifically look at this because the results are a bit ambiguous. I'm still trying to figure it out.
But we will have another look at this switching time of selectivity issue in the next analysis designed for general selectivity. This analysis is based on return fixations which are fixations on the same image that landed near each other, either within the same trial or across trials when the same image was repeated. This figure illustrates scanpaths as before, but now the plot points connected by thick lines indicate which fixations are paired as returns. And I define that with a threshold of 1 degrees of visual angle.
Return fixations are useful because if neurons cared about where exactly each fixation is at between a pair of return fixations, responses should be more similar than between, well, random fixations. And furthermore, the specificity should be time specific, which is to say anchored to the interval of the fixation being paired. So to illustrate this analysis, this scatterplot shows a firing rate of one extreme example neuron across many return fixations. Each point represents one pair of return fixations, and the two subplots show two time bins relative to fixation onset.
Each time bin is 25 milliseconds in size and centered on the times level here. So the x- and y-axis shows the firing red in one of the return fixation pairs. And I hope you can see that although the effect is subtle the points are gathering more closely to that diagonal on the point on the right, which shows that responses are getting more similar but only after the paired fixation started. This subtle effect can be captured well with Pearson's correlation, and we could plot this correlation over time.
To make it even better, we could plot for comparison if the previous fixation, the one before time 0, was paired rather than the current one, the one after time 0. So from this you can appreciate that responses were more similar during the period of the fixation being paired on. So this is for this one example neuron, and here is an average across neurons. The lower histograms show the boundaries of the previous and current fixations which gives a sense of how long you should expect this consistency to last, keeping in mind that, of course, nearby fixations in time are also self similar because of generally small saccade sizes.
So that's why the correlation doesn't drop to zero around this-- after the fixation ends. So this qualitatively agrees with a face specific analysis in showing that responses are tied to individual fixations. And the benefit of this analysis, of course, is that we can now repeat it across a race with different selectivity profiles without having to define our eyes for each type of selectivity.
And what does this result matter? I just want to emphasize that the perception probably isn't sensitive to individual fixations, and it also probably doesn't change every few hundred milliseconds. But this result shows that IT responses do both. They faithfully track individual fixations and change with each new fixation. And since IT is already at such an advanced stage of visual processing, it's interesting that still here the brain has not solved the problem of visual stability. OK.
AUDIENCE: [INAUDIBLE]
WILL XIAO: Yes?
AUDIENCE: Using this idea that if the brain had solved the problem of visual stability, there would be no change at all in a saccade. Is that the idea?
WILL XIAO: So that's a great question.
AUDIENCE: That's a very strong version of a solution to visual stability, right?
WILL XIAO: That is a great-- that is a great question. And I think I'm going to show analysis later, specifically about visual stability, and I'd love to hear your predictions about what stable should look like compared to not stable. I don't necessarily think that there's no change because a stable presentation could still be updating after each saccade.
But I guess, to preface it a little, I expect that if you look at the image for longer, later in the time, the effect of each fixation should be smaller than if you first saw the image, like the first equation. But I'll show a sketch of what I think you should look like.
AUDIENCE: I'm going to ask you one other thing. So this is very nice. It seems to me it shows two things or shows at least one thing. It shows that there's some dependence on fixation position such that fixating on the same position leads to a correlated response across that scatter plot, but those correlations are kind of low. And I'm wondering whether the fact that they're so low is some kind of random noise, or if you think there really is an important difference in the response. I don't how you tell.
WILL XIAO: Yeah.
There's a lot of pieces where there's no obvious source of that difference, like when they live in the same place at the very beginning of stimulus onset, or something like that, SO you can pull apart effects of the previous fixation. You see what I mean.
WILL XIAO: Well, doesn't the green line kind of do that where the purple line would be the current fixation is paired and the previous [INAUDIBLE]. And the green is-- the previous is paired, but the current isn't. So I guess the correlation is temporarily specific, and dependent which fixation occurred.
AUDIENCE: Yeah, but I guess I'm asking, why are those correlations not [INAUDIBLE]? Why aren't they [INAUDIBLE] values of that? Oh, it's a biological system, [INAUDIBLE]. But what I'm wondering, is there anything beyond just some random [INAUDIBLE], is there-- anything that's actually meaningful-- I mean, [INAUDIBLE] has been claiming that you get different responses to the same [INAUDIBLE] stimulation if it's in different parts of the [? movie. ?]
And so I'm wondering whether you're seeing [INAUDIBLE]. The fact that these correlations are relatively low could be taken to mean that there's some kind of [INAUDIBLE]. We don't know due to what. Maybe where the eyes came from before that fixation. And I wonder whether you think that those relatively low correlations are telling us that not only is there depends on where you look, but that where you look doesn't determine everything above [INAUDIBLE].
WILL XIAO: Yes, oh, I totally agree with that. Because obviously the correlations aren't one. But this is something that we control and measure, or is it something internal that I don't know? And I'm emphasizing the part I can explain, but I totally agree with you. This is showing that 0.8 of the correlation is something else we don't know.
And I guess, sorry, just to kind of convince you that this effect-- at least the part that I can explain-- is real, is that I don't have the slide right now. And I can't draw. But if I-- so if you look at these return fixations, the threshold was 1. But I adjust the threshold to look at the correlation as a function of how closely the return fixations need to be.
And the correlation drops when I increase the threshold and increases when I decrease the threshold down to a quarter of a degree, which is the limit of our eye tracking. That says that not only is the analysis really sensitive to the location, but IT is really sensitive to it's location. Which is kind of surprising, like [INAUDIBLE] degree specificity in IT. I think it's cool.
But, yeah, and I guess just another part of the response is--
AUDIENCE: I have another quick comment, I think it's also worth pointing out that even in an RSVP paradigm where [INAUDIBLE] exactly the same [INAUDIBLE], correlations are never one.
WILL XIAO: That's what I'm showing you here, [? Gabrielle. ?]
AUDIENCE: What kind of correlations would you get for this?
WILL XIAO: [? Gabrielle, ?] that's what I'm showing here. There are too many things going on, but the left side, the dash gray line, is single trial to trial self-consistent during RSVP.
AUDIENCE: With controlled fixations.
WILL XIAO: With controlled fixations.
AUDIENCE: [INAUDIBLE] OK, and that's what I'm asking. So they are much higher at 0.6.
WILL XIAO: No, the solid line is trial average. So I don't remember how many, but maybe 10 trials, or 20 trials split up into 10. But also [INAUDIBLE], so it's kind of like 20. But if you look at single trial, it's much lower. And it's actually in line with what I saw.
AUDIENCE: That was my question.
WILL XIAO: Yeah, and I think this is actually in line with other labs data, too, they just never published a single trial consistency. Because it's not meaningful, because every analysis is done with average results anyway. OK, anyway, that was a lot of [INAUDIBLE]. But was there another question, [? Carlos? ?]
AUDIENCE: Actually, I wasn't going to ask you that exactly, but [INAUDIBLE]?
WILL XIAO: Yeah, yeah, so that's going to be relevant in the next analysis I'll show, too. And it's kind of-- I'm running really short on time, so I'll just zoom by it I think.
AUDIENCE: Hey, Will, can you just-- sorry, just clarify. Certainly the neurons are sensitive to the location and it's not stability in the sense of the neuron. It's clear in the data. But are you saying that they're actually not reproducible when all the retinal conditions are aligned? You sort of implied that a minute ago, but then you showed something that-- [INAUDIBLE].
WILL XIAO: Well, I guess in this analysis the threshold was one degree. So then one degree is a lot of retinal mismatch.
AUDIENCE: Right, exactly. So the strength of the correlation, you can work that all out. In fact, I did that as a postdoc. And as far as we could tell, things were almost identical once you aligned conditions. So are you saying when you return the eyes to the exact same position on the image that you will get a different pattern of response out of IT than the previous response?
So two exact repeats in the context of free viewing of the exact same position-- now, I you don't have that exactly. But you can infer that given experimental noise. Do you think there's a difference or not? Does that make sense?
WILL XIAO: Do I think that if I exactly match the retinal stimulation, which I'm not doing here because I have a finite threshold-- but if I exactly match the retinal stimulation, would correlation go to one, or is there something internal that's not controlled? And right now I believe yes. But I think, really, I just don't know enough.
I think the rest of the brain could be doing a lot of things. But, yeah, I think I need to look at the derivations you mentioned you did and maybe data to see whether that's [? true. ?] Yeah, but I guess also recording-wise, we use multi-electrode arrays. So it's hardly ever exact single unit.
And I think in your own studies using multi-electrode arrays you also used a cutoff for selective neurons that's less than correlation one. And that's with trial average responses, too. For example, I think in the science paper used 0.6. That's in line with our average responses as well. So I guess, yeah, I just don't know what would have happened with perfect recording.
I guess I'm really going to zoom by the modeling results. They are exactly what you would predict, so not that necessarily to dwell on it. The point is to ask, is selective selectivity the same during free and controlled viewing, and can it be captured by the same kind of model, which state of the art model would be a CNN-based linear mapping.
So the way I did that is to use up a pre-trained CNN and a trained linear mapping to predict neural responses. But to adapt it to free viewing, use crop windows centered on fixation. And here was the plot. Well, in addition, I took those crop windows and ran them in a separate experiment on the RSVP.
So now in this plot, gray line shows neural self consistency. On the left there are two conditions, the solid and the dashed, to illustrate the differences when you use trial average versus single trial responses. The blue lines show model fit on RSVP data. And you see it captures a good portion, although far from perfect, of the responses.
Now I tested the generalization of this blue line-- of the model represented by this blue line in free viewing data. That's shown by the results on the right. And you can see that they're doing basically as well as the self consistency allowed.
In the meantime, I also tried to fit a model directly on the free viewing data that's represented by this orange line. And it's not doing even as well as the generalization model, which is kind of interesting. But also could be because the data is just more variable and limited in free viewing.
OK, so to summarize, I think this analysis all show that IT neurons track individual fixations with about the same selectivity in free viewing and RSVP, limited by the resolution of that modeling analysis, of course. And they can be captured by the same model that's good for RSVP.
But this analysis are designed specifically to look for fixation specificity. In other words, they're not trying to look for any special properties of free viewing, for example, which is stability. And it's possible that part of the neural responses could be fixation specific. And as [? Nancy ?] pointed out, the other 0.8 of the correlation could contain some other information.
So here I'm going to be really speculative about what-- I'll show one analysis and discuss what I expect visual stability would look like in this analysis. And I'm not certain that my expectation is right, but I will also show data of how high IT neurons actually behave in this analysis.
So I'll go back to my preferred metric of self-consistency. And let's look at one trial over time. Suppose it has [INAUDIBLE] 1 to 2 seconds of field fixations. So what would happen if we compare responses over time for two trials that showed the same image?
Well, for fixation invariant responses, I would expect the self consistency to go up, just because at the beginning of the trial, the eye could happen to be at different parts of the image. But the end of the trial, if our intuition is any guide, we feel like we're seeing the same image even though we started at different places. And so the self consistency of such a representation should rise after we make a few fixations on the image.
And contrast that to what I've been showing, the fixation specific self consistency, that would be pretty high at the beginning if I paired the same image and the same fixation locations. And it might start to rise a little, but eventually it will be bounded by this blue line. Because at the end the neuron cared less about updating information, and more about the accumulation over time.
And I could also contrast that if I deliberately restricted analysis to fixations that are far away from each other, then again at the beginning I would expect the responses to be dissimilar. But at the end, it should not matter if the fixation at the end happened to be far apart, as long as the whole trial was on the same image and the monkey sufficiently explored the image.
This is what I expect, and this is what IT data actually shows. The self consistency condition on the same image, it's not rising over time. In fact, all the self consistences are dropping. I don't know exactly why. It could be related to adaptation, but I just don't know.
But this spread between pairing fixations and deliberately keeping fixations far away from each other, they consistently over the whole trial resulted in higher and lower response subclasses. So I don't know if people are convinced that this shows in IT they're not basically accumulating.
Obviously, a monkey that started at different places, or on the image of a different trials, could just not have explored the whole image sufficiently. And, yeah, that's a possibility. Before we look really hard for stable visual presentation, I do want to recheck our assumptions and ask if it is just an illusion, the stable visual world.
Of course, there are a bunch of qualifications. Because obviously we can infer where things are without looking at them in typical context. And if we specifically want to remember where some things are, we can. Even though when we're not specifically remembering any particular thing, there's evidence showing that visual search doesn't use memory.
And if people have disagreements for-- here, I'm playing devil's advocate and saying there's no stable representation in an automatic process. So if people have references that argue against that, I would love to hear about them. So I do not show an analysis of history dependence. But I'm going to conclude that so far, there's no evidence of stable representation.
And I think I'll stop here. I have more analysis to show if people are curious. But, well, OK just quickly, if everything is predicted by what we already expect, what does it matter? So I think first it's a good vindication of not only controlled viewing, but previous studies on free viewing, even not using directly free viewing paradigms.
And I think the analysis method developed here, I think they are directly applicable to natural behavior. And there are already studies that simultaneously record head mounted camera video, eye tracking, and neural signal. And I think these announcements can be already applied to those data to directly infer about natural behavior. So I think methodologically it is useful. With that, thank you for listening, and I'm happy to take questions.
Associated Research Module: