Tutorial: Using Decoding to Understand Neural Algorithms (1:10:34)
Date Posted:
August 13, 2018
Date Recorded:
August 13, 2018
CBMM Speaker(s):
Ethan Meyers All Captioned Videos Brains, Minds and Machines Summer Course 2018
Description:
Ethan Meyers, Hampshire College/MIT
Introduction to neural decoding methods to study the neural representations of sensory information in the brain to support recognition, their modulation by task-relevant information from top-down attention, and existence of sparse, dynamic population codes.
Download the tutorial slides (PDF)
PRESENTER: I'm going to talk a little bit today about using neural decoding to understand processing beyond ventral visual pathway. Or, I'll start by talking about the ventral visual pathways. Then we'll go beyond that. So let me get started.
So you all saw this morning, in Jim Decarlo's talk, how he described the representations in the ventral visual pathway. And as you see, those representations, as you're moving up there, become really useful for doing things like recognizing objects. And did he talk about what the output of the ventral visual pathway is and where that goes? Does anyone know?
AUDIENCE: The frontal areas.
PRESENTER: Yeah, so it goes to frontal cortex and medial temporal lobe for memory. But most importantly, it goes to the homunculus. So did anyone tell you about the homunculus? So it's the structure that reads out from the ventral visual pathway. And if you recognize a car, it will tell you how to get in that car and drive to the store. It's a very powerful part of the brain depicted here to look like Christoph Koch.
But unfortunately, the homonculous doesn't really exist. Actually, there are other parts of the brain that need to somehow use the information in the ventral visual pathway to control behaviors. And so I'm going to talk a little bit about how, for example, the prefrontal cortex can go back and modify other representations there. Later in the summer school, Christoph will tell you about consciousness, which is above my pay grade. That's something to look forward to.
So the way I view my goal in terms of our research is to try to understand the neural algorithms. So that means trying to understand a step by step processing such that light enters the eye and produces more complicated behaviors. And this is obviously inspired by people such as these. Does everyone know who that is? Anyone know who that is? That's Tommy Poggio. And does anyone know who that is?
AUDIENCE: David Marr.
PRESENTER: Yeah. So that's David Marr. Are people familiar with David Marr's work? So if not, you should read his book Vision, or at least the first few chapters. He describes how the brain can be understood at different levels, from the implementation level, which is like neurons, to the algorithms and representations that the brain uses to solve problems, up to the computational goals. And so a lot of what I focus on is trying to understand the algorithms and representations that can be used to solve particular tasks.
So to try to understand those neural algorithms, what I do is I collaborate with experimentalists, usually, who do recordings on monkeys. And sometimes I do some studies on humans with EEG. Unfortunately, when I collaborate with experimentalists, they usually give me data that looks like this, so like spiking rasters, which hopefully you're familiar with, or these messy EEG waves. They don't give me a little pieces of algorithms. So the first thing I need to do or had to figure out was, how do I turn this stuff that's noisy and messy into useful information that I can use to build up algorithms to solve tasks?
And so in particular, focus on two things. I try to understand neural content, which is what information is in a particular brain region at a particular point in time. And then I also try to understand neural coding, which is, how does the activity represent information-- the actual patterns? And there's a little bit of a chicken and egg problem between these two. So in order to understand content, you have to know a little bit about how the information is coded. And to understand coding, you need to understand what the content is. But I start with some initial guesses and then try to refine, based on empirical results, what the best representations are.
So at the moment, this is the way I think about it. So I think information is contained in patterns of neural activity. I think this is similar to how Jim and other people think about it-- hopefully not too controversial. So for example, if you showed an image of a car and you recorded from a population of neurons, you're going to find that some neurons fire more than others. And it's this pattern across neurons that carries the information. Also, I sometimes represent it in the talks as these colored squares where darker reds mean higher firing rates and blues mean lower. But the notion is that what is important is this pattern.
And so if I now have a way to tell what information is there, the idea is to see how the information is traveling in different brain regions. So if I collaborate in a study where we're recording from IT, I can decode that there's cat information there at a particular point in time. And then maybe, if we looked into the prefrontal cortex, we might see some sort of statement like, if it's a cat, execute motor command Y. And then you could look at the motor cortex and see how command Y is executed. And so by piecing this together, we can see how tasks are solved.
That's the goal. A little ambitious. Obviously, not completely there yet. But I'll show you steps we're taking in that direction.
So to outline what I'm going to talk about, I'm going to first start with the basics of neural decoding. Are people at all familiar with that technique? I'll just go through it really quickly. I think Jim was talking a lot about it this morning, or used it, at least. So I'll describe a little bit about the methods. And then I'll talk about representations in the ventral visual pathway and how I think the goal of most sensory processing systems is to build up what I call abstract representations. These are representations that are useful for behavior. So they go from physical properties into a representation that you can act upon. And then I'll show how these representations can be modified by task demands. And this is often called attention.
And then, after the ventral visual pathway, we think information maybe goes to the prefrontal cortex. And I'll show, in those brain regions, information is really very selectively represented if it's task relevant. And otherwise, those representations aren't there, or that information is not there. And I'll talk about how information is coded there through, often, not a single pattern of activity, but often through sequences of activity, and often sparse subsets of neurons.
And then, if there's time, I'll talk about another study on pop-out attention where I'm trying to really trace the whole thing and try to get a little circuit of information flow. Does that seem reasonable?
So let's start with the neural decoding methods. So what neural decoding is-- what you try to do, you try to predict what stimulus was shown based on neural activity. So it's a function from neural activity back to the stimulus that elicited that. And this is a technique that's been around for a while, over 30 years. It's been used in the motor system for-- in brain-computer interfaces. So this is-- here's an image of someone who's paralyzed, but they have electrodes in their motor cortex, and they can control a robotic arm using just their thoughts. So that's the same technique I'm using.
It's also been used by-- in the hippocampus to tell where a rat is in a maze by people like Matt Wilson. There's been computational work developing algorithms to do this decoding. And it's also used in fMRI community, where it's called multi-voxel pattern analysis. So people are familiar with that technique? So it's basically the same thing.
So very quickly, it just works like this-- in a typical experiment, we'll have a monkey looking at an image on a screen. And we record a bunch of neural activity. And we take that activity and we feed it into a machine-learning algorithm called Pattern Classifiers. And what that classifier does is it learns to learn the association between neural activity and a particular image. And then we show another image, we get another pattern out, feed that into the algorithm. And it learns to associate that new image with the next pattern.
And so we do that for all images in our data set. And once those relationships have been learned, we take another set of data. And in that test set of data, we want to see whether those relationships are reliable. So again, what we're going to do is, we get another, new trial where monkey is looking at one of the objects that we had seen before in the training set. We feed that-- we record that and we feed it into the algorithm. But now we have the algorithm make a prediction.
And so here, it predicted that a kiwi was shown. And that was what was actually shown. So we marked that as being correct. And then on another trial, we might show a car, get a different pattern out. And it predicted a face. And so we marked that as incorrect. And so what we want to plot is, how often are we correct as a function of time? And that can be converted into a measure of information. Is that clear to everyone? So this is what I'm going to be using for the whole time.
Just a few more details-- typically, experiments are actually done offline. So I record all the data first. So here is a bunch of different of these population patterns for trials of, let's say, fish and cats. And I just split them into different subsets of data. And I train the classifier on two of the subsets and test on the last one. And then I just do that for all the different permutations. That's called cross-validation.
And just one other methodological note-- a lot of times, I don't have data that's actually recorded simultaneously. So in many studies, there will be on the first day, they recorded one neuron. On the second day, they recorded another neuron. So I don't have population activity. But what I can do is, I can create these pseudo populations.
And what I do is, I randomly select, let's say, a kiwi from the first day, from the first neuron, and then a different random trial that a kiwi was also shown. And I just paste those together to get a fake population that I pretend was recorded simultaneously. And I do a lot of permutations of this to see the average amount of information over different pseudo populations. Obviously, the potential there-- some information could be lost by the fact that it's not coded-- it's not actual simultaneous results. I've looked at that question a little bit, and it doesn't seem like very much information is lost. But I would love to collaborate with other people who are collecting larger data sets to test this further. So if you're in that position, let me know.
So let me give you an example of what one of these classifiers looks like. This is one I use often. It's very, very simple. So to train this classifier, what I do is-- so here's an example of four training patterns for each class. These are, again, these population vectors. And what I'm going to do is, I'm just going to average the activity of each neuron together. So I go from four patterns for each category to one. And that's all that training involves here.
And then, to test the classifier, I have a new data point that wasn't in the training set. And I just do the correlation with each of those representational prototype vectors. And whichever one has the highest correlation, that's the prediction. And so here, the test point maybe looks most similar to this pattern for a kiwi, so the prediction would be kiwi. So it can be as simple as that. Or you can use Bayesian stuff or a whole bunch of other ones. I generally don't use anything too complex. So I wouldn't really want to use a deep neural network on this, because I'm trying to assess what information is directly in the population without measuring, not whether I can build a brain on top of this other real brain and pull out that information.
And so basically, there's an analogy between simple linear classifiers and the information that's available to downstream neurons. So if this is incoming activity from a population and I'm trying to decode that, there would be-- a downstream neuron would have a number of synaptic weights. And it would be the input times the synaptic weights that generate the action potentials that are output if it crosses a threshold. Hopefully that's clear. And that's very much analogous to-- these synaptic weights are analogous to the weights the classifier learns. And so this assessment of decoding is assessing what information is available to a population that's one step downstream from the one I'm measuring. That clear?
So let me show you just a very simple experiment to demonstrate how the technique works. And then I'll show you a couple slightly more interesting results. And then we'll go beyond the ventral visual pathway.
So here's a very simple experiment. What a monkey is doing is, it's fixating on a point for 500 milliseconds. And then, after that, up comes a single image. And what I'm going to try to decode is which of seven images was shown at that point there. And I'm going to do that based on firing rates of 132 neurons in IT. And these are, again, pseudo populations, not recorded simultaneously.
And so to do it, what I do is, I train the classifier at one point in time using the average firing rate in a bin. And then I test at that point in time. And then I slide over a little bit and train again and test again. So for example, I could use 100-millisecond bins and slide every 10 milliseconds. And what this is going to do is, it's going to give me information as a function of time.
And so let me orient you to the results. This is the same thing I just showed you. This is time 0. This horizontal line is chance decoding, which is 1 over 7, because there are seven items. And so what we'd expect is that before the image is shown, I should be around chance, unless the monkey is psychic. If the monkey is psychic, then you take it to Vegas and make money. But usually, it means something went wrong with my analysis. So I've yet to execute that strategy.
Anyway, let's look at the data. What we see is, the results are around chance and the baseline. That makes sense. And then hopefully, they're above chance afterwards. Otherwise it didn't really work very well, either. But they can go pretty high. So this is what I'll be showing you most of the day.
And we can also, as we talked about yesterday, run statistics. And I use a lot of permutation tests just to make sure when are the results above chance. Yeah.
AUDIENCE: So could this be completely driven by just a very small set of neurons, or just one neuron in the extreme case?
PRESENTER: Yes. So I will show you some results. In certain areas, we see things like that. So there are trade offs between this method. [INAUDIBLE] things. I think this method is great. But I would also always advocate you also do a single neuron analysis to answer questions like that, like which neurons are contributing? And looking at firing rates, as well. And so that's basically coding your results.
You can do things like look at confusion matrices. This is the pattern of mistakes. So if this was what was actually shown, and this is what the classifier predicted, you can see that maybe it often mistakes cars for faces or something. So there's more information you can get out apart from just the simple decoding. But I'm not going to go too much into that. But just to say, there's more that I'm showing you-- than I'm showing you.
Also, just I've found empirically that the classifiers you use don't matter very much. Maybe you get a little boost if you use a slightly better one, but generally, the results looks fairly similar. So here is-- this is a Bayesian Poisson naive Bayes classifier. But it looks very similar-- which is good, because that means the results are driven by the data, not by, necessarily, the algorithms. But again, I'm not going to show you many results here about this. But you can try different algorithms, and that's a way to test out the coding scheme. So if you have a different decoder, that tells you something about how things are coded if it works better. Yeah.
AUDIENCE: [INAUDIBLE]
PRESENTER: So that's the maximum correlation coefficient classifier that I showed you, that very simple one. That's the support vector machine and a Bayesian Poisson naive Bayes. Yeah.
AUDIENCE: Is it common for a [INAUDIBLE] classifier to do as well as [INAUDIBLE] much simpler [INAUDIBLE].
PRESENTER: Yeah, it is. So with-- when you're using a support vector machine, there's often hyperparameters. And then you have to tune those. It's just, I generally prefer not to use those. And so you can see, it's actually lower here, because maybe the hyperparameters weren't-- the regularization contents-- constant wasn't tuned perfectly. So in general, I've yet to see differences that are bigger than that-- or maybe sometimes, but it's pretty--
AUDIENCE: But it seems like people-- a lot of people seem to use SVM a lot more [INAUDIBLE] I actually haven't [INAUDIBLE].
PRESENTER: I see.
AUDIENCE: At least they did fMRI.
PRESENTER: Yeah, I think an fMRI. I think that is-- it could depend on the type of signal you're using, as well. So depending on the noise and the structure of your data, sometimes-- so for example, before I started doing all this decoding stuff, I'd done some computer vision. And I was using these same kind of algorithms to do computer vision. And there, support vector machines gave you a substantial bump. So it really is data dependent.
AUDIENCE: I see.
PRESENTER: But what I've found for neural data is that I don't see-- again, and maybe if I had huge data sets with many, many trials, I could fit a much more complicated model. And so I'd love to see larger data sets. But from what I've seen so far--
AUDIENCE: [INAUDIBLE] is that something that [INAUDIBLE]
PRESENTER: Yeah. Right. So my belief at the moment is that information is really coded by the angle of the population, not necessarily by the magnitude. I have one paper with Doris [INAUDIBLE] where I tried a bunch of different classifiers-- because an fMRI, what you're getting is the overall magnitude a lot. And I was trying to understand how that code information-- how much does that contain?
And if you look at face patches, they're defined by higher firing rates to faces versus other categories. So you would think that that-- maybe it has a lot of information. But it turned out that when I use this, that ignored the overall activity-- the magnitude of the vector as well as an offset-- or, I tried a bunch of different ones that-- there wasn't a-- this worked just as well. So at the moment, I'm thinking it's the angle.
So let's go-- so I just showed that, hopefully-- will continue to show that neural decoding is a pretty powerful way to analyze your data. And now, let's go on and look at how the sensory pathway is creating what I call abstract representations. So the ability to form an abstract representation is really useful for if you want to perform any kind of complex behavior.
So let me give you some examples of what I mean. So you might speak a few languages. And suppose you're in France, and people start saying bonjour to you. You want to know that that maps onto the same notion of if someone said hello to you in this country, even though the acoustics are very different. And so creating some sort of abstract representation that goes from low level features to the same behavioral consequence or meaning is really important.
Another example is, we might see a person from many different head angles, or maybe wearing different colored clothing, and we still want to recognize that this person is Hillary Clinton and not be put off by the low level details of her outfit or head angle. And so I think this is something that's happening a lot, where our brain is trying to build up these representations that lose early physical properties and create patterns that are useful for behavior.
So a very simple example of this is position invariance. We want to be able to recognize an object regardless of where it is in physical space or what retinal ganglion cells it actually activated. And so to test whether they are position invariant representations, what I do is I train a classifier to discriminate between all the objects at, let's say, an upper location. So back up-- this study, the seven objects was shown at three different locations. And I'm going to train the classifier just at the upper location. And then I'm going to test it using data from the lower location. And so if it can generalize built on the upper location to the lower location, it means there's similar patterns representing those objects at different locations. And so it's abstracted over the location details.
So here are the results. This is, again, chance here. And what you can see is, I can decode quite better than chance. So there's quite a bit of possession invariant information in IT. And maybe Jim talked about this earlier. It's not quite as good as if you train and test it at the exact same location, so there is still some retinotopic information in IT. But in general, it is pretty highly invariant. And I can do other things. I can train in the middle location and test at these other locations. And in general, it is way above chance to generalized other locations.
So that's one very simple example of abstraction. And then another example is from the face patch system. People aware of this? I know some people are. So in macaque monkeys, also in humans, if you do fMRI, you find certain regions that are more activated to images of faces than other objects. And in monkeys, there seem to be these six little patches that have properties that code for faces. And if you go in and stick electrodes in these patches, the neurons fire at higher rates to faces than other categories. And so [INAUDIBLE] is going to talk later, and he'll tell you a lot more about this, but this is just one little result to demonstrate abstraction in the ventral visual pathway.
So just to orient you, the eyes are out here. And this is the ventral visual pathway going down that way, posterior to anterior. And so what I wanted to do was I wanted to see whether there's a position invariant representation to individuals-- so regardless of the head angle, can I decode the individual? And so I was comparing the decoding accuracy in three different regions, going from posterior to anterior, to see, how invariant does the representation get?
And so to do this it's very similar to the object position invariant case. I train a classifier here on the left profiles. And then I either test the classifier with images also at the left profile from different trials or I test the classifier on different head orientations. So here, it just shows you the baseline case of non-invariant recognition. But if I can generalize to the other head orientations, it tells you it's more of either an identity code or some sort of global features.
So here are the results from those posterior region, where I'm training on the left profile and then testing on all the other ones. So here, I'm testing on the left profile. And what you see is that I'm able to decode above chance there. So there is this physical low level information there about head orientation. But what you can see is, I can't really generalize to these other locations. So if I'm looking at the images that are straight ahead, I'm at chance.
But if we go to the regions that are more anterior, further down the ventral visual pathway, what you find in this region, AL, is that now I can actually generalize to some of the other head orientations-- not nearly as well as training and testing at the same head orientation. But these lighter bars indicate that statistically significant results above chance there. Yeah.
AUDIENCE: I was wondering, was there any reason [INAUDIBLE] profile? Because [INAUDIBLE] it's easier [INAUDIBLE]
PRESENTER: Right. So I do all permutations. So this is just an example of one of the-- but I-- like in the last one I showed you, I trained it all positions. Here, I also trained it all profiles. And look at that.
So this is the middle region. And then the most anterior region you see, it can generalize even better to all the different orientations. And you can do exactly what you said. So here, results averaged over training and testing on all permutations. These are the training and testing at the same pose, so the non-invariant case. And you can see there is maybe a little bit of a build up of information going down the ventral visual pathway for the same number of neurons. But if you look at the abstract information-- so generalizing to other head orientations-- you see that also builds up at a higher rate there, too.
So to summarize this part, basically, again, I think the ventral visual pathway and probably most sensory pathways are trying to build up these representations that are invariant or abstract.
So now let's look at after these representations are built up-- which, again, Jim [INAUDIBLE] talked a lot about-- how are they modified by task demands? So attention is basically, obviously, an important phenomenon that we experience subjectively. But it can-- it's been studied a lot in monkeys. And what we-- you find behaviorally is that ability to recognize objects are really degraded by clutter. And what attention seems to do is it restores the ability to recognize objects when there is a lot of clutter.
So let me show you just an example of that. If everyone wants to pay attention to the slide for a minute, I'm going to flash an image. And you'll have to let me know if there is Waldo in this picture. Did anyone see Waldo? So that was pretty hard. But if I tell you, fixate on that point, and now I flash that image again-- did everyone see Waldo? And if you remain fixating on that point and I flash another image-- everyone see Waldo, as well? So clearly, it's not just, let's say, retinal acuity, but it's the fact that the clutter is destroying your ability to recognize images or objects.
So what's the-- we wanted to study the neural basis of this. And so the basic idea here is that objects are represented by patterns of activity, as I talked about. So a car has one pattern and a tomato has a different pattern. And what happens is, clutter degrades these representations. So if you show two images-- two objects at the same time, you get an averaging of these patterns. And that averaging is less indicative of either of the objects by themselves. It still might be somewhat indicative of the two objects, but it stops being destroyed, because you're combining.
And so if you have enough things being combined, you lose an indicative pattern of something you might want to know about. And so what attention does is, it restores the pattern to the object that you're attending. So if you attend to the car, the pattern for that car pops back into IT. Yep.
AUDIENCE: Question. So for the monkeys [INAUDIBLE] two objects [INAUDIBLE]
PRESENTER: That's right. Right. So I will show you the results now that actually shows that rather than me just saying this is a nice theory. So again, this was a study done in Bob [INAUDIBLE] lab. The data was collected by Ying Zhong. And so the study went like this. We had a monkey fixate. Up came an image. There was a little cue that pointed to the image. And then, after a variable amount of time, the object changed color. And the monkey just needed to saccade to it.
And so this is-- there were trials of this type. These are single-object trials. But then we also had multi-object trials. And in those trials, there the monkey, again, started by fixating. But now up came three objects. And now the cue actually meant something. The cue meant, pay attention to this object and ignore those other two objects.
And on some trials, one of the distractors would change color first, and the monkey had to still maintain fixation and wait until the cued object changed color, and then saccade to that object to get a reward. And so what I want to do is compare the information present in the isolated object to the information present in the attended and the non-attended objects. And here, I've just used 16 objects rather than those 7.
And so to do this, what I did was, for the single object trials to do the coding, I trained the classifier using a big window of time when the object was first shown. And then I tested using sliding bins, as before. And then, for the clutter trials to decode, I again trained on the single item. And then I tested with the three items. And so I'm trying to decode all three items here. And so I had to use a different measure of performance. It's called the area under the ROC curve, but it's similar to the 01 loss I told you before-- better is-- higher is better. So it's just, if the Y-axis is different, it's just because technical reasons.
So here are the results for the single item. When the fixation comes up, I'm around chance, which is good-- again, not psychic. And when the one object comes up, I'm above chance. And then when the cue comes up, the information drops a little bit, but this is pretty much what we saw before.
What about the three object trials? When the fixation comes up, again, around chance. Then the three objects come up. And I can decode all three objects above chance, but not as well as the single item. And then when the cue comes up, now one of the objects is the attended object, and the other two are the non-attended objects. And what you find is that the information about the attended object increases to around the level of the isolated object, and the non-attended accuracy drops a bit.
AUDIENCE: [INAUDIBLE] which curve is which? Which curve is which?
PRESENTER: So this is the single item, the blue. The red is the attended one in the clutter displays. And the green is the average of the two non-attended ones.
AUDIENCE: [INAUDIBLE]
PRESENTER: Yeah.
AUDIENCE: Is there any difference between the two because one of them changed [INAUDIBLE]
PRESENTER: Yeah. Good question. So I'll show you that in a minute. So it wasn't every trial that the-- that one of the detractors changed. But that's a very good question-- what happens when the distractor changes? So the monkey is still performing very well, like-- I forget what it was, but some very high level of accuracy. But there could be two things happening.
So like I showed you, on some of the trials, the distractor changes first. And so what we want to know is, when that happens, does the monkey still see that change and just know, I don't make an eye movement now? Or does the monkey-- so focused on the object that it doesn't even enter the processing stream? And obviously, behaviorally, we can't really measure it, because the monkey is still performing fine. And we obviously don't know the monkey's subjective experiences. But we can see, is there information about the distractor object when it changes color?
So here are the results aligned to the time when the distractor is about to change color. And so here is the attended object. We see that attentional enhancement effect. And the non-attended objects have dropped off to here. We can make some predictions. Does anyone think the attended object, the color change distractor is noticed? Raise your hand. Anyone think it's filtered out and not noticed at all? No. You all think-- well, you guys are good.
So when the color-- the object changes color, what you find is that there is a brief blip where I can decode that non-attended object, that distractor, very well. And what happens to the attended objects is it drops off and then quickly recovers. So the monkey was, in a certain sense, being distracted by the color change. And I've done a little bit of behavioral analysis, and the monkey's reaction times are slower on those trials, as well. So it behaviorally has also caused distraction. And they-- just the other non-attended object just drops. Yeah.
AUDIENCE: If you train the classifier on the single objects and test it on the multiple objects, can you still record the [INAUDIBLE] the same?
PRESENTER: Yes. So for all this data, I'm actually-- I'm doing exactly that. I'm training on the single item and testing on the multiple items. I also tried, for the clutter case, to train on clutter and test on clutter. And in that case, it looks similar, as well. So there was no difference. So it's actually using the same code for the clutter and the isolated object.
AUDIENCE: Thank you. [INAUDIBLE]
PRESENTER: Yeah. I'm trying to think if I did that, too. I'm not sure if I did that, but I imagine, since it went the other way. Yeah.
AUDIENCE: So the y-axis [INAUDIBLE] classifier which the--
PRESENTER: That's right.
AUDIENCE: [INAUDIBLE]
PRESENTER: So what it is is it's an ROC curve, which is-- what I do is, I take the object class that is present and I look at its firing rates. And then-- or its decision value. A classifier often doesn't return 01, but it will return an actual value. And then so for the maximum correlation coefficient classifier, the actual value is the correlation coefficient. And then I take the maximum of that.
But if you actually looked at the correlation coefficients, you could create a distribution of those for the correct objects, and then the non-correct objects. And then from that, you can convert this into something that's called an ROC curve, which is basically how far apart the two distributions are. And so that's what I'm doing there, because I can do this for-- if there are multiple objects present, I can do this for all those objects. And the negative class just becomes all the objects that are not present.
So if you want, we can go into more details. It's really not-- it's just technical stuff. It's not particularly important. The important point to note is, you can just-- better is-- higher is better. Decoding more accurately, the distribution's more separated. I'd be happy to tell you more about it later.
So moving on, so basically, what I've shown so far is that ventral visual pathway builds up these abstract representations, and attention can modify them to what is important to complete the task. So now, let's look a little further in the prefrontal cortex and see how it's representing information. The motivation behind this work comes from the fact that when I first moved to Boston, I had some roommates, and they taught me how to play poker. Prior to moving to Boston, I knew how to play a bunch of other games, like Go Fish, but I did not know poker.
And so they were able to successfully teach me the rules of poker to the point where I could lose money, and-- but play correctly technically. But I also did not lose the ability to play Go Fish or any of the other card games I'd learned previously. So somehow, new information was added into my brain without destroying what is there. And so we wanted to understand that process of what changes in your brain when you learn a new task.
So again, so here, I collaborated with [INAUDIBLE] and [INAUDIBLE] lab at Wake Forest, who made the recordings from the dorsal lateral prefrontal cortex-- also the ventral lateral. Just the lateral prefrontal cortex.
AUDIENCE: Why play poker?
PRESENTER: What's that?
AUDIENCE: Why are you playing poker?
PRESENTER: Why poker? So the monkey--
AUDIENCE: Why [INAUDIBLE]
PRESENTER: So that was just an illustrative example. The monkeys are not playing poker. I think only dogs can do that, as far as my understanding. But no, I'll show you the actual study. So the monkeys were not playing poker.
What the monkeys were doing was-- there were two phases to the study. There is was where monkeys were just doing passive fixation. And then, once they made recordings in that phase, they taught the monkey a task. And so they wanted to compare the passive fixation of stimuli to what's happening when a monkey needs to pay attention and do something with the stimuli.
So in this study, it went like this-- there was a fixation. Up came an image. There was a delay. Up came a second image. There was a second delay. And then the monkey was given a reward if the monkey just maintained fixation throughout this trial.
And so there were objects shown next to the fixation, but the monkey did not need to do anything with those or pay attention to those. And in some trials, the objects were the same, and they matched. You can see that they're both circles. Other trials, they didn't match. But again, the monkey did not care. It was not rewarded for that fact.
So they did this study, and they recorded over 750 neurons during this passive fixation. And after they got that data, they spent six months, I think, or so training the monkey to do a delayed match-to-sample task on these same images. So the delayed match-to-sample task had exactly the same structure. There was a fixation point. Up came one image. There was a delay. Up came a second image. There was a second delay. So everything here is exactly the same. But now, up came a choice array.
And so now, if the two stimuli matched, the monkey needed to saccade to the green stimulus if they were the same. And if they didn't match, the monkey had to saccade to the blue stimulus. So again, up to here, the behavioral performance of the monkey-- you wouldn't be able to tell there was anything different. It's just fixating. But internally, the monkey now has to remember these stimuli and compare them, and-- in order to do that correct motor response.
And again, we do that thing where I train the classifier at one point in time and test and just keep moving to get information as a function of time. In this talk, I'm only going to tell you about results where I'm trying to decode whether the stimuli matched or do not match, which means I can only do that at the time the second stimulus was shown, because the monkeys weren't psychic here. Details-- over seven-- I had a big population, 750 neurons. And I used rather large bands.
So here are the results. To orient you, chance is 50%, because there are two categories. It was either a match or a non-match trial. This is the time the first stimulus comes up. This is the time the second stimulus comes up. So if we look at the decoding accuracy for the passive case, does anyone think there's going to be information there about whether the stimuli matched or not match? Maybe a little bit?
So for this study, there was-- it was a little noisy, but there didn't seem to be statistically significant information, as far as I could tell. What about after? Maybe a little? It'd be a pretty boring talk if it was the same. So that was the accuracy I could decode nearly perfectly. So there was this big change in how much information about whether the stimuli matched or not matched when it became task relevant. So that is another conclusion point here-- PFC seems to very selectively represent things that are task relevant.
Now, let's look at the coding of information. Yeah.
AUDIENCE: [INAUDIBLE] IT?
PRESENTER: What area was this? So this was the prefrontal cortex-- the lateral prefrontal cortex. So that's in the front of your brain. Yeah.
AUDIENCE: So just wondering, how do you know [INAUDIBLE] information of a task? Because they didn't have a task--
PRESENTER: You mean perform-- the task performance versus-- or--
AUDIENCE: Well, the first one was just the fixating. [INAUDIBLE]
PRESENTER: Yeah. So how do you know-- maybe there's no information, the monkey was totally spaced out, or had its eyes closed?
AUDIENCE: [INAUDIBLE]
PRESENTER: Yeah. So what I'm not showing you here is that I also decoded the stimulus that was shown. So there was actually seven different-- or eight different shapes. And I tried to decode that. And for shape information, there was information in the passive task. And it actually-- it was pretty similar, almost identical to when the monkey was performing the delayed match-to-sample task. So that basic low-level information did not change much at all. But what changed was just this task with all the match, non-match information. So that's another interesting result. I'm just-- I had to pick out a few slides. I had too much stuff. But good question. Yep.
AUDIENCE: [INAUDIBLE] observation [INAUDIBLE] what would you do [INAUDIBLE] in this case?
PRESENTER: So in this case, I'm decoding whether a trial-- whether the two stimuli were the same or whether they were different. So match, non-match. That's why chance was 50%. I could have decoded all eight stimuli, and I did that. So there's-- often, with the same data, you can decode multiple things. PFC in particular codes many things at the same time. And you can decode information about a lot of different variables.
AUDIENCE: So you would not expect that [INAUDIBLE]
PRESENTER: Yeah.
AUDIENCE: --because he would [INAUDIBLE]
PRESENTER: Right. So why was there--
AUDIENCE: [INAUDIBLE]
PRESENTER: So one question you might want to ask-- maybe I'll see if you are asking this question. But in a passive task, you might ask, why was there even shape information there if it's only doing something task relevant? So one possibility is that it just represents basic low level properties by default. Another possibility is, if you don't give the monkey a real task, it's still doing a task. And so shape might be-- the monkey might still think shape is important, or still exploring. So it remains to be seen. Like, could I wipe out all visual information if it was doing a very demanding auditory task or something? That'd be interesting to look at. Was that getting at your question, or you were asking something else?
AUDIENCE: Yes, I think so. So but then, could you decode the first and the second stimulus?
PRESENTER: Yeah. So one problem in this experiment was that I think they weren't completely uncorrelated. I have to think if that's-- they might have been. I think they weren't. So it was hard for me to tell apart the two stimuli to do that, exactly. But I could decode the first stimulus. And then the second stimulus, I wasn't sure if I was decoding the previous stimulus or the second stimulus that was shown. There was a little confound in the design, I think. Yep.
AUDIENCE: [INAUDIBLE] group or in-- if you were to record [INAUDIBLE] the muscles [INAUDIBLE]
PRESENTER: Maybe. So the behavior is exactly the same. So up to the point-- you can go-- or, I can show you the results again. So up to this point here, the monkey is just-- in both cases, just fixating. Now, would the monkey-- is the monkey more alert here? Is it jittering a little bit more while it has to perform a task? I don't know. Not all of the muscles were measured. It's possible that it had to be more alert, and so it was jittering. But even in that case, that-- it would have to jitter more for match trials than non-match trials for me to be able to decode match versus non-match. So it would have to jitter in a selective way.
AUDIENCE: Sure, yeah.
PRESENTER: It's tough to unconfound everything.
AUDIENCE: Maybe it's the same question as you mentioned earlier about the non [INAUDIBLE] stimulus [INAUDIBLE] but it-- could you also [INAUDIBLE] the monkey [INAUDIBLE]
PRESENTER: Yes. Exactly. Exactly. So actually, that's a very good interpretation, and one that I think might be happening. And I'll show you some results that might suggest that's the case, or it's one interpretation of some of the results I'm just about to show you now. So let me show you some results, and we can talk about that in a second.
So here, there's also-- I just showed you, there's, obviously, this big difference when the monkey is doing this match-to-sample task versus passive fixation. So the question is, now that the population has this information, how has it changed? Have all the neurons taken up this new match, not match information? Is there only a small subset that's done that? How are things have changed?
So one thing I looked at was, just on a single neuron level, I measured selectivity using this eta squared. And I plot, here, every neuron-- each dot's a neuron plotted at the time where it had its maximal selectivity. And so in the passive task, there was no match not-match information. So this is kind of like a null distribution.
And then I looked at delayed match-to-sample task, and what you notice is right here, after the second stimulus comes on, a bunch of the neurons become selective. And so to quantify how many there were, or to get a sense on how strongly these neurons were coding information, what I did was I took my training set of data, and I found the neurons that were most selective, just using an ANOVA. And I marked those neurons, and I deleted all the other neurons. And then I looked at the test data, and I also deleted those neurons. And then I had a smaller population representation. And now I saw, how well can I do the decoding using just a small subset of the neurons? Yeah.
AUDIENCE: Why did you only include [INAUDIBLE] or is it just [INAUDIBLE] there's probably [INAUDIBLE]
PRESENTER: Oh, yeah. So this is selectivity. So the selectivity could be higher firing rates to matches, or it could be higher firing rate to non-match. It's just--
AUDIENCE: So it's only [INAUDIBLE] firing rate. It's not both firing rates? So it's not an absolute change in firing rate from the past [INAUDIBLE]?
PRESENTER: Yeah. So there's actually a different neural population recorded in the passive versus active task. So this would-- they were recorded six months apart. And--
AUDIENCE: [INAUDIBLE]
PRESENTER: Yeah. So unfortunately, we couldn't track the neurons for the whole time. That would have been even more interesting. But technically, at the moment, that's hard to do. Yeah.
AUDIENCE: [INAUDIBLE] the decoding rate, and then infer [INAUDIBLE] how [INAUDIBLE] so the different areas [INAUDIBLE]
PRESENTER: Yeah. So--
AUDIENCE: [INAUDIBLE]
PRESENTER: Right. You probably wouldn't see a big change if just one was left out. But you could look at how much each one boosts the population or not. That's an interesting idea.
So let's take a look at this. What happens when I just train on a small subset of neurons? So here's the results I showed you before decoding from all 750 neurons. And I'm going to compare it to the performance of decoding just using eight neurons. And these are the eight most selective ones. And what we find is that it is almost as good as using all 750. So there's really a very small subset of neurons that carries almost all of the information that was taken up.
Of course, we want to ask, what about the rest of the neurons? Are they redundant? Do they also carry a lot of the information? And so what then I did was, I used the same kind of procedure, but I threw out the selective neurons and just used the remaining neurons. And I threw out the eight best neurons, but I also threw out another 120 of the next best neurons and then saw, with the remaining 600 or so neurons, how good did they do?
And so again, here are the results using all 750 as a baseline. And then here are the results using with 128 of them excluded. And again, I'm-- we're definitely above chance. But we're not as good as the eight best. So what this shows is that there's a very small subset of neurons that has all the information, but there's a long tail of other neurons that have also become selective, just not as strongly. So that's question one in neural coding.
The second question I wanted to ask about neural coding is, are there-- yeah. Question?
AUDIENCE: [INAUDIBLE] train with less fire if you [INAUDIBLE] after excluding the 128 neurons, [INAUDIBLE]
PRESENTER: Yeah. Yeah. So what I do is, I just have the training set, I eliminate, and then I train and test. Yeah. And so that's a different training procedure than when all the neurons were represent. Yeah.
AUDIENCE: Kind of on this point. So one [INAUDIBLE] in general is that, similar to this idea that if you have, say, one viable, one reliable neuron that's always on for this image, always on for this image, and always [INAUDIBLE] image, you'll get 100%, it'll find that one neuron and it'll get 100%, regardless of what everything else is doing.
PRESENTER: Yes.
AUDIENCE: And so in some of these competitions, they'll get the best decoding accuracy from the vasculature simply because it happens to vary with some feature that-- then there are the-- [INAUDIBLE] is that [INAUDIBLE] great in that case, we just found some correlate that just happens to vary with whatever.
PRESENTER: Right.
AUDIENCE: It would be just interesting to hear your thoughts on that.
PRESENTER: So what for me is interesting is thinking about decoding on the neuron level versus at the fMRI level, and the fact that they work on both levels. So I put up that picture of the analogy between a single neuron and a classifier. And I also had a salt shaker on the right, because you have to take it with a grain of salt, because it works for fMRI. And so there, we don't think there's-- mega voxels are feeding into some sort of abstract mega voxel neuron. But it still works there.
So it's very different. I think some of the inferences are different, a bit, when you're doing it on a single neuron level versus voxel level. So voxel levels, though, might be more correlates down the road. There was just some vasculature that led-- and the voxel that you're really interested in is upstream. And, you know-- but this, we're actually measuring neurons. So I think we can say a little bit more concretely that the information's coming from particular neurons. That's my take, at least. Keep the questions coming. I enjoy that more than me lecturing.
So let me show you one other finding about neural coding. So what I've shown you so far is that when you show an image, there's a certain pattern that is indicative of that image. But what I found in a paper back in 2008 and many papers subsequently is that it's not just one pattern that codes for an image, but in some regions, under some task conditions, you see, actually, over time that there's a sequence of patterns.
And this can be plotted in different ways. So if you plot neurons on different axes-- this is called neuron state space. What this means is that there's a trajectory over time of how the information is being coded. So if I show an image of a dog, you might get a particular trajectory like this in time. If I show a cat, you might get a different one. But as long as these trajectories don't overlap at all points in time, you can tell the two images apart.
So I wanted to examine whether this phenomenon was happening in this study. So I used this method called-- I guess it's now called the temporal generalization method. Has anyone heard of that before? No. OK. So what I do for this is I train the classifier at one point in time, and I test again at that point in time just as before. But then I also test at other points in time. And so the idea is basically that if I train at a point in time that has a lot of information, and the code is the same across time, then when I test at other points in time, I should do well. But if the code is changing, then when I train at one point in time, I won't be able to generalize to other points in time.
So let me orient you to the results here. This is the time I train the classifier. This is the time I test the classifier. And so this is the first stimulus. This is the max stimulus. And I'm trying to decode match, non-match information. So if it's the same code over all time, I should-- as soon as that second stimulus comes on, I should see a big block of activity where I have a high decoding accuracy at all points in time. But if it's dynamic and the code is changing, I should only have good performance at the time I train and test at. So it would be a diagonal here.
This is the passive results. Obviously, there's no information there. But when you look at the delayed match-to-sample task, you see this strong diagonal band. So this means that the same pattern-- or, the patterns are changing in time. It's not the same pattern that's coding that information. And so this gets at a little bit at the point that you raised before.
So why are these dynamics happening? This has been a bit of debate. So recently, like last week, there were two review papers out arguing different points, like these dynamics are what working memory is, or, no, it's persistent activity, and those dynamics don't really matter. And I actually have a review paper coming out saying, they're both important because I'm noncontroversial.
But in my review paper, I also talk about certain reasons why would these dynamics exist. And so one reason is, maybe it's keeping track of time. But another reason is the point that was raised, that maybe initially, this is coding match, non-match information, and then it's changing to something about a rule like, when I see green thing, move. Or move-- saccade to the green thing. And so if that's all confounded in this task-- and so the changing codes could really be picking out different types of information. That make sense? Yeah.
AUDIENCE: [INAUDIBLE] to make sure that you're not overfitting. Is that [INAUDIBLE]
PRESENTER: So that I'm not overfitting with my machine learning methods.
AUDIENCE: Yeah. When you do your training fitting, it's [INAUDIBLE]
PRESENTER: Well, I'm still using cross validation. So if I was overfitting, I wouldn't generalize-- I wouldn't do well on the test set.
AUDIENCE: But you cross validate. Your validation sets, are they taken from multiple test [INAUDIBLE]
PRESENTER: Yeah. Yeah, yeah. So I train one point, and then it's new trials that I test at at different points in time. Yeah.
AUDIENCE: No, I mean like also for evaluations that that's true, that those could be true.
PRESENTER: Yeah. Yes. For my test set-- my test set is-- it's always different. I never have the same data in the training and the test set for any of the same trials. There's no validation set here. There's only training and test.
AUDIENCE: Oh, yeah. But if you don't have validation set, how do you know when your [INAUDIBLE] maybe you're not getting full test accuracy, full generalization from test [INAUDIBLE] is your fitting is maybe using some specific information that's only encoded at that same time. But there might be more general information that's [INAUDIBLE] as areas that should have been to generalize [INAUDIBLE]
PRESENTER: So I guess maybe one question you could ask-- so you really only need a validation set if you're doing some sort of parameter tuning, which I'm not doing. These classifiers have no free parameters. They only have parameters that are fit from the data itself. But one point you might raise is, how do you interpret this, exactly? And so one interpretation is, why do I get the dynamics if you were looking at the single neuron level?
And if you look at single neurons, what you see is some neurons have large windows of selectivity. So this is firing rates here. This is firing rates to non-match trials, in red, and to match trials in blue. This is average firing rates. And so some neurons have big splits in time where the match and non-match are quite different. And other ones have small windows where there's only a selectivity difference for a short period of time. And it's neurons like this that seem to be dominating and leading to that diagonal performance. Yeah.
AUDIENCE: [INAUDIBLE]
PRESENTER: So this neuron-- it also could be the fact that there is-- it's not-- there's no persistent activity that's flat. Persistent activity depends how you define it. People define it in different ways. And this is partly why people are arguing different points in whether dynamics matter or not, is because they were having different things in their mind about what they meant about persistent activity. But anyway, the reason-- this neuron, if you look at it, the difference between the blue and the red went-- so here, red is-- right here, the red and blue don't have much difference in their firing rates. They go up and down, but they go up and down together. There's no-- so that's-- there's no information to tell those two apart, to match and non-match.
But here, for a short period of time, there's a little window when I can-- using just this neuron, there's a lot of information to tell those apart. And so the classifier can pick up and use that neuron just at that point in time to do good classification performance. But if it tries to rely on that neuron later, it's going to do poorly because that neuron's no longer selective. Does that make sense? So that could be driving the dynamics, as well.
AUDIENCE: I have a question.
PRESENTER: Yeah.
AUDIENCE: This is a speculation, but I'm just wondering how do you think a downstream region listening to neuron 2 knows to listen to the third peak and not the second peak?
PRESENTER: Yes. Yes. So if you're, again, going back to that neuron example where there's synaptic weights, since the decoder is changing in time, that means the synaptic weights are changing in time-- which seems crazy biologically. So I had some results at cosine, where I analyzed this question in 2009. No one was interested back then. So I put them in the review paper now, because I think people are more interested now.
But what I did was, I trained a classifier using a really long window. So I use a window over the whole experimental time. And then I tried to decode these short windows. And I wanted to see, if I had just a fixed set of weights that was learned from this longtime window, could I still decode on the short timescale? And what I found was, I could get out about 75% of the information. And so what that means is, it's still possible to extract a lot of it if a downstream neuron just had a fixed set of weights.
Now, I still think the dynamics are important, and it's doing something interesting. So it was another reason not to necessarily pursue that completely. But it is biologically possible to extract most of the information. Yeah.
AUDIENCE: This is kind of the same comment I had about the-- so you say you have no [INAUDIBLE] you say there might be some fast and strong dynamics that's going on, but also some slow and underlying thing that's [INAUDIBLE] but if you have-- had a way to regularize your classifier when it's learning [INAUDIBLE] it might be able to [INAUDIBLE]
PRESENTER: Yeah. So I've tried-- I've compared a bunch of different classifiers, some that have these kinds of things. And, you know-- but there's a lot more room to do all that. And at the end-- which, I'm probably going to have to skip my last results, but I'll show you-- I have a toolbox that allows you to try these things out yourself. So my general philosophy is that we should always make data and code publicly available, and then-- because I don't have the time to try every cool idea. And so if you want to give that a whirl, and then you find out something really cool on that, then we can all move forward as a field. So it's an interesting idea. I just-- there's only so much I can do in my life.
AUDIENCE: I just wanted to know [INAUDIBLE] cross-generalization job, what are the times that are actually cross generalizing? So is it the-- after the [INAUDIBLE]?
PRESENTER: Yeah. So this is the time of the second stimulus on. So there's no information before that. As we saw, we were at chance, but then we went to like 100, which is the red here. And so then-- for the whole delay-- for the whole second delay, and then into the reward period, there seems to be dynamics.
AUDIENCE: Ah. So there's nothing between the-- in the first data preview [INAUDIBLE] no matches?
PRESENTER: Right, because there was no information there about match, not-match.
AUDIENCE: Yeah, but-- not that you can classify those two, but if you see carry over of the image one information into that behavior.
PRESENTER: So if I was trying to decode what the image was?
AUDIENCE: Yeah.
PRESENTER: So I tried to look at that. That was almost one of my initial reasons. So they seemed a little diagonal. Again, someone-- a criticism, we need to quantify the diagonal-ness better. But in some studies, I've seen just a big block, and then that looks not diagonal. And other ones-- so it's visually pretty clear.
But the case of the stimulus decoding, it was sort of diagonally, but it was somewhere in the middle. But there was-- so I didn't show those results at all, but they were basically two peaks of where there was a lot of information about the stimulus properties. And those were just here and here, and they were a little diagonally there. I think it's in the paper itself. This is in PNAS 2012. So I can find that.
So we only have a few minutes left. So I will just-- so I was going to tell you about a third thing where I'm trying-- oh, here are the results. So this is a nice picture here. PFC, there was a sparse-- a small subset of neurons that had a lot of the information. And those changed in time like that.
So the last set of results I wanted to show was trying to trace information flow a little more precisely. I'm going to just have to zip through it. Basically, we were looking at pop out stimuli. And I was comparing a study where we showed a single pop out item versus a pop out item with distractors. All these trials were randomly intermixed. So in this case, the monkey never knew what the color of the pop out item was. And what that-- when I tried to decode the location of the pop out item, what I found was that
In this case, it was a bit slower because there was a certain computation happening that needed to resolve the color of the pop out-- versus this case, which was like 80 milliseconds quicker in PFC and LIP. So the firing rates were the same in both cases. So information was flowing feed forward for pop out from LIP to PFC in firing rates. But when you looked at information, there was this big delay for these kind of images.
And then I could do things like, I could compare the time course of these two brain regions, LIP and PFC . And I could see one rising before the other for the single item case. And then for these more complicated figures, or the distractors, it would seem to go in the other direction.
So all that I want to say from this is, you can do pretty fine coarse time resolution tracing information. And that can tell you-- it'll give you some insight into how the information might be flowing between regions. So it can be powerful that way. So that's something else that could be explored, particularly if you have a lot of data. And then you can create models of how the information is flowing through different brain regions. And that's getting at the notion of algorithms.
And I went on with some undergraduates, and we did some reaction time tasks. And people are quicker when there's a single item versus an item with distractors, even though it's a pop out display. So reaction times are faster for the isolated. And we see this in many people. I didn't tell them how many subjects to ran, and it was like the "Sorcerer's Apprentice." So we had a lot of data.
And then what was cool was-- so this is-- again, this is for a single item. This is for the item with pop out distractors. There was that delay here in monkey brains. But I also had a talented student-- she just ran an EEG study on a similar thing. And then we tried to decode the location of stimuli in EEG from humans. And we found the same thing. So this slower reaction time in humans can be very directly related to what we're seeing in monkeys. So again, I think if we-- when we start to, hopefully, really understand things, we should see similar effects across species-- not that undergrads are that different than monkeys, maybe. But it's-- it verified a long-held theory of mine that they are similar.
And some of the other things, now, I'm interested in is looking at anticipatory effects. So if you tell people, in this pop out item, the pop out item's going to be green, do reaction times get quicker, and does the neural signal shift earlier in time? And we've-- other people have shown this, but we have done some pilot studies where if we do a block design where it's always a green pop out item, reaction times are faster. They're similar to whether a single item is shown. And so we want to do some EEG and then, ultimately, studies with monkeys to say, if you give a monkey an anticipatory signal, how does that change the processing of the flow of information? And we think, basically, that there can be feedback signals that prime early visual areas, and then information can travel on different, faster pathways. Yeah.
AUDIENCE: You could also do that [INAUDIBLE] I guess the color is one step up. And then if you want them to reduce the color, that's yet slower. So that's, maybe, why the block [INAUDIBLE] is still not quite as fast as the single [INAUDIBLE]
PRESENTER: Yeah. I don't know why it's not quite as fast. And there's-- I think there's even multiple mechanisms going on. I think for block design, actually, might not be a feedback signal. It might actually be happening within PFC. But you should be able to-- at least, someone behaviorally has shown that on a single trial, you can tell a subject, this trial is going to be green, and then they're quicker on that one trial. And you can do it-- you can switch that cue every trial. And in that case, I think there's feedback. And so I think there's actually two mechanisms. It's a lot of speculation at this point. But this is, again, where I'm hoping to go.
So that's a lot of results. So the point here is, again, we can trace information flowing through the brain pretty accurately with precise time courses. And I just wanted to say, there is this toolbox. You can try this at home. It's the Neural Decoding Toolbox. If you use Matlab, you can do decoding using six lines of Matlab code. You just specify what-- where the data's coming from, the classifier you want to use, any pre-processing you want to do with the data such as select the best features or whatnot. And then you just run your decoding and save the results.
So that's all available for download. It's open source. You just have to register for the website. So if you want to go to readout.info, can download that and try it out. And there's tutorials and there's code from that first study I showed you with Bob Desmond and the seven items. So you can play around and reproduce the results here.
And then what I'm working on now, with some summer students, is creating a version of this in R where we can use a package called Shiny which allows you to create web interfaces. And so what you can do is, you can essentially set all the decoding parameters using a GUI. And what it does is, it writes that into a script for you. And then it executes the script in something like R Markdown, which is similar to Jupyter Notebook. So it has all the code and puts the results there. And you get a PDF of-- where you can see a reproducible document of your results. Yeah.
AUDIENCE: Why not Python?
PRESENTER: Why not Python? So that's another story. I became a professor of statistics. Everyone in statistics uses R. A big advantage here is that the Shiny code is really nice. So it allows you to create these web interfaces. I don't think there's something right now in Python that I'm aware of that's as easy to do. But on the back end, I'm planning to have this write both the R and the Matlab code. And I could add-- if I implement decoding in Python, I could have it write the Python code, too, from the same GUI. So ultimately, I wanted to use as many languages as people are familiar with and make it as easy as possible.
Just a plug-- this is the version that my summer students have been working on, which is prettier. And I just wanted to say they have been doing good work. They tried to set up a demo. They've been working on doing this on the cloud. And you can pull up the GUI on the cloud. And Google gives you free computer time. And I think something-- I was getting all these Slack messages this morning that the demo broke or something. So I don't know. But there's not enough time, anyway. But anyway, that's all I wanted to say. If you want to check out readout.info-- I wanted to thank people for funding me and all my collaborators, who gave me all this great data and insights.
[APPLAUSE]