The Deep Learning Revolution (1:11:58)
Date Posted:
August 29, 2018
Date Recorded:
August 16, 2018
Speaker(s):
Terrence Sejnowski
All Captioned Videos Brains, Minds and Machines Summer Course 2018
Description:
Terrence Sejnowski, Salk Institute for Biological Studies
BMM Summer Course 2018
Reflection on the historical evolution of deep learning in Artificial Intelligence, from perceptrons to deep neural networks that play Go, detect thermal updrafts, control a social robot, and analyze complex neural data using methods that are revolutionizing Neuroscience.
Sejnowski, Terrence (2018) The Deep Learning Revolution . MIT Press: Cambridge, MA.
TERRENCE SEJNOWSKI: First of all, in thinking about the past, I realized that it was 40 years ago when I first came here to Woods Hole to take the neurobiology summer course, which was a 10-week, very intensive methods. And it was my transition, really, from being a theoretical physicist to a neuroscientist.
And I fell in love with the place. In fact, I've been coming back every year since then, every summer, and first with Steve [INAUDIBLE], a postdoc advisor. We'd come here in the summer, in this building, when it was called Whitman, and do experiments. It was wonderful, because his friends would visit him continuously over the summer. And I got to know all the classical neurophysiologists.
And then I organized the very first computational neuroscience workshop that was held here in the '80s, early '80s. And a few years later, Chris [INAUDIBLE] and Jim Bauer started the first summer course, Methods in Computational Neuroscience, which continues today. So this is ground zero for computational neuroscience. And I'm really pleased to be back.
So this is the cover of a book that MIT Press will be publishing on October 9. And I did not design the cover. I had no control over it, except the little tag at the top, "artificial intelligence meets human intelligence." And really, that's the theme of this talk.
It's a story that goes back over 50 years. And it really covers a big part of the history of computer science and neuroscience. And I'm going to try to weave that together into a narrative. This book is really a narrative. It's really telling a story from my perspective of things that have happened. How did we get here today? What impact will it have on the future in terms of the new approach that has been developed over the last five years to artificial intelligence?
So the story begins in 1956. This is the year that a summer conference was held at Dartmouth-- brought together 12 computer scientists who started in the field of artificial intelligence. Now, if you think back to 1956, this is also a few years after digital computers were born.
And so they were programming computers to solve mathematical theorems, play games like checkers. And they were convinced-- and I actually have seen the application-- that it would be only a matter of writing a computer program that would be able to perform as intelligently as a human, right? That was their goal.
Now, they didn't care about the brain. They were engineers. They just wanted to build something that behaved like the brain, didn't want to be confused by the facts. But we'll see that, in fact, this led to a very interesting-- sorry, Marvin's going to pop up again. He has a couple of very interesting cameo appearances.
So there's a story that went around when Marvin-- he founded the AI lab at MIT. And the story went around back when I was a grad student that when they-- their first grant proposal to DARPA, Defense Advanced Research Project Administration, was to build a robot that could play ping pong. Now, I don't think that there is such a robot yet today. But they were convinced they could build one.
It turns out, robots are even more difficult than other problems. But they got the grant. And they realized that they had forgotten to ask money to write a vision program. And so they assigned it to a graduate student as a summer project.
I always thought that was an apocryphal story. But in 2006, I was at a meeting with Marvin. It was the 50th anniversary of that classic meeting at Dartmouth. And I asked him, is this a true story?
He shook his head, and he said, we have the facts wrong. We did not assign it to a graduate student. We assigned it to undergraduates.
[LAUGHTER]
And it turns out that in the archives at MIT, in fact backs this up. This is Seymour Papert, who's a collaborator with Marvin. The summer vision project is an attempt to use our summer workers effectively in the construction of a significant part of a visual system. And it was thought that vision would be really easy. And it was intuitive.
I mean, you wake up in the morning. And you look out. And you immediately recognize things and pick them up. What could be easier? That can't really be difficult.
Well, it turned out that the intuition was completely wrong. And this is a way to illustrate it. And in fact, computer vision spent many, many, many decades trying to solve the problem-- so a very nontrivial problem.
So here's the problem. Now, it should be obvious to everybody here, not just Michael [INAUDIBLE] is Michael here? Michael? He's not here. OK. OK, does anybody else recognize what the bird is?
AUDIENCE: Zebra finch.
TERRENCE SEJNOWSKI: Right, zebra finch, OK. But it's obvious you that these are two zebra finches, right? They're not looking at you directly. But it's clear from the markings that they have to be the same species.
But it turns out, because of the fact they're in different poses means that if you have a simple template matching system that tries to superimpose them, it will always match another bird with the same pose better than these two birds. And so that the task is to figure out, for every object that you're trying to recognize, what are the distinctive features and the relationships between the features?
And that requires a lot of labor. And computer vision was making progress. There's a very big data set called ImageNet with 20,000 categories of images, 20 million images. And the progress over the last few decades had been a half percent per year reduction in the error rate.
And that was because it took-- to create a new category required a couple of man years of work, because you have to not just recognize the object, but discriminate it from all the other 10,000, 20,000 objects. And we do that without any effort. And so it's really a mystery. How is it that the human and other species-- not just humans, but all these species that can see.
Well, there's another approach that was started at about the same time. And this is a very interesting engineering story. And the story is that the only existentence proof that the problem of vision can be solved is the fact that nature has solved it, right? And so an alternative approach is to actually look inside the brain and see how it's done, right?
Now, I talked to Alan Newell, who was another person who was at the Dartmouth meeting in '56. And I asked him, well, why didn't think about the brain? And he said, well, '56, not enough was known about the brain to help us. And you know, we felt that we just had to carry on.
And in fact, he did, in his own research, honor a couple of things, for example, that are relevant for the brain. The brain is very slow. These neurons that we have are really pokey. They're measured in milliseconds. Your cell phones go at gigahertz, right? I mean, it's a billion instructions per second. So they run rings around the brain.
However, there are a lot of neurons-- about 10 to the 11th, 100 billion in a human brain. But flies only have 100,000. And they see pretty well, too. So you know, it's really a question of how to get those neurons working together.
The other general principle is a high degree of connectivity. Each of the neurons in the cortex is connected to thousands of others, very densely connected. And in addition-- and this is turned out to be the secret sauce for how to get brains to do really sophisticated things-- plasticity. Brains are fantastically good at learning, especially humans.
We're champion learners. That's our special talent, special ability. We learn faster. We can remember more things. We have a bigger cortex than any other species.
OK, so the question is, let's just take the general principles and not worry yet about the details of how plasticity is organized or how the connectivity-- but let's just look and to see what can be built out of units like this-- simple processing units. Real neurons are actually much more sophisticated and complicated, 'cause they have to do all this on their own. They have to be autonomous. So there's an enormous amount of machinery in neurons.
Now, about the same time that AI was getting off the ground, Frank Rosenblatt, who's on the faculty, a psychologist and engineer at Cornell, was developing a new approach based on learning. And here's a news announcement, 1958-- the Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see.
[LAUGHTER]
Right? Now, here comes the hard part-- reproduce itself--
[LAUGHTER]
--and conscious of its existence. I mean, I can imagine that came from his grant proposal, right?
[LAUGHTER]
But the reality was that what he was proposing to do is something that he called the Perceptron, which is very-- it is the most simple embodiment of what a neuron can be. OK, so here was the idea-- that there's a bunch of inputs on the left here. And there's one output. So this is a model of a single neuron.
And there's two operations. The first operation is to take each of the inputs, multiply it by a weight-- so weighted sum of all the inputs. Sum them up. And if it's greater than some threshold, then the output should be 1. And if it's below threshold, it's 0.
Now, just to be concrete, suppose this is an image. And each of these is a pixel. And the pixel has some gray level. You multiply the gray level by some weight. And you sum everything up. And you put it through a threshold. And it's a discriminator.
Now, here's what made this a little simple model very powerful. Frank came up with a learning algorithm for changing the weights through experience, through giving it examples of positive-- for example, if you're trying to detect tanks, for example, in natural images, you give an example of a tank. And then if it gets it wrong, his learning algorithm was a simple way of changing all the weights so that it's more likely to recognize a tank in the future.
If you give enough examples, it will converge. And it will be able to discriminate. And so this is a way of replacing all the labor that engineers would have to do in computer science to come up with the right features, because it would be able to learn the features from the raw data. So this was a real breakthrough. And I want to illustrate it just to make it concrete.
This is actually from an early-- this is a simple problem of recognizing the sex from a face in which you've cropped all of the secondary sexual characteristics, OK? So here's an image. And I want you to tell me if you think this is an image of a male or a female. So if you think that it's a male put, up your hand. OK, just 1, 2-- you didn't--
[LAUGHTER]
OK, how many think it's a female? OK, you seem very confident. I mean, it's amazing. Can you tell me why? How did you come to this conclusion?
[LAUGHTER]
Well, this is a real-- it's really remarkable that we can do this without actually having any way of introspecting and finding out what it is. Now, here's a problem that we can ask the Perceptron to solve, right? So what we do is, we give this image, and a bunch of others of males and females, and we use the Perceptron learning algorithm, and we learn the weights.
Now, we've represented the weights here by little squares. And if it's white, it's positive weight. And if it's black, it's negative. And the size is proportional to the area. And so this is a mask or a template you can put over the face. And it tells you which regions carry information about maleness, where it's white, and femaleness, which is black.
So it turns out there's information all over, because this is a very broad and distributed source of summation. And so you're weighting-- when you look at the face, it's like a gestalt. You're weighting all of this. And you come to a conclusion. And so does the Perceptron.
And in fact, it was about as good as people in my lab after it had been learned up and then testing it on new faces that it'd never seen before. So this was actually a pretty good start in the sense that it's doing pattern recognition. And it's doing a difficult problem. And it's telling us something.
And you know, it tells us that actually, there's a lot of information about whether you're male or female in this region here between the nose and it's called the philtrum. It's very characteristic. The philtrum is much-- as you can see, it tells you-- it's information that it's a male face.
OK, now, there's a fly in the ointment. And the fly is the fact that if you reduce this to two pixels and then characterize the values of the positive categories and the negative examples of the category, the Perceptron can only learn to discriminate it if it can pass a line through the two sets of points. And unfortunately, most problems are not linearly separable.
This is the example. You can't put a line through it and be able to separate the pluses from the squares. And this is a theorem that was proven by Minsky and Papert back in 1969. And this put a stop to this whole line of research. It was impossible to get any money, support, to get a job. And that just basically meant that it was-- all the resources are put into writing computer programs-- rule-based, traditional, good old-fashioned AI.
Now, what they did at the very end of their book, they say that if you had a multilayer-- if you had a generalization of the learning algorithm to a multilayered Perceptron that had layers of units instead of just an input and output, that this would be able to solve these problems. But in their opinion, no such learning algorithm would ever be discovered. So you know, it was basically their intuition that was influential, because of course, they were wrong.
But I'm really-- my whole career wouldn't have happened if, in fact-- if that research had been done earlier. But there was a whole generation passed before a new set of fresh eyes looked at the problem. And Jeff Hinton here on the right and I were convinced that we could crack the problem.
This was actually taken here in Boston. This is my apartment in Brookline when I was a postdoc with Steve Kuffler at Harvard Neurology. And Jeff, at the time, was a postdoc at UCSD with David Rumelhart. And it was two years later that, as you heard from Zhao Xing, we solved the problem, or we came up with a counterexample.
This is called the Boltzmann machine. And the units are binary. And the probability here is a sigmoidal function of the inputs. The idea here is you sum up the inputs. And instead of making it 0 or 1 if it is a sharp threshold, you do it with a certain probability that has a general curve. And it's 0 if it's very negative, and it's 1 if it's very positive.
So here's the learning algorithm. And this is actually very elegant, and in some ways, really, the paper that I really feel the most-- what's the right word? Proud. I'm proud of in the sense that--
[LAUGHTER]
Thank you-- in the sense that it solved the problem very elegantly. It just didn't solve it. It solved it very elegantly in a way that's biologically plausible. Why do I say that?
It turns out that there have been many, many, many learning algorithms that have come subsequently that can solve multilayer Perceptron learning. Most of them are not very biological, because they require you doing a lot of backward propagation of error or derivatives and so forth. But this one only uses Hebbian learning between pairs of neurons-- between the input and the hidden units, and the hidden units and the output units, and between the hidden units-- same algorithm.
But what really convinced us that we were on the right track was that you had to put the network to sleep in order for it to calibrate, to get calibrated. So here's how it would work. You give it an input. You calculate the average correlations. How often were they on together at the same time when you have an input output that you're trying to train it on?
And then you turn off the inputs and the outputs. And you just let it free run. And it comes to equilibrium. And you compute the average correlation between every pair of synapses. Subtract the two. And we demonstrated you could learn complex problems that the Perceptron couldn't solve.
But they were toy problems. These were problems that-- like the exclusive or, which was a very simple problem, but it wasn't something that Perceptron could do, and other problems that the Perceptron couldn't solve. This could solve it. The only problem was, it was very slow. And it was-- let's see, that was 1983.
So 30 years later, there's a very famous talk that was given at the NIPS meeting, Neural Information Processing Systems. It's been going on for 30 years. I'm the president of the foundation that runs it. And this one was actually held at Lake Tahoe. That's the gaming thing here. But also, probability theory is important in machine learning.
But there was a talk that was given by Jeff Hinton and two of his graduate students in which they demonstrated that on ImageNet, this big difficult problem, that they reduced the error-- the one that had been going down a half percent a year-- by 20%. So it's as if these are 40 years of research. And almost overnight, almost all of computer vision now uses this technology.
Now, so what happened over those 30 years? What happened was, number one, computers got a million times faster. The learning algorithms got much, much better incrementally. People discovered how to regularize them. The problem is that with very large networks, there'll be hundreds of thousands of weights.
And the danger is over-fitting. If you over-fit, it means you're memorizing. And you can't generalize. The key is to be able to generalize to new images of objects that you've never seen before.
And so they solved those problems. So there was improvements. And then there were many, many more images, many more examples that you could give it. And it was a combination of those three things that reached a certain threshold where learning and computers got cheap enough so that you could replace the engineers. You didn't need an engineer there to try to figure out what the features are. You could learn them from examples.
And it turns out that what's beautiful about this approach, unlike the traditional writing computer programs-- for a regular AI, you have to write a different computer program for every problem. You had to be a domain expert. But here, if you have the data, you can solve problems that are very difficult that otherwise would not have been possible with the very same algorithm, with varying the number of layers, varying the sum of the hyperparameters.
And I want to give an example here. This, now, at the top is actually the visual system in the monkey. And this is from a review paper by Jim DiCarlo. And on the bottom here is one of the most successful of the deep learning networks called the Convolutional Network.
And I want to, first of all, show you that there's a parallel between what's happening at each one of these levels. So in the case of the visual system, we know that the retina gets processed through a bunch of filters in the retina, get passed through the thalamus into primary visual cortex. And then there's a bunch of operations there, which are nonlinear. And that gets passed on to the V2, V4 up to the temporal cortex.
And it's up here that you find neurons that are selective for very complex objects, like faces. Down here, they respond to bars, and edges, and things that are simple features in small regions. But here, you're recognizing the whole object.
Well, so here is how the convolution learner error goes. So the convolution basically is a linear filter that you have. You pass it over all the locations in the visual field. And there'd be many of these features, but they're learned. They're learned by giving it examples of things that you want to discriminate. And there are several layers of these. Now the transform, the nonlinear transform, between them is a very simple one.
In fact, they converge out to the very same architecture that you find in the visual cortex, namely a linear filter followed by a rectified linear unit, a threshold. And then pooling, which means you take the same feature across a larger part of the visual field. This is called a complex cell, this is called the simple cell in the visual cortex. And then you normalize it. And that means you change the gain so that you have-- you're working within an operating-- limited operating range. And in the visual cortex, that's done with recurrent inhibition.
Now what I've just described to you was what we knew about the visual system in the 1960s. And we knew it from the people who are standing here in this photo. Here's Steve Kuffler, here is David Hubel, and here is Torsten Wiesel, when there were a lot younger. And it was Steve Kuffler who discovered in-- it was the first recorded from single ganglion cells in cat, in vivo, and described the two classes of cells, most common types of cells that are carrying information in the cortex. One of them has an excitatory on-center and an inhibitory off-surround. And then there's as a complementary one with an off-center and annulus which is excitatory.
And he did this when he was at Hopkins. And if you know Steve Kuffler, you know that he was an electrophysiologist synaptic-- he loved synapses. I once asked him, when we were working together, why did you do this experiment? I mean, this is-- it was so uncharacteristic of what you did. And he actually had to build special apparatus, a dual beam ophthalmoscope scope, in order to be able to both have electric-- electrode that was positioned on a ganglion cell at the same time he could move a spot around. So he to be able to control it.
And he said that, well, you know, I was at the-- at the time, at the Wilmer Eye Institute at Hopkins. I was feeling guilty because I wasn't working on vision. So he did this classic experiment, '53, which is in all the textbooks, right? And he said, well, you know, I did my debt-- so he what he did was he had two great postdocs, and he said, why did you record upstream and see what you see up the road? Of course, that was David Hubel, Torsten Wiesel. And that became their life work, and of course, they eventually won the Nobel Prize for it.
I'm going to illustrate something that is, I think, a really wonderful way to imagine what it's like to transform the world into a bunch of spikes. OK, so this is a neuromorphic retina that Tobi Delbruck developed. It's really based on the simplest properties of these ganglion cells. Every pixel has two outputs. And a pulse is-- along one of the lines, there is a pulse every time the intensity increments by a fixed amount, small amount. If it doesn't change, then there's nothing. And then if it decreases by a small amount, the second line gets a pulse. So it's the on and the off, right? When it goes up and when it goes down.
And here's what it looks like. This is a movie of what it looks like. Looking at now, this is Tobi, he's moving back and forth. You'll notice that most of the pixels are gray. It only sends information when there is some of the change. And that's where all the information is that you're trying to recover from the scene, is where are things. Where are the boundaries, where are the outlines. It's insensitive to filters. Another great property. So that you could-- it normalizes.
And this is even more, I think, interesting, if you have a little spot that's going around in a circle, it's doing that at 200 times a second. A regular camera would only see an annulus, because it's averaging over 30 or 40 milliseconds. But you can see here, that the spikes from the leading edge and the trailing edge can be detected with-- by just looking at the relative timing of spikes [INAUDIBLE] This is a single spot, a single pixel and this is what it sounds like driving through the streets in Zurich.
[CLICKING]
Of course, what your retina is sending a million of these spike trains into the brain. Now I want you to think about this. You look out in the world and you see this fantastic-- this visual scene. [INAUDIBLE] This spikes all the way up. There is no image in the brain. There's no TV set, there's no homunculus looking at it. It's just spikes, right? And that's what we're going to try to understand. That is the challenge.
OK, very-- very quickly, this is just to give you idea the scale, that's the pyramidal cell in the cortex. That's the microelectrode. Dave Hubel actually invented it. This is a simple cell showing you the excitatory/inhibitory regions. This is a complex cell. It's oriented, but it will give you a response no matter where the bar is in that-- in that little region called a receptive field. Now finally, this is the magic sauce for deep learning. The fact that we know that in the visual system, this is a retina down here, and this is primary visual cortex. There is like 12 layers of processing. Each one of them has a visual field, each one is attracting different features.
And these are the features that Jim DiCarlo recorded from the monkey. And compare them to the features that are learned by deep learning and showed statistically that at every level, they have the same probability distribution, which means that somehow the learning that was going on in this artificial network was mimicking or recapitulating the actual statistical properties of neurons at different levels-- at different layers here, up to the inferotemporal cortex.
Now here's why that's important. It's not important-- I don't think, just the performance. If you look at the performance, that's not really what we're trying to do. We're trying to understand vision, right? Now what Hubel and Wiesel did, and Dave Van Essen, and all the others-- basically, just describe what's there. But, you know, they didn't know what the simple cell was doing there. I mean, they guessed it was an edge detector, how do they know? It was all a correlation. These are all correlations.
How do you know whether something is important or not ? There are a lot of things there that may not be important for vision-- for something else maybe. But Yann LeCun, who actually was the guy who put together the convolutional network, here on the right, is an engineer. And what he would do is, over 20 years he would fiddle with it, and say, well, let's put in a-- you know, instead of a sigmoid, let's put in the rectifying linear unit. And it would improve it by 5%. Oh, let's just put in the pooling, and see if that helps. Yes, that helps by 10%.
So in other words, he would have many, many, many things that he would do that he thought might work. And at the end of the day, he converges on exactly the same features that you see in the visual system. So this is more than recapitulating or just copying blindly what's in the brain. It's showing how each one of those features is contributing to the performance, right? And that gives you more insight than simply a correlation. It gives you insight about performance.
OK. Here's a few examples for how this network works, deep learning. So this is an image that it's never seen before. And here are its top four-- five choices. And the one that is at the top is the one that has the most confidence. It's not terribly confident. You can see it's really confident this is a container ship. It thinks it's a mite. It might be a mite. But if not, some other creature. This is actually a pretty tough photo because it's secluded and a lot of other things going on. But it was pretty sure it was a motor scooter. And it also was pretty sure this is a leopard.
But these things are more interesting. OK, it got this one wrong-- right-- grill was number two. It said it was a convertible. Well, who's right? I mean, this is a human label, right? Yeah maybe the grill is the biggest thing here, but, you know, convertible is actually a very good choice. And this is an example of where it knows more than most humans. Most humans would say these are mushrooms, but in fact it's a particular kind of mushroom, because it knows about 20 different kinds of mushrooms, right?
And this is a funny one because humans picked out cherries, maybe because they were hungry, but the reality is that the dalmatian dog is really much more salient, at least for-- for dog lovers. And this is one where it gets it-- the label is completely wrong. This is not a cat. Does anybody recognize what this creature is? It's from Madagascar, but it's not a cat.
AUDIENCE: A lemur.
TERRENCE SEJNOWSKI: What?
AUDIENCE: A lemur.
Yes, it is a lemur. It is, in fact, a ring-tailed lemur. And it didn't have lemur amongst its-- the trained categories. But it thought it was some kind of a monkey. It's not a monkey, it's a lemur, OK. It was actually close-- much closer than the human who had put this label on it.
OK, now every year since 2013, there has been a blockbuster. Somebody has come up with something new. And typically, what they add is a new feature from the brain. For example, AlexNet, which is the one that did this big breakthrough, was feed forward. But then they put in recurrence, they put in feedback. And now they could do sequences. And so here, it took the output from AlexNet. They pass it through a recurrent network that would spit out a sequence of words that would describe the picture. So this is picture captioning-- right-- which is-- it's pretty-- this is this is language processing.
So let's see what it did. OK, it had this picture. And that the top choice for the caption was a group of people shopping at an outdoor market. Now this is not just a good caption, but it's actually linguistically correct. In other words, it has learned-- what a sentence is. It has learned how to put the words together in the right order. This is kind of a miracle I don't know-- the fact that these networks were able to scale up, and with recurrence, perform linguistic tasks at this level is something that surprised everybody.
So here's a couple more examples. A woman is throwing a frisbee in the park. Now in this particular network, not only was-- does it give you the caption, but for each word, it tells you which part of the picture it's picking it up from. And you can see here, that it has put the spotlight of attention right there on the frisbee. A dog is standing on the hardwood floor. There's is the dog. A stop sign is on a road with a mountain in the background. I find this totally astonishing. I mean this is not something that I would have ever guessed.
On your cell phone, if you have Google translate, and if you happen to be in Japan, you can take a picture of a menu, and they will translate it for you into English. Now I have a friend who said that his son went to China recently. And what he would do is-- deep learning has completely transformed speech recognition to the point now where it is really good.
And so what he would do is talk into in his-- at his Android. It would translate that into China and then would produce the Chinese sentence. And he would use it for taxicabs and restaurants. And he said that it was almost like, you know, the Star Trek universal translator. There it is in your hand. This is science fiction. It's really happening. It's really remarkable.
But-- the most exciting thing, I think, is unsupervised learning. So it's now possible, you don't have to label anymore you. There are unsupervised learning algorithms that you just need a lot, a lot of images. For example, these are called generative adversary networks. Now the networks I've showed you are ones that you give it in an input, and you get an output, right? But these are networks that actually generate new things. You know, there's no input after the learning. It will generate new images.
And I'm not going to explain how it does it. It does it by having an adversary. You have to distinguish between what's a real picture and what's a fake picture. And eventually, it gets really good at making fake pictures. So after seeing many images of volcanoes, it just-- it spews out these. You get a couple of control knobs. But every one of these is-- it doesn't exist. It just created it out of thin pixels, right?
I mean, this is-- these are-- it's the kind of artistic, because it's really generalizing what it is like to be a volcano, and is-- it's creating new images that are in the same class. And you can do that for monasteries, you can do it for ants, you can do it for particular species of bird. So this is-- this is really exciting.
Now everything that has been done so far in deep learning is meant to model what goes on in the cortex. But there's a lot more to the brain than the cortex. There's a lot more to intelligence than pattern recognition, although it's bedrock important for performance, for survival.
Now there's another part of the brain which is just out of the cortex. It's called the basal ganglia. And back in the '90s, we wondered, what is going on down there? It receives an input from the entire cortical mantle. And then it projects back up to the cortex. So this is a loop, 100 millisecond loop. And it also projects to these dopamine neurons that are buried in the mid-brain-- which projects up, and in fact to the entire cortical mantle and to the basal ganglia. So these dopamine neurons, we know are very important for motivation and for learning. They're part of the reward system.
And Richard Sutton, who's a engineer, computer person. He came up with this actor-critic model back in the '80s. And the idea of the actor-critic model is that you compute what's called a value function. So you-- the input comes in from the environment, and it's represented now-- in the brain, in the cerebral cortex, which projects through the basal ganglia. And the value function basically tells you what's the likelihood of reward if you make a choice. If you take this as the state you're in, and you have choices. You can grab this or you can grab that, or you can walk over here, which of those has the most likelihood of a reward? It tells you the value of those choices, and then reports back to the cortex.
And it was at that time that Peter Dayan and Read Montague, who are postdocs in my lab, and they realized that this particular learning algorithm called temporal difference or reward prediction error could solve a problem. A very difficult problem, which is, what if you have to make a bunch of sequence of choices? Not just, you know, right now, should I eat this or not? But what should I do in order to be able to eat tomorrow, or the next day, to survive?
And it turns out the temporal difference learning algorithm, it's been proven to solve that problem. When you have a sequence of choices to make, and you don't get the reward until the very end, right? So it's delayed gratification. And what they did was to realize that these dopamine neurons might actually be representing the reward prediction error, which is the difference between the reward you actually get and what you expected. And based on whether you get more than you expected or less, you change the strength of the synapses so that next time you'll have a better estimate.
Now this will be the only technical slide. But I just want to show you what the math looks like. So the idea is that the state at time t that you happen to be in, you take action a, and that transforms you to a new state, and you get some reward. And that continues on, many choices, many states. And then at the very end, he wanted to estimate, what is the total reward you're going to get over-- into the infinite future discounted by gamma, which is between 0 and 1. If it's zero, it means I'm only interested in the immediate reward. If it's one, it means I'm interested in all future rewards.
So this is the value function. It tells you the value of taking action at time t, in state St. And you build this up through experience. And then your choice, it's called the policy, is to pick for a given state, the action which maximizes this function. This can all be implemented very easily, and there are signals that are in the basal ganglia which actually represent these.
What does the basal ganglia buy you? So let's lash the deep learning cortex onto the basal ganglia-- a model of the basal ganglia and see what it can do. Well, what it did in 2017 was to play the world's best Go player. Now for those of you who don't play Go, it's the ultimate board game. It's 19 by 19, and it is to chess what chess is the checkers, just to give you an idea of the complexity. And in Asia, especially in China and Korea, it is a very, very high art form. And the champions, the people who are the-- they're rock stars. They're like, you know, LeBron James.
I gave a talk recently where I said this, and somebody in the audience said, who's LeBron James? It shows you that every culture has its own heroes, right? And in any case this was-- the AlphaGo had beat the South Korean champion in 2016, and Ke Jie, who is the reigning Chinese champion in Go, who was very confident. He's a 19-year-old-- he's really brilliant, AlphaGo can't beat me. So 2017, after losing three games really badly, "After humanity spent thousands of years improving our tactics computers, tell us that humans are completely wrong." Losing face in China is really serious, because, you know, you suddenly let down your whole species.
Now what's remarkable is that this AlphaGo program, it had played itself hundreds of millions of times. Nobody knew how-- how good it was until it played the world's best player. It made moves that no human had ever thought of before, brilliant moves, that only later, people-- everyone realized that this is like rewriting the book. Now these are things that nobody even knew about. It was creative. This is a creative AI. This is not just writing down rules and things that humans tell it. It actually is making discoveries that no human has ever made before.
Let me give you one more example. Now that the-- you know, Go is good because it is a problem that humans have put a lot of time and effort into, and it takes a lot of intellectual-- a lot of practice and intellect to be good at it. But it is a game of perfect knowledge. Everybody sees the board and nobody has any advantage. But the real world is a lot more complicated. The real world is uncertain. It changes, and in order to survive in the real world, you really have to be much, much more able to deal with new circumstances and generalize from your previous example-- previous experience.
And so I wanted to pick out a real world problem, see how far we can push this, all right? So here's the problem I picked out with the Massimo Vergassola, who is a colleague of mine at UCSD in the Physics Department. So birds are known to soar, and the way they do this is by finding a thermal convection which has an upwelling, and follow it up, and then they can glide to the next one. There are birds that migrate thousands of miles for weeks at a time without ever coming down, by just going from thermal to thermal. Weeks, two, three weeks from the northern latitudes all the way down to the equator.
Here is an example of someone who actually instrumented a hawk with a GPS. And we're going to show you how that bird found the thermal and what it did when it found it. Oops. So the color tells how-- what the altitude is. So there it's looking for the thermal. It went down, it went up, and down, and up. And there it is, it's caught it. There's wind blowing to the left, so that it's following the thermal as it's going up to the left. Pretty good-- I mean, that takes some talent.
So the question is can we learn to soar like a bird? Well, it turns out humans glide, too. We have a hang gliders, we have big gliders. And this is a sport. There are a lot of people who are very good at this. Michael [INAUDIBLE] was telling me that he learned to glide like this when he was 14. Now here's the problem. The problem is the atmosphere is not just a column of hot air going up, it's very turbulent. And we have to understand that turbulence if we're going to-- sort of like a bird, because birds are in the real world.
So this is what we put together. And this is something that came from the physicists. It's called Rayleigh-Benard convection. And what you're doing is these are the differential equations that regulate the vertical velocity u and the temperature theta. And it's a convective problem that has to be solved numerically on a computer. And here is a simulation that Gautam Reddy did. So first of all, this is the-- look at the temperature field here. You see it's not-- it's not like it's a simple levels. Are these little spicules coming up. And then the vertical velocity here, you can see is very turbulent, and it's changing constantly. So this is a really complex environment to be trying to learn how to soar in.
Here are the-- in our model-- now we have a model in which, in the computer, it's simulating a flight using aerodynamic principles. And the two control parameters are the bank angle, mu, and the angle of attack, whether you're going up or down. And here are some of the things. We had a lot of sensors in the model. We had a vertical velocity center, or vertical velocity gradient, which turned out to be very important. And here is an example of where the velocity going up here is higher than here, which means it will lift the wing on the left and tilt the bird. Then finally, a vertical acceleration. So this is-- tells you that not only are you going up, but that you're going up faster.
So here's at the very beginning-- you know, we have to give it many, many trials in order for it to get better and better. But at the very beginning, you can see it starts out here, and it's really good at going down. But after lots of examples and training, here's what it does. It takes a while, right? This is a color-coded-- the yellow and red is going up. And you can see it's kind of searching for the thermal. And it eventually finds it, like that bird you saw. Once it hooks on, it does exactly what the bird does, the nice tight loops. And it goes up.
And so that this was, you know, really a great example of how we think it could-- you could actually discover something. The bird can-- model bird-- can perhaps use the same features. But we could do more. We can actually go in and ask, you know, what is it in the model, because we know what the inputs are. We know which ones were important for this performance. And so we can make some predictions.
So here-- and this something we didn't guess, right? It just turned out this way. So first of all, the temperature had absolutely no value in learning. You could take it out and there'd be no-- almost all glider people think that knowing the temperature is important because, of course, thermals are hotter air, so you think that that would be important. But no, that turns out not to be important.
The two most important features are the vertical acceleration and the vertical velocity gradient, at a and tau. And in, fact if, you put these two together, you get the best performance of all. And these other features, for example, control over the angle of attack, which is here represented by alpha-- if you add alpha, it actually does worse, right? So this is-- really, it's told us out of all the features, these are the two that you should pay the most attention to.
And now this seems like-- it we published this in PNAS and we got some interest. But simulating something like this isn't the real world. I mean, it's a surrogate for the real world. But let's see if we can actually teach a real glider to soar. And this is where Gautam got interested in a club in San Diego. This is in Poway, this is a suburb. And there's a club of people who do radio controlled gliders. And it's a competition, who can-- they go out for the day and they see who can get their glider to go up. And they're controlling it with their RCs.
Here's the glider. It's a 6-foot wingspan, it cost about $3,000. it's been instrumented with a GPS, and has controls, and it can measure velocities and accelerations. And it took about a week of-- of trials. And the way it works is really neat. They have a winch that catapults the glider up to about 100 meters. And then you control it. And this was after the training. It follows the path and it finds the thermal, and this is the real-- this is in Poway.
And it zips up, and at this point here, at 1,800 at 600 meters, 1,800 feet, it lost radio control. So it was automatically programmed to come down. But it did this about twice as fast as the other gliders that the humans were controlling. And of course, the humans couldn't feel it. Now when we talk to real people-- you know, real glider pilots, here's what they tell us. They look for something called the bump. They're sitting in their glider and suddenly there's they feel a bump. What is that? It's vertical acceleration, right?
And the other thing that they say is when the wings suddenly goes like that, I know that we're close, and so you go-- you could steer into-- it's like when you're skiing, you steer into the hill. That's exactly what this algorithm learns how to do. So this is really, I think, discovering something that has been known by the birds for a long time, and a lot of humans. So here's another example of a successful soaring field ex-- this is superimposing it on the wide angle shot.
Now the last thing I want to tell you about is another real world example of where AI is going, and this in social robots. This is a project that came out of a Science of Learning center sponsored by the NSF that involved-- it was based at UCSD, but involved 12 institutions around the world. And there are six of these centers, each one focused on something different. Ours focused on machine learning and neuroscience.
And so we wanted to do was to take what's best known about learning in the brain and bring it into the classroom. And so here is the project. Javier Movellan is the person who created this robot called RUBI. And here's the first version, 1.0. And the idea is that this is a robot that's going to interact with 18-month-old toddlers, right? This is a preschool. And these little kids here, if you've ever interacted with them, they're frenetic, right? They run around like crazy, they pick up a toy, they play with it for about 10 seconds and they throw it down, right? Very short attention spans.
And so when this-- they put this robot-- by the way it has two cameras, a swivel, it's expressive, the eyebrows go up and down. I don't know by the mustache, but it's meant to be interactive. And it has a Teletubby here. OK. So here's what happened. They put this into the classroom, all ready to go. The boys run over to-- the what is this? And they grab the arm, they rip it off. Of course, you know little boys, what do you expect?
So they'd go back to the shop, and, you know-- the idea was, OK, let's put an industrial arm there. We'll show them, right? We're going to really just-- they'll be able to pick the kid up, right? It's-- no, no, no, you don't want to do that. And so Marni Bartlett Stewart said, I have a better solution. This is social engineering. So what happened was they put a cheap pressure sensor. And when it went above a certain value, RUBI would cry. And the boys would back off, and the girls would go up and hug it.
And the project was one revelation like that after the next. In other words, we thought we were going into the classroom, we were going to be using science to improve things. No, the little kids were telling us things that we didn't know. And it got to the point where the little kids were so-- you know, RUBI was part of the class. And it was part of their social interaction.
By the way, we learned that in order to get a little kid to pay attention and not throw it down like it's a piece of toy, is that RUBI has to react to them within a second. So if this little kid points to the clock, if RUBI doesn't look at the clock within one second, it loses interest, walks away. That's the social engagement that you need. You need to be able to interact with the kid.
And now this is 4.0. We were afraid that the teacher in the class would feel threatened because this is going to take their job away, right? This is a teaching robot. Quite to the contrary, it turns out that the teacher loved it because this is great for control. You know, otherwise, the kids are all running around independently. But now they're, as you can see, gathering around RUBI. And so when the teacher has to do something else, RUBI is basically like an assistant.
And so this is actually-- it's happened over and over again in many, many of these applications. Something that was felt threatening, like it's going to replace the expert, the teacher, the physician. If you can do a diagnosis better than a physician, and the physician can use that information, he'll be a better doctor and he'll appreciate that. A lot of applications in medicine for reading X-rays, skin lesions, at levels that are better than the best doctor.
And the reason is that we can teach-- we can teach the network with many, many, many, many more examples than any doctor would ever see in their lifetime, right? And that means that they can pick rare cases up that most doctors wouldn't have noticed because they just never were in Africa, and that's a very rare disease for someone here in the US.
So this is something that the press has blown completely out of proportion. You know, that these-- this breakthrough in artificial intelligence is not going to replace you or your job, it's going to make you smarter. It's going to make it easier for you. It's going to get rid of the repetition. You know, radiologists, do they really want to spend all day looking at these X-rays? No, they don't. They'd rather spend their time doctoring, working with patients. And if someone-- if they can get an assistant that could read the X-ray better than they can, of course they're going to welcome that. They're going to help everybody.
Now at the last NIPS meeting, last December in Long Beach, 8,000 people showed up. And there were 100 sponsors. I mean, this machine learning has become a very hot topic. At UCSD, the machine learning course, there are 400 people that took it. Stanford, they had 800 people who took it. Starting salaries for people with a PhD is between $500,000 and a million a year, plus bonus. I'm tempted myself, I have to say.
So I do some consulting, but the reality is that there's a real transformation that's occurring. And it's really big data. If you have more data than anyone else in a certain area, whether it's weather, or social networking-- as Facebook has-- then you're going to win. And of all places, who has more data, actually, that is accessible on the medical side, is China. Now they have a billion people there. And their privacy laws are much less than we have here. Our data is in silos and it's very difficult to get permissions and aggregate the data.
But in China, they're going to-- they're going to do that much faster and better than we can. So they're putting huge investments, billions of dollars that they're putting in training engineers, and new AI labs, new, you know, support industry in China is-- they can do things really quite quickly. They could put up skyscrapers in know a couple of months, if they want to. Things work around the clock. So this is something is happening-- really, really rapidly. Within the next couple of years, you're going to be seeing self-driving cars-- well, already they're out there, they're being tested. But so many other areas of life.
Now the last thing I want to do-- it's getting late-- is to say that at the same time this revolution has occurred in artificial intelligence, there is a similar revolution that's occurring in neuroscience. And this is important because the current AI is based on what we knew about the brain 20 years ago. And now things are really, really shifting, just within the last three years. You can see the change. Now almost everything we know about the brain is levels diagrammed from molecules, the entire central nervous system. And there's structure and every one of these levels. But the one that we know most about is recording from single neurons with a microelectrode that you've already seen.
Now, you know, you record from 100 neurons, you write it up, and that's a paper. But there's 100 billion neurons. And furthermore, these neurons are talking to each other. So the information is distributed, it's represented in many parts of the brain. And trying to figure out how the brain works by recording from one neuron at a time is a little bit like looking at the world through a soda straw, one pixel at a time.
Just think how difficult it would be to figure out what's out there, let alone having people running around and doing things, right? How would you follow them? All you see is the contrast in one pixel. It really is a tough problem. And a lot of progress has been made, but the point is, though, that it's very slow.
And in 2013, Barack Obama announced the BRAIN initiative. And this is a $5 billion 10-year program, it's one of the grand challenges, like, for example, Kennedy's man on the moon project, or the war on cancer, or most recently, the Human Genome Project. And the goal here is to map the circuits of the brain, focusing on them at network level, measure the fluctuating patterns of electrical and chemical activity flowing within those circuits, and understand how their interplay creates our unique cognitive and behavioral capabilities.
This is a little capsule of a 150-page report that I helped write. A group of us, 15 of us, Corey Goodman-- Cori Bargmann and Bill Newsome were the chairs. And we spent a year and a half on this. So this is the first time I've ever had-- someone asked me to write a 10-year $5 billion grant proposal. But it was hard, because we had to set priorities. You can't fund everything. And there's already $5 billion a year at NIH for funding all the research in neuroscience. And so this has to be in addition.
And so, really, the goal here in the BRAIN initiative, we decided was to have engineers partnered with neuroscientists so that we can create innovative neurotechnologies. And that has happened on a grand scale in ways that I could have never imagined. It's now possible to record from tens of thousands of neurons simultaneously, from three or more areas simultaneously. And what we're seeing is something that could have never been seen with a single microelectrode.
I'm going to give you one example from my own research. This is a hypnogram. When you fall asleep, you go through different phases. You go into a deep slow wave sleep. The EEG changes from high frequencies to low frequencies, high amplitude. And there is-- you spend half your time in stage two, which is characterized by these mysterious sleep spindles. They last for about one or two seconds, they're about 10 to 14 hertz, and they're known to be coherent across the entire cortex.
And there's lot of evidence now that they're very important for memory consolidation during the night. And this means experiences you've had during the day are being implemented in the circuits within the cortex while you sleep. So that when you wake up in the morning-- you're able now to integrate that information into your daily behavior.
So I collaborated with Syd Cash, who's a neurologist at Mass General. And these are epilepsy patients who have grids of electrodes placed on the surface of the cortex in order to locate the seizure focus, which is where it starts. The idea is that if you cut that out, that this may-- these are patients who have had drug-resistant epilepsy for many years, right? So these are-- this is the last step that they could take. And it's very invasive. You have to open the scalp, you have to place this in, and then for one or two weeks, the subject is just sitting there waiting for seizure-- right-- because they have to-- they're recording continuously.
And we're able to get the data, especially during sleep, and study the-- oops-- you weren't supposed to see that. And study the patterns of activity during sleep spindles. Now everybody thought it was synchronous across the whole cortex. And it's not. And I'm going to show you a recording and let you decide for yourself. OK, so a peak is white and the lowest is min-- is black, minus. So if you just follow the peak around, you can see that it's circular. It's a traveling wave, from the temporal to the parietal to the prefrontal. Does everybody see that?
We've done this now with a dozen patients, with literally 50,000 spindles that we've analyzed. And about half of them are circular, the other half have other patterns, expanding patterns. Interesting, of the ones that are circular, 80% go in this direction, 20% go in that direction. And so what do we think's going on here? We have a hypothesis. We think that this has something to do with the information that is being activated in different parts of the cortex with different time delays, because there are long-range axons connecting different cortical regions. And there are time delays on those axons. And we think that those time delays are very important for regulating the traveling wave.
The caveat here is that these are epilepsy patients. So we don't know to what extent the epilepsy may have had something to do with it. By the way, this inspired us to Princess Leia. Do you recognize Princess Leia, right? So she has these buns that look just like a traveling wave.
OK, so we collaborated with April Benasich, who is a-- studies babies and infants. And this from-- study was done from six months to 18 months, high density EEG. We analyzed her data, and the beauty of-- the skull of the baby is just very thin, so recording from the scalp is almost as good as having a electrode directly on the cortex. And so we get very high signal-to-noise recordings. And I'm going to show you what we saw.
Now in humans, we only saw one hemisphere. So you don't know what's going on with the other hemisphere. Here we have the whole cortex. We were rotating this-- the array at the same time that we're looking at these patterns. And as you can see, these patterns are flowing over the entire cortex. And every baby shows this. And this is a very, very powerful signal. So babies spend half their time sleeping. So they're really putting a lot of effort into absorbing the world and integrating that into their cortical circuits.
OK, now the last thing I want to show you is a Chinese curse. Now I told you that, you know, we've developed these instruments, the engineers have built things for us that are so powerful that we don't know to do with them. They're so powerful, they're generating data at a rate that we can't possibly analyze. And this is an example of work that was done by Misha Ahrens, at Janelia. And this is a zebrafish. And it has a genetically encoded calcium indicator. You can see outlines, a nice brilliant flash, whenever a neuron fires a burst of action potentials.
The time resolution is about 100 milliseconds. So it's fast enough so that you can follow the activity as we-- as we look at the zebrafish. Now the zebrafish is embedded in agar, so it can't move. And it's in the dark, so there's no sensory input. So we're dealing with a brain that is spontaneously active. There's no sensory input, there's no motor output. But you'll see that there's a tremendous amount of active-- of activation throughout the entire brain.
So we're looking down here. We're looking from the front, and we're looking from the side. There are about 80,000 neurons. And we can record 100,000-- neurons we can record from about 80,000. So it's just a 3D reconstruction with a-- a very, very fancy optical system. Oops-- let's get this going.
OK, so this is the forebrain, this is the eyes here, in front. This is the spinal cord. You can see things going up and down the spinal cord, mid-brain. But what's going on? What does it mean? And there are these flashes every once in a while of high degree of coherent activity, flashing up and down the spinal cord. What was that? Maybe it was the zebrafish thinking, right? How would you go about analyzing this? This it intrinsic internal activity. Of course, this is happening in your brain all the time. When you're sitting there, and just thinking to yourself-- right-- there's a spontaneous activity. This is you. This is the zebrafish. And that's the challenge.
Fortunately, at the very time that we now can create these humongous data sets, we also have the ability to use machine learning to analyze them. So the very same deep learning networks are being used to analyze these recordings. And not just the recordings. It turns out that working out EM reconstructions using connectomics has been revolutionized by deep learning. It's now possible to automate that. So it's almost-- it's much better than humans now, in terms of the speed and the amount of data that you can reconstruct.
And it's being used by genetics people who are doing very large-scale genetic screens, large-scale genetic profiles, we saw [INAUDIBLE] for example, [INAUDIBLE] this morning gave us a beautiful talk from the Allen Brain Institute, showing the tremendous amount of data they're collecting on connections and genes being activated in different parts of different neurons. And this is really a-- the data explosion is really happening in all the sciences at the same time. It's happening in physics, it's happening in astronomy.
And all of those fields are greatly benefiting, probably science more than any other area, has benefited from now-- the now analytic techniques that have been developed over the last 10, 20 years in machine learning. Deep learning is just one of many techniques that are available. Bayesian networks, there's a lot of really excellent work that's being done on graphical models. And each one of these approaches is appropriate for a different-- a different problem that you have to tackle.
So I mentioned I was on the NIH team. Francis Collins was the one who put us-- to put together the team. And this photo was taken just a few minutes before he made the announcement in the White House. And he brought together some of the principal people that were responsible for the BRAIN initiative, foremost, Miyoung Chun, who is the science director-- was of the Kavli Institute, and Bob Conn is the president. And she was the one who brought the white paper into the White House and the president asked all of his cabinet members and secretaries-- head of NASA, all of them, to come up with a grand challenge project.
And we were on the shortlist, along with NASA. They wanted to bring an asteroid around the moon, which seems kind of neat, you know, space cowboys. Department of Energy wanted to build a better battery, great project. But what can be more exciting than the brain? So we-- we won. And I was really pleased to be there. I won't introduce everybody here, except to say that just before this picture was taken, we were milling around and talking to each other. And I was standing next to this woman. And I had no idea who she was. But she was very knowledgeable, she's an engineer. So I finally said, what do you do for a living? And she said, I'm the director of DARPA. And that's what I knew at the White House is a great place to be if you want to network.
Now I was talking to her, and suddenly there was a tap on my shoulder. And I turned around, and towering over me-- you have no idea how tall he is unless you're there in person. And he came through this door, and so I was the first one he came up to, and I was flabbergasted. I had nothing prepared, because of the fact that I was just-- you know-- I didn't even introduce myself. I mean, how stupid can you be? That's all I said was, thank you, Mr. President. And thank you.
[LAUGHTER AND APPLAUSE]