The Assembly Hypothesis: Emergent Computation and Learning in a rigorous model of the Brain
Date Posted:
February 2, 2024
Date Recorded:
August 12, 2023
Speaker(s):
Santosh Vempala, Georgia Tech
All Captioned Videos Brains, Minds and Machines Summer Course 2023
MODERATOR: This is a really exciting framework on the assembly hypothesis. We'll be hearing about it from Santosh Vempala who's really been pioneering this work for the past maybe six, seven years is how long you've been working on it.
And hopefully-- so I've seen different iterations of this talk over the last two or three years. And every time, Santosh has a new, exciting demo that-- it makes you really wonder how something so complicated comes out of a really simple model. So looking forward to hearing what Santosh has been working on recently.
SANTOSH VEMPALA: Thanks. Yeah. Great. The talk is a hypothesis, so I'm looking for counter examples and suggestions to build an actual theory. Yes, I have the word emergent there. I know that's a little controversial, but you'll see what exactly we mean.
Please interrupt-- lots of questions. It's meant to be a complete model of the brain-- so that should already be controversial-- but mathematical-- so of course, it's a toy model-- and completely rigorous. So every detail is specified.
So before that, let me step back a little bit and ask, what is computation? Take a minute to do this. And I mean something extremely general. It's just a well-defined sequence of state changes, perhaps with purpose-- that's in brackets-- but the state and the change of state. And so what is a state? State could be memory content, input, output. That's your state.
And then there is the Church-Turing thesis which says that anything computable can be computed by a Turing machine. It's worthwhile remembering this. Just, again, this is the model. It's a thesis, not a theorem.
Yeah, but it's been very, very useful and powerful. And Turing machine-- maybe we'll discuss a little bit later. It's just a finite-state machine plus a tape where you can store stuff along the way.
OK, now the idea is computation is universal. I mean, we can think of planetary systems as computing their next positions in time. Weather is computational. The temperature and pressure right here is something tomorrow, maybe outside more interesting. And it will be computed by nature by change of state.
Changes of state-- metabolic networks in your body, ant colonies, and of course, brains. At this point, it's completely clear that the brain is a computational system. OK, now here is my-- I'm presenting this slide not because I'm a neuroscientist-- far from it-- but because it's useful, perhaps, for you to know my view of the brain. So the next model is passable.
So the brain-- a network of 80 billion neurons, a thousand to ten thousand connections. These connection synapses are directed and have strengths. There may be new ones, and they might disappear. And then individual neurons spike or fire based on rules that are nonlinear.
And a very common model is a threshold of weighted inputs. But there are more than a thousand different types. And these firing, these signals have a temporal aspect with rates and patterns. So that's the complete-- this is the nugget from neuro.
And the question we're going to be-- how should I say it-- naive enough to ask is, how does the mind-- the perception, cognition, higher-level thought-- emerge from the brain, which we think of as a substrate with neurons and synapses, as we just mentioned on the previous slide?
So just to emphasize again, this is an exciting time to be in this field, as it has been for the past 50 years. Despite the great accelerating progress and lots of insight that keeps on coming, there is no overarching theory.
It's not like physics, where there is a theory, even if it may not be complete and even if it's yet to be tested, or there is evolution in biology. Here in the brain, there isn't really much of a theory.
And so there is this gap. And this quote from about five years ago now continues to be a great inspiration for us. So Axel said, "We do not have a logic for the transformation of neural activity into thought. And I view discerning this logic as the most important future direction of neuroscience." Yeah.
OK, so what kind of logic would qualify? What kind of formal theory would be reasonable? Going back a little bit, there are things like Hopfield nets that are specific to for specific tasks with very nice properties. Back in '95, there was a-- Les Valiant, who also proposed PAC learning, came up with the neuroidal model, which is very interesting and allows for extremely general operations, such as state changes on individual neurons-- individual synapses in order to capture computation.
So what are we looking for? We're looking for a computational system that's consistent with our understanding of the brain. It may not capture every aspect, but it should not directly contradict some basic feature. It explains cognitive phenomena. And then the question is, what are its basic data types, and what are its operations?
So we have to ask, then, what's the right level to think about this? If it's the whole brain, there's neurons and synapses that we can understand in detail. Dendrites, molecules-- there's interesting things happening at all these levels. But maybe something intermediate.
And so here's where we're going to take this. I will call it NEMO, NEural MOdel. It's a formal, probabilistic model of the brain. There's one basic data type-- just one-- and a few elementary operations-- a handful. There's a completeness theorem about what can be computed using this, what kind of things can be introduced, and what you might call a killer app.
So there's a bunch of papers. We don't need to read them now, but there's a building theory. OK. There's one mathematical background slide I want to put up-- forgive me if you're already very familiar with it-- and that's the notion of a random graph.
So the classical model is the Erdos-Renyi random graph, which is a graph on n vertices. And every pair is connected by an edge independently with probability p. So you toss a coin of probability p. If it comes up heads, you put an edge. Otherwise, no edge. That's the model.
I mean, no graph in nature actually behaves like this, but it's a fantastic model for almost anything. So 0 would be an empty graph. 1 would be the complete graph or clique. And this model has lots of very nice structure. For example, you might expect that the maximum degree is concentrated near whatever its expectation is. It is. But so is the size of the largest clique in the graph.
And as p increases from 0 to 1-- and this is a more surprising property-- any edge-monotone property-- so a property that, once true, remains true upon adding edges. So, for example, is the graph Hamiltonian? Maybe it's not. At some point, when you add edges, it becomes Hamiltonian. And then it will stay Hamiltonian if you keep adding edges. A property that's not monotone would be planarity. Is the graph planar? You add edges, you might break it.
Any edge monotone property has a sharp threshold, meaning there will be a narrow interval with less than any constant of probabilities within which this property will go from being true with probability 0 to being true with probability 1-- for example, the graph being connected, the existence of a perfect matching, let's say, or a Hamilton cycle, et cetera.
We will look at the directed version of this, where the pairs are ordered, the edge from i to j is present with probability p. And j to i might also be present with probability p, independently. So this is a simple, unrealistic model of a network. But it's going to be very useful. And we can think of it as a model for the connector.
Yeah, so the classical model is undirected. We will also use the directed model. In fact, for the brain, this will be the more relevant one because, as you know, synapses are directed.
So in the next two slides, I'll give you the complete model, and then we'll see what comes out of it. OK, so we think of the brain as a finite number of brain areas. Finite means a small number-- a dozen, half a dozen brain areas. This is conceptual. It doesn't have to be a physical thing. This is about organization.
Each one contains n neurons. n is a parameter that could have different numbers. But for the purpose of this model and for everything we have derived so far, we can assume they all have the same number n. In each area-- so each of these regions is supposed to be a brain area.
In each area, only k out of the n neurons are allowed to fire. k is going to be something much smaller than n, and only k out of the n neurons are going to be allowed to fire. This is a sort of a version of inhibition.
Some pairs of areas are connected-- maybe not all pairs-- and they might be directed. For example, this area is connected to this area only in this direction. That area is connected bidirected, so on. And all areas are recurrently connected.
Everywhere, we'll use edge probability p. Again, this could be different. There could be denser areas, some less dense areas.
But let's just say, within each area, there's a probability p of connection. From one area to another area, if there are connections, every pair is connected with probability p. So there's only two parameters-- n, p-- and the number of areas.
OK, that's part 1. Now, what about the activity, the dynamics? We'll assume that there's discrete time, for simplicity. So there's some clock. Neurons will fire only in these discrete steps.
And which ones fire? At each step, it's the k neurons with the highest input from the previous step, highest-weighted input. So the synapses are also weighted. They have weights, which we assume are nonnegative. Sorry. Yeah. And the highest total weighted input-- the top k-- will fire.
Now, connections between areas-- so from area 1 to area 2-- even though they exist, could be inhibited. Like, these connections are not allowed-- nobody feels the firing activity there-- or disinhibited, meaning you do feel it. There is no geometry at the moment. This is just a topological setup. It's a meta graph that's telling you which areas are connected. That's it.
So we abstracted that away and just said that only k neurons in each area are allowed to fire. That's by inhibition. So there is-- if you like, for each area, there is a population of inhibitory neurons that makes sure that, at most, k are firing.
How is that? Because the activity of these will stimulate the inhibitory neurons, which will then suppress the firing activity, and it will stabilize to k. At the moment, yeah, k is the same. It could be different, but we'll just assume n, k, p are just the same across. Its population control. So each area has its own population of inhibitory neurons which are randomly connected.
Yeah. And this population of neurons-- as you know, inhibitory neurons fire-- integrate faster than excitatory neurons. And this control actually happens very fast in a wide range of sizes. k-- every second, k are firing within each area. Yeah. Great.
Now, the second aspect of dynamics-- so this is firing. The second aspect of dynamics is the weight change, plasticity. And we will assume the simplest form of Hebbian plasticity, which is that if neuron i fires, followed by neuron j firing, in the next step then the connection from i to j is strengthened by a multiplicative factor of 1 plus beta. Beta is a fixed plasticity rate-- again, same across everything.
i fires, then j fires. This connection increases by 1 plus beta. That's it. And other things-- homeostasis is just to make sure that the weights don't go-- because everything is nonnegative. We've abstracted away. So every once in a while, each neuron will normalize all its input weights back down to 1, say. Yes?
AUDIENCE: [INAUDIBLE]
SANTOSH VEMPALA: No, it can happen very infrequently. We just don't want things to go off to infinity. No, this is the complete model. I have not left out any detail. Yep, this is it. Yeah. No hidden hyperparameters, no cleverness in the training. Every synapse has plasticity. Every synapse has plasticity and the same plasticity.
Because we have made time discrete. That's an assumption of the model. Time has some clock, some--
AUDIENCE: [INAUDIBLE] assumption.
SANTOSH VEMPALA: Oh, I mean, is time discrete or not? I mean, I don't know. I mean, it just depends how fine you make it. Yes, one would like-- I don't know. Is time discrete or continuous?
[LAUGHTER]
In the model, it's discrete. It can be whatever scale it is, but it's discrete. In reality, I don't know. Maybe we have a real physicist here.
But as far as the model is concerned, forget about inhibition. In each area out of the n, exactly k fire. That's it. And then there is a story about why that's realistic. And the story says that you can achieve k out of n firing by using a population of inhibitory neurons for each area.
That's a backstory of why this model is realistic. How realistic is this? Well, it's reasonably so because it's a completely precise model. The discrete step assumption, as somebody asked, is unrealistic for multiple reasons, but hopefully it's not distortive, like we still get something useful.
Plasticity assemblies are used in this model in a rapid timescale, which is true for some aspects of brain activity, but there's also plasticity at slower timescales and so on, which is not incorporated. And so hopefully this is a useful compromise between reality and usefulness without contradicting some important thing from neuroscience. But then the question is, will computation and learning emerge from these rules, which don't seem to have anything to do with it, without having to be programmed? That's the point.
OK, so what is the basic data type? Larger than a neuron, smaller than the brain. And so the basic data type is something that neuroscientists have proposed for a long time. It's an assembly of neurons.
And what is an assembly? It's going to be large and densely interconnected-- so more dense than the base density of synapses. And the point is that this subset has the following interpretation. Its firing is equivalent to recall of a particular concept.
So there will be an assembly for a person, an assembly for this room, an assembly for this course. Every concept, everything that's stored in your memory is an assembly. So this subset, when it fires, it means you're thinking of a panda or maybe a specific panda. Whatever. That could be different.
Buzsaki calls the assemblies "the alphabet of the brain." We're calling it a data type. He's not a computer scientist.
So what is the assembly hypothesis? There is an intermediate level of brain computation-- let's say, the assembly calculus-- that's going to be implicated in higher cognitive functions-- reasoning, planning, language, et cetera. And assemblies of neurons are its basic representation, the data type that's doing this.
And so then I'll tell you, what are the operations? And again, the operations should happen by themselves. But what are these operations that we can capture? Remember, we didn't have anything called assembly in the model I told you. They have to emerge.
Even the assemblies themselves have to come out somehow. We'll see. And operations on them also have to come out somehow, project, which just means-- let's think of some of the areas as sensory areas, where the neurons that fire in the sensory area are due to external stimulus-- maybe odor receptors, maybe things in your visual field, so on.
And so that's happening because you smell something, and that projects to a different area, a higher brain area. And projection just means that this activity in the sensory area is creating a representation, meaning an assembly, in a different brain area-- a copy of itself, if you like. We call that projection, which, later, when you smell again, this will fire again.
Associate, which is this operation of things that co-occur, start increasing their overlap. And therefore, when you think of two things that seem to co-occur, seeing one or sensing one ignites memories of the other or fires the other assembly. Pattern complete-- seeing a part of an assembly allows you to see the whole thing.
Merge-- assemblies can be hierarchical. There's no reason for them to be flat. You have an assembly for Woods Hole and for neuroscience, and you think of, oh, the Summer School and so on. And a few control commands-- we'll discuss it more.
OK, so here's what projection looks like. This is an advanced simulation, so please focus. On the left, you have a sensory area, and on the right, you have a brain area, some other brain area. And so here, you have an assembly firing. That's supposed to be k neurons.
And that results in some top k neurons on the right firing, presumably the ones that have the highest connectivity to that subset. And now, what happens in the next step is that the input-- the sensory area neurons are still firing. You're still smelling something. And meanwhile, these k also are firing.
OK, now, what's the next stop k? It doesn't have to be the same one. So there's some other top k. Great. And this fires now, and there are some other top k. And you've got this process.
And the question is, will this process lead to any kind of convergence? So the sensory area is firing. You're smelling the same thing consistently. You have a top k, top k, top k, top k. Is it going to converge?
AUDIENCE: [INAUDIBLE]?
SANTOSH VEMPALA: The stimulus, for now, let's say is held.
AUDIENCE: Why [INAUDIBLE]?
SANTOSH VEMPALA: Because top k depends on the previous top k. The top k is sensing not only the stimulus but also itself. There are recurrent connections here.
AUDIENCE: What is the top k?
SANTOSH VEMPALA: The k neurons that have the highest total weighted input from the previous round.
Yes, between-- from one step to the next step, if i fires, and then k fires right after, i-j connection will be strengthened. That's the rule, yes. You would expect that, OK, so the first cap had the highest input from there. They have the advantage.
And now, if somebody inside also got more from this top k, which is just an-- and since it's a random graph, maybe there should be some small fraction at least that has this advantage. But then-- yeah. I don't want to go back to the Big Bang, but let's just say there is some current activity. But the introduction of a stimulus is a huge change to the system because the total firing activity is going from k to 2k specific order.
AUDIENCE: So basically, [INAUDIBLE].
SANTOSH VEMPALA: For example, yeah. To me, that's as good an assumption as-- yeah. Yeah. OK, so here's what we can prove about this. The projection process converges exponentially in a wide range of parameters-- I'm not playing-- with high probability.
The high probability is only over the initial random graph. Everything else is deterministic. It's just the original connector that's random. And what does convergence even mean?
It's not going to be the case that you will just get to one subset k that keeps firing. That's actually false, almost always. What, instead, will happen-- what we will be able to show-- is that the total support-- meaning, you fire k, and the next time, there's another subset, another subset.
If you look at how many total neurons were activated, not counting the ones that are repeatedly activated, then that total is only a little bit more than k. It's k plus little o of k-- that's a number that vanishes as a fraction of k-- as long as the plasticity is more than a certain threshold. So if the plasticity is above a threshold, you converge with very few additional neurons fired.
And for any positive plasticity, there is convergence, but you will see a much larger number of neurons touched in the process. And the threshold that's provably valid is basically a constant over something that depends on the dense-- so pk is the expected degree because there are k neurons firing. p is the probability of an edge. pk is the expected degree to the firing set.
And as long as pk is bigger than this logarithm, this is a constant, so we're talking about a constant, overall. So this one-- these are both upper bounds. So it's no more than this and no more than this.
But when beta is less than the plasticity, you see this number is bigger than this number. At least one very interesting question you're asking, which is, is there a sharp threshold here? And that's a question that I'm going to put at the end of my slide, too.
Is there a sharp threshold-- does the plasticity parameter give you a sharp threshold for the creation of these stable assemblies I do think that if beta goes-- and that shouldn't be too hard to prove-- sufficiently smaller than this, maybe by even just a large constant factor, then the number of neurons you touch does go up.
So the convergence just is in terms of the total number of neurons that are activated at any point in time in this process. So the process is there's k sensory neurons that are being fired, and then the top k are going around. Every time you do top k, potentially, if I do t steps, there could be k times t neurons touched.
But this is saying, no. Because of plasticity, as long as the plasticity is greater than a certain threshold, the total number of neurons that will fire is actually k plus a tiny bit more. Yes?
AUDIENCE: What is n?
SANTOSH VEMPALA: n is the number of neurons in the area, which is typically much larger than k. No, no, but what-- no, no. Beta equal to 0-- it's not obvious. What happens at beta equal to 0 is that the number of neurons becomes-- and this is not hard to show-- at least a polynomial in n, n to the-- you start hitting-- even if-- yeah. You start hitting a large number of neurons. Yeah, good.
OK, so that's the first step-- very basic thing-- creation of memories. OK, so this is assemblies are created. And now, what are these doing?
So it takes about a dozen steps. And these are plots here which are showing you how many neurons actually get activated. And when you have high enough plasticity, it's very close to the k. As your plasticity becomes smaller and smaller-- this is the plasticity value-- you start seeing more and more neurons touched in the process.
AUDIENCE: This is from simulation?
SANTOSH VEMPALA: Simulation-- this is just simulation. All of the code is publicly available. I'll put it up. For those of you who are curious how such a proof might work, you see it's a little bit-- how should I say-- nonstandard, in the sense that you can't really apply a general principle from dynamical systems, come up with a Lyapunov function or some nice potential that converges continuously, because the fixed point is not a fixed point in any standard sense.
So you have to combine some discrete reasoning with this. And that's what we do with probability about what we know about the random graph or what we can prove about the random graph. And so what happens at every step? The main thing is that there is a competition between the previous winners, who are at an advantage because they had the most total input, and a large population of neurons.
And just because the population is large, the best of these are genuine competitors to who will make it to the cap. And so there is this competition between previous winners and a much larger population of-- I say not-at-an advantage players. And this balance for high-enough plasticity tilts in the favor of the previous winners.
Yeah, and you can establish these thresholds. So the fraction of previous winners that survive as winners just keeps increasing exponentially fast-- or the fraction of new winners decreases exponentially fast. OK, so that's the proof. I'm happy to maybe save this for discussion later.
Now, here's the first property you get out of these assemblies. First thing is recall, which means that if I present in the future the same stimulus which created the assembly in the first place, then I don't need to wait for these iterations. It immediately fires the assembly that was created because, now, there's recurrence. One fire-- boom. That's the one that gets lit up.
What about firing only a subset? This is one of these really nice properties of Hopfield nets. And we get this. In particular, for any epsilon that you can choose up front, if you present the stimulus sufficiently many times-- you could have stopped when the assembly was, let's say, stabilized, but you go a little bit further, present a few more times, rehearse.
Then igniting an epsilon fraction of the assembly completes to the whole thing very fast, just because, within the assembly, the weights of these connections will go up with each presentation. So this is the benefit of recurrent connections. You don't have this in insects, but you have it in mice.
OK, the next one is association. So if I have two stimuli that are cofiring, and they already have assemblies in here for them-- maybe you learned them separately-- over time, their overlap increases. The number of neurons that are actually in both increases. And there is this experiment that was, for us, very useful-- it came out around the time I started working on this-- from Ison, et all, where they did this on a population of epileptic. Yes?
OK, so this is what happened in the experiment. They were recording from several hundred neurons-- I think about 600-odd neurons and in the MT or the Medial Temporal lobe. And they first presented familiar, well-known places and recorded from 10 to 20 neurons they could see were consistently firing for specific images.
And then well-known people and also-- sorry. Also, they saw activity and very little overlap. And then they did this interesting thing where they superimposed and presented those. OK, fine. And then presented just the place again-- so they deliberately created this association between two existing assemblies.
And guess what happened? Some of the neurons that were previously firing only for Obama also started firing for Eiffel. So there was an actual increase in association that they noticed from this experiment. And this is a provable outcome in this model. It's a theorem in the model that, if you do this together, you will actually increase the association.
AUDIENCE: [INAUDIBLE]
SANTOSH VEMPALA: There are more complicated operations where you build hierarchies of assemblies using merge. And so just to recap, here are a few operations. And for now, I'm saying there are also a few control commands, which is something I promised shouldn't be there-- but let's say, for now, they are-- where you can, say, activate an assembly, disinhibit an area, like you were asking, meaning that area is not allowed to-- you can't have activity going into that area. You cut off and so on.
For the moment, using these control commands and the set of operations that naturally run-- there is no special program for them-- you could ask, how powerful is this system? What can you actually-- what computations can you execute? And the answer is it can perform arbitrary computation, anything that you can do on a Turing machine.
Using square root and space, you can do in this model with half a dozen areas and n neurons per area. So that's the general-- that's what we mean by a completeness theorem. By the end of the talk, I'll show you that we don't need any control commands, that everything will be just done using a hardware setup, which is these brain areas, and a random graph and stimuli, sequence of stimuli.
So how sensitive is this to noise? It's a good question. There is certainly a threshold of noise it can tolerate automatically, up to the point where top k are not changed, for example.
So if the top K itself, let's say, is an approximate top k, I expect that we should get very similar, in spirit, theorems, but actual theorems, but it's not proven yet. So the sensitivity, the noise sensitivity of this whole setup is to be established. Yeah.
OK, so in the rest of the talk, we'll get to solve higher-level things that you might hope for from this learning course-- sequences of inputs, which seems to be a very fundamental thing for the brain, and then giving up control. I don't want these control commands that, say, inhibit an area or inhibit this pair of areas and so on. And finally, moving away from Gnp, from this complete random graph assumption-- we know the brain has geometry, and what about it? And also mention language.
OK, so how do brains learn? Well, this is a very important problem for neuroscience but also for this model. And the only rules we have are plasticity. So that's it. It's just local-- plasticity is the only change in what's happening.
And so there isn't much evidence that there is gradient descent going on, even though it's super successful in machine learning. And so the natural question is, is synaptic plasticity an actual, effective learning mechanism? OK, so as a first step to demonstrate nontrivial learning, you can ask, suppose-- so we know that the assembly projection is effective.
But can it create higher-level assemblies that represent a class of stimuli rather than one per stimulus? So for example, if you draw inputs from a distribution, can you have one assembly created to represent the distribution and then another for a different distribution that's sufficiently different, so that, when you get something from this distribution, you get the assembly 1, when you get from the second distribution, you get assembly 2? So it's a higher-level assembly, but representing the class.
And so the answer is yes, after you define this stimulus classes properly. So one definition for which-- now, this is the definition of inputs and classes, not of the brain model. And the definition is just that suppose you have stimuli, which let's think of stimuli as 0, 1 vectors, sensory vectors, and two different classes have two different base subsets of neurons that they're anchor subsets, if you like, that have very little overlap.
And then the rest of the neurons can fire randomly with some probability. So half of their weight comes from the base anchor set of neurons, and then the other half is spread out randomly over the rest. So if you look at these distributions, they look focused on these k subsets and spread out randomly.
So they're pretty different from each other. But any two stimuli are quite different from each other, too. And now, if you present these, it turns out it forms a higher-level assembly for each stimulus class after a small number of presentations.
Yeah, what I'm doing here is summarizing the theorems, and then we'll go over a simulation. I think we'll have time for that. OK, what about a more classical learning theory problem of learning a halfspace or a linear threshold function?
So let's say there is a halfspace, so there's an unknown vector, v, so a linear threshold. And what we're going to do is pick Bernoulli input vectors. We'll just present five examples that are one side of the threshold and five from the other side.
Or actually, we can just do with one side and the rest. So it really looks like that. There is a margin. We're assuming there is a margin. And then here's what happens.
So this matrix is representing the overlap of two stimuli as 0, 1 vectors, the inner product, from the same class and between classes. So of course, it actually almost all looks random because there's only one direction in which they are separated. And that's a random direction, in this example-- in the experiment. So if I look at two random vectors, and I look at what their overlap is, yeah, it's very high in this-- their plus/minus-1 vectors.
On the other hand, after you present these five examples and create this assembly, and now you ask, What's the overlap of the assembly in the brain area, the firing activity? then it becomes much more focused within versus across. And so these firing activities are for the assembly that's formed after five examples. There is some overlap, certainly, but it's much smaller than in the input neurons.
So this is what we mean by creating an assembly. So the overlap is preserved. And this is a theorem for this model. For under this model of the input, presenting these inputs to this model that you know exactly the entire setup of, you will create these two assemblies that have very little overlap. And therefore, classification is easy. If I give you a new example from the positive, this is the assembly that will fire.
Now let's go to sequences. You're welcome to keep interrupting. We'll just stop at noon. Memorizing and learning sequences seems to be very fundamental and natural to brains. On a computer, what's the big difference-- sequence, table, whatever? But here, it's-- yes.
Actually, we need, crucially, that the examples come from-- that's the only label information we're getting. Everything on one side-- you get five examples from the positive side, and then you get five from the negative side.
Yes. Yeah, just the label. The label has to-- if I do random, then I don't get chance to form an assembly. But if I see a few together, that's enough to form an assembly. And then, from then on, you're OK. Yeah, because, otherwise, it looks like an unsupervised problem. But there is actually label information because of the sequence here.
OK, but I'm now talking about a different kind of sequence. I mean, anybody who's learned to say, A, B, C, D, if you learn it to a tune, much easier. Whatever we do, we seem to have some kind of mnemonic or a story, a location, a map, whatever.
Now, assembly sequences-- sequences of assemblies are observed in experiments with various mammals. This experiment from Buzsaki's lab-- they have these animals running around this structure. And not only do-- and they have to do complicated, nontrivial decisions at various points to get their rewards and avoid the penalty.
So there's a sequence of assemblies that are created. And very interestingly, after the animal is trained, as it enters the arena, it preplays the sequence before it starts executing the task. So yeah, there's also preplay. This has been replicated in modified settings.
OK, so how do I pose this as a problem for the assembly model? Here's the simplest sequence problem. I present not one stimulus, but a sequence of stimulus-- A, B, C, D, E-- and maybe a few times-- A, B, C, D, E; A, B, C, D, E. And this should create assemblies, first of all-- one for A, one for B, one for C, one for D, one for E. Sure.
But then what we'd also like is, if I now fire A, the assembly for A fires, sure. But it also leads to firing B, C, D, E, even though I didn't present those. It triggers the rest of the sequence. And not only A-- if you give me something in the middle, it should trigger the suffix, the rest of it.
We'll try it soon. The theorem is, yes, it does. And you need log n presentations to guarantee that it does-- or at least we can prove that, with log n presentations, it will happen with high probability. Yes?
That's a good question, in general. If I have two input-- let's say, sensory stimuli that have large overlap, then the projected assemblies also have large overlap. And it applies also to sequences. It turns out, though, the overlap just depends on the overlap in the sensory areas, not in where they are in the sequence. Yeah, this is single sequences so far. Yeah.
OK, now why are sequences fundamental? Why are they interesting? Well, there's powerful consequences, besides the fact that they are a very human-- or natural thing. The first thing is that you can then go to finite-state automata.
I mean, finite-state automata are fantastic. That's an algorithm. Every algorithm is a finite-state automaton plus memory, plus state. Just telling you, if you're in this state, and you're seeing this input, you should take this action or produce this-- that's an algorithm, or any algorithm is that.
And what we can do now is basically take a finite-state machine, memorize each arc as a short sequence-- now you've got a whole set of sequences-- and then just run the machine in the brain model. I'll show you this.
And in fact, once we do this, we will not need any more control commands. All we'll need is-- yeah, you'll see. OK, so both finite-state machines and Turing machines will operate without control commands. What we use are what are called long-range interneurons. So previously, we had neurons suppressing activity within an area.
Long-range interneurons will be used-- so when they are activated, they will suppress an entire area. But they are set up already a priori in the hardware. There are long-range neurons between area A and area C, between area whatever. That's your substrate.
OK, so the way we'll simulate a state machine-- these are supposed to be states and transitions-- is that there will be areas for symbols-- symbols like stimuli, areas for states, and areas for transitions or arcs. So if your finite-state machine is some finite size-- it has five states and 10 arcs-- then you still only have three areas, only three brain areas. There are separate assemblies for each symbol, separate assemblies for each state, and separate assemblies for each transition, and that's it. And the transitions will be learned by just presenting the arcs repeatedly.
So this is what we mean by simulation of a general Turing machine without control commands. We can also read the input and memorize it, but let's not do that. That also can be done by simply presenting the input as a sequence enough times.
OK, let me just do one more slide, and we can switch to the-- how do we make two areas fire alternately? This is an important operation-- to make two areas fire alternately. You're not allowed to both fire at the same time.
And so for this, we use inhibition in a somewhat new way in the model. It's neuroscience. It's not new. We have populations-- so these are two areas, A, B. And I want the effect that A is allowed to fire then B, then A, then B. These are not assemblies. These are areas.
So some top k in area A will fire, then top k in area B, then top k in area A.I just want that alternating effect. How do I make that happen? Well, we have inhibitory populations for each of them. Fine. And when those fire, they suppress. That's the nature of inhibition.
But then we also have disinhibitory neurons which inhibit the inhibitors. And those are directly fired by the areas. So when area A fires, it fires these disinhibitory neurons for B, which then makes the inhibitory neurons fire, and that suppresses B.
And that's how we get the alternation. And now, when B fires, it will suppress A. So that's just-- inhibition is enough to do this, and this is the setup. OK.
So the conclusion here is that the realization and execution of finite-state machines here is emergent provably. You just need a sequence of inputs. And then all the components of the assembly calculus-- this model, NEMO-- operate according to prescribed rules. There's no overall control. All of the simulation is here. We'll go to this in a second.
So with this setup, exactly the following behavior comes out. A fires, then B fires, then A fires, then B fires-- I mean, some subset of k within. Language is a fantastic domain for such models because it's unique to the brain. And maybe you've seen this.
This is exactly about the pace at which most of us read. And maybe you've seen this. This is the spiking neurons, 50 hertz. And this experiment was another one that came out around the same time, which was just fantastic. If you read this aloud-- "fret, ship, hill, give, true, melt, fans, blue, guess, hits, then, cats"-- nonsense-- this is what you observe in terms of the frequency of firing-- basically, one per word, 4 hertz.
And then this was the ingenious experiment. There's "bad, cats, eat, fish, new, plan, gave, joy, little, boy, kicks, ball." What do you think is going to happen?
Yeah, they had three different frequencies-- one for the words, still, one for phrases, and one for complete little sentences. Yeah. OK, so this is suggesting that there are little parts trees being formed on the fly as you're doing this. And there's all kinds of neuroscience evidence suggesting this.
And indeed, this tree-building step-- what we've seen here-- about a dozen spikes are enough to-- a dozen timesteps are enough to create an assembly to represent the next level of a phrase and so on. OK, there are many research directions.
We've discussed several, but let me just go straight to the demo now in the remaining few minutes after I thank-- there's a whole topic, which I haven't covered, which is going away from GNP to models with more geometry, random geometry graphs. And I will not do that right now because I don't want to fly through it.
But there is very interesting behavior there, where, even without plasticity, you get convergence phenomena. And some things about it are provable. Thanks to Christos, collaborators, and students-- Max and Mirabel are current students. I should have recommended the Summer School to them-- and to you. And let me now switch to this demo.
So here's what's happening here. Oops. That's already loaded fine. From brain import RecurrentArea-- input area is 1,000, neurons 1,000, cap_size 30-- that's your k. Density is p. Plasticity is 0.1, just to quickly ground us.
I'm presenting a stimulus for 10 rounds, and you're seeing what is actually activated when I present that. And you see, initially, it's something-- and they're sorted by activation, and so it quickly focuses onto the same cap_size-- what is it-- 30 neurons. OK.
Classifying stimuli-- you generate some stimulus, and this is what the actual stimulus looks like. The blue stimulus consists mostly of activity over here and some activity spread out everywhere. Similarly, the green; similarly, the orange. The green was plotted last. That's why you see more of it.
Now we create these assemblies by presenting them and then visualize them. And this is what the created assemblies look like within the brain, just by the presentation-- exactly what I said in the talk.
You can test their overlap. And of course, the prediction is almost perfect because the majority are from the same. This is not a hard machine learning problem. It's just happening without any rules here.
OK, now let's go to sequences. So I'm defining a sequence of inputs of length 25, sequence of 25 things. And we are presenting the entire sequence 10 times. That's it-- 1 through 25 10 times. And now we plot the results.
Now, what this is showing you is 25 is a sequence item. And if you present only three times, then the recall fraction-- the fraction of the correct assemblies-- recall drops off very fast. Let me explain to you what this orange line is in a second. But think about the blue line for a second.
After six presentations, the blue line-- you get good recall. And even the last element is recalled well after about eight presentations. So I did 10, but I could have seen, what was the status after three presentations, after six presentations, after 10?
After 10 presentations, this is telling you you're recalling basically the last item perfectly. And this is the average. But I just want to tell you what this orange line this. This is something very human.
You know how when we learn ABC, we learn with a song, with an existing sequence? So what we did here is we used two brain areas. One of them already had a sequence, and one of them is the one that was getting projected. And we just projected inside, simultaneously let these things fire.
And when you do this, it turns out that three presentations are enough. So basically, the number of presentations you need drops by a factor of 2 when you allow yourself a scaffold existing sequence in the brain. It doesn't matter what it is. It could have nothing to do sensory-wise, stimulus-wise. Yes?
AUDIENCE: Does the--
SANTOSH VEMPALA: OK, so here we get to the finite-state machine. So this is sort of a classic finite-state machine-- recognize numbers divisible by 3. I give you an input binary number, and you want to know-- or not binary-- decimal number. You want to know, is it divisible by 3? This is something we can do in our head. Just keep the count mod 3. That's a finite-state machine.
How is this going to learn it? We're going to learn it by just presenting the transitions one by one. This has just four states plus the accept state. That's the network. We present the transitions 20 times.
And now, let's give it a sequence. Anybody want to give me a sequence of digits? All right, fine. Sorry? 7-- yeah. Just to be-- there. OK.
Now it's done it's computation. Plotting is the more complicated thing. How do I show you what exactly it's done? We'll see it in a second. Here it is. So this is 729729927, end of sequence.
And this is what's getting activated. It starts here. This represents 0 mod 3, 1 mod 3, 2 mod 3. And these are the accept and reject states. Those are the five states in the network.
When I present 7, it moved to mod 1. It moved to 1 mod 3 because 7 mod 3 is 1. And then when you present-- what is it, 2? 1 plus 2 is 3. It goes back to 0. 9 stays here. 7-- so on. And at the end, it accepted it.
OK, now, here's the interesting-- how about a probabilistic finite automaton? This is closer to your question. What if the sequences are not deterministic? For example, here is a very simple probabilistic finite automaton for a fragment of English.
The subject article "the dog chases a ball." And then we allow ourselves to go back to the beginning. "And then the boy chases"-- whatever.
But the point is this. Nobody's telling us anything about this. We're just learning this by presentation as little sequences.
Yep. This is the-- and then we present it 20 times. Now I sample from the model, which means I start at the start state of this PFA and see what happens. And here is the output. This is a sample output. Let's just say, print.
OK, the boy throws the stick. Let's sample again. Now, you're not training again. I'm just sampling. You see, the same model is now giving me "The dog chases the ball, then a dog chases a stick, then a dog chases the stick." OK, it's obviously not thinking.
But the point is that the probabilities themselves-- this is what I want to say. This is ongoing work-- are generated by assemblies. How do I make a random choice between two assemblies? This can be done using just a random initial k subset, it turns out-- this probability half plot.
OK, this is the last thing here. I'm plotting the activations for what actually-- oh, sorry, there's one more thing. This is just the sequence of activations corresponding to the sentence that was generated-- "a boy"-- what was it? Whatever it was. Anyway, so that's the probabilistic environment.
Now, the last one, very last one-- simulating a Turing machine. You see, a Turing machine is much more powerful than just a finite-state machine with no memory. One example of this is a palindrome. If I tell you ABBABBA, is that the same when I reverse it? You can't get a finite-state machine to do it, but Turing machine, no problem-- you and I, no problem. Write it down, and then start marking off the ends and so on.
And this is a finite-state machine with the input alongside. That's just the finite-state machine, and then you need to have the input on which you can move left and right. This R and L represents whether you're moving left or right on the tape.
And now, how is this model learning it? Well, first, the alphabet right now is just A or B. And we have an end symbol. That's why it's plus 1.
These are the transitions of that finite-state machine. We create symbol assemblies and state assemblies, present the network-- so this is, again, just by presenting transitions of the finite-state machine. And now we ask for a string-- for example, abbabbbabba. Is that a palindrome? Yes? OK.
And now, we run it. And let's see. Yeah, it takes a little bit. Here we go.
So what's happening here? This is just what's on the tape-- abbabbabba. And that's the end symbol. And in the first step, it is starting here and writing this down.
And then what it does is just checking if the last was the same as the first. And if so, it erases it. That's why you get the smaller tape, and smaller tape, and smaller tape. I mean, it's just following the finite-state machine correctly but for a large number of transitions, till you're down to just bbb, and then just b. And it accepts, in the end-- move to the accept state.
Anyway, I'll stop here. All of this is available on GitHub. And everything was done here with less than 7,000 neurons.
[APPLAUSE]