One of the most important features of animal behavior is our ability to learn from experience. Such adaptive behavior is important for survival in our ever-changing environment. The brain has evolved many mechanisms of learning which likely contribute differentially to distinct types of learning. Thus declarative learning, procedural learning, as well as fear learning, spatial learning and motor learning, may involve different changes in distinct neuronal circuits in different parts of the brain. In a previous video, video 6.2, we briefly discussed evidence that spatial learning through NMDA receptors in the CA1 region of the hippocampus might play an important role. Indeed, the hippocampus and neocortex are likely to be involved in many forms of learning, and this is supported by increasing amounts of evidence from experiments both correlating and manipulating neuronal activity in hippocampus and neocortex. However, in this video, we'll focus our attention on other aspects of learning that might be perhaps more ancient and thus perhaps more fundamental — a form of learning that likely existed long before the neocortex evolved. Here we will specifically focus on reward-based learning. Presumably an important motivation for carrying out various actions is a desire to achieve positive outcomes, accompanied by immediate or future rewards. Sometimes we're successful, either by design or chance, and it's important for us to learn from the outcome of our actions so that we can maximize our reward. In this video, we will discuss one attractive hypothesis for how reward-based learning might occur in the mammalian brain. However, it's good to bear in mind that there are many other hypotheses about how learning takes place and this is a very active field of research. Let's think a bit about how reward-based learning might work. We imagine there's some sensory input, there's some sort of context that's provided by sensory input — that's how we know where we are. That's processed by the brain, the synaptically connected neuronal networks that govern brain function, and that then gives rise to some form of motor output — an action. That action, that motor output, might give rise to reward. It may be a successful action and we may be rewarded for the performance of that action in the context of where that sensory input arrived. So the basic idea behind reward-based learning is to learn that in a specific stimulus context we should perform a specific motor output and that will then lead us to reward. At the level of neuronal networks, we can begin to think a bit more specifically about what types of learning ought to occur in the brain. We again imagine our context provided by the sensory input and of course there are many different channels and types of sensory input that we could have. That comes in, excites neurons or groups of neurons inside the brain and that then forms these connected neuronal networks where different neurons in different parts of the brain talk to each other. They process the information and give rise to motor output, and of course there are many different motor outputs that we can imagine. In one particular behavior, we might imagine that there's a given sensory stimulus that comes in, a given context that's provided to the animal that generates activity in one node of the network, and in order for the animal to receive reward, it needs to carry out a specific action that it turns out is driven by activity in this motor node of the network. So what the animal needs to learn is to convert this sensory input into this motor output. In terms of how one might do this at the level of individual neurons, you might want to strengthen this most direct pathway that links the sensory stimulation to the motor output. So we might want to — if we imagine that these are individual neurons and this might be glutamatergic synapses, what we'd want to do is to increase the efficacy of these synapses, perhaps through the process of synaptic plasticity and long-term potentiation. If we could strengthen the link between this sensory input and that motor output, then that would give rise to an enhanced sensorimotor transformation that would then give rise to reward for the animal. So what reward should do, then, is to drive this form of plasticity in the sensorimotor networks of the brain. So a key question, obviously, is how reward is represented in the brain. In the now classic work of Wolfram Schultz, who recorded dopaminergic neurons of the midbrain in the monkey, it seems that there's an interesting reward signal that's present there in the dopamine neurons. What Wolfram Schultz observed is that when he gave reward to his monkeys, there would be a brief transient increase in the action potential firing of the dopamine neurons, and that appears, then, to encode a reward signal. More recently, these experiments have been repeated in mice by the work of Naoshige Uchida and in mice, in addition, we have the ability to verify that these really are dopaminergic neurons through detailed examination of the properties of these cells. The work of Naoshige Uchida and Wolfram Schultz are in close agreement, and what both of them found is that when reward is delivered to an animal, and typically this is in the form of a juice, for example, that's squirted into the mouth of a thirsty monkey, some 100 milliseconds or so afterwards there's a phasic increase in the action potential firing of the dopaminergic neurons. So there's an extracellular recording electrode that's been lowered to where the midbrain dopamine neurons are— these are spontaneously active neurons and so they will fire action potentials every few hundred milliseconds; maybe they have a basal firing rate of around five Hertz, but when reward is delivered, some hundred milliseconds later there's an increase in the action potential firing of these dopaminergic neurons and that's a transient signal that again lasts about 100 milliseconds or so. So reward is associated with a brief increase in dopaminergic signalling. Interestingly, if a cue is presented before the reward and it's done so on a regular basis — say it's a one-second interval between the cue — this could be a sound stimulus or a visual stimulus that predicts reward delivery — then the dopamingergic neurons fire not so much to the reward but rather to the cue that predicts the reward. So in fact the signal here that drives dopamine signaling is not so much reward but reward prediction. So this phasic dopamine signal seems to move to the earliest predictor of reward coming the way of the animal. That's clearly a very interesting signal in terms of learning how the animal should figure out how it should best behave in order to achieve more reward. That seems to be then given by dopaminergic signals that drive the sensorimotor loops for learning how to behave and how to get more reward. Having found this reward signal in the dopamine neurons, it's obviously interesting to see where the signal goes in the brain. The midbrain dopaminergic neurons project most strongly to the striatum. So that's the location where they release dopamine most prominently in the brain. The striatum is a relatively large brain area. It can be divided into the dorsal and the ventral striatum, and the ventral striatum is sometimes called the nucleus accumbens. The midbrain dopaminergic neurons can also be divided into two groups. There's the so-called ventral tegmental area that heavily innervates the nucleus accumbens and also signals to the frontal cortex and there's the substantia nigra compacta that signals to the dorsal striatum. The heavy innervation of striatum by dopamine then suggests that the dopamine is probably carrying out some important functions in the striatum. So if you now zoom in on a small bit of the striatum, we'll see that there are axons, dopaminergic axons that have swellings in them, and at these swellings, there are many synaptic vesicles that contain dopamine and it's thought that dopamine is released from these swellings very much like they are in other synapses of the brain. There are however no immediate post-synaptic membranes that are tightly opposed to these release sites, so the dopamine is thought to diffuse in the extracellular space. There it can bind to dopamine receptors that are present on membranes around it. There are two important types of dopamine receptors. There are the dopamine type 1 receptors and there are the dopamine type 2 receptors and both of them are set in trans-membrane receptors that couple through G proteins, so the next downstream signal here is G protein activation after dopamine binding to the receptor. The dopamine type 1 receptor signals the G(s) and G(off) — these are two different types of G proteins, and they signal through a variety of different pathways including the stimulation of adenylate cyclase, that leads to increased production of the second messenger, cyclic AMP. The D(2) receptors signal through other G proteins, G(i) or G(o), G proteins that have a variety of signalling targets including Phospholipase C but in addition they also inhibit adenylate cyclase and thus reduce cyclic AMP concentrations. Although they probably have many other functions, there at least is, at some level, the dopamine type 1 receptors increase cyclic AMP and the dopamine type 2 receptors decrease cyclic AMP, so they have in some respects opposing functions. Interestingly, these dopamine receptors are expressed on distinct neurons in the striatum. So there are two types of striatal projection neurons. There are the ones that express the dopamine type 1 receptors and there is another type of neuron that expresses the dopamine type 2 receptor. We can look at these specifically through mouse genetics. We can express cre recombinase downstream of the promoter of the D1 receptor or we can express cre recombinase downstream of the promoter for the D2 receptor. If we now inject a virus that's activated by cre recombinase to express green fluorescent protein, then we can see where the axons of these neurons go, and so that's exactly what we've done here. So here's the striatum. This is a sagittal section of the mouse brain, it's a D1 cre mouse, we've injected virus here into the striatum, so we've infected cells and expressed green fluorescent protein in the D1 cre animals, and the GFB here is shown in pink and so here we see the cells that are infected by the virus and further away we see the axonal projection of these neurons. So they send axons that go all the way over here to the substantia nigra reticulata. So that's actually immediately next to where the dopamine neurons are of the substantia nigra compacta that signal back here to the striatum and release dopamine. Here we're looking at the reverse projection, the striatal projection neurons, the ones that express D1 receptors, they send axons to the substantia nigra pars reticulata and all the projection neurons here in the striatum are GABAergic neurons and so this direct pathway inhibits substantia nigra reticulata. The indirect pathway we can study in these D2 cre expressing mice. Again, here we have a sagittal section, we've injected a virus that gets activated by cre recombinase, we infect cells in the striatum, and now we see that there's a very different axonal projection that targets an area here much closer to the striatum, the so-called globus pallidus external part. It's a GABAergic projection from these D2 expressing neurons that inhibits the globus pallidus. The globus pallidus, in turn, contains GABAergic neurons that then inhibits substantia nigra reticulata. So not only do these two pathways express dopamine receptors that at least at some level have opposing functions, where this increases cyclic AMP concentration and this decreases cyclic AMP concentration but in terms of the impact upon the substantia nigra reticulata, they have opposing functions. The direct pathway directly inhibits substantia nigra reticulata whereas the indirect pathway inhibits globus pallidus that then inhibits substantia nigra reticulata, so that's a disinhibitory pathway that in the end leads to an increase of reticulata function and here a decrease of reticulata function. These two pathways are rather interesting in the sense that they're very clearly distinct in their projections, their functions, and also in terms of the dopamine receptors that they express. How does that help us understand reward-based learning? Let's have a look and see what happens if we imagine that there's some reward that causes activation of the dopamine pathway, release of dopamine into the striatum, the dopamine diffuses in the extracellular space, and it then binds to D2 receptors or D1 receptors, as we've seen expressed in different cell types. The action of the dopamine receptors seems to be particularly involved in regulating synaptic plasticity at the glutamatergic synapses that arrive onto the striatal projection neurons. It turns out that a large amount of the excitation that arrives in the striatum comes from the neocortex, and so cortical projections come into the striatum and release glutamate onto the striatal projection neurons. That's true of both pathways, both the direct and the indirect pathways. The action of dopamine in the context of the glutamatergic synapse here is on the D1 receptors to promote the insertion of both AMPA and NMDA receptors onto the postsynaptic membranes and D1 activation appears to be essential for promoting long-term potentiation at this synapse here. So the cortical input here onto the direct striatal projection neurons appears to be potentiated by the action of dopamine. So that's a strengthening of that pathway in the presence of reward. In contrast, the activation of D2 receptors seems to be involved in just the opposite type of synaptic plasticity. Activation of D2 receptors prevents calcium entry through the NMDA receptor, and also seems to be involved in potentiating the long-term depression at this synapse. So D2 activation gives rise to long-term depression, here onto the indirect striatal projection neurons. So dopamine seems to have two opposing functions. It strengthens the direct pathway, cortex to direct striatal projection neurons, and it weakens the pathway onto the indirect striatal projection neurons. Let's look at that in the context of one goal-directed sensorimotor transformation that we've already been thinking about. Let's take the whisker detection task where we stimulate a whisker and the mouse needs to learn to do a goal-directed motor output involving licking a reward spout in order to obtain reward. At the level of a circuit diagram, this is how it might work. The sensory input arrives at the periphery, drives excitation of somatosensory thalamus that in turn excites the somatosensory neocortex and we already saw that activity in the neocortex is essential in order to carry out this whisker detection task. The neocortex in turn sends glutamatergic projections to the striatum, and one interesting place that could be reinforced by the reward signal by dopamine is the glutamatergic input onto the striatal direct projection neurons, the D1 receptor-expressing pathway. Dopamine might potentiate that, drive more activity in the direct pathway downstream of sensory activity, the activity in these direct pathway neurons might then inhibit the substantia nigra pars reticulata and that contains GABAergic neurons that are tonically active and normally they inhibit the brain stem motor nuclei and prevent the animal from moving in some sort of random way. So there's a tonic inhibition of brain stem motor nuclei and that tonic activity in the substantia nigra might be what's been directly inhibited by the striatal direct projection neurons. The D2 receptor indirect pathway spinal projection neurons might be weakened by the dopamine synaptic plasticity. There would then be less inhibition onto the globus pallidus neurons and they would then in turn inhibit the substantia nigra pass reticulata more, and so we would have two ways in which the basal ganglia circuit via striatum and substantia nigra might contribute to the licking motor output in this goal-directed sensorimotor transformation of the whisker detection task. So that's a nice hypothesis. Let's see if there's any data that might support it. So in the first membrane potential recordings of striatal projection neurons during goal-directed behavior, we found that there's in fact a brief, transient sensory response in the direct striatal projection neurons and that seems to be missing or at least much weaker in the indirect striatal projection neurons. So this transient excitation of the direct pathway could contribute to a go signal initiating the licking motor response during the whisker detection task. To see if that brief activation of the direct pathway might be sufficient in order to drive licking behavior, we tried an optogenetic substitution experiment. So we trained the animals as before in the whisker detection task, we stimulate the whisker, the animal licks, it gets a reward, and now, once the animal's learned the whisker task we then start stimulating with blue light the direct striatal projection neurons that have been infected by our virus-expressing channelrhodopsin. So we can activate the striatal projection neurons, these dSPSNs that are normally activated by the sensory stimulus during the whisker detection task. Now we bypass the sensory stimulus and we directly stimulate those neurons. We give them a brief few-millisecond blue light flash, and that turns out to drive the licking motor output. So apparently this activity in the direct striatal projection neurons is sufficient to drive the licking motor response. Here you can see the group statistics: the stimulation of the direct striatal projection neurons is in fact even better than the whisker stimulus at driving licking, and of course, false alarm rates are low under these conditions, whereas stimulating the indirect striatal projection neurons in no way drives further licking. If anything, it prevents the animal from licking. So there's a highly specific pathway through the direct striatal projection neurons that drives licking and the indirect neurons seem to have the opposing function. So if you now put this at the level of a circuit and synaptic wiring diagram, we think that the neocortex is involved in the performance of this whisker detection task, and one of the outputs of the neocortex is the glutamatergic innervation of the striatum. The dopamine reward signal that presumably is involved in the learning of this task might contribute to long-term potentiation at the neocortical synapse onto the direct pathway striatal projection neurons through the activation of D1 receptors. That would then strengthen the go pathway, whereas the glutamatergic input onto the indirect striatal projection neurons might be decreased in efficacy through long-term depression driven by the D2 receptors and the depression of this pathway here might then prevent the no-go signal. So altogether what we see is that this pathway could be involved in the reward-based learning of this very simple goal-directed sensorimotor loop downstream of the whisker stimulus. In this video, we've discussed one specific hypothesis that attempts to provide a mechanistic neuronal circuit and synaptic plasticity mechanisms underlying reward-based learning. The basic concept is simple: reward-based learning should reinforce the synaptic circuits driving goal-directed sensorimotor transformations. That way, actions leading to rewards should be carried out in the appropriate context. Mechanistically, the transient action potential firing of dopaminergic neurons in the midbrain appears to signal reward, or cues that predict reward. In fact, more formally it's thought to be reward prediction errors. This could form the signal driving synaptic plasticity at corticostriatal synapses, which might participate importantly in reward-based learning. These same circuits that are so useful for learning goal-directed behavior also go wrong in some brain diseases. We'll discuss that in the next video.