Reinforcement learning in complex natural environments is a challenging task because the agent should generalize from the outcomes of actions taken in one state of the world to future actions in different states of the world. The extent to which human experts find the proper level of generalization is unclear. Here we show, using the sequences of field goal attempts made by professional basketball players, that the outcome of even a single field goal attempt has a considerable effect on the rate of subsequent 3 point shot attempts, in line with standard models of reinforcement learning. However, this change in behaviour is associated with negative correlations between the outcomes of successive field goal attempts. These results indicate that despite years of experience and high motivation, professional players overgeneralize from the outcomes of their most recent actions, which leads to decreased performance.
What fundamental properties of synaptic connectivity in the neocortex stem from the ongoing dynamics of synaptic changes? In this study, we seek to find the rules shaping the stationary distribution of synaptic efficacies in the cortex. To address this question, we combined chronic imaging of hundreds of spines in the auditory cortex of mice in vivo over weeks with modeling techniques to quantitatively study the dynamics of spines, the morphological correlates of excitatory synapses in the neocortex. We found that the stationary distribution of spine sizes of individual neurons can be exceptionally well described by a log-normal function. We furthermore show that spines exhibit substantial volatility in their sizes at timescales that range from days to months. Interestingly, the magnitude of changes in spine sizes is proportional to the size of the spine. Such multiplicative dynamics are in contrast with conventional models of synaptic plasticity, learning, and memory, which typically assume additive dynamics. Moreover, we show that the ongoing dynamics of spine sizes can be captured by a simple phenomenological model that operates at two timescales of days and months. This model converges to a log-normal distribution, bridging the gap between synaptic dynamics and the stationary distribution of synaptic efficacies.
Delayed comparison tasks are widely used in the study of working memory and perception in psychology and neuroscience. It has long been known, however, that decisions in these tasks are biased. When the two stimuli in a delayed comparison trial are small in magnitude, subjects tend to report that the first stimulus is larger than the second stimulus. In contrast, subjects tend to report that the second stimulus is larger than the first when the stimuli are relatively large. Here we study the computational principles underlying this bias, also known as the contraction bias. We propose that the contraction bias results from a Bayesian computation in which a noisy representation of a magnitude is combined with a-priori information about the distribution of magnitudes to optimize performance. We test our hypothesis on choice behavior in a visual delayed comparison experiment by studying the effect of (i) changing the prior distribution and (ii) changing the uncertainty in the memorized stimulus. We show that choice behavior in both manipulations is consistent with the performance of an observer who uses a Bayesian inference in order to improve performance. Moreover, our results suggest that the contraction bias arises during memory retrieval/decision making and not during memory encoding. These results support the notion that the contraction bias illusion can be understood as resulting from optimality considerations.
To what extent are the properties of neuronal networks constrained by computational considerations? Comparative analysis of the vertical lobe (VL) system, a brain structure involved in learning and memory, in two phylogenetically close cephalopod mollusks, Octopus vulgaris and the cuttlefish Sepia officinalis, provides a surprising answer to this question.
RESULTS:
We show that in both the octopus and the cuttlefish the VL is characterized by the same simple fan-out fan-in connectivity architecture, composed of the same three neuron types. Yet, the sites of short- and long-term synaptic plasticity and neuromodulation are different. In the octopus, synaptic plasticity occurs at the fan-out glutamatergic synaptic layer, whereas in the cuttlefish plasticity is found at the fan-in cholinergic synaptic layer.
CONCLUSIONS:
Does this dramatic difference in physiology imply a difference in function? Not necessarily. We show that the physiological properties of the VL neurons, particularly the linear input-output relations of the intermediate layer neurons, allow the two different networks to perform the same computation. The convergence of different networks to the same computational capacity indicates that it is the computation, not the specific properties of the network, that is self-organized or selected for by evolutionary pressure.
According to the theory of Melioration, organisms in repeated choice settings shift their choice preference in favor of the alternative that provides the highest return. The goal of this paper is to explain how this learning behavior can emerge from microscopic changes in the efficacies of synapses, in the context of a two-alternative repeated-choice experiment. I consider a large family of synaptic plasticity rules in which changes in synaptic efficacies are driven by the covariance between reward and neural activity. I construct a general framework that predicts the learning dynamics of any decision-making neural network that implements this synaptic plasticity rule and show that melioration naturally emerges in such networks. Moreover, the resultant learning dynamics follows the Replicator equation which is commonly used to phenomenologically describe changes in behavior in operant conditioning experiments. Several examples demonstrate how the learning rate of the network is affected by its properties and by the specifics of the plasticity rule. These results help bridge the gap between cellular physiology and learning behavior.
Over the past several decades, economists, psychologists, and neuroscientists have conducted experiments in which a subject, human or animal, repeatedly chooses between alternative actions and is rewarded based on choice history. While individual choices are unpredictable, aggregate behavior typically follows Herrnstein's matching law: the average reward per choice is equal for all chosen alternatives. In general, matching behavior does not maximize the overall reward delivered to the subject, and therefore matching appears inconsistent with the principle of utility maximization. Here we show that matching can be made consistent with maximization by regarding the choices of a single subject as being made by a sequence of multiple selves-one for each instant of time. If each self is blind to the state of the world and discounts future rewards completely, then the resulting game has at least one Nash equilibrium that satisfies both Herrnstein's matching law and the unpredictability of individual choices. This equilibrium is, in general, Pareto suboptimal, and can be understood as a mutual defection of the multiple selves in an intertemporal prisoner's dilemma. The mathematical assumptions about the multiple selves should not be interpreted literally as psychological assumptions. Human and animals do remember past choices and care about future rewards. However, they may be unable to comprehend or take into account the relationship between past and future. This can be made more explicit when a mechanism that converges on the equilibrium, such as reinforcement learning, is considered. Using specific examples, we show that there exist behaviors that satisfy the matching law but are not Nash equilibria. We expect that these behaviors will not be observed experimentally in animals and humans. If this is the case, the Nash equilibrium formulation can be regarded as a refinement of Herrnstein's matching law.
The ability to represent time is an essential component of cognition but its neural basis is unknown. Although extensively studied both behaviorally and electrophysiologically, a general theoretical framework describing the elementary neural mechanisms used by the brain to learn temporal representations is lacking. It is commonly believed that the underlying cellular mechanisms reside in high order cortical regions but recent studies show sustained neural activity in primary sensory cortices that can represent the timing of expected reward. Here, we show that local cortical networks can learn temporal representations through a simple framework predicated on reward dependent expression of synaptic plasticity. We assert that temporal representations are stored in the lateral synaptic connections between neurons and demonstrate that reward-modulated plasticity is sufficient to learn these representations. We implement our model numerically to explain reward-time learning in the primary visual cortex (V1), demonstrate experimental support, and suggest additional experimentally verifiable predictions.
It is widely believed that learning is due, at least in part, to long-lasting modifications of the strengths of synapses in the brain. Theoretical studies have shown that a family of synaptic plasticity rules, in which synaptic changes are driven by covariance, is particularly useful for many forms of learning, including associative memory, gradient estimation, and operant conditioning. Covariance-based plasticity is inherently sensitive. Even a slight mistuning of the parameters of a covariance-based plasticity rule is likely to result in substantial changes in synaptic efficacies. Therefore, the biological relevance of covariance-based plasticity models is questionable. Here, we study the effects of mistuning parameters of the plasticity rule in a decision making model in which synaptic plasticity is driven by the covariance of reward and neural activity. An exact covariance plasticity rule yields Herrnstein's matching law. We show that although the effect of slight mistuning of the plasticity rule on the synaptic efficacies is large, the behavioral effect is small. Thus, matching behavior is robust to mistuning of the parameters of the covariance-based plasticity rule. Furthermore, the mistuned covariance rule results in undermatching, which is consistent with experimentally observed behavior. These results substantiate the hypothesis that approximate covariance-based synaptic plasticity underlies operant conditioning. However, we show that the mistuning of the mean subtraction makes behavior sensitive to the mistuning of the properties of the decision making network. Thus, there is a tradeoff between the robustness of matching behavior to changes in the plasticity rule and its robustness to changes in the properties of the decision making network.
The probability of choosing an alternative in a long sequence of repeated choices is proportional to the total reward derived from that alternative, a phenomenon known as Herrnstein's matching law. This behavior is remarkably conserved across species and experimental conditions, but its underlying neural mechanisms still are unknown. Here, we propose a neural explanation of this empirical law of behavior. We hypothesize that there are forms of synaptic plasticity driven by the covariance between reward and neural activity and prove mathematically that matching is a generic outcome of such plasticity. Two hypothetical types of synaptic plasticity, embedded in decision-making neural network models, are shown to yield matching behavior in numerical simulations, in accord with our general theorem. We show how this class of models can be tested experimentally by making reward not only contingent on the choices of the subject but also directly contingent on fluctuations in neural activity. Maximization is shown to be a generic outcome of synaptic plasticity driven by the sum of the covariances between reward and all past neural activities.
A persistent change in neuronal activity after brief stimuli is a common feature of many neuronal microcircuits. This persistent activity can be sustained by ongoing reverberant network activity or by the intrinsic biophysical properties of individual cells. Here we demonstrate that rat and guinea pig cerebellar Purkinje cells in vivo show bistability of membrane potential and spike output on the time scale of seconds. The transition between membrane potential states can be bidirectionally triggered by the same brief current pulses. We also show that sensory activation of the climbing fiber input can switch Purkinje cells between the two states. The intrinsic nature of Purkinje cell bistability and its control by sensory input can be explained by a simple biophysical model. Purkinje cell bistability may have a key role in the short-term processing and storage of sensory information in the cerebellar cortex.
Many neurons in the brain remain active even when an animal is at rest. Over the past few decades, it has become clear that, in some neurons, this activity can persist even when synaptic transmission is blocked and is thus endogenously generated. This “spontaneous” firing, originally described in invertebrate preparations (Alving, 1968; Getting, 1989), arises from specific combinations of intrinsic membrane currents expressed by spontaneously active neurons (Llinas, 1988). Recent work has confirmed that, far from being a biophysical curiosity, spontaneous firing plays a central role in transforming synaptic input into spike output and encoding plasticity in a wide variety of neural circuits. This mini-symposium highlights several key recent advances in our understanding of the origin and significance of spontaneous firing in the mammalian brain.
The calculation and memory of position variables by temporal integration of velocity signals is essential for posture, the vestibulo-ocular reflex {(VOR)} and navigation. Integrator neurons exhibit persistent firing at multiple rates, which represent the values of memorized position variables. A widespread hypothesis is that temporal integration is the outcome of reverberating feedback loops within recurrent networks, but this hypothesis has not been proven experimentally. Here we present a single-cell model of a neural integrator. The nonlinear dynamics of calcium gives rise to propagating calcium wave-fronts along dendritic processes. The wave-front velocity is modulated by synaptic inputs such that the front location covaries with the temporal sum of its previous inputs. Calcium-dependent currents convert this information into concomitant persistent firing. Calcium dynamics in single neurons could thus be the physiological basis of the graded persistent activity and temporal integration observed in neurons during analog memory tasks.
The cerebellar cortex contains the majority of the neurons in the central nervous system, which are well organized in a lattice-like structure. Despite this apparent simplicity, the function of the olivo-cerebellar system is still largely unknown. In this thesis I have tried to contribute to the understanding of the system by studying three aspects of the dynamics of neurons and their relation to the function (see bellow). Although these questions have emerged from the study of the olivo-cerebllar system, the results are more general, and relate to neuronal dynamics and the function of other brain structures.
In many biological systems, the electrical coupling of nonoscillating cells generates synchronized membrane potential oscillations. This work describes a dynamical mechanism in which the electrical coupling of identical nonoscillating cells destabilizes the homogeneous fixed point and leads to network oscillations via a Hopf bifurcation. Each cell is described by a passive membrane potential and additional internal variables. The dynamics of the internal variables, in isolation, is oscillatory, but their interaction with the membrane potential damps the oscillations and therefore constructs nonoscillatory cells. The electrical coupling reveals the oscillatory nature of the internal variables and generates network oscillations. This mechanism is analyzed near the bifurcation point, where the spatial structure of the membrane potential oscillations is determined by the network architecture and in the limit of strong coupling, where the membrane potentials of all cells oscillate in-phase and multiple cluster states dominate the dynamics. In particular, we have derived an asymptotic behavior for the spatial fluctuations in the limit of strong coupling in fully connected networks and in a one-dimensional lattice architecture.
Tremor is a potentially disabling pathology that affects millions of people. The inferior olive (IO) has been implicated in several types of tremor.1,2 In particular, electrical synapses have been shown to be essential for the generation of oscillatory activity in the IO,3 which may manifest as tremor. In a recent paper,4 we described how the electrical coupling of non-oscillating cells can generate oscillatory network behavior. Here we apply this dynamic mechanism to the IO and discuss the possible clinical applications...
In several biological systems, the electrical coupling of nonoscillating cells generates synchronized membrane potential oscillations. Because the isolated cell is nonoscillating and electrical coupling tends to equalize the membrane potentials of the coupled cells, the mechanism underlying these oscillations is unclear. Here we present a dynamic mechanism by which the electrical coupling of identical nonoscillating cells can generate synchronous membrane potential oscillations. We demonstrate this mechanism by constructing a biologically feasible model of electrically coupled cells, characterized by an excitable membrane and calcium dynamics. We show that strong electrical coupling in this network generates multiple oscillatory states with different spatio-temporal patterns and discuss their possible role in the cooperative computations performed by the system.
The Hebrew University websites utilize cookies to enhance user experience and analyze site usage. By continuing to browse these sites, you consent to our use of cookies.