paper: Mastering the Game of Go with Deep Neural Networks and Tree Search by David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van deniessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot,nder Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, Demis Hassabis

summary: they create a convolutional neural network with 13 layers to select moves (given a game position, it outputs a probability distribution over all legal moves, trying to assign higher probabilities to better moves). They train the network on databases of expert matches, save a copy of the trained network as ‘SL’ then train it further by playing it against randomly selected previous iterations of itself. Then they use the history of the move-selecting network playing against itself to generate a new training set consisting of 30 million game positions and the outcome of that game, with each of the 30 million positions coming from a separate game. They use this training set to train a new convolutional neural network (with 13 layers again, i think) to appraise the value of a board position (given a board position, it outputs a single scalar that attempt to predict the game outcome of that board position).

They also train ANOTHER move-predicting classifier called the ‘fast rollout’ policy; the reason for another one is that the fast rollout policy is supposed to be very fast to run, unlike the neural nets. The fast rollout policy is a linear softmax of small pattern features (move matches one or more response features, Move saves stone(s) from capture, Move is 8-connected to previous move, Move matches nakade patterns at captured stone, Move matches 12-point diamond pattern near previous move, move matches 3×3 pattern around candidate move). When a feature is “move matches some pattern”, i don’t understand if they mean that “match any pattern” is the feature, or if each possible pattern is its own feature; i suspect the latter, even though that’s a zillion features to compute. The feature weights of the fast rollout classifier are trained on a database of expert games.

Now they will use three of those classifiers, the ‘SL’ neural network (the saved network that tried to learn which move an expert would have made, before further training against itself), and the board-position-value-predicting network, plus the ‘rollout’ policy.

The next part, the Monte Carlo Tree Search combined with the neural networks, is kinda complicated and i don’t fully understand it, so the following is likely to be wrong. The idea of Monte Carlo Tree Search is to estimate the value of a board position by simulating all or part of game in which both players in the simulation are running as their policy a classifier without lookahead (eg within the simulation, neither player does any lookahead at each step); this simulation is (eventually) done many times and the results are averaged together. This simulation is called ‘rollout’. Each time the Monte Carlo simulation is done, the policy is updated. In this application, the ‘fast rollout’ is used in some parts of this simulation, but not all, because the ‘fast rollout’ policy is fixed after training, but the idea of Monte Carlo Tree Search is that the policy used is updated after each simulation.

In order to take one turn in the real game, the program does zillions of iterations; in each iteration, it simulates a game-within-a-game:

It simulates a game where the players use the current policy, which is represented as a tree of game states whose root is the current actual game state, whose edges are potential moves, and whose nodes or edges are labeled with the current policy’s estimated values for game states (plus a factor encouraging exploration of unexplored or underexplored board states).

When the simulation has visited the parent of a ‘leaf node’ (a game state which has not yet been analyzed but which is a child of a node which is not a leaf node) more than some threshold, the leaf node is added to a queue for an asynchronous process to ‘expand the leaf node’ (analyze it) (the visit-count-before-expansion threshold is adaptively adjusted to keep the queue short). This process estimates the value of the leaf node via a linear combination of (a) the board-position-value-predicting network’s output and (b) the outcome of running a simulation of the rest of the game (a game within a game within a game) with both players using the ‘fast rollout’ policy. Then, the SL neural network is used to give initial estimates of the value of each move from that board position (because you only have to run SL once to get an estimate for all possible moves from that board position, whereas it would take a long time to recurse into each of the many possible successor board positions and run the board-position-value-predicting network for each of these).

Because the expansion of a leaf node (including running SL) is asynchronous, in the mean time (until the node reaches the front of the analysis queue and is analyzed) the leaf node is provisionally expanded and a ‘tree policy’ is used to give a quick estimate of the value of each possible move from the leaf node board state. The tree policy is like the quick rollout policy but with a few more features (move allows stones to be captured, manhattan distance to two previous moves, Move matches 12-point diamond pattern centered around candidate move). The tree policy’s estimate will be replaced when the node reaches the front of the queue and is fully analyzed.

At the end of each iteration, the action values of all (non-leaf) nodes visited are updated, and a ‘visit count’ for each of these nodes is updated.

At the end of all of these iterations, the program actually plays the move that had the maximium visit count in the monte carlo tree search (“this is less sensitive to outliers than maximizing action-value”).

some more details:

- During monte carlo tree search, they also use a heuristic called ‘last good reply’ which is sorta similar to caching.
- the move-predicting networks are for the most part just fed the board state as input, but they also get a computed feature “the outcome of a ladder search”
- because Go is symmetric w/r/t rotations of the board, the move-predicting networks are wrapped by a procedure that either randomly selects a rotation, or runs them for all rotations and averages the results (depending on whether or not the network is being used for monte carlo tree search or not)

Links:

]]>Travis J. A. Craddock, Jack A. Tuszynski, Stuart Hameroff. Cytoskeletal Signaling: Is Memory Encoded in Microtubule Lattices by CaMKII Phosphorylation? PLoS Computational Biology, 2012; 8 (3): e1002421 DOI: 10.1371/journal.pcbi.1002421.

]]>From sparsely sampled data, we can attempt to estimate some of topological structure of the data.

Toplogical structure is here represented by Betti numbers. The paper explains this best:

Consider a world where objects are made of elastic rubber. Two objects are considered equivalent if they can be deformed into each other without tearing the material. If such a transformation between X and Y exists, we say they are topologically equivalent……it is evident that a possible reason for two objects not to be equivalent is that they differ in the number of holes. Thus, simply counting holes can provide a signature for the object at hand. Holes can exist in different dimensions. A one-dimensional hole is exposed when a one-dimensional loop (a closed curve) on the object cannot be deformed into a single point without tearing the loop. If two such loops can be deformed into one another they define the same hole, which should be counted only once. Analogous definitions can be invoked in higher dimensions. For example, a two-dimensional hole is revealed when a closed two-dimensional oriented surface on the object cannot be deformed into a single point.

This notion of counting holes of different dimensions is formalized by the definition of Betti numbers. The Betti numbers of an object X can be arranged in a sequence, b ( X )=( b 0 , b 1 , b 2 , I ), where b 0 represents the number of connected components, b 1 represents the number of one- dimensional holes, b 2 the number of two-dimensional holes, and so forth. An important property of Betti sequences is that if two objects are topologically equiv- alent (they can be deformed into each other) they share the same Betti sequence. One must note, as we will shortly illustrate, that the reverse is not always true: two objects can be different but have the same Betti sequence.

A technique is presented for estimating the Betti numbers of sampled data using “Rips complexes” and “barcodes”. To put this technique to use on neural data, the spiking of 5 cells (mostly “complex cells in the superficial layers”) with high spontaineous rate in V1 in Macaques were recorded from. The spikes were binned and a point cloud in 5D was constructed (so i think the coordinates of the point cloud representing the spike rate in each of the 5 dimensions).

This was done in two experimental conditions, when a stimulus was being presented, and when the eyes were occluded. In both cases, the topological structure varied between a circle and a sphere, although the circle structure was found with higher probability in the stimulus condition. The authors present a model of circular structure generated “if cortical activity is dominated by neuronal responses to stimulus orientation”, and a model of toroidal structure generated “A toroidal representation may arise from a neuronal population responding to two circular variables, such as orientation and color hue”. Note that a torus wasn’t actually observed in the data; a circle and a sphere was. In the conclusions the authors speculate what could have caused the sphere.

The authors conclude that the topology of spiking patterns for “both the data for spontaneous and driven conditions have similar topological structures, with the signatures of the circle and the sphere dominating the results”.

]]>NYT article: http://www.nytimes.com/2013/04/02/science/obama-to-unveil-initiative-to-map-the-human-brain.html

http://www.whitehouse.gov/sites/default/modules/wh_multimedia/EOP_OVP_player.swf

]]>Meanwhile, another rat was implanted with 6 stimulating electrodes in the same area as the first rat. It was trained to choose between the same two alternatives based on a stimulation pattern conveyed via the electrodes.

Then the signals recorded from the first rat’s brain were processed ald sent into the second rat’s brain. Both rats were trained together and both were rewarded when both made the right choice. The second rat learned to make the same choice as the first rat 60% of the time.

Miguel Pais-Vieira, Mikhail Lebedev, Carolina Kunicki, Jing Wang, Miguel A. L. Nicolelis. A Brain-to-Brain Interface for Real-Time Sharing of Sensorimotor Information. Scientific Reports 3, Article number: 1319. Received 20 December 2012.

http://www.reuters.com/article/2013/02/28/us-science-brain-mindmeld-idUSBRE91R0U620130228

Excerpt from the abstract: “We genetically labeled and manipulated MrgprA3+ neurons in the dorsal root ganglion (DRG) and found that they exclusively innervated the epidermis of the skin and responded to multiple pruritogens. Ablation of MrgprA3+ neurons led to substantial reductions in scratching evoked by multiple pruritogens and occurring spontaneously under chronic itch conditions, whereas pain sensitivity remained intact.”

]]>

Inbal Goshen, Matthew Brodsky, Rohit Prakash, Jenelle Wallace, Viviana Gradinaru, Charu Ramakrishnan, Karl Deisseroth. Dynamics of Retrieval Strategies for Remote Memories. Cell, Volume 147, Issue 3, 28 October 2011, Pages 678-689.

Prevailing theory suggests that long-term memories are encoded via a two-phase process requiring early involvement of the hippocampus followed by the neocortex. Contextual fear memories in rodents rely on the hippocampus immediately following training but are unaffected by hippocampal lesions or pharmacological inhibition weeks later. With fast optogenetic methods, we examine the real-time contribution of hippocampal CA1 excitatory neurons to remote memory and find that contextual fear memory recall, even weeks after training, can be reversibly abolished by temporally precise optogenetic inhibition of CA1. When this inhibition is extended to match the typical time course of pharmacological inhibition, remote hippocampus dependence converts to hippocampus independence, suggesting that long-term memory retrieval normally depends on the hippocampus but can adaptively shift to alternate structures. Further revealing the plasticity of mechanisms required for memory recall, we confirm the remote-timescale importance of the anterior cingulate cortex (ACC) and implicate CA1 in ACC recruitment for remote recall.

]]>Scientists use brain imaging to reveal the movies in our mind.

]]>