Google/Deepmind/Silver and Huang’s new Go-playing program

There’s some buzz around recent improvements in Go-playing programs. Google made a program that is pretty good against human opponents but it uses 170GPUs and 1200CPUs!

paper: Mastering the Game of Go with Deep Neural Networks and Tree Search by David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van deniessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot,nder Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, Demis Hassabis

summary: they create a convolutional neural network with 13 layers to select moves (given a game position, it outputs a probability distribution over all legal moves, trying to assign higher probabilities to better moves). They train the network on databases of expert matches, save a copy of the trained network as ‘SL’ then train it further by playing it against randomly selected previous iterations of itself. Then they use the history of the move-selecting network playing against itself to generate a new training set consisting of 30 million game positions and the outcome of that game, with each of the 30 million positions coming from a separate game. They use this training set to train a new convolutional neural network (with 13 layers again, i think) to appraise the value of a board position (given a board position, it outputs a single scalar that attempt to predict the game outcome of that board position).

They also train ANOTHER move-predicting classifier called the ‘fast rollout’ policy; the reason for another one is that the fast rollout policy is supposed to be very fast to run, unlike the neural nets. The fast rollout policy is a linear softmax of small pattern features (move matches one or more response features, Move saves stone(s) from capture, Move is 8-connected to previous move, Move matches nakade patterns at captured stone, Move matches 12-point diamond pattern near previous move, move matches 3×3 pattern around candidate move). When a feature is “move matches some pattern”, i don’t understand if they mean that “match any pattern” is the feature, or if each possible pattern is its own feature; i suspect the latter, even though that’s a zillion features to compute. The feature weights of the fast rollout classifier are trained on a database of expert games.

Now they will use three of those classifiers, the ‘SL’ neural network (the saved network that tried to learn which move an expert would have made, before further training against itself), and the board-position-value-predicting network, plus the ‘rollout’ policy.

The next part, the Monte Carlo Tree Search combined with the neural networks, is kinda complicated and i don’t fully understand it, so the following is likely to be wrong. The idea of Monte Carlo Tree Search is to estimate the value of a board position by simulating all or part of game in which both players in the simulation are running as their policy a classifier without lookahead (eg within the simulation, neither player does any lookahead at each step); this simulation is (eventually) done many times and the results are averaged together. This simulation is called ‘rollout’. Each time the Monte Carlo simulation is done, the policy is updated. In this application, the ‘fast rollout’ is used in some parts of this simulation, but not all, because the ‘fast rollout’ policy is fixed after training, but the idea of Monte Carlo Tree Search is that the policy used is updated after each simulation.

In order to take one turn in the real game, the program does zillions of iterations; in each iteration, it simulates a game-within-a-game:

It simulates a game where the players use the current policy, which is represented as a tree of game states whose root is the current actual game state, whose edges are potential moves, and whose nodes or edges are labeled with the current policy’s estimated values for game states (plus a factor encouraging exploration of unexplored or underexplored board states).

When the simulation has visited the parent of a ‘leaf node’ (a game state which has not yet been analyzed but which is a child of a node which is not a leaf node) more than some threshold, the leaf node is added to a queue for an asynchronous process to ‘expand the leaf node’ (analyze it) (the visit-count-before-expansion threshold is adaptively adjusted to keep the queue short). This process estimates the value of the leaf node via a linear combination of (a) the board-position-value-predicting network’s output and (b) the outcome of running a simulation of the rest of the game (a game within a game within a game) with both players using the ‘fast rollout’ policy. Then, the SL neural network is used to give initial estimates of the value of each move from that board position (because you only have to run SL once to get an estimate for all possible moves from that board position, whereas it would take a long time to recurse into each of the many possible successor board positions and run the board-position-value-predicting network for each of these).

Because the expansion of a leaf node (including running SL) is asynchronous, in the mean time (until the node reaches the front of the analysis queue and is analyzed) the leaf node is provisionally expanded and a ‘tree policy’ is used to give a quick estimate of the value of each possible move from the leaf node board state. The tree policy is like the quick rollout policy but with a few more features (move allows stones to be captured, manhattan distance to two previous moves, Move matches 12-point diamond pattern centered around candidate move). The tree policy’s estimate will be replaced when the node reaches the front of the queue and is fully analyzed.

At the end of each iteration, the action values of all (non-leaf) nodes visited are updated, and a ‘visit count’ for each of these nodes is updated.

At the end of all of these iterations, the program actually plays the move that had the maximium visit count in the monte carlo tree search (“this is less sensitive to outliers than maximizing action-value”).

some more details:

  • During monte carlo tree search, they also use a heuristic called ‘last good reply’ which is sorta similar to caching.
  • the move-predicting networks are for the most part just fed the board state as input, but they also get a computed feature “the outcome of a ladder search”
  • because Go is symmetric w/r/t rotations of the board, the move-predicting networks are wrapped by a procedure that either randomly selects a rotation, or runs them for all rotations and averages the results (depending on whether or not the network is being used for monte carlo tree search or not)



Emerald Therapeutics: they do your experiments for you

They bought a bunch of machines to automate common experimental techniques and wrote software allowing the machines to be remotely programmed over the web. They plan to charge on a per-experiment basis. They are soliciting beta testers for 2015.

Here’s the techniques that they can run.

Here’s (slightly) more detail.

Computing with microtubules (Craddock, Tuszynski, Hameroff 2012)

This paper hypothesizes that postsynaptic CaMKII (calcium/calmodulin-dependent protein kinase II) receives synaptic input and then interacts with via phosphorylation, suggesting that memories may be encoded in the microtubules in this way. They note that the size and shape of CaMKII appears to be just right to phosphorylate the hexagonal lattices of tubulin proteins in microtubules. The paper also can “demonstrate microtubule-associated protein logic gates, and show how patterns of phosphorylated tubulins in microtubules can control neuronal functions by triggering axonal firings, regulating synapses, and traversing scale.”. Via ScienceDaily.

Travis J. A. Craddock, Jack A. Tuszynski, Stuart Hameroff. Cytoskeletal Signaling: Is Memory Encoded in Microtubule Lattices by CaMKII Phosphorylation? PLoS Computational Biology, 2012; 8 (3): e1002421 DOI: 10.1371/journal.pcbi.1002421.

Topological analysis of population activity in visual cortex

Singh, G., Memoli, F., Ishkhanov, T., Sapiro, G., Carlsson, G., & Ringach, D. L. (2008). Topological analysis of population activity in visual cortex. Journal of Vision, 8(8):11, 1–18,, doi:10.1167/8.8.11

From sparsely sampled data, we can attempt to estimate some of topological structure of the data.

Toplogical structure is here represented by Betti numbers. The paper explains this best:

Consider a world where objects are made of elastic rubber. Two objects are considered equivalent if they can be deformed into each other without tearing the material. If such a transformation between X and Y exists, we say they are topologically equivalent……it is evident that a possible reason for two objects not to be equivalent is that they differ in the number of holes. Thus, simply counting holes can provide a signature for the object at hand. Holes can exist in different dimensions. A one-dimensional hole is exposed when a one-dimensional loop (a closed curve) on the object cannot be deformed into a single point without tearing the loop. If two such loops can be deformed into one another they define the same hole, which should be counted only once. Analogous definitions can be invoked in higher dimensions. For example, a two-dimensional hole is revealed when a closed two-dimensional oriented surface on the object cannot be deformed into a single point.

This notion of counting holes of different dimensions is formalized by the definition of Betti numbers. The Betti numbers of an object X can be arranged in a sequence, b ( X )=( b 0 , b 1 , b 2 , I ), where b 0 represents the number of connected components, b 1 represents the number of one- dimensional holes, b 2 the number of two-dimensional holes, and so forth. An important property of Betti sequences is that if two objects are topologically equiv- alent (they can be deformed into each other) they share the same Betti sequence. One must note, as we will shortly illustrate, that the reverse is not always true: two objects can be different but have the same Betti sequence.

A technique is presented for estimating the Betti numbers of sampled data using “Rips complexes” and “barcodes”. To put this technique to use on neural data, the spiking of 5 cells (mostly “complex cells in the superficial layers”) with high spontaineous rate in V1 in Macaques were recorded from. The spikes were binned and a point cloud in 5D was constructed (so i think the coordinates of the point cloud representing the spike rate in each of the 5 dimensions).

This was done in two experimental conditions, when a stimulus was being presented, and when the eyes were occluded. In both cases, the topological structure varied between a circle and a sphere, although the circle structure was found with higher probability in the stimulus condition. The authors present a model of circular structure generated “if cortical activity is dominated by neuronal responses to stimulus orientation”, and a model of toroidal structure generated “A toroidal representation may arise from a neuronal population responding to two circular variables, such as orientation and color hue”. Note that a torus wasn’t actually observed in the data; a circle and a sphere was. In the conclusions the authors speculate what could have caused the sphere.

The authors conclude that the topology of spiking patterns for “both the data for spontaneous and driven conditions have similar topological structures, with the signatures of the circle and the sphere dominating the results”.

Technique named 'clarity' makes chunks of dead brain transparent, allowing fluorescent labeling

This technique takes a dead brain and permeates it with a transparent hydrogel matrix to keep proteins and nucleic acids in place. Then it removes the lipids. I guess the lipids are all that makes the brain opaque. At this point the brain is transparent but maintains its original structure so you can still label the proteins and nucleic acids.

Neuroscience as a new national priority

President Obama: “Now, it’s time to get to work.”

NYT article:

Limited mediated telepathy

A rat was implanted with a 32-unit microelectrode cortical array in either M1 or S1. The rat was then trained to choose between two alternatives based on external stimuli.

Meanwhile, another rat was implanted with 6 stimulating electrodes in the same area as the first rat. It was trained to choose between the same two alternatives based on a stimulation pattern conveyed via the electrodes.

Then the signals recorded from the first rat’s brain were processed ald sent into the second rat’s brain. Both rats were trained together and both were rewarded when both made the right choice. The second rat learned to make the same choice as the first rat 60% of the time.

Miguel Pais-Vieira, Mikhail Lebedev, Carolina Kunicki, Jing Wang, Miguel A. L. Nicolelis. A Brain-to-Brain Interface for Real-Time Sharing of Sensorimotor Information. Scientific Reports 3, Article number: 1319. Received 20 December 2012.