IceInSpace - View Single Post - Neural networked AlphaGo Zero teaches itself to become a Go master in three days

gary · #1 22-10-2017, 04:23 PM

In a 18 Oct 2017 paper in Nature, Silver et. al. from Google's DeepMind Technologies Limited describe
AlphaGo Zero, their latest evolution of AlphaGo, which was the
first program ever to defeat a world champion at the ancient Chinese
game of Go.

Both programs use neural networks. However, whereas AlphaGo was
trained by learning from human expert moves which was then reinforced
by learning through self-play, in the case of the AlphaGo Zero, beyond
being taught the rules of the game, it taught itself to become an expert
player entirely through self-play.

In fact, it went from a beginner to a grand master, without any human help,
in three days.

Quote:

Originally Posted by Silver et. Al, Google DeepMind, in Nature

A long-standing goal of artificial intelligence is an algorithm that learns,
tabula rasa, superhuman proficiency in challenging domains.

Recently, AlphaGo became the first program to defeat a world champion in the game of Go.

The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were
trained by supervised learning from human expert moves, and by reinforcement learning from self-play.

Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game
rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also
the winner of AlphaGo’s games.

This neural network improves the strength of the tree search, resulting in higher quality
move selection and stronger self-play in the next iteration.

Starting tabula rasa, our new program AlphaGo Zero achieved
superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.

Quote:

Originally Posted by Silver et. Al, Google DeepMind, in Nature

AlphaGo Zero discovered a remarkable level of Go knowledge during its self-play training process.
This included not only fundamental elements of human Go knowledge, but also non-standard strategies
beyond the scope of traditional Go knowledge.

Nature paper "Mastering the game of Go without human knowledge" here :-
https://www.nature.com/articles/natu...wjxeTUgZAUMnRQ

Article and video at DeepMind here :-
https://deepmind.com/blog/alphago-ze...rning-scratch/