Deep Mind wins against a Go world-champion

Source: The Verge, Mar 2016

It’s one of the great intellectual mind sports of the world,” says Toby Manning, treasurer of the British Go Association and referee of AlphaGo’s victory over European champion Fan Hui last year. “It’s got extremely simple rules, but these rules give rise to an awful lot of complexity.” Manning cites a classic quote from noted 20th-century chess and Go player Edward Lasker: “While the baroque rules of chess could only have been created by humans, the rules of Go are so elegant, organic, and rigorously logical that if intelligent life forms exist elsewhere in the universe, they almost certainly play Go.”

Because of Go’s deep intricacy, human players become experts through years of practice, honing their intuition and learning to recognize gameplay patterns. “The immediate appeal is that the rules are simple and easy to understand, but then the long-term appeal is that you can’t get tired of this game because there is such a depth,” says Korea Baduk Association secretary general Lee Ha-jin

Every Go player I’ve spoken to says the same thing about the game: its appeal lies in depth through simplicity. And that also gets to the heart of why it’s so difficult for computers to master. There’s limited data available just from looking at the board, and choosing a good move demands a great deal of intuition.

Go has no dominant heuristics. From the human’s point of view, the knowledge is pattern-based, complex, and hard to program. Until AlphaGo, no one had been able to build an effective evaluation function.”

DeepMind continually reinforces and improves the system’s ability by making it play millions of games against tweaked versions of itself. This trains a “policy” network to help AlphaGo predict the next moves, which in turn trains a “value” network to ascertain and evaluate those positions.

AlphaGo looks ahead at possible moves and permutations, going through various eventualities before selecting the one it deems most likely to succeed. The combined neural nets save AlphaGo from doing excess work: the policy network helps reduce the breadth of moves to search, while the value network saves it from having to internally play out the entirety of each match to come to a conclusion.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s