What's new: AlphaGo's initial iteration was trained on a database of human Go games whereas the newer AlphaGo Zero's artificial neural networks use the current state of the game as input. Through trial and error and feedback in the form of winning, the AI learned how to play.
It then used that same network to choose its next move whereas AlphaGo used a separate network. This reinforcement learning strategy, which was used extensively by AlphaGo as well, has its roots in psychology: the neural network learns from rewards like humans do.
The DeepMind researchers wrote: "the self-learned player performed much better overall, defeating the human-trained player within the first 24h of training. This suggests that AlphaGo Zero may be learning a strategy that is qualitatively different to human play."
How they did it: AlphaGo Zero uses less computing power than earlier versions but Google's immense computing power was still key. The sheer number of games the AI can play against itself is an advance, says Domingos, who is the author of a book called The Master Algorithm.
He points out though that the roughly 5 million training games of self-play it took for AlphaGo Zero to beat AlphaGo is "vastly more" than the number of games Sedol had played to become a champion.
Recent work suggests simpler forms of learning could achieve similar goals. A paper published earlier this year by OpenAI showed how a technique similar to hill-climbing — in which the AI basically starts with a solution then makes small tweaks to optimize it — can solve Atari games, albeit simpler than Go.
0 Response to "Mining giant Rio Tinto accused of fraud - Axios"
Post a Comment