AI's Latest and Greatest

Outrageous Artificial Intelligence: (Game 1) DeepMind’s AlphaZero crushes Stockfish Chess WC



1 minute per move, 100 game match, match score: 28 wins, 72 draws, AI Landmark game, Stockfish crushed, Bishop pair worth more than knight and 4 pawns

Research paper: “Mastering Chess and Shogi by Self-Play with a
General Reinforcement Learning Algorithm” :

David Silver,1∗ Thomas Hubert,1∗
Julian Schrittwieser,1∗
Ioannis Antonoglou,1 Matthew Lai,1 Arthur Guez,1 Marc Lanctot,1
Laurent Sifre,1 Dharshan Kumaran,1 Thore Graepel,1
Timothy Lillicrap,1 Karen Simonyan,1 Demis Hassabis1

https://arxiv.org/pdf/1712.01815.pdf

The game of chess is the most widely-studied domain in the history of artificial intelligence.
The strongest programs are based on a combination of sophisticated search techniques,
domain-specific adaptations, and handcrafted evaluation functions that have been
refined by human experts over several decades. In contrast, the AlphaGo Zero program
recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement
learning from games of self-play. In this paper, we generalise this approach into
a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in
many challenging domains. Starting from random play, and given no domain knowledge
except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in
the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a
world-champion program in each case ….

Read more at: https://arxiv.org/pdf/1712.01815.pdf

What is reinforcement learning?
https://en.wikipedia.org/wiki/Reinforcement_learning

“Reinforcement learning (RL) is an area of machine learning inspired by behaviourist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In the operations research and control literature, the field where reinforcement learning methods are studied is called approximate dynamic programming. The problem has been studied in the theory of optimal control, though most studies are concerned with the existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality.

In machine learning, the environment is typically formulated as a Markov decision process (MDP), as many reinforcement learning algorithms for this context utilize dynamic programming techniques.[1] The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible.

Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Instead the focus is on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).[2] The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.”

What is this company called Deepmind ?
https://en.wikipedia.org/wiki/DeepMind

DeepMind Technologies Limited is a British artificial intelligence company founded in September 2010.

Acquired by Google in 2014, the company has created a neural network that learns how to play video games in a fashion similar to that of humans,[4] as well as a Neural Turing machine,[5] or a neural network that may be able to access an external memory like a conventional Turing machine, resulting in a computer that mimics the short-term memory of the human brain.[6][7]

The company made headlines in 2016 in nature after its AlphaGo program beat a human professional Go player for the first time in October 2015.[8] and again when AlphaGo beat Lee Sedol the world champion in a five-game tournament, which was the subject of a documentary film.

♚Play at: http://www.chessworld.net/chessclubs/asplogin.asp?from=1053

►Kingscrusher chess resources: http://www.chessworld.net/chessclubs/learn_coaching_chessable.asp
►Kingscrusher’s “Crushing the King” video course with GM Igor Smirnov: http://chess-teacher.com/affiliates/idevaffiliate.php?id=1933&url=2396
►FREE online turn-style chess at http://www.chessworld.net/chessclubs/asplogin.asp?from=1053
http://goo.gl/7HJcDq
►Kingscrusher resources: http://www.chessworld.net/chessclubs/learn_coaching_chessable.asp
►Playlists: http://goo.gl/FxpqEH
►Follow me at Google+ : http://www.google.com/+kingscrusher ►Play and follow broadcasts at Chess24: https://chess24.com/premium?ref=kingscrusher

UCDUDDmslypVXYoUsZafHSUQ

source

Similar Posts

WP2Social Auto Publish Powered By : XYZScripts.com