Google Teases Large Scale Reinforcement Learning Infrastructure

“The new infrastructure reduces the training time from eight hours down to merely one hour compared to a strong baseline.”

The current state-of-the-art reinforcement learning techniques require many iterations over many samples from the environment to learn a target task. For instance, the game Dota 2 learns from batches of 2 million frames every 2 seconds. The infrastructure that handles RL at this scale should be not only good at collecting a large number of samples, but also be able to quickly iterate over these extensive amounts of samples during training. Read More

#performance, #reinforcement-learning