Discovering Reinforcement Learning Algorithms

Reinforcement learning (RL) algorithms update an agent’s parameters according to one of several possible rules, discovered manually through years of research. Automating the discovery of update rules from data could lead to more efficient algorithms, or algorithms that are better adapted to specific environments. Although there have been prior attempts at addressing this significant scientific challenge, it remains an open question whether it is feasible to discover alternatives to fundamental concepts of RL such as value functions and temporal-difference learning. This paper introduces a new meta-learning approach that discovers an entire update rule which includes both ‘what to predict’ (e.g. value functions) and ‘how to learn from it’ (e.g. bootstrapping) by interacting with a set of environments. The output of this method is an RL algorithm that we call Learned Policy Gradient (LPG). Empirical results show that our method discovers its own alternative to the concept of value functions. Furthermore it discovers a bootstrapping mechanism to maintain and use its predictions. Surprisingly, when trained solely on toy environments, LPG generalises effectively to complex Atari games and achieves non-trivial performance.This shows the potential to discover general RL algorithms from data. Read More

#reinforcement-learning

What can I do here? A Theory of Affordances in Reinforcement Learning

Reinforcement learning algorithms usually assume that all actions are always available to anagent. However, both people and animals un-derstand the general link between the features of their environment and the actions that are feasible.Gibson (1977) coined the term “affordances” to describe the fact that certain states enable an agent to do certain actions, in the context of embodied agents. In this paper, we develop a theory of affordances for agents who learn and plan in Markov Decision Processes. Affordances play a dual role in this case. On one hand, they allow faster planning, by reducing the number of actions available in any given situation. On the other hand, they facilitate more efficient and precise learning of transition models from data, especially when such models require function approximation. We establish these properties through theoretical results as well as illustrative examples. We also propose an approach to learn affordances and use it to estimate transition models that are simpler and generalize better. Read More

#human, #reinforcement-learning

A Deep Dive into Reinforcement Learning

Let’s take a deep dive into reinforcement learning. In this article, we will tackle a concrete problem with modern libraries such as TensorFlow, TensorBoard, Keras, and OpenAI gym. You will see how to implement one of the fundamental algorithms called deep Q-learning to learn its inner workings. Regarding the hardware, the whole code will work on a typical PC and use all found CPU cores (this is handled out of the box by TensorFlow). Read More

#python, #reinforcement-learning

A deep reinforcement learning framework to identify key players in complex networks

Network science is an academic field that aims to unveil the structure and dynamics behind networks, such as telecommunication, computer, biological and social networks. One of the fundamental problems that network scientists have been trying to solve in recent years entails identifying an optimal set of nodes that most influence a network’s functionality, referred to as key players.

Identifying key players could greatly benefit many real-world applications, for instance, enhancing techniques for the immunization of networks, as well as aiding epidemic control, drug design and viral marketing. Due to its NP-hard nature, however, solving this problem using exact algorithms with polynomial time complexity has proved highly challenging.

Researchers at National University of Defense Technology in China, University of California, Los Angeles (UCLA), and Harvard Medical School (HMS) have recently developed a deep reinforcement learning (DRL) framework, dubbed FINDER, that could identify key players in complex networks more efficiently. Read More

#reinforcement-learning

Building AI Trading Systems

About two years ago I wrote a little piece about applying Reinforcement Learning to the markets. A few people asked me what became of it. So this post covers some high-level things I’ve learned. It’s more of a rant than an organized post, really. If there is enough interest in this topic I’d be happy to go into more technical detail in future posts, but that’s TBD. Please let me know in the comments or on Twitter.

Over the past few years I’ve built four and a half trading systems. The first one was crap. The second one I never finished because I realized early on that it could never work either. The third one was abandoned for personal and political reasons. The fourth one worked extremely well for 12-18 months, producing something on the order of a full-time salary with a tiny investment of a few thousands dollars. Then, profits started decreasing and I decided to move on to other things. I lacked the motivation to build yet another system. Some of the systems I worked on were for the financial markets, but the last one was applied to the crypto markets. So keep that in mind while reading. Read More

#reinforcement-learning

#investing

David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning

Read More

#reinforcement-learning, #videos

Massively Scaling Reinforcement Learning with SEED RL

Reinforcement learning (RL) has seen impressive advances over the last few years as demonstrated by the recent success in solving games such as Go and Dota 2. Models, or agents, learn by exploring an environment, such as a game, while optimizing for specified goals. However, current RL techniques require increasingly large amounts of training to successfully learn even simple games, which makes iterating research and product ideas computationally expensive and time consuming. Read More

Code

#python, #reinforcement-learning

An algorithm that learns through rewards may show how our brain does too

In 1951, Marvin Minsky, then a student at Harvard, borrowed observations from animal behavior to try to design an intelligent machine. Drawing on the work of physiologist Ivan Pavlov, who famously used dogs to show how animals learn through punishments and rewards, Minsky created a computer that could continuously learn through similar reinforcement to solve a virtual maze.

At the time, neuroscientists had yet to figure out the mechanisms within the brain that allow animals to learn in this way. But Minsky was still able to loosely mimic the behavior, thereby advancing artificial intelligence. Several decades later, as reinforcement learning continued to mature, it in turn helped the field of neuroscience discover those mechanisms, feeding into a virtuous cycle of advancement between the two fields. Read More

#reinforcement-learning

Reimagining Reinforcement Learning – Upside Down

For all the hype around winning game play and self-driving cars, traditional Reinforcement Learning (RL) has yet to deliver as a reliable tool for ML applications.  Here we explore the main drawbacks as well as an innovative approach to RL that dramatically reduces the training compute requirement and time to train. Read More

#reinforcement-learning

Breakthrough Research In Reinforcement Learning From 2019

Reinforcement learning (RL) continues to be less valuable for business applications than supervised learning, and even unsupervised learning. It is successfully applied only in areas where huge amounts of simulated data can be generated, like robotics and games.

However, many experts recognize RL as a promising path towards Artificial General Intelligence (AGI), or true intelligence. Thus, research teams from top institutions and tech leaders are seeking ways to make RL algorithms more sample-efficient and stable.

We’ve selected and summarized 10 research papers that we think are representative of the latest research trends in reinforcement learning. Read More

#reinforcement-learning