Coagent Networks: Reinforcement Learning with Asynchronous Stochastic Neural Networks
Content
Speaker: James Kostas
Reinforcement learning (RL) is a machine learning paradigm that studies how to optimize an algorithm’s interaction with an environment over time, based on a reward signal. In recent years, RL research has generated many ground-breaking results for simulated domains where data is plentiful, such as simulated robotics problems or game-playing [3] [4] [2] [5] [9]. However, these methods all rely on deep learning (DL), that is, deep neural networks that are (typically) trained with backpropagation. Despite its successes, DL has significant limitations. For example, 1) it is brittle and prone to overfitting or divergence (particularly for RL), 2) it typically requires specialized computing hardware, 3) it is arguably biologically implausible [7], and 4) the network components must be differentiable and must be calculated in a specific order. As a result of its limitations, despite the successes in simulated environments where data is plentiful, deep-learning-based RL (DRL) has not resulted in many recent breakthroughs in real-world RL applications such as robotics, healthcare systems, recommender systems, and automated vehicles.
Inspired both by the success of DRL and by its limitations, we study a connectionist alternative based on stochastic neural networks (SNNs); SNNs are networks where the outputs of some units are not deterministic functions of the units’ inputs. The class of SNNs we study is called coagent networks [8]; coagent networks are SSNs designed for RL.
Coagent networks are less constrained or unconstrained by the DRL limitations above. For example, coagent networks 1) may be more biologically plausible than deep networks trained with backpropagation, 2) may be computed in a fully-distributed asynchronous fashion, and 3) can be constructed from types of components that cannot be used in DL. Additionally, since our approach generalizes the theoretical foundations that underlie a large class of DRL algorithms, coagent networks can even include DL networks as components of the larger coagent network; this fact may allow practitioners to leverage the advantages of both approaches simultaneously.
In this work, we show that coagent networks may be trained and run asynchronously (that is, the units do not execute simultaneously or at the same rate), and we give principled, theoretically-grounded learning rules for training these networks in this setting. We prove that asynchronous coagent networks generalize a popular class of RL algorithms known as option-critic algorithms [1], and show how the theory of coagent networks can simplify the creation and analysis of option-critic algorithms.
Finally, we build on prior work to extend the theory and increase the practical capabilities of coagent networks. We extend and generalize the theory of asynchronous coagent networks to address the case where parameters are shared within the network. We demonstrate that coagent networks are capable of scaling to a variety of complex high-dimensional domains, and we provide a variety of best practices and empirical studies for scaling these networks to solve these difficult problems, many of which have previously been solved only by DRL architectures.
Faculty Advisor: Philip Thomas