- Real-Time Recurrent Reinforcement Learning(arXiv)
Abstract : Recent advances in reinforcement learning, for partially-observable Markov decision processes (POMDPs), rely on the biologically implausible backpropagation through time algorithm (BPTT) to perform gradient-descent optimisation. In this paper we propose a novel reinforcement learning algorithm that makes use of random feedback local online learning (RFLO), a biologically plausible approximation of realtime recurrent learning (RTRL) to compute the gradients of the parameters of a recurrent neural network in an online manner. By combining it with TD(λ), a variant of temporaldifference reinforcement learning with eligibility traces, we create a biologically plausible, recurrent actor-critic algorithm, capable of solving discrete and continuous control tasks in POMDPs. We compare BPTT, RTRL and RFLO as well as different network architectures, and find that RFLO can perform just as well as RTRL while exceeding even BPTT in terms of complexity. The proposed method, called real-time recurrent reinforcement learning (RTRRL), serves as a model of learning in biological neural networks mimicking reward pathways in the mammalian brain
2. Simultaneous Discovery of Quantum Error Correction Codes and Encoders with a Noise-Aware Reinforcement Learning Agent(arXiv)
Abstract : : Finding optimal ways to protect quantum states from noise remains an outstanding challenge across all quantum technologies, and quantum error correction (QEC) is the most promising strategy to address this issue. Constructing QEC codes is a complex task that has historically been powered by human creativity with the discovery of a large zoo of families of codes. However, in the context of real-world scenarios there are two challenges: these codes have typically been categorized only for their performance under an idealized noise model and the implementation-specific optimal encoding circuit is not known. In this work, we train a Deep Reinforcement Learning agent that automatically discovers both QEC codes and their encoding circuits for a given gate set, qubit connectivity, and error model. We introduce the concept of a noise-aware meta-agent, which learns to produce encoding strategies simultaneously for a range of noise models, thus leveraging transfer of insights between different situations. Moreover, thanks to the use of the stabilizer formalism and a vectorized Clifford simulator, our RL implementation is extremely efficient, allowing us to produce many codes and their encoders from scratch within seconds, with code distances varying from 3 to 5 and with up to 20 physical qubits. Our approach opens the door towards hardware-adapted accelerated discovery of QEC approaches across the full spectrum of quantum hardware platforms of interest.