TU Berlin

Neural Information ProcessingApproximate Reinforcement Learning

Neuronale Informationsverarbeitung

Page Content

to Navigation

Approximate Reinforcement Learning


Fully autonomous agents that interact with the environment (like humans and robots) present challenges very different from classic machine learning. The agent must balance future benefits of actions against their costs without the advantage of a teacher or prior knowledge of the environment. In addition costs may not only include the expected benefits (or rewards), but may well be formulated as a trade-off between different objectives (for example: rewards vs. risk).
Exact solutions in the field of Reinforcement Learning scale badly with the task's complexity and are rarely applicable in practice. To close the gap between theory and reality, this project aims for approximate solutions that not only make favourable decisions but also avoid irrational behaviour or dead ends. The approximation's highly adaptive nature allows a direct application onto the agent's sensor data and therefore a full sensor-actor control loop. Newly developed algorithms are tested in simulations and on robotic systems. Reinforcement and reward-based learning is also investigated in the context of understanding and modeling human decision making. For details see "Research" page "Perception and Decision Making in Uncertain Environments".

Acknowledgements: Research is funded by Deutsche Forschungsgemeinschaft (DFG), Human-Centric Communication Cluster (H-C3) and Technische Universität Berlin.

Selected Publications:

Towards Structural Generalization: Factored Approximate Planning
Citation key Boehmer2013b
Author Böhmer, W. and Obermayer, K.
Year 2013
Journal ICRA Workshop on Autonomous Learning
Abstract Autonomous agents do not always have access to the amount of samples machine learning methods require. Structural assumptions like factored MDP allow to generalize experiences beyond traditional metrics to entirely new situations. This paper introduces a novel framework to exploit such knowledge for approximated policy iteration. At the heart of the framework a novel factored approximate planning algorithm is derived. The algorithm requires no real observations and optimizes control for given linear reward and transition models. It is empirically compared with least squares policy iteration in a continuous navigation task. Computational leverage in constructing the linear models without observing the entire state space and in representation of the solution are discussed as well.
Bibtex Type of Publication Selected:reinforcement
Link to publication Download Bibtex entry


Quick Access

Schnellnavigation zur Seite über Nummerneingabe