TU Berlin

Neural Information ProcessingApproximate Reinforcement Learning

Neuronale Informationsverarbeitung

Page Content

to Navigation

Approximate Reinforcement Learning


Fully autonomous agents that interact with the environment (like humans and robots) present challenges very different from classic machine learning. The agent must balance future benefits of actions against their costs without the advantage of a teacher or prior knowledge of the environment. In addition costs may not only include the expected benefits (or rewards), but may well be formulated as a trade-off between different objectives (for example: rewards vs. risk).
Exact solutions in the field of Reinforcement Learning scale badly with the task's complexity and are rarely applicable in practice. To close the gap between theory and reality, this project aims for approximate solutions that not only make favourable decisions but also avoid irrational behaviour or dead ends. The approximation's highly adaptive nature allows a direct application onto the agent's sensor data and therefore a full sensor-actor control loop. Newly developed algorithms are tested in simulations and on robotic systems. Reinforcement and reward-based learning is also investigated in the context of understanding and modeling human decision making. For details see "Research" page "Perception and Decision Making in Uncertain Environments".

Acknowledgements: Research is funded by Deutsche Forschungsgemeinschaft (DFG), Human-Centric Communication Cluster (H-C3) and Technische Universität Berlin.

Selected Publications:

Risk-sensitive Markov Control Processes
Citation key Shen2013
Author Shen, Y. and Stannat, W. and Obermayer, K.
Pages 3652–3672
Year 2013
DOI 10.1137/120899005
Journal SIAM Journal on Control and Optimization
Volume 51
Number 5
Abstract We introduce a general framework for measuring risk in the context of Markov control processes with risk maps on general Borel spaces that generalize known concepts of risk measures in mathematical finance, operations research, and behavioral economics. Within the framework, apply- ing weighted norm spaces to incorporate unbounded costs also, we study two types of infinite-horizon risk-sensitive criteria, discounted total risk and average risk, and solve the associated optimization problems by dynamic programming. For the discounted case, we propose a new discount scheme, which is different from the conventional form but consistent with the existing literature, while for the average risk criterion, we state Lyapunov-like stability conditions that generalize known conditions for Markov chains to ensure the existence of solutions to the optimality equation.
Bibtex Type of Publication Selected:main selected:reinforcement selected:decision selected:publications
Link to publication Link to original publication Download Bibtex entry


Quick Access

Schnellnavigation zur Seite über Nummerneingabe