direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Approximate Reinforcement Learning

Lupe

Fully autonomous agents that interact with the environment (like humans and robots) present challenges very different from classic machine learning. The agent must balance future benefits of actions against their costs without the advantage of a teacher or prior knowledge of the environment. In addition costs may not only include the expected benefits (or rewards), but may well be formulated as a trade-off between different objectives (for example: rewards vs. risk).
Exact solutions in the field of Reinforcement Learning scale badly with the task's complexity and are rarely applicable in practice. To close the gap between theory and reality, this project aims for approximate solutions that not only make favourable decisions but also avoid irrational behaviour or dead ends. The approximation's highly adaptive nature allows a direct application onto the agent's sensor data and therefore a full sensor-actor control loop. Newly developed algorithms are tested in simulations and on robotic systems. Reinforcement and reward-based learning is also investigated in the context of understanding and modeling human decision making. For details see "Research" page "Perception and Decision Making in Uncertain Environments".


Acknowledgements: Research is funded by Deutsche Forschungsgemeinschaft (DFG), Human-Centric Communication Cluster (H-C3) and Technische Universität Berlin.

Selected Publications:

Construction of Approximation Spaces for Reinforcement Learning
Citation key Boehmer2013a
Author Böhmer, W. and Grünewälder, S. and Shen, Y. and Musial, M. and Obermayer, K.
Pages 2067–2118
Year 2013
Journal Journal of Machine Learning Research
Volume 14
Month July
Abstract Linear reinforcement learning (RL) algorithms like least-squares temporal difference learning (LSTD) require basis functions that span approximation spaces of potential value functions. This article investigates methods to construct these bases from samples. We hypothesize that an ideal approximation spaces should encode diffusion distances and that slow feature analysis (SFA) constructs such spaces. To validate our hypothesis we provide theoretical statements about the LSTD value approximation error and induced metric of approximation spaces constructed by SFA and the state-of-the-art methods Krylov bases and proto-value functions (PVF). In particular, we prove that SFA minimizes the average (over all tasks in the same environment) bound on the above approximation error. Compared to other methods, SFA is very sensitive to sampling and can sometimes fail to encode the whole state space. We derive a novel importance sampling modification to compensate for this effect. Finally, the LSTD and least squares policy iteration (LSPI) performance of approximation spaces constructed by Krylov bases, PVF, SFA and PCA is compared in benchmark tasks and a visual robot navigation experiment (both in a realistic simulation and with a robot). The results support our hypothesis and suggest that (i) SFA provides subspace-invariant features for MDPs with self-adjoint transition operators, which allows strong guarantees on the approximation error, (ii) the modified SFA algorithm is best suited for LSPI in both discrete and continuous state spaces and (iii) approximation spaces encoding diffusion distances facilitate LSPI performance.
Bibtex Type of Publication Selected:main selected:reinforcement selected:publications
Link to publication Download Bibtex entry

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions