direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Approximate Reinforcement Learning

Lupe

Fully autonomous agents that interact with the environment (like humans and robots) present challenges very different from classic machine learning. The agent must balance future benefits of actions against their costs without the advantage of a teacher or prior knowledge of the environment. In addition costs may not only include the expected benefits (or rewards), but may well be formulated as a trade-off between different objectives (for example: rewards vs. risk).
Exact solutions in the field of Reinforcement Learning scale badly with the task's complexity and are rarely applicable in practice. To close the gap between theory and reality, this project aims for approximate solutions that not only make favourable decisions but also avoid irrational behaviour or dead ends. The approximation's highly adaptive nature allows a direct application onto the agent's sensor data and therefore a full sensor-actor control loop. Newly developed algorithms are tested in simulations and on robotic systems. Reinforcement and reward-based learning is also investigated in the context of understanding and modeling human decision making. For details see "Research" page "Perception and Decision Making in Uncertain Environments".


Acknowledgements: Research is funded by Deutsche Forschungsgemeinschaft (DFG), Human-Centric Communication Cluster (H-C3) and Technische Universität Berlin.

Selected Publications:

Generating feature spaces for linear algorithms with regularized sparse kernel slow feature analysis
Citation key Boehmer2012
Author Böhmer, W. and Grünewalder, S. and Nickisch, H. and Obermayer, K.
Pages 67–86
Year 2012
ISSN 0885-6125
DOI 10.1007/s10994-012-5300-0
Journal Machine Learning
Volume 89
Number 1
Publisher Springer US
Abstract Without non-linear basis functions many problems can not be solved by linear algorithms. This article proposes a method to automatically construct such basis functions with slow feature analysis (SFA). Non-linear optimization of this unsupervised learning method generates an orthogonal basis on the unknown latent space for a given time series. In contrast to methods like PCA, SFA is thus well suited for techniques that make direct use of the latent space. Real-world time series can be complex, and current SFA algorithms are either not powerful enough or tend to over-fit. We make use of the kernel trick in combination with sparsification to develop a kernelized SFA algorithm which provides a powerful function class for large data sets. Sparsity is achieved by a novel matching pursuit approach that can be applied to other tasks as well. For small data sets, however, the kernel SFA approach leads to over-fitting and numerical instabilities. To enforce a stable solution, we introduce regularization to the SFA objective. We hypothesize that our algorithm generates a feature space that resembles a Fourier basis in the unknown space of latent variables underlying a given real-world time series. We evaluate this hypothesis at the example of a vowel classification task in comparison to sparse kernel PCA. Our results show excellent classification accuracy and demonstrate the superiority of kernel SFA over kernel PCA in encoding latent variables.
Bibtex Type of Publication Selected:reinforcement
Link to publication Download Bibtex entry

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions