direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Technical Reports

Optimality of LSTD and its Relation to TD and MC
Zitatschlüssel gruen06
Autor S. Grünewälder and K. Obermayer
Jahr 2006
Institution Berlin University of Technology
Zusammenfassung In this analytical study we compare the risk of three well known reinforcement estimators: temporal difference learning (TD), Monte Carlo estimation (MC) and least-squares TD (LSTD). We find that neither TD nor Monte Carlo estimation are in general superior to each other. However, we can prove that for the case of acyclic Markov Reward Processes (MRPs) LSTD, which is related to TD, has minimal risk for any convex loss function inthe class of unbiased estimators. We analyze the relation of TD and LSTD by means of a new estimator which is both similar to LSTD and to TD. We proof that the new estimator converges almost sure and in the average. When comparing the Monte Carlo estimator, which does not assume a Markov structure, and LSTD, we find that the Monte Carlo estimator is equivalent to LSTD if both estimators have the same amount of information. Theoretical results are supported by an empirical evaluation of the estimators.
Link zur Publikation Download Bibtex Eintrag

Zusatzinformationen / Extras


Schnellnavigation zur Seite über Nummerneingabe