direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Page Content

Machine Learning

All Publications

Optimal Gradient-Based Learning Using Importance Weights
Citation key hochreit2005a
Author Hochreiter, S. and Obermayer, K.
Title of Book Proceedings of the International Joint Conference on Neural Networks
Pages 114 – 119
Year 2005
ISBN 0-7803-9048-2
ISSN 2161-4393
DOI 10.1109/IJCNN.2005.1555815
Volume 1
Publisher IEEE
Abstract We introduce a novel "importance weight" method (IW) to speed up learning of "difficult" data sets including unbalanced data, highly non-linear data, or long-term dependencies in sequences. An importance weight is assigned to every training data point and controls its contribution to the total weight update. The importance weights are obtained by solving a quadratic optimization problem and determines the learning informativeness of a data point. For linear classifiers we show, that IW is equivalent to standard support vector learning. We apply IW to feedforward multi-layer perceptrons and to recurrent neural networks (LSTM). Benchmarks with QuickProp and standard gradient descent methods show that IW is usually much faster in terms of epochs as well as in terms of absolute CPU time, and that it provides equal or better prediction results. IW improved gradient descent results on "real world" protein datasets. In the "latching benchmark" for sequence prediction, IW was able to extract dependencies between sites which are 1,000,000 sequence elements apart - a new record.
Link to original publication Download Bibtex entry

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions