Abstract
In recent years, long short-term memory neural networks (LSTMs) followed by a connectionist temporal classification (CTC) have shown strength in solving handwritten text recognition problems. Such networks can handle not only sequence variability but also geometric variation by using a convolutional front end, at the input side. Although different approaches have been introduced for decoding activations in the CTC output layer, only limited consideration is given to the use of proper label-coding schemes. In this paper, we use a limited-size ensemble of end-to-end convolutional LSTM Neural Networks to evaluate four label-coding schemes. Additionally, we evaluate two CTC search techniques: Best-path search vs dual-state word-beam search (DSWBS). The classifiers in the ensemble have comparable architectures but variable numbers of hidden units. We tested the coding and search approaches on three datasets: A standard benchmark IAM dataset (English) and two more difficult historical handwritten datasets (diaries and field notes, highly multilingual). Results show that stressing the word endings in the label-coding scheme yields a higher performance, especially for DSWBS. However, stressing the start-of-word shapes with a token appears to be disadvantageous.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2020, 17th International Conference on Frontiers in Handwriting Recognition (ICFHR) |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 13-18 |
Number of pages | 6 |
ISBN (Electronic) | 978-1-7281-9966-5 |
ISBN (Print) | 978-1-7281-9967-2 |
DOIs | |
Publication status | Published - 25-Nov-2020 |
Event | 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR) - Dortmund, Germany Duration: 25-Nov-2020 → 25-Nov-2020 |
Conference
Conference | 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR) |
---|---|
Country | Germany |
City | Dortmund |
Period | 25/11/2020 → 25/11/2020 |