The snowball principle for handwritten word-image retrieval: The importance of labelled data and humans in the loop

Jean-Paul van Oosten

Research output: ThesisThesis fully internal (DIV)

316 Downloads (Pure)


Handwriting recognition is an active field of research, even though
today, most text is produced digitally. Large collections of handwritten
text are stored in archives, like the Dutch National Archives. Usually,
these older manuscripts are hard to read and especially hard to search.

Besides technical assumptions that are frequently made in the field of
handwriting recognition, this dissertation studies detailed assumptions
on the process of handwriting recognition itself. One of the most
important insights is that the collection of labels, that indicate
what word is written in an image, is usually regarded as a separate
process. However, for an unknown script, it is particularly important
that a search engine can be trained quickly by human users. Through such
a collaboration, these users can use this labelling to influence the
quality of the algorithms.

Important aspects of a search engine are the machine learning and
feature extraction techniques. The big question is: Where can
researchers best focus their time and attention? Although machine
learning is usually regarded as the most important part, it is argued in
this dissertation that the label collection is often overlooked. Machine
learning, feature extraction and label collection are connected, and the
role of humans in this interplay is very important.

The goal is to create a snowball effect, that makes sure that collecting
labels keeps getting easier and easier. The most important conclusion
is that handwriting recognition is not a static process that is applied
only once, but a dynamic process that needs continuous maintenance.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • University of Groningen
  • Schomaker, Lambert, Supervisor
Award date26-Mar-2021
Place of Publication[Groningen]
Publication statusPublished - 2021

Cite this