Dataset Open Access The Monk Line Segmentation (MLS) Dataset (17 May 2013)

Dataset

Description

The MLS dataset available from this page consists of 31 handwritten page scans. The dataset contains medieval, historical and contemporary manuscripts, and has the purpose of testing line-segmentation algorithms. The collection contains a wide variation of the common problems in handwriting recognition: lines with overlapping ascenders/descenders, slightly rotated scans and curved base lines.

The MLS dataset was collected from the Monk system as of Friday May 17 14:15:04 CEST 2013. It was collected by Lambert Schomaker in May 2013 at the Institution of Artificial Intelligence and Cognitive Engineering (ALICE), University of Gronigen.
Date made available17-May-2013
PublisherUniversity of Groningen
Date of data production17-May-2013

Keywords on Datasets

  • document analysis benchmark
  • line segmentation
  • historical manuscripts
  • handwriting recognition

Cite this