1QIsaa data collection (binarized images, feature files, and plotting scripts) for writer identification test using artificial intelligence and image-based pattern recognition techniques



The Great Isaiah Scroll (1QIsaa) data set for writer identification

According to ImageMagick's' identify' tool, the original images are in grayscale (.jpg) from Brill collection, in '8-bit Gray 256c'. These images pass through multiple preprocessing measures to become suitable for pattern recognition-based techniques. The first step in preprocessing is the image-binarization technique. In order to prevent any classification of the text-column images based on irrelevant background patterns, a specific binarization technique (BiNet) was applied, keeping the original ink traces intact. After performing the binarization, the images were cleaned further by removing the adjacent columns that partially appear on the target columns' images. Finally, few minor affine transformations and stretching corrections were performed in a restrictive manner. These corrections are also targeted for aligning the texts where the text lines get twisted due to the leather writing surface's degradation. Hence, the clean images are there in the directory along with the direct binarized images. No effort has been made to obtain a balanced set in any way.

Note the terms of use stated, follow the DOI for details.
Date made available26-Jan-2021
PublisherUniversity of Groningen

Keywords on Datasets

  • Writer identification
  • Artificial intelligence
  • Pattern recognition
  • Document analysis
  • Historical manuscript dating
  • Great Isaiah Scroll

Cite this