Dead Sea Scrolls data collection (images, labels, prediction plots) for dating ancient manuscripts using radiocarbon and AI-based writing style analysis

Dataset

Description

The dataset is associated with the following article: Title: Dating ancient manuscripts using radiocarbon and AI-based writing style analysis Authors: Mladen Popović, Maruf A. Dhali, Lambert Schomaker, Johannes van der Plicht, Kaare Lund Rasmussen, Jacopo La Nasa, Ilaria Degano, Maria Perla Colombini, and Eibert Tigchelaar (Under review) This data set is collected for the ERC project: The Hands that Wrote the Bible: Digital Palaeography and Scribal Culture of the Dead Sea Scrolls PI: Mladen Popović Grant agreement ID: 640497 Project website: https://cordis.europa.eu/project/id/640497 Copyright (c) University of Groningen, 2024. All rights reserved. Disclaimer and copyright notice for all data contained on the *.tar.gz files: 1) permission is hereby granted to use the data for research purposes. It is not allowed to distribute this data for commercial purposes. 2) provider gives no express or implied warranty of any kind, and any implied warranties of merchantability and fitness for purpose are disclaimed. 3) provider shall not be liable for any direct, indirect, special, incidental, or consequential damages arising out of any use of this data. 4) the user should refer to the first public article mentioned above on this data set. 5) the recipient should refrain from proliferating the data set to third parties external to his/her local research group. Please refer interested researchers to this site to obtain their own copy. Organization of the data: (Update on 19 April 2024: OxCal data for accepted 2-sigma ranges are updated with the incusion and exclusion of minor peaks. New prediction plots are added after the model is trained with accepted 2-sigma ranges, including minor peaks. The old plots are also kept. updated on 07-Feb-2024: OxCal data for selected ranges added in a new directory in addition to previously available original OxCal data. Enoch's prediction plots and test images are reorganized for easy access to the users. Please use the files from this version and disregard the previous two versions: 10.5281/zenodo.10629480 and 10.5281/zenodo.8168210) There are four *.tar.gz files: C14-Oxcal-data-updated.tar.gz contains one directory with radiocarbon data (OxCal [1] raw data) for all 30 manuscripts. Three additional directories contain name-corrected files for original OxCal data, files with accepted ranges, and files with accepted ranges including minor peaks. Please refer to the original article for details about OxCal data and the manuscripts. 25 out of 30 raw OxCal data are used (accepted ranges only) as the training labels during the training of Enoch, the date prediction model. train-images-c14.tar.gz contains the clean and preprocessed (binarized, aligned, and arrangement corrected) training images for the 25 radiocarbon-dated training manuscripts (including 4Q52; 64 images in total). test-images-all.tar.gz contains the clean and preprocessed test images for 135 previously undated manuscripts. The images are organized in three different directories: the first one with all 359 images for the 135 manuscripts, the second one with the selected 135 images, and the final one with 25 images to illustrate the poor quality of images. Enoch-prediction-new-with-minor-peaks.tar.gz contains the new date prediction plots for each of the 135 test images, where Enoch was trained with the inclusion of minor peaks for the 2-sigma accepted ranges and with a data balancing threshold of 0.05. These plots are used by expert palaeographers' evaluation of Enoch's style-based date predictions of 135 previously undated manuscripts. Enoch-predictions.tar.gz contains the date prediction plots for each of the 135 test images. There are two directories inside the *.tar.gz file: - prediction-plots-for-selected-135: Prediction plots with data balancing threshold of 0.05. - extra-plots: contains four additional directories: - Enoch-predictions-c14wo4Q52-balanced05: Prediction plots with data balancing threshold of 0.05. - Enoch-predictions-c14wo4Q52-balanced10: Prediction plots with data balancing threshold of 0.1. - Enoch-predictions-c14wo4Q52-unbalanced: Unbalanced raw predictions. - Enoch-predictions-c14wo4Q52-combined: Combined plots with all three prediction plots (unbalanced, 0.05, 0.1). Please refer to the original article for more details. The updated code to run the plot is available here: https://doi.org/10.5281/zenodo.10998860 If you have any questions, please get in touch with us: Mladen Popović Maruf A. Dhali Lambert Schomaker References: 1. Bronk Ramsey, C. (2001). Development of the radiocarbon calibration program. Radiocarbon, 43(2A), 355-363.
Date made available19-Apr-2024
PublisherZENODO

Cite this