A data lineage model for distributed sub-image processing

Johnson Mwebaze, John McFarland, Danny Booxhorn, Edwin Valentijn

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Citations (Scopus)
19 Downloads (Pure)

Abstract

An important challenge facing e-Science is the development of scalable systems and analysis techniques that allow client applications to locate data and services in increasingly large-scale distributed environments. e-Science Systems should achieve three main goals: (i) efficient and selective processing of data, (ii) support network collaboration without clogging distribution networks; and (iii) allow transparency of experiments through repeatability and verifiability of experiments. Several systems have addressed limited combinations of these properties, but we address all three in this work. We describe the architecture and implementation of such a framework in Astro-WISE, an astronomical approach to distributed data processing, discovery and retrieval of datasets that achieves scalability via dynamic linking (data lineage) maintained within the system. We show that lineage data collected during the processing and analysis of datasets can be reused to perform selective reprocessing(at sub-image level)ondatasets while the remainder of the dataset is untouched, a rather difficult process to automate without lineage.

Original languageEnglish
Title of host publicationFountains of Computing Research - Proceedings of SAICSIT 2010 Annual Research Conference of the South African Institute of Computer Scientist and Information Technologists
PublisherACM Press Digital Library
Pages209-219
Number of pages11
ISBN (Print)9781605589503
DOIs
Publication statusPublished - 2010
Event2010 Annual Research Conference of the South African Institute of Computer Scientist and Information Technologists, SAICSIT 2010 - Bela Bela, South Africa
Duration: 11-Oct-201013-Oct-2010

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2010 Annual Research Conference of the South African Institute of Computer Scientist and Information Technologists, SAICSIT 2010
Country/TerritorySouth Africa
CityBela Bela
Period11/10/201013/10/2010

Keywords

  • data lineage
  • data reduction
  • provenance
  • scientific computing
  • subimage processing
  • target processing

Fingerprint

Dive into the research topics of 'A data lineage model for distributed sub-image processing'. Together they form a unique fingerprint.

Cite this