Leveraging data lineage to infer logical relationships between astronomical catalogs

Hugo Buddelmeijer*, Edwin A. Valentijn

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

1 Citation (Scopus)
56 Downloads (Pure)

Abstract

A novel method to infer logical relationships between sets is presented. These sets can be any collection of elements, for example astronomical catalogs of celestial objects. The method does not require the contents of the sets to be known explicitly. It combines incomplete knowledge about the relationships between sets to infer a priori unknown relationships. Relationships between sets are represented by sets of Boolean hypercubes. This leads to deductive reasoning by application of logical operators to these sets of hypercubes. A pseudo code for an efficient implementation is described. The method is used in the Astro-WISE information system to infer relationships between catalogs of astronomical objects. These catalogs can be very large and, more importantly, their contents do not have to be available at all times. Science products are stored in Astro-WISE with references to other science products from which they are derived, or their dependencies. This creates a full data lineage that links every science product all the way back to the raw data. Catalogs are created in a way that maximizes knowledge about their relationship with their dependencies. The presented algorithm is used to determine which objects a catalog represents by leveraging this information.

Original languageEnglish
Pages (from-to)227-244
Number of pages18
JournalExperimental Astronomy
Volume35
Issue number1
DOIs
Publication statusPublished - Jan-2013

Keywords

  • Data mining
  • Data lineage
  • Algorithms
  • Automated theorem solving
  • Astro-WISE

Cite this