TY - GEN
T1 - Trusted Provenance of Collaborative, Adaptive, Process-Based Data Processing Pipelines
AU - Stage, Ludwig
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024/3/2
Y1 - 2024/3/2
N2 - The abundance of data nowadays provides a lot of opportunities to gain insights in many domains. Data processing pipelines are one of the ways used to automate different data processing approaches and are widely used by both industry and academia. In many cases data and processing are available in distributed environments and the workflow technology is a suitable one to deal with the automation of data processing pipelines and support at the same time collaborative, trial-and-error experimentation in term of pipeline architecture for different application and scientific domains. In addition to the need for flexibility during the execution of the pipelines, there is a lack of trust in such collaborative settings where interactions cross organisational boundaries. Capturing provenance information related to the pipeline execution and the processed data is common and certainly a first step towards enabling trusted collaborations. However, current solutions do not capture change of any aspect of the processing pipelines themselves or changes in the data used, and thus do not allow for provenance of change. Therefore, the objective of this work is to investigate how provenance of workflow or data change during execution can be enabled. As a first step we have developed a preliminary architecture of a service – the Provenance Holder – which enables provenance of collaborative, adaptive data processing pipelines in a trusted manner. In our future work, we will focus on the concepts necessary to enable trusted provenance of change, as well as on the detailed service design, realization and evaluation.
AB - The abundance of data nowadays provides a lot of opportunities to gain insights in many domains. Data processing pipelines are one of the ways used to automate different data processing approaches and are widely used by both industry and academia. In many cases data and processing are available in distributed environments and the workflow technology is a suitable one to deal with the automation of data processing pipelines and support at the same time collaborative, trial-and-error experimentation in term of pipeline architecture for different application and scientific domains. In addition to the need for flexibility during the execution of the pipelines, there is a lack of trust in such collaborative settings where interactions cross organisational boundaries. Capturing provenance information related to the pipeline execution and the processed data is common and certainly a first step towards enabling trusted collaborations. However, current solutions do not capture change of any aspect of the processing pipelines themselves or changes in the data used, and thus do not allow for provenance of change. Therefore, the objective of this work is to investigate how provenance of workflow or data change during execution can be enabled. As a first step we have developed a preliminary architecture of a service – the Provenance Holder – which enables provenance of collaborative, adaptive data processing pipelines in a trusted manner. In our future work, we will focus on the concepts necessary to enable trusted provenance of change, as well as on the detailed service design, realization and evaluation.
KW - Collaborative Processes
KW - Data Processing Pipelines
KW - Provenance of ad-hoc workflow change
KW - Provenance of Change
KW - Reproducibility
KW - Trust
KW - Workflow evolution provenance
UR - http://www.scopus.com/inward/record.url?scp=85187801480&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-54712-6_25
DO - 10.1007/978-3-031-54712-6_25
M3 - Conference contribution
AN - SCOPUS:85187801480
SN - 978-3-031-54711-9
T3 - Lecture Notes in Business Information Processing
SP - 363
EP - 370
BT - Enterprise Design, Operations, and Computing. EDOC 2023 Workshops - IDAMS, iRESEARCH, MIDas4CS, SoEA4EE, EDOC Forum, Demonstrations Track and Doctoral Consortium, 2023, Revised Selected Papers
A2 - Sales, Tiago Prince
A2 - de Kinderen, Sybren
A2 - Proper, Henderik A.
A2 - Pufahl, Luise
A2 - Karastoyanova, Dimka
A2 - van Sinderen, Marten
PB - Springer
CY - Cham
T2 - several workshops, EDOC Forum and Demonstrations and Doctoral Consortium track, which were held in conjunction with 27th International Conference on Enterprise Design, Operations, and Computing, EDOC 2023
Y2 - 30 October 2023 through 3 November 2023
ER -