Abstract
This thesis addresses the challenge of training machine learning models on distributed data while ensuring the privacy and integrity of the data. It explores methods to maintain the scalability and accuracy of algorithms across various environments, focusing on compliance with privacy regulations and optimal use of machine learning technologies. The central research question investigates how machine learning models can be trained over distributed data in a privacy-preserving manner, broken down into sub-questions that explore different facets of this challenge.
Initial research focused on the feasibility of performing machine learning operations on encrypted data using multi-party computation. This foundational work led to further investigations into federated learning and synthetic data generation. A novel framework combining homomorphic encryption and differential privacy was introduced, offering a practical approach to synthetic data generation in distributed settings. Additionally, the thesis proposes secure protocols for efficiently aggregating information from distributed sources, enhancing security against both semi-honest and Byzantine adversaries.
The research rigorously evaluated these methods using various datasets and scenarios, demonstrating their practical effectiveness in enhancing privacy and security in distributed learning environments. It provides robust protocols that integrate cryptographic techniques with machine learning algorithms to ensure privacy, marking a significant advancement towards practical, privacy-preserving machine learning. The findings lay a solid foundation for future research and applications that aim to balance leveraging big data for machine learning with upholding stringent data privacy standards. This thesis thus takes a pivotal step forward in making privacy-preserving machine learning achievable.
Initial research focused on the feasibility of performing machine learning operations on encrypted data using multi-party computation. This foundational work led to further investigations into federated learning and synthetic data generation. A novel framework combining homomorphic encryption and differential privacy was introduced, offering a practical approach to synthetic data generation in distributed settings. Additionally, the thesis proposes secure protocols for efficiently aggregating information from distributed sources, enhancing security against both semi-honest and Byzantine adversaries.
The research rigorously evaluated these methods using various datasets and scenarios, demonstrating their practical effectiveness in enhancing privacy and security in distributed learning environments. It provides robust protocols that integrate cryptographic techniques with machine learning algorithms to ensure privacy, marking a significant advancement towards practical, privacy-preserving machine learning. The findings lay a solid foundation for future research and applications that aim to balance leveraging big data for machine learning with upholding stringent data privacy standards. This thesis thus takes a pivotal step forward in making privacy-preserving machine learning achievable.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 3-Dec-2024 |
Place of Publication | [Groningen] |
Publisher | |
DOIs | |
Publication status | Published - 2024 |