Federated Learning (FL) has recently emerged as the de facto framework for distributed machine learning (ML) that preserves the privacy of data, especially in the proliferation of mobile and edge devices with their increasing capacity for storage and computation. To fully utilize the vast amount of geographically distributed, diverse and privately owned data that is stored across these devices, FL provides a platform on which local devices can build their own local models whose training processes can be synchronized via sharing differential parameter updates. This was done without exposing their private training data, which helps mitigate the risk of privacy violation, in light of recent policies such as the General Data Protection Regulation (GDPR). Such potential use of FL has since then led to an explosive attention from the ML community, resulting in a vast, growing amount of both theoretical and empirical literature that push FL closer to being the new standard of ML as a democratized data analytic service.
Interestingly, as FL comes closer to being deployable in real-world scenarios, it also surfaces a growing set of challenges on trustworthiness, fairness, auditability, scalability, robustness, security, privacy preservation, decentralizability, data ownership and personalizability that are all becoming increasingly important in many interrelated aspects of our digitized society. Such challenges are particularly important in economic landscapes that do not have the presence of big tech corporations with big data and are instead driven by government agencies and institutions with valuable data locked up or small-to-medium enterprises & start-ups with limited data and little funding. With this forethought, the workshop envisions the establishment of an AI ecosystem that facilitates data and model sharing between data curators as well as interested parties in the data and models while protecting personal data ownership.
This raises the following questions:
1. Data curators may own different types of ML models. Due to their own interest in protecting the IP, there is no reason to believe that they would be willing to share information on their model architectures or parameters. Thus, if we are to facilitate meaningful collaboration in such cases, how then do data curators aggregate and distill latent knowledge from their heterogeneous, black-box models to bring home the distilled model(s) for future use?
2. How do we incentivize data curators to come together to share their data for model building? How does a participant know that the other data curator(s) are contributing valuable, authentic and safe data to the collaboration (and vice versa)? How do we incentivize them to collaborate? In this view, there is a need to consider data auditability and fairness in data sharing based on their respective contributions. Furthermore, as far as personal data ownership goes, how do we guarantee the right to be forgotten in terms of the participant’s data footprint?
We believe addressing these challenges will make another key milestone in shaping FL as a democratized machine learning (ML) service supported by a trustworthy AI ecosystem built on the aforementioned concepts.