The Swiss Data Custodian | Part 2: how to enhance both data sharing and data privacy

Our digital self is becoming an important part of ourselves and a very descriptive representation of our physical body and mind. It is thus very important that we protect it.
By
Martin Fontanet
April 17, 2020
Share this post

In a previous Articles, we talked about today’s data economy and the problems that arise from it. We presented the Swiss Data Custodian (SDC) — a new project from the Swiss Data Science Center (SDSC) — which is meant to protect the rights of individuals and organizations, while generating new opportunities for economy, science and the common good.

A World of Data

Data pervades the fabric of our digital universe. Our daily interactions with online businesses and public services are permanently recorded. Our digital self is becoming an important part of ourselves and a very descriptive representation of our physical body and mind. It is thus very important that we protect it. Data is critical to the economy and there are indeed legitimate needs to process it for the benefit of everybody. It can be analyzed to better everyday life — one can use it to improve productivity and take more accurate decisions — as well as to take actions in extraordinary cases: with the recent coronavirus outbreak, some countries started processing data to stop the spread of the disease.However, people are often not aware of the importance of their data and that it can be used to their benefits, but also abused against them. It is important that they realize their data can have a great impact on everyday life, and that its use can deviate from its original goal: from travel assistance to mass surveillance, from improving user experience to influencing individuals’ behavior, or even from targeted advertising to presidential election manipulation.The main problem with today’s data economy comes from its asymmetrical sharing: we do not know what others know about us, and we do not realize how much they know.We need to find a way to let people regain sovereignty over their data — in other words, to let them maintain right of decisions on their data. It is a difficult task, because once the data is shared, it can be copied and people lose control over it. On the other hand, we should also avoid creating a dark web where people can hide their illegal activities. The ultimate goal is to create a fair trade data economy from which everyone could benefit. For that, we need a neutral body taking no advantage of its position to ensure that everybody’s rights are respected and that data is used ethically.The Swiss Data Custodian aims at preventing any abuses, promoting transparency, educating people and letting them regain sovereignty over their data. In this Articles, we describe the SDC concept in more details, and investigate how it can improve privacy, economy and science at the same time.

Actors and Concepts

To keep it simple, we can differentiate three main roles in today’s data economy: data sources, data storage and data consumers.A data source is an entity creating data. When users enter their date of birth in a registration form, or when transport companies generates GPS data from their buses, they create information. Even though this information is theirs, they do not have any legally binding ownership over it. Moreover, data ownership is a difficult problem, because data may have multiple owners - to share CCTV recordings of a residential area, for example, we may require a quorum of the majority of its residents.Data storage entities are the ones handling the information. They acquire and store the data — often in its non-encrypted form — and decide who and what can access it. The data source does not have any control over it anymore. Data storage entities can use the information, share it, sell it, or even leak it to a data consumer.Data consumers represent any entity asking for data. Most of the time, they extract value from the data to improve their services. For example, in the healthcare sector, data can be used to train machine learning models and get better diagnoses.These actors are mutually non-trusting parties: data sources may not trust the data storage entities and the data storage entities may not trust the data consumers. The Swiss Data Custodian provides a way for them to interact securely, while preserving the privacy and the sovereignty over personal data.

Decentralizing the Data Economy

Today, there is often no separation of data and its use - sensitive data is held in silos, and the storage entity is free to access it, modify it or remove it. Two problems arise from that: users do not have any control over their data, and the data is barely reusable. For example, it is difficult for a city to get transport data of its citizen without turning to the tech giants. The SDC solves this problem by providing a decentralized network of Custodian actors who manage data storage and processing on behalf of the users. The Custodian distributes the different roles (source, storage, process) and permissions to ensure that no entity is able to gain control that goes beyond its role.The decentralized network is a Multi-Sided Platform (MSP) comprised of different Custodian infrastructures hosted by data storage entities. These infrastructures are called Independent Administrative Domains (IAD). Companies can join the Custodian by running an IAD and providing storage and computing resources. Users can choose IADs for their personal data spaces, which store their data in an encrypted form, and decide who can access it. The IAD that stores the encrypted data cannot infer any information from it, unless the users gave their explicit consent.Instead of providing data consumers with raw data, the processing is done within the Custodian and only the results are transmitted. If a company wants to do some analysis on data using a specific algorithm, it will ask the Custodian to deploy computing resources running the algorithm. Depending on the nature of the data, the custodian employs different methods: either deploy the computing resource in the same IAD as where the data is stored to minimize information transmitted between IADs and to divide the knowledge of how to decrypt the data, or deploy it piecemeal to multiple IADs to avoid that one entity gets too much knowledge. Combinations are used to obtain a spectrum of methods. When users agree to share their information, the data is sent to the computing resources. Then, they run the algorithm on the data and send the output to the company.To minimize privacy issues, the SDC offers privacy-preserving distributed computing. Before sharing the data with the computation parties, it is anonymized, aggregated and distorted to make sure it cannot be linked back to the user. The Custodian also makes sure only the minimal necessary information is transmitted to the computing resources.A neutral hierarchical governance performs audits from the sideline on IADs and data consumers to make sure that algorithms, storage and usage of the data respect the individuals’ privacy. Before a computing resource is deployed, its algorithm is reviewed by the governance. The context, the goal of the analysis and the conflict of interest of the different parties are taken into account to ensure that the users’ rights are respected. For example, a mobile operator would not be allowed to perform analysis on health records (unless there is a strong and ethical reason for that).

The Data Journey

When a company wants to collect some information on users, it must first ask for their consent. These consent forms must be transparent and enable individuals to be informed about how their data will be processed. The users are given a fine-grained list of data elements and processing activities. They can decide which data element will be collected and for which processing it can be used. The Custodian provides the user with an evaluation of the privacy risk for each option in the consent form, based on a privacy score. Each consent is subject of audits to ensure respect of GDPR laws and that only the information necessary to perform the service is being used.Data collected from a user machine is encrypted before it leaves the device. It is then sent to the storing platform. Only the user and the entities which have been granted access and have the keys to decrypt the data are able to use it. As users can regulate who can access their data, they can be confident that it is not being used without them knowing about it, and that their sovereignty is inalienable.Entities demanding access to users’ data can only ask for the strict minimal necessary information for the purpose they claim. In principle, the data consumer should not have a direct access to the raw data, but only to its analysis results. The data is sent to different computing parties that will analyze the data for the consumer. Splitting the data between IAD ensures that no single IAD can gain control of it, or can trace it back to a user. The SDC uses differential privacy to make sure the different parties do not infer any information about the user from the data. For example, if a company wants to know how far two people are from each other, it will ask for the absolute distance between the two persons instead of their location. To compute the results, the Custodian adds noise to the raw data, and splits the task across multiple computation parties (for example, one computes the difference between latitudes, one computes the difference between longitudes and another computes the norm). When the computation is done, the absolute noisy distance is sent to the data consumer. The SDC gives the noise used to distort the data to the consumer, who can subtract it from the final result and get the accurate absolute distance.

What makes the Swiss Data Custodian Trustable?

Every action performed on users’ data is logged and accessible to them. Data is handled with full transparency, which means that users know who had access to it, how they used it, why and when. They know who they shared the keys with, and that this information is complete, accurate and verifiable. If parties have been sharing data, the users must also be informed. When users see a decision has been made based on their data — for example, for an advertisement they are shown — they can require to know why and how.Furthermore, users can control where their data is stored. If they are not satisfied with an IAD, they can decide to move their data elsewhere.

What are the benefits?

The Swiss Data Custodian can lead to advances in many domains. It would help protecting individuals’ privacy, while enhancing the data economy and improving research and scientific advances. Not only this provides transparency and gives data sovereignty back to the users, but also it is a tool to educate the less computer-savvy users, and increase their awareness of the value of their data and the importance of keeping it safe.People would know their data is safely stored and would not fear to lose control over it anymore. Therefore, they would be more willing to share it and create a better data ecosystem. That way, startups and small companies could have access to the data to improve their services. It would also make GDPR — a great burden for small companies — much easier to comply with. Companies would only have to use the data without dealing with its encryption, security and privacy assessment. Data sharing can create a coopetition between companies, which can result in great advances in research and science. Finally, this would let people know better who they are dealing with when they use a service, and instill trust in them while avoiding the abuse we can see in today’s data world.

The next steps

Currently, the Swiss Data Custodian can only guarantee users security and privacy when dealing with companies that decided to integrate it. The vision is to provide a Swiss-level Custodian that, through a privacy differentiation, offers a compelling solution to less privacy conscious alternatives. For example, some email and messaging services could be handled by the SDC, giving to the people an alternative that offers better privacy protection to the citizens than solutions proposed by Google, Facebook, or other data driven industries. In the future, we imagine that citizens will have access to their own personal data spaces provided by the Custodian, where it will be possible to interact with public and private services without putting at risk their digital-self. Ultimately, the objective of the Swiss Data Custodian is to demonstrate that it is possible to foster a data-driven economy that restores the citizens sovereignty on their data, and encourages more stringent privacy protection laws that further promote new initiatives in this direction.

References

  1. https://www.smartcitiesworld.net/news/news/south-korea-to-step-up-online-coronavirus-tracking-5109  
  2. https://www.theguardian.com/news/2018/dec/20/googles-earth-how-the-tech-giant-is-helping-the-state-spy-on-us  
  3. https://www.theguardian.com/technology/2017/may/01/facebook-advertising-data-insecure-teens
  4. https://www.nytimes.com/2018/03/17/us/politics/cambridge-analytica-trump-campaign.html

About the author

The Swiss Data Custodian | Part 2: how to enhance both data sharing and data privacy
Martin Fontanet
Privacy & Security Expert

Martin joined the SDSC in October 2019 to work on the Swiss Data Custodian project as a computer scientist. He obtained a BSc and a MSc in Communication Systems from EPFL. His great interest for Cyber security led him to specialize in Information Security and Privacy during his master. At the end of his studies, he did an internship where he coupled machine learning and security to develop a tool to automatically analyze log files and detect behavioural anomalies. After that, he came back to EPFL for his master thesis, where he investigated Website Fingerprinting, an attack aiming at breaking the privacy of Tor and other encrypted and anonymous networks.

Share this post

More blog posts

March 6, 2024

RAvaFcast | Automating regional avalanche danger prediction in Switzerland

RAvaFcast | Automating regional avalanche danger prediction in Switzerland

RAvaFcast is a data-driven model pipeline developed for automated regional avalanche danger forecasting in Switzerland.
Blog
June 19, 2024

Pulse Mag | Interview with Dr Roberto Castello on the use of AI in the energy transition

Pulse Mag | Interview with Dr Roberto Castello on the use of AI in the energy transition

Dr. Castello, Principal Data Scientist at the Swiss Data Science Center, shares insights on accelerating energy transition with AI.
Articles about the SDSC
June 27, 2024

Collaboration between CHUV, SDSC and UNIL

Collaboration between CHUV, SDSC and UNIL

Thanks to a collaboration between CHUV, EPFL – through the Swiss Data Science Center – and UNIL, a predictive platform based on artificial intelligence is under development to improve patient care.
Our News

Contact us

Let’s talk Data Science

Do you need our services or expertise?
Contact us for your next Data Science project!