Game theory: an innovative way to protect biomedical data against hackers | technology

Game theory, which seeks to predict how competitors ‘behavior affects the choices other players make, can help researchers find the best ways to share biomedical data with the ability to anonymize the people sharing hackers’ data.

Modern biomedical research, such as the National COVID Cohort Collaborative and the Personal Genome Project, require large amounts of data on individuals.

Making detailed datasets available to the public without violating anyone’s privacy is a critical challenge for such projects.

To do this, many programs that collect genomic data hide and publish personal information in data that can be exploited to re-identify subjects. However, the remaining data can be used to track personal information from other sources, which can be linked to biomedical data to discover people’s identities.

For example, comparing a person’s DNA data with public genealogical databases such as Ancestry.com can sometimes lead to a person’s surname, which can be used with demographic data to trace a person’s identity via search engines or by online public records sites like PeopleFinders.

A research group at the Center for Genetic Privacy and Identity in Community Institutions has developed methods to help evaluate and mitigate privacy risks in sharing biomedical data.

Research group methods can be used to protect different types of data, such as personal demographics or genome order, from attacks on anonymity.

Research group methods can be used to protect different types of data, such as personal demographics or genome sequences, from attacks on anonymity (Getty Images)

Leaders and followers model

The group’s latest model uses the Leaders and Followers game to model the interactions between a data subject and a potentially harmful data user.

In this model, the data subject moves first, decides which data to share, and then moves the opponent and decides whether to attack based on the shared data.

Using game theory to evaluate data sharing approaches involves evaluating each strategy and its impact on both the privacy and value of the data shared. Strategies include balancing the leaving or withholding of pieces of data to protect identities and keep the data as useful as possible.

An optimal data subject strategy allows the most data to be shared with the least risk. However, it is challenging to find the optimal strategy because genome sequencing data has many dimensions, making it impractical to fully research all possible data sharing strategies.

To overcome this problem, the research team has developed algorithms that focus on a small subset of strategies that are most likely to contain the optimal strategy. This method has proven its effectiveness by achieving the benefit of sharing data with the public and the privacy of the data subject.

Why is this important?

The worst case scenario is where the attacker has unlimited capabilities and does not care about financial losses, which is often highly unlikely. However, data managers sometimes focus on these scenarios, which can lead to them overestimating the risks of re-identification and, for security reasons, sharing much less data than they can.

The goal is to create a structured approach to thinking about risks that also explains the value of shared data. A game-based approach not only provides a more realistic estimate of the risks of re-identification, but also discovers strategies for sharing data that can strike the right balance between utility and privacy.

Data managers use encryption techniques to protect biomedical data (Getty Images)

What other research is being done?

Data managers use encryption techniques to protect essential medical data. Other methods include adding noise to the data and masking partial data.

This work builds on a previous study, which pioneered the use of game theory to assess the risks of re-identification within health data and protection against identity attacks on genetic data.

The present study is the first to look at an attack in which an attacker can access multiple resources and combine them in an incremental way.

What’s next?

The research group is now expanding its game-based approach to modeling player uncertainty and rationality. It also works for computer environments that consist of multiple data providers and multiple types of data receivers.

Leave a Comment