Overcoming privacy concerns in medical research databases

By Melissa Fassbender

- Last updated on GMT

Researchers have developed a new system that permits database queries for genome-wide association studies, but reduces the chances of privacy compromises to almost zero. (Illustration: Christine Daniloff/MIT)
Researchers have developed a new system that permits database queries for genome-wide association studies, but reduces the chances of privacy compromises to almost zero. (Illustration: Christine Daniloff/MIT)

Related tags Scientific method Research

A new system developed by MIT researchers helps ensure privacy in genomic research databases by “slightly perturbing” analysis results.

The research at MIT’s Computer Science and Artificial Intelligence Laboratory and Indiana University at Bloomington was recently published​ in the journal Cell Systems.

According to the researchers, the new system reduces the chances of privacy compromises to almost zero - addressing one of the pivotal issues facing data sharing​ initiatives. 

Sean Simmons, an MIT postdoc in mathematics and first author on the new paper, and Bonnie Berger, a mathematics professor at MIT and the corresponding author on the paper, told us the system is based on the ideas of differential privacy.

The basic concept is that, by slightly perturbing analysis results, one is able to guarantee privacy for research participants​,” the researchers said.

Though these ideas had been applied to some genomic statistics, the existing technologies could not deal with the diverse ancestries present in many real world genomic data sets that are known to be critical to accurate genomic studies. Our goal was to develop methods that overcame this hurdle​.”

According to the researchers, the most challenging part was determining how to overcome the effect of outliers.

If one individual is very different from all the other individuals in a study, their inclusion can greatly affect the result, leading to privacy loss​,” said Simmons and Berger. “We dealt with this by slightly modifying our definition of privacy to focus on protecting information about private disease status—a realistic goal as it is the data that is most sensitive​.”

How does it work?

The system “perturbs the results​” of a genomic analysis to ensure privacy, yet is still accurate enough to retain useful information.

In particular, it allows users to determine if a particular genomic alteration is correlated with a disease of interest in a dataset, or to produce a list of locations in the genome that are highly associated with the disease​,” the researchers said.

The method is able to overcome issues that cause false positives in genomic studies as well, unlike previous methods. Specifically, it corrects for population stratification – false positives due to different ancestries in a sample.

While the research addresses privacy issues in genomic databases, the researchers said the ideas of differential privacy can be applied to almost any area where private human data is collected.

One reason that data is not shared is due to concern over the privacy of individuals in the study​,” explained Simmons and Berger. “Our approach helps overcome that particular roadblock​.”

Related news

Show more

Related products

show more

Saama accelerates data review processes

Saama accelerates data review processes

Content provided by Saama | 25-Mar-2024 | Infographic

In this new infographic, learn how Saama accelerates data review processes. Only Saama has AI/ML models trained for life sciences on over 300 million data...

More Data, More Insights, More Progress

More Data, More Insights, More Progress

Content provided by Saama | 04-Mar-2024 | Case Study

The sponsor’s clinical development team needed a flexible solution to quickly visualize patient and site data in a single location

Using Define-XML to build more efficient studies

Using Define-XML to build more efficient studies

Content provided by Formedix | 14-Nov-2023 | White Paper

It is commonly thought that Define-XML is simply a dataset descriptor: a way to document what datasets look like, including the names and labels of datasets...

Why should you use clinical trial technology?

Why should you use clinical trial technology?

Content provided by Formedix | 01-Nov-2023 | White Paper

New, innovative clinical trial technology is helping to revolutionize the research landscape. COVID-19 demonstrated that clinical trials can be run much...

Related suppliers

Follow us

Products

View more

Webinars