UK-based Lifebit, Genomics England, and NIHR Cambridge are joining forces on an ambitious project: creating the largest cancer database on the planet. The project is poised to strike the balance between the need to access data needed to develop much-needed therapies and to protect patient confidentiality.
To learn more about the project and the possibilities it could create for future cancer research, Outsourcing-Pharma connected with Thorben Seeger, chief business officer with Lifebit.
OSP: Could you please share the ‘elevator presentation’ description of Lifebit—who you are, how you got started, your key areas of specialty, and what sets you apart from other companies in this sphere?
TS: In a nutshell, Lifebit has created a patented, federated technology that brings analysis and computation to where data resides. This enables researchers to run analyses on multiple, distributed datasets in-situ, avoiding the risky movement of highly sensitive data. The result is enabling precision medicine and therapeutics by giving researchers a means of securely accessing and analyzing siloed biomedical data.
We effectively address the legitimate security concerns of patient data confidentiality and security, by allowing research to be done while data is still under the control of organizations that generated it, to begin with. Lifebit’s CloudOS enables researchers to link to other data sources virtually, for faster data insights.
Lifebit was founded in 2017 by Maria Chatzou Dunford and Pablo Prieto Barja, who had worked together at the Center for Genomic Regulation (Centro de Regulación Genómica, CRG), in Barcelona. Pursuing big data analysis of genomics markers (what is known as bioinformatics), they understood the problems inherent in making genomic data securely accessible and developed their vision of democratizing this data.
How Lifebit differentiates itself from other providers is in its federated approach, it is building highly powerful and secure trusted research environments (TREs), connecting researchers with the data they need to make research discoveries - without ever moving that data.
Lifebit’s leadership in this field has been validated by venture capital investment totaling over $70m [USD] across two rounds. The most recent round of investment came in September 2021 from Tiger Global.
OSP: Please take a moment to share your views on how the collection and analysis of data in health research have evolved in recent years.
TS: Technologies that enable next-generation sequencing of human genes have become dramatically more affordable than they had been, consequently generating vast amounts of data. How large? An individual whole genome, the complete set of that individual’s genetic information, can be as large as 300 gigabytes in size.
Add to that the volumes of available clinical data and the numbers start becoming inconceivably large. Consider this: The combined genomic data of the world’s population would be on the order of 2.4t gigabytes.
The unfortunate truth for scientific advancement, however, has been that this data is kept in silos, under the control and management of the organizations that gathered it. Global security and patient confidentiality concerns have made it difficult to share data.
According to the World Economic Forum, roughly 97% of available health data goes completely unused. So, generating large amounts of biomedical data has become relatively straightforward – the “easy” part, if any of this could be called easy. The difficult, almost impossible part is how organizations can access and make use of data stored across thousands of disconnected locations.
We are on the precipice of eliminating or greatly reducing the complexity of this problem with the increased acceptance of the cloud in life science, combined with bioinformatics and artificial intelligence. While not a requirement for federation per sé, the cloud accelerates the creation of federated networks across research, pharmaceutical, and government organizations, as bioinformatics has taken advantage of the power of the cloud to acquire, store, analyze and disseminate large-scale biological data.
OSP: Please share some of the challenges/limitations that still remain around collecting data and making optimal use of it.
TS: The answer to this question falls in the realm of “be careful what you wish for, for you will surely get it.”
The greatest challenge or limitation underlying optimal use of collected data is the data itself – not the quality, but the sheer volume. Access to data is a mixed blessing: The larger the dataset in clinical trials, the more confidence there can be in the results of those findings. By some calculations, the more data that’s used in clinical trials, the greater the chances of faster regulatory approval – as great as double the chances.
Still, the most robust cloud solutions and the most sophisticated acquisition storage and analysis capabilities of bioinformatics do little to address the problem of how to get through the analysis of massive volumes of data being unlocked by genetic scientists and researchers today.
The addition of increasingly robust artificial intelligence and machine learning capabilities is enabling the analysis of the hundreds of millions of gigabytes of data to finally become a reality.
OSP: Could you tell us how Lifebit came to collaborate with Genomics England and the NIHR Cambridge Biomedical Research Centre (BRC)?
TS: In 2020, Genomics England launched a next-generation genomic medicine research platform, or TRE, using Lifebit technology to support the effort. Genomics England’s TRE has been central to the UK Government’s research response to COVID-19. It also has facilitated medical advancements in cancer and rare diseases.
In an exciting new project led by a consortium including The University of Cambridge, NIHR Cambridge Biomedical Research Centre, Genomics England, Eastern AHSN, Cambridge University Health Partners, and Lifebit, we will be bridging the two TREs of Genomics England and the NIHR Cambridge BRC in what will be the first multi-party federated architecture between a national organization and a higher education institution. By bridging the TREs of these institutions, we will be enabling researchers to analyze a much larger cohort of fully consented clinical-genomic data from patients with cancer.
The project is funded by UK Research & Innovation as part of the DARE UK (Data and Analytics Research Environments UK) program, which is delivered in partnership with Health Data Research UK (HDR UK) and ADR UK (Administrative Data Research UK).
OSP: How did the concept to federate between Genomics England and the NIHR Cambridge Biomedical Research Centre come about?
TS: A growing number of research organizations and data custodians have their own TREs, but these are not able to communicate with others, meaning the biomedical data in these TREs remain siloed. This is slowing down research and delaying new discoveries.
With this bridging technology, researchers will be able to work with their combined data, without any data leaving either secure source. With more health data accessible for research, researchers' analyses can have greater power, and this could hold the key to us better understanding, diagnosing, and treating cancers and rare diseases.
Lifebit’s technology, Lifebit CloudOS, is being used to modernize and future-proof the computational infrastructure for genomics and wider ‘omics’ data across the NIHR Cambridge BRC, by powering a new cloud-based TRE, named CYNAPSE, which will serve as a scalable and secure data management and analysis platform for NIHR Cambridge BRC researchers.
The delivery of CYNAPSE is well underway and the NIHR Cambridge BRC is set to join a growing number of research organizations who are standing up TREs that make use of Lifebit’s pioneering federated technology to make sensitive biomedical data securely available for research.
OSP: Also, please share your thoughts about challenges with keeping patient data private and secure, and how this project will work to ensure that (and balance safety with speed).
TS: TREs are secure spaces for researchers to access and analyze sensitive data to help prevent unauthorized access and re-identification of individuals from de-identified data. Maintaining ultimate participant confidentiality and data security is at the core of this project and a federated approach enables all participant data to stay securely within the CYNAPSE platform at all times.
Patient and public involvement has also been essential from the outset. The CYNAPSE project team is working with patient groups to develop data governance and federation best practices to safely and securely maximize the use of data for research.