2314 exabytes of data – this was the number projected by the International Data Corporation (IDC) in a health trends report published in 2017 while predicting the amount of data to be generated in the healthcare industry by 2020, with a compound annual growth rate (CAGR) of 47% between 2013-2020. Now, if you are a deep-learning enthusiast like me, these numbers should send chills down your spine. After all, data of this scale entail unimaginable possibilities for building and training gigantic cognitive models to develop ingenious healthcare solutions with impeccable accuracies. However, this raises a natural question, “if this huge amount of data is actually available, then why wouldn’t every healthcare brand out there ensure that they make use of it to build their own proprietary solutions?” If you have some experience in this domain, though, it shouldn’t be hard for you to answer this question. There are several obstacles which restrict usage of all the data available in the healthcare and life sciences industry. The most glaring issue among those is concerned with privacy.

Data within the healthcare industry comprises sensitive information related to patients’ medical histories, payment records, and identification details. Medical data sources are siloed across distant geographical regions and many international agencies have already formed different laws and regulations for the protection of private data. On one hand, some of these laws, such as “Health Insurance Portability and Accountability Act” (HIPAA) in the USA and “General Data Protection Regulation” (GDPR) in the European Union and the European Economic Area regulate the usage and disclosure of protected health information and other personal data, which is a good thing for the consumers, but on the other hand, such regulations restrict the flows of information and make it almost impossible for us to make use of a huge proportion of the data to build robust healthcare solutions.

Then, there’s another problem even in the data which can be used without violating any privacy protection policies – the problem of heterogeneity. Medical records, in general, have unique properties which make them extremely hard to be integrated. These records range from free-text clinical notes to medical films and images. No single machine learning algorithm can deal with this kind of heterogeneity in data. Even the readily available data is far from being good enough to train intelligent systems, due to biases and imbalances which are quite common in medical records.

Federated learning, introduced by Google in 2016, is a powerful concept which is especially fit for the healthcare and life sciences domain as it helps bring the models to the data which can then learn from it without having to transfer a single bit of the data out of its location. While federated learning looks like a promising solution for different data privacy concerns in the healthcare industry, it is not yet fully developed to handle different challenges faced by the developers while designing solutions using medical data. Heterogenic devices, data imbalance, data poisoning, and traceability & reproducibility of different learning events are some of the open issues related to the federated learning model within the world of the healthcare and medical industries.

A significant advancement can be made in the field of federated learning by designing heterogeneous architectures to handle the intrinsic heterogeneity of the different participants (clients) and datasets involved in a federated learning framework. These heterogeneous architectures should be flexible enough to support different models in lower levels (to be trained on the client devices), which can be then aggregated in a single centralized model (to be kept in the server).

The idea of federated learning is still in its early developmental phase, where new paths are being explored quite vigorously. This leaves an entire universe of possibilities for leading software and engineering organizations. Persistents’ life sciences innovation & engineering team can develop end-to-end data-driven products deploying federated learning to support a diverse portfolio of healthcare businesses.

This new concept of privacy preservation has a lot to offer to our healthcare industry and Persistent is certain that federated learning will redefine data protection and learning from private data within Life Sciences domain. Read on to know more about Persistent’s Life Science Innovation & engineering capabilities that make use of technological expertise coupled with its extensive domain experience to solve complex data challenges using AI and ML.