The client is an innovator and trendsetter in the biotechnology industry where they apply cutting-edge technology for DNA sequencing. The result from their analysis is primarily used in diagnosis and predicting the chances of developing a health condition in the future.
Their genomic applications and database were spread between their on-prem infrastructure and the public cloud by a leading cloud provider. The on-prem system couldn’t handle their Machine Learning models and their previous cloud provider wasn’t able to offer the desired functionalities and benefits, like scale and cost-savings.
The client wanted to collaborate with a strategic partner to move to the scalable and cost-effective Google Cloud. Persistent was looped in as the digital engineering partner to begin the phase one of the many digital transformation projects.
Phase 1: Infrastructure migration and building Lab Workflow Management Tool
Compute, storage, genomic databases, and networking resources were migrated to Google Cloud. Data was earlier stored in SQL as a relational database. The Persistent team migrated the data based on the use case requirements to a NoSQL database; if the genome data size was huge, it was taken to BigQuery. ETL jobs were used to sync all these data for data warehousing and reporting. Data processing and orchestration were also migrated to Google Cloud to save time and cost of ML training.
The Google Cloud products used include Google Cloud Storage; AI Platform; Compute Engine for virtual machines; Dataflow for data processing; Pub/Sub for event ingestion and delivery system pipelines; Kubeflow for developing, orchestrating, deploying, and running scalable and portable ML workloads; Cloud Functions for serverless compute; and Google Kubernetes Engine for developing containerized applications.
For the Lab Workflow Management Tool, the Persistent team scoped out the requirements of the client’s scientists and built a web application. The scientists are now able to track in real-time their experiments, end-to-end through the different phases of execution, and make rapid adjustments to optimize results.
More in phase 1: Data labeling service and architecture planning
Persistent also performed requirement analysis and planned the architecture for data capture, signal processing pipelines, and orchestration of the client’s advanced sequencing workflows.
The analysis included the details of all the required functions, the data pipeline gaps that needed to be addressed, and how to achieve seamless scaling of the platform for growthAsheesh Sharma, Google Business Unit, Persistent
Further, Persistent proposed Google’s labeling service to label the images generated during the genome sequencing. Persistent consultants also conducted training sessions for detailed knowledge transfer of the workflow.
Phase 2: Improving analytics and the UX of the Lab Workflow Management Tool
The following are the projects in the second phase:
- Improving the user experience of the lab tool: More functionalities will be added to their laboratory information management system like data export, dashboard, different data views, notifications, reminders, and email summary
- Advanced analytics: The on-prem reporting tools accessed data from local files and relational databases that were not optimized for reporting and analytics. Hence, the team is working on improving the reporting and analytics capabilities
We are working on ensuring that all the inputs for reporting are delivered from the centralized data store, BigQuery. Also, we are integrating the reporting tool with the Lab Workflow Management Tool so that the reports can be accessed on the tool itselfAsheesh
Optimizing the DNA sequencing workflow
The client is now witnessing reduced cost and time of the DNA sequencing jobs. There are almost no hardware limitations as they are running their ML models on Google Cloud. And the scientists and the client teams will experience a drastic enhancement in the user experience of their lab management tool after the completion of phase two.