Next Generation Sequencing in the Cloud
In the last few years, genome sequencing has steadily gained traction in biomedical research and other clinical communities. Before the early 2000s, Sanger sequencing, introduced in 1977 by Fredrick Sanger, was the most commonly used method of genome sequencing. However, following the publication of the first draft of human genome in 2001, there was a large spike in the interest in sequencing entire genomes. This increased demand for sequencing providing a large impetus for the development of Next Generation Sequencing (NGS) platforms
NGS is a group of technologies, first of which was commercially launched by 454 Life Sciences in 2005, with the capability to generate millions of genomic sequences at lower cost and efforts compared to the traditional Sanger sequencing method. Illumina, Life Technologies, and Roche 454 are the three players offering the most widely used platforms, namely HiSeq, Ion torrent/SOLiD and GS FLX series respectively. Each of these platforms is capable of producing at a minimum 1 GB to 600 GB (in case of HiSeq 2000 platform) sequence data per run. The sequence data is used to generate a genome sequence (denovo genome assembly), and identify genetic variation and gene signatures. Analyzing this quantum of data requires large amount of computational power, typically requiring more than 32 GB RAM, 4 processors and 1 TB RAM per sample. The requirement increases exponentially as the quantity of data to be analyzed, genome size and the complexity of the genome increases. It is not uncommon to find labs equipped with multiple servers with 512 GB RAM, 48 cores and more then 10 TB of disk space. Servers with this capacity run in tens of thousands of dollars, not including the personnel and maintenance costs. This kind of infrastructure investment might work for a laboratory that has the resources and dedicated IT staff, however, for small labs and labs that have sporadic, but large, NGS data analysis projects, this investment can be a significant road block for under taking such projects.
In this scenario Cloud computing offers an attractive alternative, providing the right flexibility to users based on their sporadic usage. These flexible payment plans allow users to pay only for the amount of time the infrastructure is used, allowing to scale up or scale down as required. Apart from scalability and a low cost model, Cloud computing provides agility and multi-tenancy to user operations.
Cloud computing typically comes in three flavors; Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). In the context of NGS data analysis requirements, IaaS and SaaS models work well. Cloud computing has generated a lot of buzz in the NGS data analysis field and a few data analysis solutions such as Galaxy are currently available on Cloud. However, a widespread adoption of Cloud computing in NGS is still lagging.
Inspite of the initial teething problems, Cloud Computing still presents an exciting proposition in the world of NGS and providers are looking at Cloud alternatives to address these concerns. For instance, institutes that have high throughput of data and are sensitive to data security issues, are exploring other options including Private or Hybrid Cloud, which promises more control over security, availability and accountability of their data.
In a nut shell, Cloud computing and NGS is a potent combination capable of providing the right incentives and affordable options for smaller labs, continuing to drive innovation in the Life Sciences space.