Estimated reading time: 3 minutes
Genomics is one of the defining scientific fields of our age. It was only two decades ago that we first sequenced the human genome and since then, the discipline has made far-reaching changes in the world around us.
Medicine, biology and agriculture are using Genomics to push their disciplines forward in profound ways.
It’s helping to fight new pathogens like Covid-19 and the genetic diseases and cancers which continue to plague human societies. Genome sequencing is being used to detect genetic predispositions to diabetes, heart disease, asthma and various cancers and moving healthcare from cure to prevention. Genomic epidemiology was carried out on an unprecedented scale to sequence and track the spread of new Covid-19 variants. As the smoke clears from the pandemic, we’ll be relying on this field more and more to help defend against new viruses and diseases.
When the first human genome was sequenced 20 years ago, the practice cost $2.7 billion — today it costs just under $1000. That was largely driven by advances in computing power and today, the field still relies on it to advance its discoveries.
We can’t take those gains for granted. Behind those past achievements and expected innovations is a dense network of machines, people and processes that drive genomic research forward. A critical part of that is storage.
Sequencing genomes generates a huge amount of data. A human genome, for example, contains three billion base pairs. When that genome is sequenced, data is output for analysis. The result is a map of a human genome which is around 100 gigabytes of space. Labs regularly produce hundreds of terabytes per month and that requires a commensurate amount of storage. When labs can’t get that, research suffers.
The Center for Microbiome Innovation (CMI) at the University of California San Diego performs cutting edge research into the microbial communities which populate the natural world. When they decided to scale their research they ran into serious problems. The storage they had been using couldn’t handle the scale up in I/O requirements and soon after, research slowed to a crawl and discovery efforts were seriously hobbled. It was only when they approached Panasas, who outfitted them with an ActiveStor solution, that they could advance their research efforts again.
Jeff DeReus, CMI’s systems administrator said that now, “we have more people that want to get work done, using data sets that are orders of magnitude larger than anything we’ve seen.” He added that, “having technology that takes the pain away from sluggish file systems or longer run times is really important.”
The hunger for storage in genomic research doesn’t look like it’s going to diminish either. One study from 2015 shows that the storage needs of even the largest genomic institutions did not exceed 22 petabytes. By 2025, the study continues, as many as 40 exabytes may be needed for the storage of human genome data alone.
Storage might appear tenuously related to genomics, but this seemingly parochial aspect has a disproportionate effect on advances in research. The reality is that genomics research ambitions are spiraling upwards, and they need storage capabilities that meet those ambitions.