Estimated reading time: 4 minutes
Genomics’ reliance on data storage might not seem particularly profound at first, but choices about storage solutions have serious impacts on research outcomes.
Storage often hinders genomics research by providing insufficient compute or I/O capacity. Those issues can then escalate to service outages, lengthy restoring times, spiralling costs, and — most importantly — disrupted discoveries.
In 2020, Panasas commissioned a Hyperion survey asking data managers about the Total Cost of Ownership (TCO) for their High Performance Computing (HPC) storage. Nearly 50% of organizations experienced some kind of storage failure at least once a month. Around 40% of those organizations would require more than two days to restore that system to full functionality after a failure.
Since genomics research produces vast amounts of data, the field is especially vulnerable to such storage disturbances. According to the National Human Genome Research Institute, a single human genome sequence can require up to 200 gigabytes of space. In a business context, this might not seem like a huge quantity; however, labs regularly produce many times that amount each week, and their storage requirements are only growing.
The Center for Microbiome Innovation (CMI) at the University of California San Diego offers an example of storage’s significant effects on genomics. CMI is currently exploring the microbiomes that are increasingly critical to advancements in medicine. When their storage needs exceeded their systems’ capacities, though, the resultant issues threatened their research efforts.
CMI’s 300 users had been gathering massive amounts of data and trying to store them on individually allotted clusters of traditional storage. Their increasingly demanding methods, such as DNA sequence processing or multi-terabyte data set analysis, taxed their storage systems. The strain degraded their performance and ultimately compromised researchers’ discovery efforts. When CMI started carrying out large scale metagenomics sequencing, things really started to break down.
It was only when Panasas outfitted them with an ActiveStor solution that CMI could advance their research efforts again. Yoshiki Vázquez-Baeza, Associate Director of Bioinformatics Integration, said that if they’d had to limit their methods because of storage concerns, “we wouldn’t be able to explore the full breadth of scientific options. Panasas technology supports the mission of the Center because it never limits our exploration.”
Unfortunately, accounting practices for calculating the TCO of storage solutions often ignore the kinds of costs and downsides discussed here. This is why many opt for the superficially cheaper option of open source solutions. They prefer the lower upfront fee, but they don’t realize that what they appear to be saving in the short term may end up costing them substantially more in the long term.
Because if the storage infrastructure on which researchers depend can’t meet their data needs, or it fails, or it slows down their research — this is all an exacting loss. It can have directly damaging financial implications, such as maintenance costs or missed product and grant deadlines. Our Hyperion survey, for example, found that a single day of downtime can cost anywhere between $100,000 and $1 million. And some of that survey’s respondents reported that outages could last as long as one week.
And there’s broader loss, too, for which there is no metric to easily measure: the time, productivity, and energy lost when storage fails or slows research. Our Hyperion survey showed that more than 75% of respondents had experienced reduced productivity as a direct result of storage related problems. Every minute of downtime is a minute of lost movement toward the next breakthrough. That might not be easy to quantify, but it is clearly a profound loss.
Genomics is helping us better understand life on this planet and advancing innovations in medicine, biotechnology, agriculture, and a range of other fields. While storage advancements have accelerated genomics discoveries, short-sighted storage decisions can jeopardize that progress.