Last month in Dallas, SC18 broke a record for the number of attendees and the excitement continues to build as the supercomputing market grows rapidly, particularly around a number of important, emerging applications such as AI, autonomous vehicles, and precision medicine.
Many of the commercial enterprises at SC18 are either adopting—or thinking of adopting—these new technologies but are worried about what HPC storage solution makes the most sense. Should they go with a commercial or with an open-source storage solution? Is a hybrid approach best or should an all-flash deployment be considered? And while everyone is looking for performance, the cost and ease of administering the storage cluster is equally important, as companies are dealing with the increased complexity of new applications and rapidly changing workloads.
Here are some of the comments that were shared with us at SC18:
Commercial vs. Open Source
Open source storage with filesystems running on commodity hardware is an attractive option for organizations that are working at the performance fringe and have the substantial IT resources required to deploy, tune and maintain the storage.
However, Enterprises who have deployed open source-based storage solutions talked about the need for more simplicity and the challenge to stay on top of administrative costs, which can quickly exceed the initial cost of acquisition, something that had attracted them to the open source solution in the first place. Lustre-based systems in particular were mentioned as a cause for dissatisfaction, not only because of their complexity and high cost to manage, but also due to the uncertainty about the future of Lustre in the hands of DDN.
Commercial storage solutions that meet high performance requirements and deliver a cost-effective, easily deployed and managed storage system are high on everyone’s radar. A high performance parallel filesystem deployed on commodity hardware and delivered as an integrated appliance was viewed as a highly attractive choice for those who are looking for the best of both worlds: fast and simple.
Enterprise IT executives are faced with a dilemma. Because AI drives large amounts of data and requires processing complex, mixed workloads, they believe their only choice is expensive flash memory. But because the data sizes are so enormous, the economics don’t add up.
What’s needed is an HPC parallel filesystem with intelligent data placement, tiered across a mix of storage media. A solution that delivers the required performance while being vastly more economical than flash. Intelligent data placement distributes data across three tiers of media – metadata on non-volatile memory express (NVMe), small files on SSDs, and large files on HDDs. The result? Optimized performance for all data types at a highly competitive price point.
The technology used to power autonomous vehicles has its own unique set of demands. To meet rigorous safety and certification standards, companies must collect enormous amounts of data, and analyze and reanalyze it. We have received RFPs that require more than 100PB of data that must be reprocessed on a regular basis. Enterprises often add 4PB per month, creating a challenging level of complexity and change. With open-source systems, enterprises may get the performance they need, but must hire a small army of highly skilled tech experts to manage it.
Utilizing a fully integrated plug-and-play appliance with a portable parallel filesystem running on industry-standard commodity hardware provides the highest performance and reliability at a competitive price. Our customers are working with a storage solution that is consistently fast, regardless of the complexity, and one that linearly scales performance without limitation.
Compute time, performance requirements, and storage capacity are the key challenges faced by IT departments across life science organizations. Across the myriad of life science applications, different types of data need to be analyzed within a specific workflow by multiple users (sometimes more than 1,000) running hundreds of sessions at a time.
The majority of data by volume is genomics-based, though imaging continues to rapidly grow as a component of storage. Next-generation sequencing (NGS) produces up to 50TB per week, per machine. Cryo-Electron Microscopy (CryoEM), which involves high-resolution images of small and large molecules, features raw data sets of up to 12TB that are generated in less than a week.
High-performance, high-capacity turn-key storage solutions are key to creating a productive research environment. Storage needs to be able to adopt to constantly changing workloads, support geographically dispersed users, and require as little manual interaction as possible.
- Enterprises are looking more like HPC shops, processing large and complex data sets with high precision. The hassle of managing complex storage deployments limits productivity and affects business outcome.
- Total cost-of-ownership (acquisition, administration, and scalability) is becoming increasingly important when trying to balance performance requirements with the need to deploy a cost-effective storage solution.
- Storage flexibility and adaptability are a must as emerging applications drive new and evolving use cases. Storage software and hardware based on industry-standard commodity hardware allow for the rapid adoption of new features and technologies, and offer customers a broad choice of storage options.
To learn how Panasas drives business innovation with ActiveStor high-performance storage and the portable PanFS parallel filesystem visit www.panasas.com/products/.