A megabyte of data is like water vapor, which, in addition to some insulation, only needs a few empty pipes to get from one place to another. A gigabyte of data is more like water: You still need empty pipes, but you also need some electric pumps to transport it from one place to another. Finally, a petabyte of data is more comparable to ice, which does not flow through pipes, no matter how much you drive it — you would have to cut it into blocks or crush it and place it on a conveyor belt. That process is not only much more energy-intensive, but it also requires a completely different physical infrastructure compared to transporting the water or steam. In other words, the amount of data (or the temperature of the water) makes the difference, and it determines the way that it must be processed.
“Parallel file systems” were invented as offshoots of typical network file systems such as NFS or SMB/CIFS precisely because they can function as the “conveyor belt” that HPC systems need, as opposed to the simple “pipe.”
These two analogies provide simple mental models representing why HPC storage solutions are so different from enterprise storage solutions. Data on the scale that HPC typically processes requires a physical infrastructure and an energy level that is not available in storage solutions for most companies.
About the author:
Curtis Anderson is a data storage expert with more than 34 years of experience. Anderson focuses on the implementation of file systems. Anderson was one of the five original authors of the XFS file system, which is now widely used in Linux, and worked on the Veritas VxFS file system before Veritas was launched. He was also a member of the IEEE for 14 years, including as a sponsor chair for the IEEE 1244 Working Group, which coordinated and published a formal standard for the sharing of tape drives and tape robots in a SAN by several hosts. In his function as software architect at Panasas, Anderson is responsible for coordinating technology teams working on various elements that make up Panasas’ parallel storage file system. Before joining Panasas, Anderson worked as Technical Director at NetApp and as an architect at EMC/Data Domain. Anderson holds 10 patents, including in the areas of continuous data backup and replication of deduplicated file data over a network.
The authors are responsible for the content and accuracy of their contributions. The opinions presented reflect the views of the authors.
Read the original article in German here.