Comparing the Relative Performance of Different Parallel File Systems

Recently, a fellow Panasas employee asked how our new Panasas ActiveStor Ultra running the Panasas PanFS file system compared, performance-wise, to the Lustre, IBM Spectrum Scale (GPFS) and BeeGFS parallel file systems. I knew what our numbers were, but I had to figure out how they compared to these other parallel file systems. Since they all are scale-out parallel file systems, their maximum performance is theoretically limitless, so I needed to find some way to make a fair comparison.

Compounding the problem was the fact that all the different commercially available systems that integrate these parallel file systems had different form factors, different server/JBOD ratios and hard disk drive (HDD) counts (for this analysis, I limited the scope to HDD-based systems). The diagram below shows an example of some of the solutions and different form factors on offer.

How to Compare the Performance of These?

As you can see, some systems have various GB/s performance numbers identified in the diagram, and the other systems of various sizes also have performance data associated with them. But how to compare all these systems with different sizes and form factors?

What Do They All Have in Common?

I started to compare performance per server and even performance per rack, but our benchmarking engineer, who also had experience benchmarking Lustre and GPFS, recommended I look at throughput per hard disk drive (HDD) as a good comparable figure of merit, since the number of disks has the primary impact on the system footprint, TCO, and performance efficiency. And if you think about it, isn’t that what a high-performance file system is supposed to do: read and write data to the storage media as fast as possible? Bandwidth per drive is really the only way to compare storage systems with significantly different hardware and software architectures to get a comparable MB/s per $ metric. And since this is an HPC storage blog, to get a more relevant comparative figure of merit I multiplied the per HDD throughput number by 100 to get throughput per 100 drives, and presented the results in GB/s.

IBM Spectrum Scale (GPFS) Comparative Performance

Let’s start with the published performance results of the IBM Elastic Storage Server (ESS). IBM does an exemplary job documenting their products, and ESS performance is detailed here. Looking at slide 7, the HDD-based Model GL4S is rated at 24 GB/s and has 334 disk drives, or 71.86 MB/s/HDD, and multiplying the per HDD throughput number by 100 results in 7.2 GB/s/100 HDD. IBM advertises read throughput and I will focus on that in this blog (since that is the most readily available benchmark number supplied by most vendors), but note that IBM GPFS write speeds can be almost 2 times slower than reads. I picked the GL4S because it was a bit faster per HDD than the bigger GL6S, but the reader can calculate that all IBM ESS systems, no matter the size, fall within a tight range of GB/s/100 HDD values (further supporting an HDD-based performance figure of merit thesis).

For comparison, a four ASU Panasas ActiveStor Ultra with PanFS and 96 HDDs has a read throughput of 12,465 MB/s, or 13.0 GB/s/100 HDD. Plotting the results below shows Panasas ActiveStor Ultra read performance to be nearly 2X faster than GPFS.

Comparative Read Performance of PanFS v. GPFS ESS GL4S in GB/s

And PanFS read and write performance are close to being equivalent to each other (making PanFS write performance nearly 4X faster than GPFS). It should be noted that IBM ESS uses the GPFS “Scatter” data placement mode, where data is randomly written to disk. While being a bit slower, Scatter mode has the valuable customer benefit of maintaining uniform performance as the file system fills, avoiding performance loss from fragmentation, a wildly popular feature not common in most file systems. PanFS also maintains similarly consistent performance, but uniquely at the much higher performance rates shown.

Under the Hood, What Makes PanFS on ActiveStor Ultra So Fast?

The latest release of PanFS features a multi-tier intelligent data placement architecture that matches the right type of storage for each type of data:

Small files are stored on low-latency flash SSD
Large files are stored on low-cost, high-capacity, high-bandwidth HDD
Metadata is stored on low-latency NVMe SSD
An NVDIMM intent-log stages both data & metadata operations
Unmodified data and metadata are cached in DRAM

Because PanFS protects newly written data in NVDIMM, it allows the other drives to write data fully asynchronously, coalescing writes and placing data on the HDDs in the most efficient pattern for best performance. Accumulating newly written data in larger sequential regions reduces data fragmentation, so later reads of the data will also be sequential. In addition, ActiveStor Ultra features an extremely balanced design that deftly optimizes the right amount of CPU, memory, storage media, and networking — with no hardware bottlenecks from the NIC down to the disk, to deliver maximum PanFS performance and best price/performance.

Next Up, Lustre

For Lustre, I looked at data from a famous large customer site (slides 67, 68) with the figure of merit calculated to be 50,000 MB/s / 656 HDD x 100 = 7.6 GB/s/100 HDD. These customer site results (achieved no doubt by the best Cray, Seagate and KAUST engineers) v. PanFS results are shown below.

Comparative Read Performance of PanFS v. Lustre ClusterStor L300 in GB/s

I previously discussed how the performance of most file systems degrade as the they fill over time — except for GPFS using Scatter mode and PanFS. A great example of that behavior is described in this Lustre presentation with the chart below illustrating how significant the performance drop can be.

Lustre Performance Degradation v. Capacity Fill

Questions recorded during the video of the presentation show the concern some Lustre users had over these findings.

BeeGFS

Lastly, let’s look at BeeGFS. There are several example systems with performance documented on the BeeGFS website, ranging from fast unprotected RAID 0 systems to slower ZFS-based systems. Since BeeGFS recommends servers with 24 to 72 disks per server in several RAID 6 groups, usually 10 or 12 drives per RAID 6 group, for a high number of clients in cluster or enterprise environments, I chose to analyze the so configured ThinkParQ authored performance whitepaper accordingly. The results there were 3,750 MB/s / 48 HDD x 100 = 7.8 GB/s/100 HDD. And those are plotted below against the results for Panasas ActiveStor Ultra.

Comparative Read Performance of PanFS v. BeeGFS in GB/s

To Cache or Not to Cache

The results above (including those of Panasas ActiveStor Ultra) were achieved without caching affecting performance. There is one more parallel file system solution to take a look at: the newly released Lustre-based DDN EXAScaler SFA18K (slide 21). Its numbers are 60,000 MB/s / 400 HDD x 100 = 15.0 GB/s/100 HDD. DDN does not disclose if these are cached or un-cached results. Lustre does have caches including write-through (where data immediately written to disk also stays in cache for later readback). And although there is scant cache information available on the SFA18K datasheet, previous SFAs had large 512 GB DDR4 RAM caches. Knowing DDN, let’s assume these are cached results for now.

The Panasas ActiveStor Ultra also has very large caches (32 GB of DDR4 RAM per ASU node), and our write-through read back cached results are 25.4 GB/s/100 HDD. Big caches can have beneficial effects on application performance which is why Panasas and DDN go through the expense of including them in our systems. The comparative performance between Panasas and DDN is shown in the chart below.

Comparative Read Performance of PanFS v. DDN Lustre ES18K in GB/s

Conclusion

A summary slide showing the comparative (non-cached) performance of the parallel file systems mentioned is shown below. PanFS is nearly twice as efficient as the other parallel file systems at delivering bandwidth from a given set of hardware.

Comparative Read Performance of PanFS v. Competitive Parallel File Systems in GB/s

Admittedly, this is just a simple first-order method to assess the relative performance of parallel file system based high-performance storage systems using easily found public information. Everyone’s application mix and use-cases are different and specific actual targeted benchmarking is required to see how each system would perform against an organization’s specific workload.

But it does show that the performance of Panasas ActiveStor Ultra with PanFS is serious competition compared to these systems and should be on your short list for new high-performance storage deployments.

Contact Us

Thank You

Comparing the Relative Performance of Different Parallel File Systems

What Do They All Have in Common?

IBM Spectrum Scale (GPFS) Comparative Performance

Under the Hood, What Makes PanFS on ActiveStor Ultra So Fast?

Next Up, Lustre

BeeGFS

To Cache or Not to Cache

Conclusion

Related Blogs

Why Scale-out NAS Can’t Keep Up With HPC

How to Choose the Right Storage for AI and HPC

SC18: The Panasas Perspective