Storage nodes create the data layer. Within the architecture, user data and metadata are stored only there. Thus, both data types can scale in relation to each other. The storage nodes are commercially available systems, however their hardware is well balanced in terms of hard disks, SSDs, VVMe and DRAM capacities, CPU performance, network bandwidth, etc.
Finally, the client driver DirectFlow is a loadable file system installed on compute servers, and to be used like any file system by any application. In cooperation with the director and storage nodes, it shows the behavior of a fully POSIX-compliant file system: in a single namespace, across all servers within the compute cluster. The Panasas driver supports all major Linux distributions and versions.
PanFS is designed to scale linearly. If 50 percent more storage nodes are added, storage capacity also increases by 50 percent. Adding more director nodes results in increased metadata processing speed. There is no upper limit on performance or capacity, which in turn makes the file system extremely well suited for high-performance computing (HPC) as well as for AI requirements.
Storage management – features and functions
As a parallel file system, PanFS is able to provide much more bandwidth than NFS and CIFS/SMB protocols. Each file stored by PanFS is distributed across many storage nodes so that each file component can be read and written in parallel. This impressively increases performance for file access.
Because PanFS is also a direct file system, the compute server can talk to all storage nodes via the network. Comparable products set up file access via so-called “head nodes” running the NFS or CIFS/SMB protocols, and via an additional backend network. The bottleneck occurs at these head nodes, and the backend network adds cost. In PanFS, the client driver on the compute server talks directly to the storage nodes, and the director nodes are not involved at all (“out-of-band”). As a result, there are hardly any bottlenecks, load points (hotspots) or even fluctuating performance, as it is the case in scale-out NAS systems.
File maps and erasure coding
Because all the components of a file are distributed, each file requires a file map that shows where the other components are located on respective storage nodes. The client driver uses this file map to identify which storage nodes it needs to, and can access both directly and in parallel.
PanFS also uses network-erasure coding to ensure the highest level of data integrity and reliability within the distribution process (striping). Because PanFS is fully POSIX-compliant, all processes on the client driver’s compute servers see the same file system namespace, metadata, and user file contents. The client driver DirectFlow also ensures cache coherency.
Data Management and encryption
To ensure the security of the system, PanFS provides so-called Access Control Lists (ACLs), not only for files, but also for directories. This is in addition to the common Linux style such as “-rwxr-xr-x”, but much more fine-grained. Snapshots per drive (at least one logical drive must be set up) allow user-based recovery of older file versions without requiring an admin. To ensure that data remains confidential, it can be encoded with DARE encryption (DARE: Data At Rest Encryption).
Mixed workload performance
In a storage system, file sizes, access patterns and workloads can change significantly over time. But Panasas supports all of these, and in doing so impressively expands the range of use cases. In high-performance computing (HPC), large files are not unique. PanFS supports genetic research as well as hosting central directories with a cloud provider.
Dynamic Data Acceleration
PanFS for HPC and AI workloads offers a “Dynamic Data Acceleration on PanFS” feature since 2020. This control feature is designed to accelerate data storage operations on Panasas’ ActiveStor Ultra appliances by helping storage media such as SSDs and hard drives become more efficient. The key factor is not access frequency, as with tiering, but file size. To enable DDA to do this work automatically, an algorithm in the orchestrator monitors how and where metadata and usage data are stored.
By dynamically controlling the movement of files between SSDs and HDDs and realizing the full potential of NVMe, PanFS delivers not only the highest possible performance for HPC and AI workloads at a reasonable cost of ownership, but also, just as importantly, in a consistent, predictable manner. The DDA algorithm controls the sweeper software that does the actual distribution of the small files.
To keep SSD utilization at about 80 percent of its capacity, the sweeper moves small files onto that media. “If an SSD is 80 percent full, the sweeper moves the largest files to disk. If a hard drive is ‘only’ 70 percent full, the sweeper moves the ‘smallest’ files to the faster SSDs,” Curtis Anderson explains. “DDA manages the relocation of small files between SSDs and hard drives to increase access performance as well as the performance of workloads that use small files by keeping them isolated from streaming workflows.” This is just one example of how PanFS keeps system performance at optimal levels.