Object-based storage for Linux clusters
By Garth Gibson, co-founder and CTO, Panasas Inc.
OCTOBER 30, 2003
http://www.computerworld.com/hardwaretopics/storage/story/0,10801,86634,00.html
Linux cluster computing has transformed the architecture of high-performance computing applications. High-cost supercomputers are being replaced by low-cost Linux clusters to solve the most challenging computing problems. To complement the performance potential of these Linux compute clusters, a new storage paradigm is needed. Object-based storage clustering is the foundation for a new class of storage systems that scale in capacity and performance to meet the demands of the most powerful Linux-based clusters.
For years, high-performance cluster computing has delivered solutions to the world's most challenging technical computing problems. More recently, these successes have been replicated in high-performance commercial applications using Linux clusters. Geophysicists are developing more capable seismic-analysis techniques to create images of the Earth's substructure and guide oil-field drilling and extraction operations. Pharmaceutical companies mine massive genomic data sets to provide better insight into human disease and develop more effective therapies. And Internet portals such as Yahoo Inc. and Google Inc. index and serve the content of the Internet.
An increasing appetite for shared storage performance
In addition to hefty computational requirements, these applications are characterized by high-performance I/O needs. Rapid access to shared data sets, often multiple terabytes in size, is critical for ensuring optimal use of compute cluster assets. Without it, already scant resources sit idle. These data sets need to be made globally available to all processes executing on the compute cluster in order to simplify development and systems management activities. Traditional networked storage systems are incapable of providing the necessary performance to serve the aggressive shared-access requirements of these expanding clusters.
For example, animation-rendering applications distribute scene generation tasks to hundreds of cluster compute nodes-each generating an individual frame of the final segment. Shared-scene and character information and per-frame rendering instructions must be accessed by each of the participating compute nodes, and each node generates as much as 50MB of output per frame. The individual frames are then sequenced and assembled into their final form for review. This is a common data-access scenario across many cluster computing applications.
Shortcomings of traditional shared storage
The natural inclination of cluster computing developers is to deploy shared storage that can be accessed by all nodes in the cluster. However, standard shared-storage technologies provided by file servers built from direct-attached storage are only sufficient for small clusters. Larger clusters require more scalable storage. Storage-area networks (SAN) and optimized network-attached storage (NAS) architectures have been employed for modest-sized clusters, however, these architectures have severe limitations as clusters become larger. Neither SAN nor NAS architectures support the aggressive concurrency and high per-client throughput requirements of these cluster computing applications.
Because of these limitations, organizations are forced to adopt a process in which data from a shared-storage system is "staged" (copied) to the compute nodes, processing is performed, and results are "destaged" from the nodes back to shared storage when done. In many applications, the staging setup time can be appreciable-up to several hours for large clusters.
Object-based storage: An emerging standard
For the growing community of cluster computer users, object-based storage is emerging as the foundation for building massively parallel storage systems that leverage commodity processing, networking and storage components to deliver unprecedented scalability and aggregate throughput in a cost-effective and manageable package.
At the core of this architecture are storage "objects," fundamental containers that house both application data and an extensible set of storage attributes. Traditional user and application files are decomposed into a set of storage objects and distributed across one or more "smart disks," also called object-based storage devices (OSD). Each OSD includes local processing capabilities, local memory for data and attribute caching and its own network connection. OSDs form the core of a distributed storage architecture in which much of the traditional storage-allocation activity can be offloaded from the file system layer, removing a key performance bottleneck present in current storage systems. Object attributes include security information and usage statistics available for enforcing credential-based access and quality-of-service policies as well as supporting dynamic data redistribution for cross-OSD load balancing. The object storage architecture mirrors the scale-out architecture of cluster computing systems, providing a balanced growth model that adds network bandwidth and processing capability in step with capacity increments to ensure scalability.
A standard for OSDs is being defined by technical working groups within the Storage Networking Industry Association (SNIA) and the T10 Technical Committee of Accredited Standards (download PDF). The standard includes a command set designed for the iSCSI protocol-in essence providing object extensions to the traditional SCSI block command set. Together, the object specification and command set define a new wave of intelligent storage devices that can be integrated into massively parallel, high performance, IP-based storage environments. The effort has the participation of many leading storage companies, including EMC Corp., Hewlett-Packard Co., IBM, Intel Corp., Seagate Technology LLC and Veritas Software Corp.
Pulling it all together
The object storage architecture provides the foundation for a new wave of networked storage systems. In emerging implementations, it's combined with a scalable metadata management layer that provides the file system interface to applications. This layer manages information such as directory membership and file ownership and permission attributes. It's also responsible for striping "component objects" (portions of files) across OSDs and ensuring data reliability and availability, for example, by coordinating backup and online redundant encoding, such as RAID Level 1 or 5. It's the layer through which client processes make requests (such as to open or close files), are authenticated and receive the information required for them to directly and securely access the cluster of OSDs, reading and writing file data without additional intervention by the metadata manager.
When implemented as part of a scalable, clustered file system, the object storage architecture is capable of delivering high aggregate bandwidth to hundreds of clients. In short, it delivers cost-effective, shared storage for high-performance Linux clusters.
Garth Gibson is co-founder and chief technology officer of Panasas Inc., a Fremont, Calif.-based vendor of storage system clusters specializing in object storage and targeted at Linux cluster computing. He received a Ph.D. in computer science from the University of California at Berkeley in 1991, where he also did the groundwork research and co-wrote the seminal paper on redundant arrays of independent disks (RAID). He joined the faculty at Carnegie Mellon University in Pittsburgh in 1991 and is now on leave from an associate professor appointment in the computer science and electrical and computer engineering departments.