In storage, “crash recovery” means that upon coming up after a crash, a server must remember exactly what promises it made to clients, and must be able to carry out those promises.
For example, consider a directory on a server’s disk. The server may, for expediency, keep a copy of the directory in memory, so that routine lookups do not require access to disk. But now, suppose that a client requests that a file in the directory be renamed. The server may carry out the rename in its in-memory copy of the directory, and notify the client of the completed operation, without waiting for the writing back of the modification to disk to complete. If the server then crashes, then upon coming back up, it must remember what modification must be made to the disk copy of the directory.
Intentions, data, and meta-data must be preserved across a crash, even if the crash is a power failure. This constrains the use of RAM, which normally loses its contents at power failure.
Older Panasas systems relied on an on-board battery to help to solve the problem. When power failure was imminent, the system had a few seconds to tidy up by shoveling critical items out to disk, from which they could be retrieved at recovery.
But Panasas is moving in the direction of using off-the-shelf hardware components and standard form factors. Instead of relying on batteries, in version 7 of the system, Panasas’s director modules rely on NVDIMM1 to hold items that must be preserved; in a future version, Panasas’s storage modules will do the same.
The NVDIMM’s Panasas uses can be accessed as if they were regular RAM. But the requirement that the system must always be crash-ready, with no interval of a few seconds to tidy up, imposes a stern discipline on the software that writes to the NVDIMM.
- Routine memory accesses go through a processor cache. But writes to the NVDIMM must write through that cache, or bypass it altogether, since on power failure, the cache is toast.
- Operations on data in NVDIMM memory must be done atomically. That is, if an operation is in progress at the time of a crash, then upon recovery, either the operation should be complete, or it should be clear that the operation was not completed, and any modifications made for it can be ignored.
- The atomicity property, described above, requires that data structures in NVDIMM memory must never be modified in place. Instead, a copy of the data structure is made at another location in NVDIMM memory; the copy is modified; and finally, the pointer to the original data structure is modified to point to the new copy. This can be called write-anywhere modification (to use nomenclature from NetApp’s WAFL, “Write Anywhere File Layout”).
The payoff for adhering to this discipline is that client operations can take the fast path – for most writing operations the client does not have to wait for disk accesses to complete.
1 An NVDIMM is one type of non-volatile memory that is built from DRAM, just enough flash to store the contents of that DRAM, and a supercap that can power them both long enough to flush the DRAM to the flash if main power fails.