Data layout and protection

Thank you for your feedback!

OneFS is designed to withstand multiple simultaneous component failures (currently four per node pool) while still affording unfettered access to the entire file system and dataset. Data protection is implemented at the file level using Reed Solomon erasure coding, and, thus, is not dependent on any hardware RAID controllers. This implementation provides many benefits, including the ability to add new data protection schemes as market conditions or hardware attributes and characteristics evolve. Because protection is applied at the file level, a OneFS software upgrade is all that is required to make new protection and performance schemes available.

OneFS employs the popular Reed-Solomon erasure coding algorithm for its protection calculations. Protection is applied at the file level, enabling the cluster to recover data quickly and efficiently. Inodes, directories, and other metadata are protected at the same or higher level as the data blocks they reference. Because all data, metadata, and forward error correction (FEC) blocks are striped across multiple nodes, dedicated parity drives are not required. Striping across multiple nodes guards against single points of failure and bottlenecks, allows file reconstruction to be highly parallelized, and ensures that all hardware components in a cluster always do useful work.

OneFS supports several protection schemes. These schemes include the ubiquitous +2d:1n, which protects against two drive failures or one node failure.

Note: The best practice is to use the recommended protection level for a particular cluster configuration. The recommended level of protection is clearly marked as “suggested” in the OneFS WebUI storage pools configuration pages and is typically configured by default.

The hybrid protection schemes are particularly useful for high-density node configurations, such as the PowerScale A3000 chassis. In these configurations, the probability of multiple drives failing surpasses that of an entire node failure. One such example is if the file is beyond its protection level. In the unlikely event of multiple, simultaneous device failures, OneFS will reprotect everything possible and report errors on the individual files affected to the cluster’s logs.

OneFS also provides various mirroring options ranging from 2x to 8x, allowing from two to eight mirrors of the specified content. The mirroring method is used for protecting OneFS metadata. Metadata, for example, is mirrored at one level above FEC by default. For example, if a file is protected at +1n, its associated metadata object is 3x mirrored.

The full range of OneFS protection levels are summarized in the following table:

Table 1. OneFS FEC protection levels

Protection level	Description
+1n	Tolerate failure of one drive OR one node
+2d:1n	Tolerate failure of two drives OR one node
+2n	Tolerate failure of two drives OR two nodes
+3d:1n	Tolerate failure of three drives OR one node
+3d:1n1d	Tolerate failure of three drives OR one node AND one drive
+3n	Tolerate failure of three drives or three nodes
+4d:1n	Tolerate failure of four drives or one node
+4d:2n	Tolerate failure of four drives or two nodes
+4n	Tolerate failure of four nodes
2x to 8x	Mirrored over two through eight nodes, depending on configuration

Also, SmartPools can manage data and metadata objects separately due to a logical separation of data and metadata structures within the single OneFS file system.

OneFS stores file and directory metadata in inodes and B-trees, allowing the file system to scale to billions of objects and still provide fast lookups of data or metadata. OneFS is a symmetric and fully distributed file system with data and metadata spread across multiple hardware devices. Generally, data is protected using erasure coding for space-efficiency reasons, enabling utilization levels around 80 percent on clusters of five or more nodes. Metadata (which generally makes up around 2 percent of the system) is mirrored for performance and availability. Protection levels are dynamically configurable at a per-file or per-file system granularity, or anything in between. Data and metadata access and locking are coherent across the cluster, and this symmetry is fundamental to the simplicity and resiliency of the OneFS shared nothing architecture.

When a client connects to a OneFS-managed node and performs a write operation, files are broken into smaller logical chunks, or stripes units, before being written to disk. Then chunks are striped across the cluster’s nodes and protected either through erasure-coding or mirroring. OneFS primarily uses the Reed-Solomon erasure coding system for data protection, and mirroring for metadata. OneFS file-level protection typically provides industry-leading levels of utilization. And, for nine node and larger clusters, OneFS can sustain up to four full node failures while still providing full access to data.

OneFS uses multiple data layout methods to optimize for maximum efficiency and performance according to the data’s access pattern—for example, streaming, concurrency, random, and so on. As with protection, these performance attributes can also be applied per file or per file system.

Note: For more information about OneFS data protection levels, see the OneFS Technical Overview white paper.

Your Browser is Out of Date

Data layout and protection

Data layout and protection