If you are using L3 cache, we recommend the following best practices:
- Use a small number (ideally no more than two) of large capacity SSDs rather than multiple small SSDs.
- Use the appropriate capacity of SSDs that will fit your working dataset. The isi_cache_stats utility can help to determine that on existing clusters. A useful general rule is to size L3 SSD capacity per node according to the following formula:
L2 capacity + L3 capacity >= 150% of working set size.
- While L3 cache can potentially use up to a 2:1 hard drive to SSD ratio per node, use at most 2-3 SSDs for L3 per node.
- Repeated random read workloads will typically benefit most from L3 cache through latency improvements.
- Although not recommended, both L3 cache and Global Namespace Acceleration (GNA) are supported within the same cluster.
- The same procedure is used for replacing failed L3 cache SSDs as for other storage drives. However, L3 cache SSDs do not require FlexProtect or AutoBalance to run post replacement, so it is typically a much faster process.
- For a legacy node pool using a SmartPools metadata-write strategy, do not convert to L3 cache unless:
- The SSDs are seriously underutilized.
- The SSDs in the pool are oversubscribed and spilling over to hard disk.
- Your primary concern is SSD longevity.
L3 cache considerations
When deploying L3 cache, the following considerations should be kept in mind:
- All the SSDs within a node pool can either be used for L3 cache, or for SmartPools data strategies (metadata-ro, metadata-rw, data) – but not mixed L3/SmartPools usage.
- L3 cache is not applicable for nodes containing 16 or more SSDs, and all SSD node pools are not eligible for L3 cache enablement.
- Enabling L3 cache on an existing node pool with SSDs takes some time. The data and metadata on the SSDs have to be evacuated to other drives before the SSDs can be formatted for caching. Conversely, disabling L3 cache is a very fast operation because no data has to be moved and drive reformatting can begin right away.
- If you are concerned about metadata being evicted from L3, you can deploy more SSDs per node to accommodate a large working set. Alternatively, you can disable L3 and stick with traditional SmartPools metadata acceleration (either metadata read-only or read/write) for the particular node pool.
- It is possible to have GNA and L3 in the same cluster (different node pools), although some manual setup is required including a SmartPools policy to avoid SSD storage on L3 node pool. Note that L3 node pool hard drive space does count towards GNA limits
- All the SSDs in an L3 cache node pool must be the same size.
- If an L3 cache SSD fails, OneFS does not need to run FlexProtect or AutoBalance jobs, like with a regular file system SSD. However, after the failed SSD is replaced, some time will be needed before the cache is repopulated.
- All new node pools containing SSD will have L3 cache enabled by default.
- Existing node pools with SSD will not be modified to use L3 cache on upgrade.
- SSDs displace HDDs. More SSDs and fewer hard drive spindles can impact streaming and concurrency performance towards total capacity.
- The L3 cache is intentionally avoided for streaming reads during data prefetch operation. The streaming requests are kept to the spinning disks (HDDs), while the SSDs are used for the random I/O.
- L3 cache node pool hard drive space does not count in GNA SSD percentage calculations.
- In L3 cache, metadata is preferentially cached over data blocks.
- When a node reboots, there is no automatic flushing of L2 blocks to L3 cache.
- Unlike HDDs and SSDs that are used for storage, when an SSD used for L3 cache fails, the drive state should immediately change to REPLACE without a FlexProtect job running. An SSD used for L3 cache contains only cache data that does not have to be protected by FlexProtect. After the drive state changes to REPLACE, you can pull and replace the failed SSD.
- Although there is no percentage completion reporting shown when converting node pools to use L3 cache, you can estimate it by tracking SSD space usage throughout the job run. The Job impact policy of the FlexprotectPlus or SmartPools job, responsible for the L3 conversion, can also be reprioritized to run faster or slower.
- InsightIQ reports current and historical L3 cache statistics.
- For L3 cache, the isi_cache_stats prefetch statistics will always read zero because it is purely an eviction cache and does not use data or metadata prefetch.
- L3 cache has a metadata only mode (as opposed to data and metadata) to support high-density archive storage nodes.
For more information, see the OneFS SmartFlash white paper.