Cache prefetching

Thank you for your feedback!

For read caching to be beneficial for reads, the cache must already contain data before it is requested. The storage system must accurately determine file access patterns and prepopulate the cache with data, and metadata blocks, before they are requested. OneFS uses two primary sources of information for predicting a file’s access pattern:
- OneFS attributes that can be set on files and directories to provide hints to the file system
- The read activity occurring on the file
This technique is known as prefetching, whereby predictively copying data into a cache before it has been requested mitigates the latency of an operation. Data prefetching is employed frequently and is a significant benefactor of the OneFS flexible file allocation strategy.
Flexible allocation means to determine the best layout for a file based on several factors, including:
- Cluster size (number of nodes
- File size
- Protection level (for example, +2 or +3)
The performance effect of flexible allocation is placement of a file on the largest number of drives possible.
The most straightforward application of prefetch is file data, where linear access is common for unstructured data, such as media files. Reading and writing of such files generally starts at the beginning and continues unimpeded to the end of the file. After a few requests, it becomes highly likely that a file is being streamed to the end.
However, for streaming reads from low-latency SSD media, the cache benefit of prefetching is typically less than the overhead. To address this benefit, OneFS 9.5 and later releases automatically disable L2 cache prefetching for concurrent and streaming reads from SSD media, in a process known as Direct Read. However, L2 caching is still used when prefetching data blocks from spinning disk (HDD).
Similarly, OneFS 9.7 introduces Direct Write, also known as NCIO or non-cached I/O, which is a feature that targets the NVMe-based F series platforms, and operates by bypassing cache to increase write throughput.
Writes to newly allocated blocks are identified and queued directly to the drives, by skipping the L2 cache and journal. This allows OneFS to better utilize the NVMe drives, reducing I/O access latency, and freeing up L2 cache and journal bandwidth for writes. And this will help in any streaming write or heavy sequential write workload.
Direct Write is analogous to the L2-bypass Direct Read functionality described above. Enabled by default in 9.7, it requires no license, is not user configurable, and, as such, has no CLI or WebUI interface.
OneFS data prefetch strategies can be configured either from the command line or using SmartPools. File data prefetch behavior can be controlled down to a per-file granularity using the isi set command’s access pattern setting. The available selectable file access patterns include concurrency (the default), streaming, and random.
Metadata prefetch occurs for the same reason as file data, particularly benefiting scanning operations, such as finds and tree walks. To ensure that metadata keeps up with data block prefetching, OneFS 9.5 and later releases include both a metadata prefetcher and lock prefetcher. They are automatically enabled for concurrent and streaming data access patterns, but remain inactive for random access where there is typically little benefit.
OneFS also provides a mechanism for prefetching files based on their nomenclature. In film and TV production, “streaming” often takes a different form as opposed to streaming an audio file. Each frame in a movie is often contained in an individual file. As such, streaming reads a set of image files, and prefetching across files is important. The files are often a subset of a directory, so directory entry prefetch does not apply. Ideally, a client application could control this issue, but in practice, it rarely occurs.
To address this issue, OneFS has a file name prefetch facility. While file name prefetch is disabled by default, as with file data prefetch, it can be enabled with file access settings. When enabled, file name prefetch guesses the next sequence of files to be read by matching against several generic naming patterns.
Flexible file handle affinity (FHA) is a read-side algorithm designed to better use the internal threads that read files. Using system configuration options and read access profiling, the number of operations per thread can be tuned to improve the efficiency of reads. FHA maps file handles to worker threads according to a combination of:
- System settings
- Locality of the read requests (in terms of how close the requested addresses are)
- Latency of the thread or threads serving requests to a particular client
Note: Prefetch does not directly apply to the L3 cache because blocks that are evicted from L2 cache exclusively populate L3 cache. However, prefetch can affect L3 cache indirectly if and when prefetched blocks are evicted from L2 cache and are considered for inclusion in L3 cache.

Your Browser is Out of Date

Cache prefetching

Cache prefetching