Home > Storage > ObjectScale and ECS > Product Documentation > Dell ECS: High Availability Design > ECS metadata
ECS maintains its own metadata that tracks data locations and transaction history. This metadata is maintained in logical tables and journals.
The tables hold key-value pairs to store information relating to the objects. A hash function is used to do fast lookups of values associated with a key. These key-value pairs are stored in a B+ tree for fast indexing of data locations. By storing the key-value pair in a balanced, searched tree format, like a B+ tree, the location of the data and metadata can be accessed quickly. In addition, to further enhance query performance of these logical tables, ECS implements a two-level log-structured merge (LSM) tree. Thus, there are two tree-like structures where a smaller tree is in memory (memory table) and the main B+ tree resides on disk. A lookup of key-value pairs first queries the memory table and if the value is not memory, it looks in the main B+ tree on disk.
Transaction history is recorded in journal logs, which are written to disks. The journals track the index transactions not yet committed to the B+ tree. After the transaction is logged into a journal, the memory table is updated. After the table in memory becomes full or after a set period of time, the table is merged, sorted, or dumped to B+ tree on disk. A checkpoint is then recorded in the journal. The following figure illustrates this process.
Both the journals and B+ trees are written to chunks.
ECS uses several different tables, and each table can get quite large. To optimize performance of table lookups, each table is divided into partitions that are distributed across the nodes in a VDC/site. The node where the partition is written becomes the owner or authority of that partition or section of the table.
One such table is a chunk table, which tracks the physical location of chunk fragments and replica copies on disk. The following table shows a sample of a partition of the chunk table. Each chunk identifies its physical location by listing the disk within the node, the file within the disk, the offset within that file, and the length of the data. Chunk ID C1 is erasure coded, and chunk ID C2 is triple mirrored. For more details about triple mirroring and erasure coding, see Advanced data-protection methods.
Chunk ID | Chunk location |
C1 | Node1:Disk1:file1:offset1:length Node2:Disk1:File1:offset1:length Node3:Disk1:File1:offset1:length Node4:Disk1:File1:offset1:length Node5:Disk1:File1:offset1:length Node6:Disk1:File1:offset1:length Node7:Disk1:File1:offset1:length Node8:Disk1:File1:offset1:length Node1:Disk2:File1:offset1:length Node2:Disk2:File1:offset1:length Node3:Disk2:File1:offset1:length Node4:Disk2:File1:offset1:length Node5:Disk2:File1:offset1:length Node6:Disk2:File1:offset1:length Node7:Disk2:File1:offset1:length Node8:Disk2:File1:offset1:length |
C2 | Node1:Disk3:File1:offset1:length Node2:Disk3:File1:offset1:length Node3:Disk3:File1:offset1:length |
Another example is an object table, which is used for object name to chunk mapping. The following table shows an example of a partition of an object table that lists the chunk or chunks and shows where an object resides in the chunk or chunks.
Object name | Chunk ID |
ImgA | C1:offset:length |
FileA | C4:offset:length C6:offset:length |
A service called vnest, which runs on all nodes, maintains the mapping of table partition owners. The following table shows an example of a portion of a vnest mapping table.
Table ID | Table partition owner |
Table 0 P1 | Node 1 |
Table 0 P2 | Node 2 |