Introducing the Next Generation of PowerScale – the AI Ready Data Platform
Tue, 20 Feb 2024 19:07:47 -0000
|Read Time: 0 minutes
Generative AI systems thrive on vast amounts of unstructured data, which are essential for training algorithms to recognize patterns, make predictions, and generate new content. Unstructured data – such as text, images, and audio – does not follow a predefined model, making it more complex and varied than structured data.
Preprocessing unstructured data
Unstructured data does not have a predefined format or schema, including text, images, audio, video, or documents. Preprocessing unstructured data involves cleaning, normalizing, and transforming the data into a structured or semi-structured form that the AI can understand and that can be used for analysis or machine learning.
Preprocessing unstructured data for generative AI is a crucial step that involves preparing the raw data for use in training AI models. The goal is to enhance the quality and structure of the data to improve the performance of generative models.
There are different steps and techniques for preprocessing unstructured data, depending on the type and purpose of the data. Some common steps are:
- Data completion: This step involves filling in missing or incomplete data, either by using average or estimated values or by discarding or ignoring the data points with missing fields.
- Data noise reduction: This step involves removing or reducing irrelevant, redundant, or erroneous data, such as duplicates, spelling errors, hidden objects, or background noise.
- Data transformation: This step involves converting the data into a standard or consistent format, including scaling and normalizing numerical data, encoding categorical data, or extracting features from text, image, audio, or video data.
- Data reduction: This step involves reducing the dimensionality or size of the data, either by selecting a subset of relevant features or data points or by applying techniques such as principal component analysis, clustering, or sampling.
- Data validation: This step involves checking the quality and accuracy of the preprocessed data by using statistical methods, visualization tools, or domain knowledge.
These steps can help enhance the quality, reliability, and interpretability of the data, which can improve the performance and outcomes of the analysis or machine learning models.
PowerScale F210 and F710 platform
PowerScale’s continuous innovation extends into the AI era with the introduction of the next generation of PowerEdge-based nodes, including the PowerScale F210 and F710. The new PowerScale all-flash nodes leverage Dell PowerEdge R660, unlocking the next generation of performance. On the software front, the F210 and F710 take advantage of significant performance improvements in PowerScale OneFS 9.7. Combining the hardware and software innovations, the F210 and F710 tackle the most demanding workloads with ease.
The F210 and F710 offer greater density in a 1U platform, with the F710 supporting 10 NVMe SSDs per node and the F210 offering a 15.36 TB drive option. The Sapphire Rapids CPU provide 19% lower cycles-per-instruction. PCIe Gen 5 doubles throughput when compared to PCIe Gen 4. Additionally, the nodes take advantage of DDR5, offering greater speed and bandwidth.
From a software perspective, PowerScale OneFS 9.7 introduces a significant leap in performance. OneFS 9.7 updates the protocol stack, locking, and direct-write. To learn more about OneFS 9.7, check out this article on PowerScale OneFS 9.7.
The OneFS journal in the all-flash F210 and F710 nodes uses a 32 GB configuration of the Dell Software Defined Persistent Memory (SDPM) technology. Previous platforms used NVDIMM-n for persistent memory, which consumed a DIMM slot.
For more details about the F210 and F710, see our other blog post at Dell.com: https://www.dell.com/en-us/blog/next-gen-workloads-require-next-gen-storage/.
Performance
The introduction of the PowerScale F210 and F710 nodes capitalizes on significant leaps in hardware and software from the previous generations. OneFS 9.7 introduces tremendous performance-oriented updates, including the protocol stack, locking, and direct-write. The PowerEdge-based servers offer a substantial hardware leap from previous generations. The hardware and software advancements combine to offer enormous performance gains, particularly for streaming reads and writes.
PowerScale F210
The PowerScale F210 is a 1U chassis based on the PowerEdge R660. A minimum of three nodes is required to form a cluster, with a maximum of 252 nodes. The F210 is node pool compatible with the F200.
Table 1. F210 specifications
Attribute | PowerScale F210 Specification |
Chassis | 1U Dell PowerEdge R660 |
CPU | Single Socket – Intel Sapphire Rapids 4410Y (2G/12C) |
Memory | Dual Rank DDR5 RDIMMs 128 GB (8 x 16 GB) |
Journal | 1 x 32 GB SDPM |
Front-end networking | 2 x 100 GbE or 25 GbE |
Infrastructure networking | 2 x 100 GbE or 25 GbE |
NVMe SSD drives | 4 |
PowerScale F710
The PowerScale F710 is a 1U chassis based on the PowerEdge R660. A minimum of three nodes is required to form a cluster, with a maximum of 252 nodes.
Table 2. F710 specifications
Attribute | PowerScale F710 Specification |
Chassis | 1U Dell PowerEdge R660 |
CPU | Dual Socket – Intel Sapphire Rapids 6442Y (2.6G/24C) |
Memory | Dual Rank DDR5 RDIMMs 512 GB (16 x 32 GB) |
Journal | 1 x 32 GB SDPM |
Front-end networking | 2 x 100 GbE or 25 GbE |
Infrastructure networking | 2 x 100 GbE |
NVMe SSD drives | 10 |
For more details on the new PowerScale all-flash platforms, see the PowerScale All-Flash F210 and F710 white paper.
Author: Aqib Kazi