Accelerate Genomics Insights and Discovery with High-Performing, Scalable Architecture from Dell and Intel
Download PDFThu, 05 Oct 2023 19:52:19 -0000
|Read Time: 0 minutes
Summary
The field of genomics requires the storage and processing of vast amounts of data. In this brief, Intel and Dell technologists discuss key considerations to successfully deploy BeeGFS based storage for genomics applications on the 16th Generation PowerEdge Server portfolio offerings.
Market positioning
The life sciences industry faces intense pressure to accelerate results and bring new treatments to market while lowering costs, especially in genomics. But life-changing discoveries often depend on processing, storing, and analyzing enormous volumes of genomic sequencing data — more than 20 TB of new data per day by one organization alone[1], with each modern genome sequencer producing up to 10TB of new data per day. Researchers need high-performing solutions built to handle this volume of data, in addition to demanding analytics and artificial intelligence (AI) workloads, and that are also easy to deploy and scale.
Dell and Intel have collaborated on a bill of materials (BoM) that provides life science organizations with a scalable solution for genomics. This solution features high-performance compute and storage building blocks for one of the leading parallel cluster file systems, BeeGFS. The BoM features four Dell PowerEdge rack server nodes powered by 4th Generation Intel® Xeon® Scalable processors, which deliver the performance needed for faster results and time to production.
The BoM can be tailored for each organization’s architectural needs. For dense configurations, customers can use the Dell PowerEdge C6600 enclosure with PowerEdge C6620 server nodes instead of standard PowerEdge R660 servers (each PowerEdge C6600 chassis can hold up to four PowerEdge C6620 server nodes). If they already have a storage solution in place using InfiniBand fabric, the nodes can be equipped with an additional Mellanox ConnectX-6 HDR100 InfiniBand adapter.
Key Considerations
Key considerations for deploying genomics solutions on Dell PowerEdge servers include:
- Core count: Life sciences organizations often process a whole genome on a cluster, which scales linearly with core count. The Dell PowerEdge solution offers up to 32 cores per CPU to meet performance requirements.
- Memory requirements: This BoM provides 512 GB of DRAM to support specific tasks in workloads that have higher memory requirements, such as running Burrows-Wheeler Aligner algorithms.
- Local and distributed storage: Input/output (I/O) is a big consideration for genomics workloads because datasets can reach hundreds of gigabytes in size. Dell and Intel recommend 3.2 TB of local storage specifically for commonly used genomics tools that read and write many temporary files.
Available Configurations
Feature | Configuration |
Platform | 4 x Dell R660 supporting 8 x 2.5” NVMe drives - direct connection |
CPU (per server) | 2x Xeon Gold 6438Y+ (32c @ 2.0GHz) |
DRAM | 512GB (16 x 32GB DDR5-4800) |
Boot device | Dell BOSS-N1 with 2x 480GB M.2 NVMe SSD (RAID1) |
Storage | 1x 3.2TB Solidigm D7-P5620 SSD (PCIe Gen4, Mixed-use) |
Capacity storage | Dell Ready Solutions for HPC BeeGFS Storage: 500 GB capacity per 30x coverage whole genome sequence (WGS) to be processed; 800 MB/s total (200 MB/s per node). |
NIC | Intel E810-XXV Dual Port 10/25GbE SFP28, OCP NIC 3.0 |
Learn More
Contact your Dell or Intel account team for a customized quote at 1-877-289-3355.
Intel Select Solutions for Genomics Analysis: https://www.intel.com/content/dam/www/public/us/en/documents/solution-briefs/select-genomics-analytics.pdf
Dell HPC Ready Architecture for Genomics: https://infohub.delltechnologies.com/static/media/6cb85249-c458-4c06-bcec-ef35c1a363ca.pdf?dgc=SM&cid=1117&lid=spr4502976221&linkId=112053582
Dell Ready Solutions for HPC BeeGFS Storage: https://www.dell.com/support/kbdoc/en-us/000130963/dell-emc-ready-solutions-for-hpc-beegfs-high-performance-storage
[1] Broad Institute. “Sharing Data and Tools to Enable Discovery” https://www.broadinstitute.org/sharing-data-and-tools/cloud-computing#top.