Home > Workload Solutions > High Performance Computing > White Papers > Dell Technologies Validated Design for Genomics with NVIDIA Clara Parabricks On AMD-powered Dell PowerEdge > Reducing secondary analysis time to keep pace with NGS data generation
Due to the size of individual sample data and volume of samples, WGS secondary analysis is a compute and storage-intensive process. The most commonly used and cited methods for secondary analysis include the Burrows Wheeler Alignment (BWA-Mem) (Li, 2009), and the Genome Analysis Toolkit (GATK) (McKenna, 2010). Using the Broad GATK Best Practices workflow (pipeline) requires over 30 hours to process 40x WGS. The system that is used has a 48-core Intel Xeon E5-2697v2 12C, 2.7 GHz processors with 128 GB RAM and 3.2 TB SSD, running CentOS 6 (Goyal, 2017). The most recent test results in Dell Technologies, as part of a DTVD, show roughly 24 hours to process 50x WGS with a single Intel Xeon Platinum 8358 processor. The system that was used was a Dell PowerEdge C6520 with two Intel Xeon Platinum 8358 processors, 32 cores, 2.60 GHz, and 512 GB RAM. See the DTVD for HPC High Capacity Storage for BeeGFS, and RHEL 8.3 (4.18.0-240.22.1. Analyzing a few genomes per day is far from ideal. A modern, high-throughput NGS instrument can generate unanalyzed, raw NGS data for 20 or more WGS per day.
Consider all the critical variables that may impact the total secondary analysis (wall clock) time when choosing technologies that enable secondary analysis of NGS data. These variables are wide-ranging and entail the type of NGS sequencing application including the sequencing coverage per sample. They support analysis software and strategies that are specific to the application, output file types, application file access patterns, and number and type of available computing resources.