Home > Workload Solutions > High Performance Computing > White Papers > Dell Technologies Validated Design for Genomics with NVIDIA Clara Parabricks On AMD-powered Dell PowerEdge > Clara Parabricks secondary analysis pipeline
Analysis that is performed on NGS data is often described as a pipeline. A pipeline is a defined workflow consisting of a methods or operations collection where the output of one operation becomes the input for the next operation. Four critical operations—mapping, alignment, preprocessing, and variant calling—make up most secondary analysis WGS pipelines.
Clara Parabricks is a software suite for genomic analysis methods that are designed to take advantage of GPU acceleration. Many of the Clara Parabricks methods are functionally equivalent to existing open-source methods, often generating greater than 99.9% concordance. Clara Parabricks operations are stitched together to create a secondary analysis pipeline. This pipeline is best matched to the requirements for the sequencing application of interest such as germline or somatic analysis.
Clara Parabricks is available as either a Docker or Singularity container and uses various server GPU resources. The figure below highlights the Clara Parabricks v3.7.0-1 application suite.
Calling genetic variants present in an individual genome relies on millions to billions of short, error-prone sequence reads. The hand-crafted, parameterized statistical models that are used for variant calling still produce thousands of errors and missed variants in each genome. These errors are despite over a decade of effort by thousands of dedicated researchers (Poplin, 2016). Many groups run consensus variant calling pipelines that use more than one variant calling method to minimize the likelihood of missing a variant. Clara Parabricks contains multiple variant callers to enable this approach. For this study, the germline pipeline was used, and the steps are listed in the figure below.
DeepVariant, a variant calling method that Google developed, applies a deep convolutional neural network and has been shown to outperform expert driven statistical methods. However, calling variants for a 30x human genome, and writing the variants out to a gVCF file, takes approximately four hours. It also requires at least 1,024 compute cores. The Clara Parabricks GPU accelerated version of DeepVariant runs in less than 20 minutes for a 30x genome. The fast analysis time enables using DeepVariant alone, or with other germline callers like GATK HaplotypeCaller, while minimizing the potential of creating a secondary analysis backlog.