GROMACS — with Ice Lake on Dell EMC PowerEdge Servers
Fri, 02 Dec 2022 05:33:27 -0000
|Read Time: 0 minutes
3rd Generation Intel Xeon® Scalable processors (architecture code named Ice Lake) is Intel’s successor to Cascade Lake. New features include up to 40 cores per processor, eight memory channels with support for 3200 MT/s memory speed and PCIe Gen4.
The HPC and AI Innovation Lab at Dell EMC had access to a few systems and this blog presents the results of our initial benchmarking study on a popular open-source molecular dynamics application – GROningen MAchine for Chemical Simulations (GROMACS).
Molecular dynamics (MD) simulations are a popular technique for studying the atomistic behavior of any molecular system. It performs the analysis of the trajectories of atoms and molecules where the dynamics of the system progresses over time.
At HPC and AI Innovation Lab, we have conducted research on the SARS-COV-2 study where applications like GROMACS helped researchers identify molecules that bind to the spike protein of the virus and block it from infecting human cells. Other use cases of MD simulation in medicinal biology is iterative drug design through prediction of protein-ligand docking (in this case usually modelling a drug to target protein interaction).
Overview of GROMACS
GROMACS is a versatile package to perform MD simulations, such as simulate the Newtonian equations of motion for systems with hundreds to millions of particles. GROMACS can be run on CPUs and GPUs in single-node and multi-node (cluster) configurations. It is a free, open-source software released under the GNU General Public License (GPL). Check out this page for more details on GROMACS.
Hardware and software configurations
Table 1: Hardware and Software testbed details
Component | Dell EMC PowerEdge R750 server | Dell EMC PowerEdge R750 server | Dell EMC PowerEdge C6520 server | Dell EMC PowerEdge C6520 server | Dell EMC PowerEdge C6420 server | Dell EMC PowerEdge C6420 server |
SKU | Xeon 8380 | Xeon 8358 | Xeon 8352Y | Xeon 6330 | Xeon 8280 | Xeon 6252 |
Cores/Socket | 40 | 32 | 32 | 28 | 28 | 24 |
Base Frequency | 2.30 GHz | 2.60 GHz | 2.20 GHz | 2.00 GHz | 2.70 GHz | 2.10 – GHz |
TDP | 270 W | 250 W | 205 W | 205 W | 205 W | 150 W |
L3Cache | 60M | 48M | 48M | 42M | 38.5M | 37.75M |
Operating System | Red Hat Enterprise Linux 8.3 4.18.0-240.22.1.el8_3.x86_64 | |||||
Memory | 16 GB x 16 (2Rx8) 3200 MT/s | 16 GB x 12 (2Rx8) 2933 MT/s | ||||
BIOS/CPLD | 1.1.2/1.0.1 | |||||
Interconnect | NVIDIA Mellanox HDR | NVIDIA Mellanox HDR100 | ||||
Compiler | Intel parallel studio 2020 (update 4) | |||||
GROMACS | 2021.1 |
Datasets used for performance analysis
Table 2: Description of datasets used for performance analysis
Datasets/Download Link | Description | Electrostatics | Atoms | System Size |
Movement of Water This example is to simulate- the motion process of many water molecules in each space and temperature.
| Particle Mesh Ewald (PME)
| 1536K | small | |
3072K | Large | |||
This example is to simulate- 1.4M atom system - A Pair of hEGFR Dimers of 1IVO and 1IVO 3M atom system – A Pair of hEGFR tetramers of 1IVO and 1IVO
| Particle Mesh Ewald (PME)
| 1.5M | Small | |
3M | Large | |||
Prace – Lignocellulose | This example is to simulate the lignocellulose – the tpr was obtained from PRACE website
| Reaction Field (rf)
| 3M | Large |
Compilation Details
We compiled GROMACS from source (version-2021.1) using the Intel 2020 Update 5 Compiler to take advantage of AVX2 and AVX512 optimizations, and the Intel MKL FFT library. The new version of GROMACS has a significant performance gain due to the improvements in its parallelization algorithms. The GROMACS build system and the gmx mdrun tool have built-in and configurable intelligence that detects your hardware and make effective use of it.
Objective of Benchmarking
Our objective is to quantify the performance of GROMACS using different test cases, like performance evaluation on different Ice Lake processors as listed in Table 1, then we compare the 2nd and 3rd Gen Xeon Scalable (Cascade Lake vs Ice Lake), and finally we compare multi-node scalability with hyper threading enabled and disabled.
To evaluate the datasets results with an appropriate metric, we added associated high-level compiler flags, electrostatic field load balancing (like PME, etc), tested with multiple ranks, separate PME ranks, varying different nstlist values, and created a paradigm for our application (GROMACS).
The typical time scales of the simulated system are in the order of micro-seconds (µs) or nanoseconds (ns). We measure the performance for the dataset’s simulation as nanoseconds per day (ns/day).
Performance Analyses on Single Node
Figure 1(a): Single node performance of Water 1536K and Water 3072K on Ice Lake processor model
Figure 1(b): Single node performance of Lignocellulose 3M on Ice Lake processor model
Figure 1(c): Single node performance of HecBioSim 1.4M and HecBioSim 3M on Ice Lake processor model
Figure 1 (a), (b) and (c) shows are the single node performance analyses for three datasets mentioned in Table 2 with the four processor models available for evaluation of GROMACS.
Figure 2: Relative Performance of GROMACS across the datasets with Intel Ice Lake Processor Model
For ease of comparison across the various datasets, the relative performance of the processor model has been included into a single graph. However, it is worth noting that each dataset behaves individually when performance is considered, as each uses different molecular topology input files (tpr), and configuration files.
Individual dataset performance is mentioned in Figures 1(a), 1(b), and 1(c) respectively.
Figure 2 shows increase in the core count in the processor model increases the performance, based on the dataset used. In here, we observe that smaller (water 1536K and HecBioSim 1400K) has more advantage 5 to 6 percent performance gain in counterpart to the larger datasets (water 3072, HecBioSim 3M, and Ligno 3M).
Next, by comparing the relative numbers to the baseline processor Xeon 6330(28C) with Xeon 8380(40C), we found a 30 to 50 percent performance gain according to the datasets with increases in cores, from 28 to 40. A fraction of gain is by frequency of the processor model.
Performance Analyses on Cascade Lake vs Ice Lake
Figure 3(a): Performance of GROMACS on Cascade Lake (Xeon 6252) vs Ice Lake (Xeon 6330)
Figure 3(b): Performance of GROMACS on Cascade Lake (Xeon 8280) vs Ice Lake (Xeon 8380)
We accounted for the fact that the memory is rightly fit according to the datasets. To begin, we compared each processor with previous generation processors. For performance benchmark comparisons, we selected Cascade Lake closest to their Ice Lake counterparts in terms of hardware features such as cache size, TDP values, and Processor Base/Turbo Frequency, and marked the maximum value attained for Ns/day by each of the datasets mentioned in Table 2.
Figure 3a shows Ice Lake 6330 is up to 50 to 75 percent faster than the 6252. The Xeon 6330 has 16 percent more cores and 9 percent faster memory bandwidth. Figure 3b shows that Ice Lake 8380 is up to 50-65 percent faster than the Xeon 8280 on single node tests, this is in line with the 42 percent more cores and 9 percent faster memory bandwidth.
This result is due to a higher processor speed, wherein more data can be accessed by each core. Also, datasets are more memory intensive and some percentage is added on due frequency improvement Overall, the Ice Lake processor results demonstrated a substantial performance improvement for GROMACS over Cascade Lake processors.
Performance Analysis on Multi-Node
Figure 4(a): Scalability of water 1536K with hyper threading disabled(80C) vs hyperthreading enabled (160C) w/ Xeon 8380; the dotted line represent the delta between hyperthreading enabled vs hyperthreading disabled
Figure 4(b): Scalability of water 3072K with hyper threading disabled(80C) vs hyperthreading enabled (160C) w/INTEL 8380; the dotted line represent the delta between hyperthreading enabled vs hyperthreading disabled
Figure 4(c): Scalability of HecBioSim 1.4M with hyper threading disabled(80C) vs hyperthreading enabled (160C) w/ Xeon 8380; the dotted line represent the delta between hyperthreading enabled vs hyperthreading disabled
Figure 4(d): Scalability of HecBioSim 3M with hyper threading disabled(80C) vs hyperthreading enabled (160C) w/ Xeon 8380; the dotted line represent the delta between hyperthreading enabled vs hyperthreading disabled
Figure 4(e): Scalability of Lignocellulose 3M with hyper threading disabled(80C) vs hyperthreading enabled (160C) w/INTEL 8380 ; the dotted line represent the delta between hyperthreading enabled vs hyperthreading disabled
For multi-node tests, the test bed was configured with an NVIDIA Mellanox HDR interconnect running at 200 Gbps and each server having the Ice Lake processor. We were able to achieve the expected linear performance scalability for GROMACS of up to eight nodes with hyper threading disabled and approximately 7.25X with hyper threading enabled for eight nodes, across the datasets. All cores in each server were used while running these benchmarks. The performance increases are close to linear across all the dataset types as the core count increases.
Conclusion
The Ice Lake processor-based Dell EMC Power Edge servers, with notable hardware feature upgrades over Cascade Lake, show up to 50 to 60 percent performance gain for all the datasets used for benchmarking GROMACS. Hyper threading should be disabled for the benchmarks addressed in this blog for getting better scalability above eight nodes. For small datasets mentioned in this blog benefits 5 to 6 percent in comparison to the larger ones with increase in the core count.
Watch our blog site for updates!
Related Blog Posts
LAMMPS — with Ice Lake on Dell EMC PowerEdge Servers
Mon, 30 Aug 2021 21:09:22 -0000
|Read Time: 0 minutes
3rd Generation Intel Xeon® Scalable processors (architecture code named Ice Lake) is Intel’s successor to Cascade Lake. New features include up to 40 cores per processor, eight memory channels with support for 3200 MT/s memory speed and PCIe Gen4. The HPC and AI Innovation Lab at Dell EMC had access to a few systems and this blog presents the results of our initial benchmarking study.
LAMMPS Overview
Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is an open source, well parallelized collection of packages for molecular dynamics (MD) research. LAMMPS has a nice collection of “atom styles”, force fields, and many contributed packages. LAMMPS can run on a single processor or on the largest parallel super-computers. It also has packages that provide force calculations accelerated on GPU’s. It can do simulations with billions of atoms!
LAMMPS can be run on a single processor or in parallel using some form of message passing, such as Message Passing Interface (MPI). The most current source code for LAMMPS is written in C++. For more information about LAMMPS, see the following link: https://www.lammps.org/.
Objective
In this study we measure the performance of LAMMPS on different Ice Lake processor models as listed in Table 1 with a comparison to the previous generation Cascade Lake systems. Single node as well as the multi-node scalability tests were conducted.
Compilation Details
The LAMMPS version used for testing release was lammps-2July-2021, using the Intel 2020 update 5 compiler to take advantage of AVX2 and AVX512 optimizations, and the Intel MKL FFT library. We used the default INTEL package, which comes along the LAMMPS package providing some well optimized atom pair styles in LAMMPS for the vector instructions on Intel processors. The datasets used for our study are described in Table 2, along with detailed configuration of atom sizes and run steps. The unit of performance is timesteps per second, and higher is better.
Hardware and software configurations
Table 1: Hardware and Software test bed details
Component | Dell EMC PowerEdge R750 server | Dell EMC PowerEdge R750 server | Dell EMC PowerEdge C6520 server | Dell EMC PowerEdge C6520 server | Dell EMC PowerEdge C6420 server | Dell EMC PowerEdge C6420 server |
CPU model | Xeon 8380 | Xeon 8358 | Xeon 8352Y | Xeon 6330 | Xeon 8280 | Xeon 6248R |
Cores/Socket | 40 | 32 | 32 | 28 | 28 | 24 |
Base Frequency | 2.30 GHz | 2.60 GHz | 2.20 GHz | 2.00 GHz | 2.70 GHz | 3.00 GHz |
TDP | 270 W | 250 W | 205 W | 205 W | 205 W | 205 W |
Operating System | Red Hat Enterprise Linux 8.3 4.18.0-240.22.1.el8_3.x86_64 | |||||
Memory |
16 GB x 16 (2Rx8) 3200 MT/s
| 16 GB x 12 (2Rx8) 2933 MT/s | ||||
BIOS/CPLD | 1.1.2/1.0.1 | |||||
Interconnect | NVIDIA Mellanox HDR
| NVIDIA Mellanox HDR100 | ||||
Compiler | Intel parallel studio 2020 (update 4) | |||||
LAMMPS | 2july2021 |
Datasets used for performance analysis
Table 2: Description of datasets used for performance analysis
Datasets | Description | Units | Atomic Style | Atom Size | Step Size |
Lennard Jones | Atomic fluid (LJ Benchmark) | lj | atomic | 512000 | 7900 |
Rhodo | Protein (Rhodopsin Benchmark) | real | full | 512000 | 520 |
Liquid crystal | Liquid Crystal w/ Gay-Berne potential | lj | ellipsoid | 524288 | 840 |
Eam | Copper benchmark with Embedded Atom Method | metal | atomic | 512000 | 3100 |
Stilliger Weber | Silicon benchmark with Stillinger-Weber | metal | atomic | 512000 | 6200 |
Tersoff | Silicon benchmark with Tersoff | metal | atomic | 512000 | 2420 |
Water | Coarse-grain water benchmark using Stillinger-Weber | real | atomic | 512000 | 2600 |
Polyethylene | Polyethylene benchmark with AIREBO | metal | atomic | 522240 | 550 |
Figure 1: Image view of datasets from OVITO (scientific data visualization and analysis software for molecular and other particle-based simulation model). Images are listed in order 1a-1h, subfigure 1a- 1h represents a small portion of simulation domain for Atomic Fluid (Lennard Jones), rhodo(protein), liquid crystal(lc), copper(eam), stilliger webner(sw), Terasoff, water, polyethylene datasets respectively.
Table 1 and Figure 1 shows the image view of datasets used for the single and multi-node analysis. For visualization of all datasets was done using OVITO, scientific data visualization and analysis software for molecular and other particle-based simulation model. For single node performance study, all the datasets shown in Table 2 were used, and for multi-node study Atomic fluid was considered for benchmarking.
Performance Analyses on Single Node
Figure 2: Single Node Performance of LAMMPS across the datasets with Intel Ice Lake processor model. Each graph in Figure 2 is an individual subfigure, labeled (a-h) in the order shown. Each subfigure (2a- 2h) represents single node performance comparison across the Xeon processors with Xeon 6330 as baseline for Atomic Fluid (Lennard Jones), rhodo(protein), liquid crystal(lc), copper(eam), stilliger webner(sw), Terasoff, water, polyethylene datasets respectively.
Figure 2 shows the single node performance for the eight datasets (sub figure 2a-h) listed in Table 2 with the four Ice Lake processor model available for evaluation of LAMMPS.
For ease of comparison across the processor model, the relative performance of the datasets has been included into a different graph. However, it is worth noting that each dataset behaves individually when performance is considered, as each uses different molecular potentials and have different number of atoms. Figure 2 shows that increases in the core count in the processor model increases the performance, across the dataset used. Next, by comparing the relative numbers to the baseline processor Xeon 6330(28C) with Xeon 8380(40C), we measured a 30 to 45 percent performance gain with these datasets. A fraction of these boosts was due to frequency of the processor model.
Figure 3a: Performance of LAMMPS on Cascade Lake (Xeon 6248R) in comparison to Ice Lake (Xeon 6330)
Figure 3b: Performance of LAMMPS on Cascade Lake (Xeon 8280) in comparison to Ice Lake (Xeon 8380)
Figure 3 compares the performance of the mid-bin Cascade Lake 6248R (24core) to the Ice Lake 6330 (28 core), and the top end Cascade Lake 8280 (28 core) to the Ice Lake 8380 (40 core) From the figure 3a, Ice Lake 6330 is up to 30 percent faster than the 6248R. The Xeon 6330 has 16 percent more cores, and 9 percent faster memory bandwidth. Figure 3b shows Ice Lake 8380 is up to 75 percent faster than the Xeon 8280 on single node tests, this is in line with the 42 percent additional cores and 9 percent faster memory bandwidth. These results are due to a higher processor speed wherein more data can be accessed by each core.
Performance Analysis on Multi-Node
To analyze the scalability test with strong and weak scaling, we used the Atom Fluid (LJ) dataset from the Intel package. The job run time was 7900 steps with 512000 atoms in the simulation system.
Figure 4a: Fixed size Atomic fluid (LJ) for different problem size (strong scaling) w/ Xeon 8380
With strong scaling, we referred to the fixed problem size and increasing the parallel processes (Amdahl’s law). Whereas in weak scaling, we varied the atom size from 512000 atoms to 4096000 atoms in the simulation environment with an increase in the parallel processes (Gustafson-Barsis law). The test bed included DellEMC Poweredge R750 servers each with dual Ice Lake Xeon 8380 processors an NVIDIA Networking HDR interconnect running at 200 Gbps.Figure 4a plots the fixed-size relative performance for four different problem sizes, viz, 512000,1024000,2048000, and 4096000 atoms, on different number of nodes.
The relative performance is normalized by single node performance. Hence, the single node performance for each curve is 1.00 (unity). Relative Performance for fixed size Atomic fluid was calculated by the following equation:
Relative Performance = loop time of ‘N’ node / loop time for single node
Loop time is the total wall-clock time for the simulation to run. It can be observed that relative performance increases with increase in problem size. This is because for smaller problems system spend more time in inter-nodal communications. Time spent in communication at 8 nodes is 61.91%, 59.74%,48.42%,45.04% for 512000,1024000,2048000,4096000 atom size respectively.
Figure 4b: Scaled size Atomic fluid (LJ) with 512000 atoms per node (weak scaling) w/ Xeon 8380
Figure 4b plots the scaled-size efficiency for runs with 512000 atoms per node. Thus, a scaled-size 2 node run is for 1024000 atoms; 8 node runs is for 4096000 atoms. Relative Performance for Scaled size Atomic fluid was calculated by the following equation:
Relative Performance= loop time for ‘n’ node/ (loop time for single node * number of nodes)
Weak scaling efficiency decreases with increase in no of nodes in the investigated range. This is due to the fact for larger number of nodes the time spend in MPI communication is larger. Time spent in communication with scaled size atom for 1N, 2N, 4N and 8 N are 27.17%, 32.42 %, 40.87%, 45.04 % respectively.
Figure 5: Multi-node efficiency for Atomic Fluid (LJ) w/ I 8380
Figure 5 plots the multi-node efficiency for Atomic Fluid with Xeon 8380. The relative performance is normalized by single node with 512000 atoms performance. Hence, the single node performance for 512000 atoms is 1.00 (unity). This point is taken as baseline for other comparison.
Performance Rating = (loop time * number of atoms)/ (loop time for 512000 atoms on 1 node * number of nodes * 512000)
We observed that for smaller systems, such as those with fewer atoms, the efficiency of strong scaling decreases as the system spends more time in MPI communication; whereas in larger systems with many atoms, the efficiency of strong scaling increases as the time spent in pair-wise force calculation becomes dominant. For weak scaling, as the no of nodes increases the efficiency of weak scaling decreases.
Conclusion
The Ice Lake processor-based Dell EMC Power Edge servers, with its hardware feature upgrades over Cascade Lake, demonstrate up to 50 to70 percent performance gain for all the datasets used for benchmarking LAMMPS. Watch our blog site for updates!
MD Simulation of GROMACS with AMD EPYC 7003 Series Processors on Dell EMC PowerEdge Servers
Thu, 19 Aug 2021 20:06:53 -0000
|Read Time: 0 minutes
AMD has recently announced and launched its third generation 7003 series EPYC processors family (code named Milan). These processors build upon the proceeding generation 7002 series (Rome) processors and improve L3 cache architecture along with an increased memory bandwidth for workloads such as High Performance Computing (HPC).
The Dell EMC HPC and AI Innovation Lab has been evaluating these new processors with Dell EMC’s latest 15G PowerEdge servers and will report our initial findings for the molecular dynamics (MD) application GROMACs in this blog.
Given the enormous health impact of the ongoing COVID-19 pandemic, researchers and scientists are working closely with the HPC and AI Innovation Lab to obtain the appropriate computing resources to improve the performance of molecular dynamics simulations. Of these resources, GROMACS is an extensively used application for MD simulations. It has been evaluated with the standard datasets by combining the latest AMD EPYC Milan processor (based on Zen 3 cores) with Dell EMC PowerEdge servers to get most out of the MD simulations.
In a previous blog, Molecular Dynamic Simulation with GROMACS on AMD EPYC- ROME, we published benchmark data for a GROMACS application study on a single node and multinode with AMD EPYC ROME based Dell EMC servers.
The results featured in this blog come from the test bed described in the following table. We performed a single-node and multi-node application study on Milan processors, using the latest AMD stack shown in Table 1, with GROMACS 2020.4 to understand the performance improvement over the older generation processor (Rome).
Table 1: Testbed hardware and software details
Server | Dell EMC PowerEdge 2-socket servers (with AMD Milan processors) | Dell EMC PowerEdge 2-socket servers (with AMD Rome processors) |
Processor Cores/socket Frequency (Base-Boost ) Default TDP Processor bus speed | 7763 (Milan) 64 2.45 GHz – 3.5 GHz 280 W 256 MB 16 GT/s | 7H12 (Rome) 64 2.6 GHz – 3.3 GHz 280 W 256 MB 16 GT/s |
Processor Cores/socket Frequency Default TDP Processor bus speed | 7713 (Milan) 64 2.0 GHz – 3.675 GHz 225 W 256 MB 16 GT/s | 7702 (Rome) 64 2.0 GHz – 3.35 GHz 200 W 256 MB 16 GT/s |
Processor Cores/socket Frequency Default TDP Processor bus speed | 7543 (Milan) 32 2.8 GHz – 3.7 GHz 225 W 256 MB 16 GT/s | 7542 (Rome) 32 2.9 GHz – 3.4 GHz 225 W 128 MB 16 GT/s |
Operating system | Red Hat Enterprise Linux 8.3 (4.18.0-240.el8.x86_64) | Red Hat Enterprise Linux 7.8 |
Memory | DDR4 256 G (16 GB x 16) 3200 MT/s | |
BIOS/CPLD | 2.0.2 / 1.1.12 |
|
Interconnect | NVIDIA Mellanox HDR | NVIDIA Mellanox HDR 100 |
Table 2: Benchmark datasets used for GROMACS performance evaluation
Datasets | Details |
1536 K and 3072 K | |
1400 K and 3000 K | |
Prace – Lignocellulose | 3M |
The following information describes the performance evaluation for the processor stack listed in the Table 1.
Rome processors compared to Milan processors (GROMACS)
Figure 1: GROMACS performance comparison with AMD Rome processors
For performance benchmark comparisons, we selected Rome processors that are closest to their Milan counterparts in terms of hardware features such as cache size, TDP values, and Processor Base/Turbo Frequency, and marked the maximum value attained for Ns/day by each of the datasets mentioned in Table 2.
Figure 1 shows a 32C Milan processor has higher performance improvements (19 percent for water 1536, 21 percent for water 3072, and 10 to approximately 12 percent with HECBIO sim and lingo cellulose datasets) compared to a 32C Rome processor. This result is due to a higher processor speed and improved L3 cache, wherein more data can be accessed by each core.
Next, with the higher end processor we see only 10 percent gain with respect to the water dataset, as they are more memory intensive. Some percentage is added on due to improvement of frequency for the remaining datasets. Overall, the Milan processor results demonstrated a substantial performance improvement for GROMACS over Rome processors.
Milan processors comparison (32C processors compared to 64C processors)
Figure 2: GROMACS performance with Milan processors
Figure 2 shows performance relative to the performance obtained on the 7543 processor. For instance, the performance of water 1536 is improved from the 32C processor to the 64 core (64C) processor from 41 percent (7713 processor) to 57 percent (7763 processor). The performance improvement is due to the increasing core counts and higher CPU core frequency performance improvement. We observed that GROMACS is frequency sensitive, but not to a great extent. Greater gains may be seen when running GROMACS across multiple ensembles runs or running dataset with higher number of atoms.
We recommend that you compare the price-to-performance ratio before choosing the processor based on the datasets with higher CPU core frequency, as the processors with a higher number of lower-frequency cores may provide better total performance.
Multi-node study with 7713 64C processors
Figure 3: Multi-node study with 7713 64c SKUs
For multi-node tests, the test bed was configured with an NVIDIA Mellanox HDR interconnect running at 200 Gbps and each server included an AMD EPYC 7713 processor. We achieved the expected linear performance scalability for GROMACS of up to four nodes and across each of the datasets. All cores in each server were used while running the benchmarks. The performance increases are close to linear across all the dataset types as core count increases.
Conclusion
For the various datasets we evaluated, GROMACS exhibited strong scaling and was compute intensive. We recommend a processor with high core count for smaller datasets (water 1536, hec 1400); larger datasets (water 3072, ligno,HEC 3000) would benefit from memory per core. Configuring the best BIOS options is important to get the best performance out of the system.
For more information and updates, follow this blog site.