Home > Workload Solutions > High Performance Computing > White Papers > Dell Validated Design for Risk Assessment White Paper > Singleserver performance
To evaluate the performance of the proposed Dell Validated Design configuration shown in Table 2, Table 3, and Table 4, we ran STACA2 benchmarks on a single server. After we optimized the system, STAC® conducted a formal audit of the benchmark. The findings of the STAC audit were used to generate the STACA2 report at (SUT ID: NVDA221007, https://www.stacresearch.com/NVDA221007).
Table 4 shows the overall performance of the system under test (SUT). The formal audit by STAC showed that the results established three new STACA2 performance records for the following STACA2 metrics:
Compared to a solution using twice as many GPUs with twice as much memory (SUT ID NVDA210914), the results:
(STACA2.β2.HPORTFOLIO.SPACE_EFF)
(STACA2.β2.GREEKS.TIME.COLD)
(STACA2.β2.GREEKS.10100k1260TIME.COLD)
(STACA2.β2.GREEKS.TIME.WARM)
STACA2 (beta 2) report card STACA2 Pack for CUDA (Rev G)/ Dell PowerEdge XE8545/2 x AMD EPYC 7713 and 4 x NVIDIA A100 SXM 40 GB (SUT ID: NVDA221007)  
Performance benchmark  Description 


STACA2.β2.HPORTFOLIO.SPEED  Ratio of options completed to elapsed time  168.8  Options per second 
STACA2.β2.HPORTFOLIO.ENERG_EFF  Energy efficiency = HPORTFOLIO.OPTIONS_DONE/Energy Consumed  211,413  Options per kWh 
STACA2.β2.HPORTFOLIO.SPACE_EFF  Space efficiency = HPORTFOLIO.SPEED/Effective Volume  99.0  Options per hour per cubic inch 
STACA2.β2.GREEKS.TIME  Seconds to compute all Greeks with 5 assets, 25 K paths, and 252 timesteps.*  WARM  0.018 
COLD  0.138  
STACA2.β2.GREEKS.10100k1260.TIME  Seconds to compute all Greeks with 10 assets, 100 K paths, and 1,260 timesteps[]  WARM  1.4 
COLD  2.2  
STACA2.β2.GREEKS.MAX_ASSETS  Max assets completed in 10 minutes with 25 K paths and 252 timesteps (using cold test runs)  180  
STACA2.β2.GREEKS.MAX_PATHS  Max paths completed in 10 minutes with 5 assets and 252 timesteps (using cold test runs)  51,200,000  
STACA2.β2.PATHGEN.PVSTDERR  Worst case standard error in path generation across 5 assets  0.031  
STACA2.β2.EARLYEX.ERR  Relative error in LongstaffSchwartz valuation compared to BlackScholes binomial approximation  0.006 
STACA2 benchmark harness was configured to interface with the Yokogawa WT210 power meter. The power meter reported SUT power consumption to the test harness. Using volumetric data available for the SUT, the benchmark harness and tools calculated the power and space efficiency of the SUT. Table 5 shows the power usage when the SUT was idle as well as when the SUT was running the HPORTFOLIO tests. It also displays space efficiency as calculated by dividing the portfolio speed by the effective volume of the SUT.
STACA2 (beta2) efficiency STACA2 Pack for CUDA (Rev G)/ Dell PowerEdge XE8545/2 x AMD EPYC 7713 and 4 x NVIDIA A100 SXM 40 GB (SUT ID: NVDA221007)  
Performance benchmark  Description 


Energy consumed (not a benchmark)  Energy consumed in processing HPORTFOLIO.OPTIONS_DONE  0.497  kWh 
STACA2.β2.HPORTFOLIO.ENERG_EFF  Energy efficiency = HPORTFOLIO.OPTIONS_DONE/Energy Consumed  211,413  Options per kWh 
STACA2.β2.IDLE.POWER  Average power consumption while no STACA2 algorithms are running  1384  Watts 
Effective Volume (not a benchmark)  Physical space rendered unavailable by SUT[]  6,136  Cubic inches 
100,557  Cubic cm  
STACA2.β2.HPORTFOLIO.SPACE_EFF  Space efficiency = HPORTFOLIO.SPEED/Effective Volume  99.0  Options per hour per cubic inch 
STACA2 pack for CUDA reports the time to complete component operations and various risk characteristics.
The following table contains two results for each operation:
STACA2 (beta2) performance decomposition of baseline workload STACA2 Pack for CUDA (Rev G)/ Dell PowerEdge XE8545/2 x AMD EPYC 7713/NVIDIA A100 SXM 40 GB (SUT ID: NVDA221007) Problem size: 5 Assets, 25 K paths, 252 timesteps (63 million values in path cube) (Elapsed times in seconds)  
Performance benchmark  Description  Cold run  Mean of warm runs 
STACA2.β2.SQRT.TIME  Time to compute square roots  0.001  0.001 
STACA2.β2.EXP.TIME  Time to compute exponentials  0.001  0.001 
STACA2.β2.LOG.TIME  Time to compute logs  0.001  0.001 
STACA2.β2.UNR.TIME  Time to compute unitnormal randoms  0.079  0.002 
STACA2.β2.CORRAND.TIME  Time to compute correlated randoms  0.002  0.001 
STACA2.β2.PATHGEN.TIME  Time to generate paths  0.083  0.003 
STACA2.β2.EARLYEX.TIME  Time to compute payoff with early exercise  0.011  0.007 
STACA2.β2.THETA.TIME  Time to compute theta  0.107  0.010 
STACA2.β2.RHO.TIME  Time to compute rho  0.107  0.010 
STACA2.β2.DELTA.TIME  Time to compute delta  0.135  0.010 
STACA2.β2.GAMMA.TIME  Time to compute gamma  0.125  0.010 
STACA2.β2.CROSSGAMMA.TIME  Time to compute crossgamma  0.128  0.013 
STACA2.β2.MODELVEGA.TIME  Time to compute model vega  0.126  0.012 
STACA2.β2.CORRVEGA.TIME  Time to compute correlation vega  0.123  0.011 
STACA2.β2.GREEKS.TIME  Endtoend time to compute all Greeks  0.138  0.018 