Home > Workload Solutions > High Performance Computing > White Papers > Dell Validated Design for Risk Assessment White Paper > Single-server performance
To evaluate the performance of the proposed Dell Validated Design configuration shown in Table 2, Table 3, and Table 4, we ran STAC-A2 benchmarks on a single server. After we optimized the system, STAC® conducted a formal audit of the benchmark. The findings of the STAC audit were used to generate the STAC-A2 report at (SUT ID: NVDA221007, https://www.stacresearch.com/NVDA221007).
Table 4 shows the overall performance of the system under test (SUT). The formal audit by STAC showed that the results established three new STAC-A2 performance records for the following STAC-A2 metrics:
Compared to a solution using twice as many GPUs with twice as much memory (SUT ID NVDA210914), the results:
(STAC-A2.β2.HPORTFOLIO.SPACE_EFF)
(STAC-A2.β2.GREEKS.TIME.COLD)
(STAC-A2.β2.GREEKS.10-100k-1260TIME.COLD)
(STAC-A2.β2.GREEKS.TIME.WARM)
STAC-A2 (beta 2) report card STAC-A2 Pack for CUDA (Rev G)/ Dell PowerEdge XE8545/2 x AMD EPYC 7713 and 4 x NVIDIA A100 SXM 40 GB (SUT ID: NVDA221007) | |||
Performance benchmark | Description |
|
|
STAC-A2.β2.HPORTFOLIO.SPEED | Ratio of options completed to elapsed time | 168.8 | Options per second |
STAC-A2.β2.HPORTFOLIO.ENERG_EFF | Energy efficiency = HPORTFOLIO.OPTIONS_DONE/Energy Consumed | 211,413 | Options per kWh |
STAC-A2.β2.HPORTFOLIO.SPACE_EFF | Space efficiency = HPORTFOLIO.SPEED/Effective Volume | 99.0 | Options per hour per cubic inch |
STAC-A2.β2.GREEKS.TIME | Seconds to compute all Greeks with 5 assets, 25 K paths, and 252 timesteps.* | WARM | 0.018 |
COLD | 0.138 | ||
STAC-A2.β2.GREEKS.10-100k-1260.TIME | Seconds to compute all Greeks with 10 assets, 100 K paths, and 1,260 timesteps[] | WARM | 1.4 |
COLD | 2.2 | ||
STAC-A2.β2.GREEKS.MAX_ASSETS | Max assets completed in 10 minutes with 25 K paths and 252 timesteps (using cold test runs) | 180 | |
STAC-A2.β2.GREEKS.MAX_PATHS | Max paths completed in 10 minutes with 5 assets and 252 timesteps (using cold test runs) | 51,200,000 | |
STAC-A2.β2.PATHGEN.PVSTDERR | Worst case standard error in path generation across 5 assets | 0.031 | |
STAC-A2.β2.EARLYEX.ERR | Relative error in Longstaff-Schwartz valuation compared to Black-Scholes binomial approximation | 0.006 |
STAC-A2 benchmark harness was configured to interface with the Yokogawa WT210 power meter. The power meter reported SUT power consumption to the test harness. Using volumetric data available for the SUT, the benchmark harness and tools calculated the power and space efficiency of the SUT. Table 5 shows the power usage when the SUT was idle as well as when the SUT was running the HPORTFOLIO tests. It also displays space efficiency as calculated by dividing the portfolio speed by the effective volume of the SUT.
STAC-A2 (beta2) efficiency STAC-A2 Pack for CUDA (Rev G)/ Dell PowerEdge XE8545/2 x AMD EPYC 7713 and 4 x NVIDIA A100 SXM 40 GB (SUT ID: NVDA221007) | |||
Performance benchmark | Description |
|
|
Energy consumed (not a benchmark) | Energy consumed in processing HPORTFOLIO.OPTIONS_DONE | 0.497 | kWh |
STAC-A2.β2.HPORTFOLIO.ENERG_EFF | Energy efficiency = HPORTFOLIO.OPTIONS_DONE/Energy Consumed | 211,413 | Options per kWh |
STAC-A2.β2.IDLE.POWER | Average power consumption while no STAC-A2 algorithms are running | 1384 | Watts |
Effective Volume (not a benchmark) | Physical space rendered unavailable by SUT[] | 6,136 | Cubic inches |
100,557 | Cubic cm | ||
STAC-A2.β2.HPORTFOLIO.SPACE_EFF | Space efficiency = HPORTFOLIO.SPEED/Effective Volume | 99.0 | Options per hour per cubic inch |
STAC-A2 pack for CUDA reports the time to complete component operations and various risk characteristics.
The following table contains two results for each operation:
STAC-A2 (beta2) performance decomposition of baseline workload STAC-A2 Pack for CUDA (Rev G)/ Dell PowerEdge XE8545/2 x AMD EPYC 7713/NVIDIA A100 SXM 40 GB (SUT ID: NVDA221007) Problem size: 5 Assets, 25 K paths, 252 timesteps (63 million values in path cube) (Elapsed times in seconds) | |||
Performance benchmark | Description | Cold run | Mean of warm runs |
STAC-A2.β2.SQRT.TIME | Time to compute square roots | 0.001 | 0.001 |
STAC-A2.β2.EXP.TIME | Time to compute exponentials | 0.001 | 0.001 |
STAC-A2.β2.LOG.TIME | Time to compute logs | 0.001 | 0.001 |
STAC-A2.β2.UNR.TIME | Time to compute unit-normal randoms | 0.079 | 0.002 |
STAC-A2.β2.CORRAND.TIME | Time to compute correlated randoms | 0.002 | 0.001 |
STAC-A2.β2.PATHGEN.TIME | Time to generate paths | 0.083 | 0.003 |
STAC-A2.β2.EARLYEX.TIME | Time to compute payoff with early exercise | 0.011 | 0.007 |
STAC-A2.β2.THETA.TIME | Time to compute theta | 0.107 | 0.010 |
STAC-A2.β2.RHO.TIME | Time to compute rho | 0.107 | 0.010 |
STAC-A2.β2.DELTA.TIME | Time to compute delta | 0.135 | 0.010 |
STAC-A2.β2.GAMMA.TIME | Time to compute gamma | 0.125 | 0.010 |
STAC-A2.β2.CROSSGAMMA.TIME | Time to compute cross-gamma | 0.128 | 0.013 |
STAC-A2.β2.MODELVEGA.TIME | Time to compute model vega | 0.126 | 0.012 |
STAC-A2.β2.CORRVEGA.TIME | Time to compute correlation vega | 0.123 | 0.011 |
STAC-A2.β2.GREEKS.TIME | End-to-end time to compute all Greeks | 0.138 | 0.018 |