Single-server performance

Thank you for your feedback!

To evaluate the performance of the proposed Dell Validated Design configuration shown in Table 2, Table 3, and Table 4, we ran STAC-A2 benchmarks on a single server. After we optimized the system, STAC® conducted a formal audit of the benchmark. The findings of the STAC audit were used to generate the STAC-A2 report at (SUT ID: NVDA221007, https://www.stacresearch.com/NVDA221007).

STAC-A2 performance summary

Table 4 shows the overall performance of the system under test (SUT). The formal audit by STAC showed that the results established three new STAC-A2 performance records for the following STAC-A2 metrics:

STAC-A2.β2.HPORTFOLIO.SPACE_EFF
STAC-A2.β2.GREEKS.TIME.COLD
STAC-A2.β2.GREEKS.10-100k-1260.TIME.COLD

Compared to a solution using twice as many GPUs with twice as much memory (SUT ID NVDA210914), the results:

Had 1.2x the space efficiency

(STAC-A2.β2.HPORTFOLIO.SPACE_EFF)

Was 2.9x the speed in the cold runs of the baseline Greeks benchmark

(STAC-A2.β2.GREEKS.TIME.COLD)

Was 1.1x the speed in the cold runs of the Greeks benchmark

(STAC-A2.β2.GREEKS.10-100k-1260TIME.COLD)

Was 67 percent of the speed in the warm runs of the baseline Greeks benchmark

(STAC-A2.β2.GREEKS.TIME.WARM)

Table 4. STAC-A2 report card

STAC-A2 (beta 2) report card STAC-A2 Pack for CUDA (Rev G)/ Dell PowerEdge XE8545/2 x AMD EPYC 7713 and 4 x NVIDIA A100 SXM 40 GB (SUT ID: NVDA221007)
Performance benchmark	Description
STAC-A2.β2.HPORTFOLIO.SPEED	Ratio of options completed to elapsed time	168.8	Options per second
STAC-A2.β2.HPORTFOLIO.ENERG_EFF	Energy efficiency = HPORTFOLIO.OPTIONS_DONE/Energy Consumed	211,413	Options per kWh
STAC-A2.β2.HPORTFOLIO.SPACE_EFF	Space efficiency = HPORTFOLIO.SPEED/Effective Volume	99.0	Options per hour per cubic inch
STAC-A2.β2.GREEKS.TIME	Seconds to compute all Greeks with 5 assets, 25 K paths, and 252 timesteps.*	WARM	0.018
STAC-A2.β2.GREEKS.TIME		COLD	0.138
STAC-A2.β2.GREEKS.10-100k-1260.TIME	Seconds to compute all Greeks with 10 assets, 100 K paths, and 1,260 timesteps []	WARM	1.4
STAC-A2.β2.GREEKS.10-100k-1260.TIME		COLD	2.2
STAC-A2.β2.GREEKS.MAX_ASSETS	Max assets completed in 10 minutes with 25 K paths and 252 timesteps (using cold test runs)		180
STAC-A2.β2.GREEKS.MAX_PATHS	Max paths completed in 10 minutes with 5 assets and 252 timesteps (using cold test runs)		51,200,000
STAC-A2.β2.PATHGEN.PVSTDERR	Worst case standard error in path generation across 5 assets		0.031
STAC-A2.β2.EARLYEX.ERR	Relative error in Longstaff-Schwartz valuation compared to Black-Scholes binomial approximation		0.006

STAC-A2 Efficiency benchmarks

STAC-A2 benchmark harness was configured to interface with the Yokogawa WT210 power meter. The power meter reported SUT power consumption to the test harness. Using volumetric data available for the SUT, the benchmark harness and tools calculated the power and space efficiency of the SUT. Table 5 shows the power usage when the SUT was idle as well as when the SUT was running the HPORTFOLIO tests. It also displays space efficiency as calculated by dividing the portfolio speed by the effective volume of the SUT.

Table 5. STAC-A2 efficiency benchmarks

STAC-A2 (beta2) efficiency STAC-A2 Pack for CUDA (Rev G)/ Dell PowerEdge XE8545/2 x AMD EPYC 7713 and 4 x NVIDIA A100 SXM 40 GB (SUT ID: NVDA221007)
Performance benchmark	Description
Energy consumed (not a benchmark)	Energy consumed in processing HPORTFOLIO.OPTIONS_DONE	0.497	kWh
STAC-A2.β2.HPORTFOLIO.ENERG_EFF	Energy efficiency = HPORTFOLIO.OPTIONS_DONE/Energy Consumed	211,413	Options per kWh
STAC-A2.β2.IDLE.POWER	Average power consumption while no STAC-A2 algorithms are running	1384	Watts
Effective Volume (not a benchmark)	Physical space rendered unavailable by SUT []	6,136	Cubic inches
Effective Volume (not a benchmark)	Physical space rendered unavailable by SUT []	100,557	Cubic cm
STAC-A2.β2.HPORTFOLIO.SPACE_EFF	Space efficiency = HPORTFOLIO.SPEED/Effective Volume	99.0	Options per hour per cubic inch

STAC-A2 speed benchmarks

STAC-A2 pack for CUDA reports the time to complete component operations and various risk characteristics.

The following table contains two results for each operation:

Table 6. STAC-A2 speed benchmarks

STAC-A2 (beta2) performance decomposition of baseline workload STAC-A2 Pack for CUDA (Rev G)/ Dell PowerEdge XE8545/2 x AMD EPYC 7713/NVIDIA A100 SXM 40 GB (SUT ID: NVDA221007) Problem size: 5 Assets, 25 K paths, 252 timesteps (63 million values in path cube) (Elapsed times in seconds)
Performance benchmark	Description	Cold run	Mean of warm runs
STAC-A2.β2.SQRT.TIME	Time to compute square roots	0.001	0.001
STAC-A2.β2.EXP.TIME	Time to compute exponentials	0.001	0.001
STAC-A2.β2.LOG.TIME	Time to compute logs	0.001	0.001
STAC-A2.β2.UNR.TIME	Time to compute unit-normal randoms	0.079	0.002
STAC-A2.β2.CORRAND.TIME	Time to compute correlated randoms	0.002	0.001
STAC-A2.β2.PATHGEN.TIME	Time to generate paths	0.083	0.003
STAC-A2.β2.EARLYEX.TIME	Time to compute payoff with early exercise	0.011	0.007
STAC-A2.β2.THETA.TIME	Time to compute theta	0.107	0.010
STAC-A2.β2.RHO.TIME	Time to compute rho	0.107	0.010
STAC-A2.β2.DELTA.TIME	Time to compute delta	0.135	0.010
STAC-A2.β2.GAMMA.TIME	Time to compute gamma	0.125	0.010
STAC-A2.β2.CROSSGAMMA.TIME	Time to compute cross-gamma	0.128	0.013
STAC-A2.β2.MODELVEGA.TIME	Time to compute model vega	0.126	0.012
STAC-A2.β2.CORRVEGA.TIME	Time to compute correlation vega	0.123	0.011
STAC-A2.β2.GREEKS.TIME	End-to-end time to compute all Greeks	0.138	0.018

Your Browser is Out of Date