Home Workload Solutions Oracle Blogs

Have you sized your Oracle database lately? Understanding server core counts and Oracle performance.

Mon, 24 Oct 2022 14:08:36 -0000

Read Time: 0 minutes

Matt Kaberlein

Oracle specialists at Dell Technologies are frequently engaged by customers to help with database (DB) server sizing. That is, sizing DB servers that are stand-alone or part of a VxRail or PowerFlex system.

Server sizing is important in the fact that it helps ensure the CPU core count matches your workload needs, provides optimal performance, and helps manage Oracle DB software licensing costs.

Understanding CPU frequency

When sizing a server for an Oracle DB, CPU frequency matters. By CPU frequency, we are referring to the CPU cycle time, which can basically be thought of as the time it takes for the CPU to execute a simple instruction, for example, an Addition instruction.

In this blog post, we will use the base CPU frequency metric, which is the frequency the CPU would run at if turbo-boost and power management options were disabled in the system BIOS.

Focus during the sizing process is therefore on:

Single-threaded transactions
The best host-based performance per DB session

Packing as many CPU cores as possible into a server will increase the aggregate workload capability of the server more than it will improve per-DB session performance. Moreover, in most cases, a server that runs Oracle does not require 40+ CPU cores. Using the highest base-frequency CPU will provide the best performance per DB session.

Why use a 3.x GHz frequency CPU?

CPU cycle time is essentially the time it takes to perform a simple instruction, for example, an Addition instruction. A 3.x GHz CPU cycle time is lower on the higher frequency CPU:

A 2.3 GHz CPU cycle time is .43 ns
A 3.4 GHz CPU cycle time is .29 ns

A 3.x GHz CPU achieves an approximate 33% reduction in cycle time with the higher base frequency CPU.

Although a nanosecond may seem like an extremely short period of time, in a typical DB working environment, those nanoseconds add up. For example, if a DB is driving 3,000,000+ logical reads per second while also doing all the other host-based work, speed may suffer.

An Oracle DB performs an extreme amount of host-based processing that eats up CPU cycles. For example:

Traversing the Cache Buffer Chain (CBC)
1. The CBC is a collection of hash buckets and linked lists located in server memory.
2. They are used to locate a DB block in the server buffer cache and then find the needed row and column the query needs.
SQL Order by processing
1. Sorting and/or grouping the SQL results set helps the server memory create the results set in the order the user intended to see.
Parsing SQL
1. Before a SQL statement can be started, it must be parsed.
2. CPU is used for soft and hard parsing.
PLSQL
1. PLSQL is the procedural logic the application uses to run IF-THEN-ELSE, and DO-WHILE logic. It is the application logic on the DB server.
Chaining a DB block just read from storage
1. Once a new DB block is read from storage, it must be placed on the CBC and a memory buffer is “allocated” for its content.
DB session logons and logoffs
1. Session logons and logoffs allocate and de-allocate memory areas for the DB session to run.
ACO – Advanced Compression Option
1. ACO is eliminating identical column data in a DB block – once again, more CPU and RAM work to walk through server memory to eliminate redundant data.
TDE – Transparent Data Encryption
1. TDE encrypts and decrypts data in server memory when accessed by the user – on large data Inserts/Updates extra CPU is needed to encrypt the data during the transaction.

All of this host-based processing drives CPU instruction execution and memory access (remote/local DIMMs and L1, L2, and L3 caches). When there are millions of these operations occurring per second, it supports the need for the fastest CPU to complete CPU instruction execution in the shortest amount of time. Nanosecond execution time adds up.

Keep in mind, the host O/S and the CPU itself are:

Running lots of background memory processes supporting the SGA and PGA accesses
Using CPU cycles to traverse memory addresses to find and move the data the DB session needs
Working behind the scenes to transfer cache lines from NUMA node to NUMA node, DIMM to the CPU core, and Snoopy/MESI protocol work to keep cache lines in-sync

In the end, these processes consume some additional CPU cycles on top of what the DB is consuming, adding to the need for the fastest CPU for our DB workload.

SLOB Logical Read testing:

The 3.x GHz recommendation is based on various Oracle logical read tests that were run with a customer using various Intel CPU models of the same family. The customer goal was to find the CPU that performs the highest number of logical read tests per second to support a critical application that predominantly runs single-threaded queries (such as no PQ – Parallel Query).

Each of the tests showed a higher base-frequency CPU outperformed a lower base-frequency CPU on a per-CPU core basis running SLOB logical read tests (i.e., no physical I/O was done, all host-based processing – only using CPU, Cache, DIMMs, and remote NUMA accesses).

In the tests performed, the comparison point across the various CPUs and their tests was the number of logical reads per second per CPU core. Moreover, the host was 100% dedicated to our testing and no physical I/O was done for the transactions to avoid introducing physical I/O response times in the results.

The following graphic depicts a few examples from the many tests run using a single SLOB thread (a DB session), and the captured logical read-per-second counts per test.

The CPU baseline was an E5-4627 v2, which has a 3.3GHz base frequency, but a 7.2GTs QPI and 4 x DDR3-1866 DIMM channels, versus the Skylake CPUs having a 10.2GTs UPI and 6 x DDR4-2666 DIMMs.

Why is there a variance in the number of logical reads per second between the test run of the same CPU model on the same server? Because SLOB runs a random read DB block profile which leads to more or less DB blocks being found on the local NUMA node the DB session was running on in the various tests. Hence, where fewer DB blocks were present, the remote NUMA node had to transfer DB blocks over the UPI to the local NUMA node, which takes time to complete.

If you are still wondering if Dell Technologies can help you find the right CPU and PowerEdge server for your Oracle workloads, consider the following:

Dell has many PowerEdge server models to choose from, all of which provide the ability to support specific workload needs.
Dell PowerEdge servers let you choose from several Intel and AMD CPU models and RAM sizes.
The Oracle specialists at Dell Technologies can perform AWR workload assessments to ensure you get the right PowerEdge server, CPU model, and RAM configurations to achieve optimal Oracle performance.

Summary

If per-DB session, host-based performance is important to you and your business, then CPU matters. Always use the highest base-frequency CPU you can, to meet CPU power needs and maintain faster cycle times for Oracle DB performance.

If you need more information or want to discuss this topic further, a Dell Oracle specialist is ready to help analyze your application and database environment and to recommend the appropriate CPU and PowerEdge server models to meet your needs. Contact your Dell representative for more information or email us at askaworkloadspecialist@dell.com

Tags:

Hardware	Details
Server	Dell PowerEdge C6620
Processor Model	Intel® Xeon® Gold 6414U (Sapphire Rapids)
Processors per Node	2
Processor Core Count	32
Processor Frequency	2GHz
Host Memory	256 GB, 8 x 32GB

Software	Details
Omnia	https://github.com/dell/omnia
Rocky Linux 8.8	https://rockylinux.org/
Intel® Extensions for Transformers	https://github.com/intel/intel-extension-for-transformers
Intel® Neural Speed	https://github.com/intel/neural-speed

Your Browser is Out of Date

Have you sized your Oracle database lately? Understanding server core counts and Oracle performance.

Understanding CPU frequency

Why use a 3.x GHz frequency CPU?

SLOB Logical Read testing:

Summary

Related Blog Posts

CPU to the Rescue: LLMs for Everyone

Optimizing Large Language Models

Hardware

Llama 2 with Intel® Neural Speed

Conclusion

Improving Oracle Performance with New Dell 4 Socket Servers