
Have you sized your Oracle database lately? Understanding server core counts and Oracle performance.
Mon, 24 Oct 2022 14:08:36 -0000
|Read Time: 0 minutes
Oracle specialists at Dell Technologies are frequently engaged by customers to help with database (DB) server sizing. That is, sizing DB servers that are stand-alone or part of a VxRail or PowerFlex system.
Server sizing is important in the fact that it helps ensure the CPU core count matches your workload needs, provides optimal performance, and helps manage Oracle DB software licensing costs.
Understanding CPU frequency
When sizing a server for an Oracle DB, CPU frequency matters. By CPU frequency, we are referring to the CPU cycle time, which can basically be thought of as the time it takes for the CPU to execute a simple instruction, for example, an Addition instruction.
In this blog post, we will use the base CPU frequency metric, which is the frequency the CPU would run at if turbo-boost and power management options were disabled in the system BIOS.
Focus during the sizing process is therefore on:
- Single-threaded transactions
- The best host-based performance per DB session
Packing as many CPU cores as possible into a server will increase the aggregate workload capability of the server more than it will improve per-DB session performance. Moreover, in most cases, a server that runs Oracle does not require 40+ CPU cores. Using the highest base-frequency CPU will provide the best performance per DB session.
Why use a 3.x GHz frequency CPU?
CPU cycle time is essentially the time it takes to perform a simple instruction, for example, an Addition instruction. A 3.x GHz CPU cycle time is lower on the higher frequency CPU:
- A 2.3 GHz CPU cycle time is .43 ns
- A 3.4 GHz CPU cycle time is .29 ns
A 3.x GHz CPU achieves an approximate 33% reduction in cycle time with the higher base frequency CPU.
Although a nanosecond may seem like an extremely short period of time, in a typical DB working environment, those nanoseconds add up. For example, if a DB is driving 3,000,000+ logical reads per second while also doing all the other host-based work, speed may suffer.
An Oracle DB performs an extreme amount of host-based processing that eats up CPU cycles. For example:
- Traversing the Cache Buffer Chain (CBC)
- The CBC is a collection of hash buckets and linked lists located in server memory.
- They are used to locate a DB block in the server buffer cache and then find the needed row and column the query needs.
- SQL Order by processing
- Sorting and/or grouping the SQL results set helps the server memory create the results set in the order the user intended to see.
- Parsing SQL
- Before a SQL statement can be started, it must be parsed.
- CPU is used for soft and hard parsing.
- PLSQL
- PLSQL is the procedural logic the application uses to run IF-THEN-ELSE, and DO-WHILE logic. It is the application logic on the DB server.
- Chaining a DB block just read from storage
- Once a new DB block is read from storage, it must be placed on the CBC and a memory buffer is “allocated” for its content.
- DB session logons and logoffs
- Session logons and logoffs allocate and de-allocate memory areas for the DB session to run.
- ACO – Advanced Compression Option
- ACO is eliminating identical column data in a DB block – once again, more CPU and RAM work to walk through server memory to eliminate redundant data.
- TDE – Transparent Data Encryption
- TDE encrypts and decrypts data in server memory when accessed by the user – on large data Inserts/Updates extra CPU is needed to encrypt the data during the transaction.
All of this host-based processing drives CPU instruction execution and memory access (remote/local DIMMs and L1, L2, and L3 caches). When there are millions of these operations occurring per second, it supports the need for the fastest CPU to complete CPU instruction execution in the shortest amount of time. Nanosecond execution time adds up.
Keep in mind, the host O/S and the CPU itself are:
- Running lots of background memory processes supporting the SGA and PGA accesses
- Using CPU cycles to traverse memory addresses to find and move the data the DB session needs
- Working behind the scenes to transfer cache lines from NUMA node to NUMA node, DIMM to the CPU core, and Snoopy/MESI protocol work to keep cache lines in-sync
In the end, these processes consume some additional CPU cycles on top of what the DB is consuming, adding to the need for the fastest CPU for our DB workload.
SLOB Logical Read testing:
The 3.x GHz recommendation is based on various Oracle logical read tests that were run with a customer using various Intel CPU models of the same family. The customer goal was to find the CPU that performs the highest number of logical read tests per second to support a critical application that predominantly runs single-threaded queries (such as no PQ – Parallel Query).
Each of the tests showed a higher base-frequency CPU outperformed a lower base-frequency CPU on a per-CPU core basis running SLOB logical read tests (i.e., no physical I/O was done, all host-based processing – only using CPU, Cache, DIMMs, and remote NUMA accesses).
In the tests performed, the comparison point across the various CPUs and their tests was the number of logical reads per second per CPU core. Moreover, the host was 100% dedicated to our testing and no physical I/O was done for the transactions to avoid introducing physical I/O response times in the results.
The following graphic depicts a few examples from the many tests run using a single SLOB thread (a DB session), and the captured logical read-per-second counts per test.
The CPU baseline was an E5-4627 v2, which has a 3.3GHz base frequency, but a 7.2GTs QPI and 4 x DDR3-1866 DIMM channels, versus the Skylake CPUs having a 10.2GTs UPI and 6 x DDR4-2666 DIMMs.
Why is there a variance in the number of logical reads per second between the test run of the same CPU model on the same server? Because SLOB runs a random read DB block profile which leads to more or less DB blocks being found on the local NUMA node the DB session was running on in the various tests. Hence, where fewer DB blocks were present, the remote NUMA node had to transfer DB blocks over the UPI to the local NUMA node, which takes time to complete.
If you are still wondering if Dell Technologies can help you find the right CPU and PowerEdge server for your Oracle workloads, consider the following:
- Dell has many PowerEdge server models to choose from, all of which provide the ability to support specific workload needs.
- Dell PowerEdge servers let you choose from several Intel and AMD CPU models and RAM sizes.
- The Oracle specialists at Dell Technologies can perform AWR workload assessments to ensure you get the right PowerEdge server, CPU model, and RAM configurations to achieve optimal Oracle performance.
Summary
If per-DB session, host-based performance is important to you and your business, then CPU matters. Always use the highest base-frequency CPU you can, to meet CPU power needs and maintain faster cycle times for Oracle DB performance.
If you need more information or want to discuss this topic further, a Dell Oracle specialist is ready to help analyze your application and database environment and to recommend the appropriate CPU and PowerEdge server models to meet your needs. Contact your Dell representative for more information or email us at askaworkloadspecialist@dell.com
Related Blog Posts

Get the most from your Oracle database
Wed, 30 Mar 2022 15:09:53 -0000
|Read Time: 0 minutes
This year, the Oracle team at Dell Technologies set out to help our customers get the most from their database investments. To determine optimal price performance, we started by validating three different PowerEdge R750 configurations.
The first configuration was an Intel Xeon Gold 5318Y CPU with 24 processor cores, a 2.10 GHz clock speed, and a cache of 36 MB. This PowerEdge configuration had the highest core count of all the three configurations, so we expected it to be the performance leader in our workload tests.
The second configuration included two Intel Xeon Gold 6334 CPUs, each with eight processor cores for a total of 16 cores. Each CPU had the same clock speed of 3.6 GHz, and each with 18 MB cache for a total of 36 MB. This configuration had a much higher clock speed, even though it had eight fewer processor cores.
The third and final configuration had the fewest cores, with only one Intel Xeon Gold 6334 CPU with eight processor cores, a 3.6 GHz clock speed, and an 18 MB cache used for the workload tests. We did not expect this entry-level configuration to match the performance of the first configuration as it has 16 fewer processor cores. The main reason that we included this configuration was to provide insights into how a PowerEdge design can start with one processor and scale-up when adding another processor.
All three configurations were tested with an OLTP workload using HammerDB load generation tool. The TPROC-C workload configuration is described in Table 1.
Table 1: Virtual users and related HammerDB workload parameters
Specifications | Use case 1: 1x 5318Y CPU | Use case 2: 2 x 6334 CPUs | Use case 3: 1 x 6334 CPU |
---|---|---|---|
Virtual Users | 120 | 120 | 60 |
User Delay (ms) | 500 | ||
Repeat Delay (ms) | 500 | ||
Iterations | 1 |
We measured our results in New Orders Per Minute (NOPM), as this metric facilitates price performance comparison between different database systems.
Use Case 1: 1 x 5318Y CPU | Use Case 2: 2 x 6334 CPUs | Use Case 3: 1 x 6334 CPU |
---|---|---|
Compared to use case 1: | 75% of NOPM | 47% of NOPM |
Figure 1: NOPM for each use case
Unsurprisingly, use case 1 was the performance leader. Although we expected this result, we did not expect the high NOPM performance for use cases 2 and 3. Use case 2 supported 75 percent of the NOPM workload of use case 1, with eight fewer processor cores. Use case 3 supported 47 percent of the NOPM of use case 1, with 16 fewer processor cores. The higher base clock speed in both use cases 2 and 3 seemed to increase efficiency.
Estimated costs are important factors that impact the efficiency of each PowerEdge configuration. Table 2 shows the cost of the PowerEdge configurations combined with the Oracle Database Enterprise Edition (EE) licensing using the processor metric. For a more detailed overview of costs, read the white paper.
Table 2: Server and Oracle EE processor core license costs (non-discounted, no support costs)
1 CPU w/ 24 cores 2.1 GHz | 2 CPUs w/ 8 cores 3.6 GHz | 1 CPU w/ 8 cores 3.6 GHz | |
---|---|---|---|
License Totals | $570,000.00 | $380,000.00 | $190,000.00 |
PowerEdge R750 costs | $70,320.00 | $98,982 | $71,790.00 |
Grand total costs | $640,320.00 | $478,982.00 | $261,790.00 |
Price per Performance-efficiency-per-processor-core is calculated by taking the total cost of each configuration divided by the number of NOPM per processor core.
1 CPU w/ 24 cores 2.1 GHz | 2 CPUs w/ 8 cores 3.6 GHz | 1 CPU w/ 8 cores 3.6 GHz | |
---|---|---|---|
Total cost | $640,320.00 | $478,982.00 | $261,790.00 |
NOPM per processor core | 62,414 | 69,806 | 87,598 |
Price per Performance-Efficiency-per-processor-core | $10.26 | $6.86 | $2.17 |
Use cases 2 and 3 were more efficient in terms of performance than use case 1, as both configurations processed more NOPM per processor core. Although use case 1 was the performance leader, it had highest price per performance-efficiency-per-processor-core. The high price performance efficiency per core relates to how high it would cost to generate the number of NOPM per core that the system was able to process. More information about the DBAs and wait events for each use case is available in this white paper.
The PowerEdge configuration for use case 2 had twice the amount of RAM than use case 1, meaning the system was more expensive. However, the additional memory had a minimal impact on performance, as the database SGA size remained the same for each validation test.
Use case 3 had the best price performance-efficiency per core at $2.17 which provided a 79 percent savings compared to use case 1 ($10.26). This result shows that customers can start with a small Oracle database configuration with one low core count high frequency CPU and achieve up to 47 percent of the performance that a very high core count low frequency CPU can achieve. For the processor configurations used in use case 1 and 3 this savings that are equated to $379,530, which is significant savings in terms of dollar amount!
The two CPUs configuration in use case 2 cost $6.86, which provided a 33 percent savings compared to use case 1 ($10.26). This result shows that when customers must support larger workload arises customers can scale-up by adding a second CPU to their server and achieve up to 75 percent of the performance that a very high core count low frequency CPU can achieve and still end up paying less than what they would’ve paid had they started with the high core count low frequency to start with. For the processor configurations used in use case 1 and 2 these savings equated to $161,388, which is again good savings in terms of dollar amounts.
Each database is unique, and the savings results demonstrated in our validation tests may vary depending on your database system. Dell Technologies has an experienced Oracle team that can configure PowerEdge systems based on your performance requirements and can help you optimize price efficiency for your databases. For customers who are interested in more information, see the following next steps:
- Read the white paper: Accelerate Oracle Databases and Maximize Your Investment
- Connect with your Dell Technologies Representative and mention that you want to upgrade your Oracle databases with this price efficiency approach

Introducing the Accelerator Nodes – the Latest Additions to the Dell PowerScale Family
Thu, 20 Jan 2022 14:45:39 -0000
|Read Time: 0 minutes
The Dell PowerScale family announced a recent addition with the latest release of accelerator nodes. Accelerator nodes contribute additional CPU, memory, and network bandwidth to a cluster that already has adequate storage resources.
The PowerScale accelerator nodes include the PowerScale P100 performance accelerator and the PowerScale B100 backup accelerator. Both the P100 and B100 are based on 1U PowerEdge R640 servers and can be part of a PowerScale cluster that is powered by OneFS 9.3 or later. The accelerator nodes contain boot media only and are optimized for CPU/memory configurations. A single P100 or B100 node can be added to a cluster. Expansion is through single node increments.
PowerScale all-flash and all-NVMe storage deliver the necessary performance to meet demanding workloads. If additional capabilities are required, new nodes can be non-disruptively added to the cluster, to provide both performance and capacity. There may be specialized compute-bound workloads that require extra performance but don’t need any additional capacity. These types of workloads may benefit by adding the PowerScale P100 performance accelerator node to the cluster. The accelerator node contributes CPU, memory, and network bandwidth capabilities to the cluster. This accelerated storage solution delivers incremental performance at a lower cost. Let’s look at each in detail.
A PowerScale P100 Performance Accelerator node adds performance to the workflows on a PowerScale cluster that is composed of CPU-bound nodes. The P100 provides a dedicated cache, separate from the cluster. Adding CPU to the cluster will improve performance where there are read/re-read intensive workloads. The P100 also provides additional network bandwidth to a cluster through the additional front-end ports.
With rapid data growth, organizations are challenged by shrinking backup windows that impact business productivity and the ability to meet IT requirements for tape backup, and compliance archiving. In such an environment, providing fast, efficient, and reliable data protection is essential. Given the 24x7 nature of the business, a high-performance backup solution delivers the performance and scale to address the SLAs of the business. Adding one or more PowerScale B100 backup accelerator nodes to a PowerScale cluster can reduce risk while addressing backup protection needs.
A PowerScale B100 Backup Accelerator enables backing up a PowerScale cluster using a two-way NDMP protocol. The B100 is delivered in a cost-effective form factor to address the SLA targets and tape backup needs of a wide variety of workloads. Each node includes Fibre Channel ports that can connect directly to a tape subsystem or a Storage Area Network (SAN). The B100 can benefit backup operations as it reduces overhead on the cluster, by going through the Fibre Channel ports directly, thereby separating front-end and NDMP traffic.
The PowerScale P100 and B100 nodes can be monitored using the same tools available today, including the OneFS web administration interface, the OneFS command-line interface, Dell DataIQ, and InsightIQ.
In a world where unstructured data is growing rapidly and taking over the data center, organizations need an enterprise storage solution that provides the flexibility to address the additional performance needs of certain workloads, and that meets the organization’s overall data protection requirements.
The following information provides the technical specifications and best practice design considerations of the PowerScale Accelerator nodes:
- PowerScale Accelerator Nodes Specification Sheet
- PowerScale: NDMP Technical Overview and Design Considerations
- PowerScale: Accelerator Nodes Overview and General Best Practices
Author: Cris Banson