Have you sized your Oracle database lately? Understanding server core counts and Oracle performance.
Mon, 24 Oct 2022 14:08:36 -0000|
Read Time: 0 minutes
Oracle specialists at Dell Technologies are frequently engaged by customers to help with database (DB) server sizing. That is, sizing DB servers that are stand-alone or part of a VxRail or PowerFlex system.
Server sizing is important in the fact that it helps ensure the CPU core count matches your workload needs, provides optimal performance, and helps manage Oracle DB software licensing costs.
Understanding CPU frequency
When sizing a server for an Oracle DB, CPU frequency matters. By CPU frequency, we are referring to the CPU cycle time, which can basically be thought of as the time it takes for the CPU to execute a simple instruction, for example, an Addition instruction.
In this blog post, we will use the base CPU frequency metric, which is the frequency the CPU would run at if turbo-boost and power management options were disabled in the system BIOS.
Focus during the sizing process is therefore on:
- Single-threaded transactions
- The best host-based performance per DB session
Packing as many CPU cores as possible into a server will increase the aggregate workload capability of the server more than it will improve per-DB session performance. Moreover, in most cases, a server that runs Oracle does not require 40+ CPU cores. Using the highest base-frequency CPU will provide the best performance per DB session.
Why use a 3.x GHz frequency CPU?
CPU cycle time is essentially the time it takes to perform a simple instruction, for example, an Addition instruction. A 3.x GHz CPU cycle time is lower on the higher frequency CPU:
- A 2.3 GHz CPU cycle time is .43 ns
- A 3.4 GHz CPU cycle time is .29 ns
A 3.x GHz CPU achieves an approximate 33% reduction in cycle time with the higher base frequency CPU.
Although a nanosecond may seem like an extremely short period of time, in a typical DB working environment, those nanoseconds add up. For example, if a DB is driving 3,000,000+ logical reads per second while also doing all the other host-based work, speed may suffer.
An Oracle DB performs an extreme amount of host-based processing that eats up CPU cycles. For example:
- Traversing the Cache Buffer Chain (CBC)
- The CBC is a collection of hash buckets and linked lists located in server memory.
- They are used to locate a DB block in the server buffer cache and then find the needed row and column the query needs.
- SQL Order by processing
- Sorting and/or grouping the SQL results set helps the server memory create the results set in the order the user intended to see.
- Parsing SQL
- Before a SQL statement can be started, it must be parsed.
- CPU is used for soft and hard parsing.
- PLSQL is the procedural logic the application uses to run IF-THEN-ELSE, and DO-WHILE logic. It is the application logic on the DB server.
- Chaining a DB block just read from storage
- Once a new DB block is read from storage, it must be placed on the CBC and a memory buffer is “allocated” for its content.
- DB session logons and logoffs
- Session logons and logoffs allocate and de-allocate memory areas for the DB session to run.
- ACO – Advanced Compression Option
- ACO is eliminating identical column data in a DB block – once again, more CPU and RAM work to walk through server memory to eliminate redundant data.
- TDE – Transparent Data Encryption
- TDE encrypts and decrypts data in server memory when accessed by the user – on large data Inserts/Updates extra CPU is needed to encrypt the data during the transaction.
All of this host-based processing drives CPU instruction execution and memory access (remote/local DIMMs and L1, L2, and L3 caches). When there are millions of these operations occurring per second, it supports the need for the fastest CPU to complete CPU instruction execution in the shortest amount of time. Nanosecond execution time adds up.
Keep in mind, the host O/S and the CPU itself are:
- Running lots of background memory processes supporting the SGA and PGA accesses
- Using CPU cycles to traverse memory addresses to find and move the data the DB session needs
- Working behind the scenes to transfer cache lines from NUMA node to NUMA node, DIMM to the CPU core, and Snoopy/MESI protocol work to keep cache lines in-sync
In the end, these processes consume some additional CPU cycles on top of what the DB is consuming, adding to the need for the fastest CPU for our DB workload.
SLOB Logical Read testing:
The 3.x GHz recommendation is based on various Oracle logical read tests that were run with a customer using various Intel CPU models of the same family. The customer goal was to find the CPU that performs the highest number of logical read tests per second to support a critical application that predominantly runs single-threaded queries (such as no PQ – Parallel Query).
Each of the tests showed a higher base-frequency CPU outperformed a lower base-frequency CPU on a per-CPU core basis running SLOB logical read tests (i.e., no physical I/O was done, all host-based processing – only using CPU, Cache, DIMMs, and remote NUMA accesses).
In the tests performed, the comparison point across the various CPUs and their tests was the number of logical reads per second per CPU core. Moreover, the host was 100% dedicated to our testing and no physical I/O was done for the transactions to avoid introducing physical I/O response times in the results.
The following graphic depicts a few examples from the many tests run using a single SLOB thread (a DB session), and the captured logical read-per-second counts per test.
The CPU baseline was an E5-4627 v2, which has a 3.3GHz base frequency, but a 7.2GTs QPI and 4 x DDR3-1866 DIMM channels, versus the Skylake CPUs having a 10.2GTs UPI and 6 x DDR4-2666 DIMMs.
Why is there a variance in the number of logical reads per second between the test run of the same CPU model on the same server? Because SLOB runs a random read DB block profile which leads to more or less DB blocks being found on the local NUMA node the DB session was running on in the various tests. Hence, where fewer DB blocks were present, the remote NUMA node had to transfer DB blocks over the UPI to the local NUMA node, which takes time to complete.
If you are still wondering if Dell Technologies can help you find the right CPU and PowerEdge server for your Oracle workloads, consider the following:
- Dell has many PowerEdge server models to choose from, all of which provide the ability to support specific workload needs.
- Dell PowerEdge servers let you choose from several Intel and AMD CPU models and RAM sizes.
- The Oracle specialists at Dell Technologies can perform AWR workload assessments to ensure you get the right PowerEdge server, CPU model, and RAM configurations to achieve optimal Oracle performance.
If per-DB session, host-based performance is important to you and your business, then CPU matters. Always use the highest base-frequency CPU you can, to meet CPU power needs and maintain faster cycle times for Oracle DB performance.
If you need more information or want to discuss this topic further, a Dell Oracle specialist is ready to help analyze your application and database environment and to recommend the appropriate CPU and PowerEdge server models to meet your needs. Contact your Dell representative for more information or email us at email@example.com
Related Blog Posts
Do We Always Need GPUs for AI Workloads?
Sun, 10 Sep 2023 15:52:30 -0000|
Read Time: 0 minutes
Graphics Processing Units (GPUs) have long been the preferred choice for accelerating AI workloads, especially deep learning tasks. However, the assumption that GPUs are indispensable for all AI applications merits a closer examination.
In this blog, we shift the focus to Central Processing Units (CPUs), delve into the role of CPU performance in AI workloads, and investigate scenarios where CPUs might offer competitive or even superior performance compared to GPUs.
To measure the performance of AI inference workload types on CPUs, we used the TensorFlow benchmark. TensorFlow is a benchmark with implementations of popular convolutional neural networks for large-scale image recognition (VGG-16, AlexNet, GoogLeNet, and ResNet-50) and various batch sizes (16, 32, 64, 256, and 512). It is designed to support workloads running on a single machine as well as workloads running in distributed mode across multiple hosts. The study looks at all subtests in TensorFlow.
We looked at the performance trend that each model shows for the different batch sizes to decide which of the 1-socket PowerEdge R7615 and 2-socket PowerEdge R7625 versions is suitable for a CPU-based AI inference type of workload.
The following figures show the performance of convolutional models on different batch sizes in the balanced, 12-DIMMs-per-socket configuration with memory capacity of 64 GB per DIMM in PowerEdge 7625 and 7615 with 4th Gen AMD EPYC 9654 and 9654P processors:
Figure 1. Performance of convolutional models on different batch sizes in a balanced, 12-DIMMs-per-socket configuration with memory capacity of 64 GB per DIMM with default BIOS settings
The batch size can vary depending on several factors, including the specific application, available computational resources, and hardware constraints. Generally, larger batch sizes are preferred because they offer better parallelization and computational efficiency, but they also require more memory. As we can see in the line graphs, the 2-socket server (PowerEdge R7625) outperforms the 1-socket server (PowerEdge R7615) by up to 150 percent in smaller batch sizes.
We found that the performance of smaller batch sizes is great in CPUs and suggest that our customers buy that configuration based on performance, business requirements and future scalability.
In practice, the choice between CPU-based and GPU-based AI inference depends on the specific requirements of the application. Some AI workloads benefit more from the parallel processing capabilities of GPUs, while others may prioritize low latency and versatile processing, which CPUs can provide.
Ultimately, the choice between using GPUs or CPUs for AI workloads should be based on a thorough understanding of the workload's characteristics, performance requirements, available hardware, and budget considerations. In some cases, a combination of different hardware components might also be a viable solution to optimize performance and cost.
You can find more about this on CPU-based AI inference | Workload-Based DDR5 Memory Guidance for Next-Generation PowerEdge Servers | Dell Technologies Info Hub.
Author: Swaraj Mohapatra
Dell PowerStore Native Integration with Dell PowerProtect DD Series Appliances for DP in Oracle Environments
Wed, 10 May 2023 13:29:40 -0000|
Read Time: 0 minutes
Having many years of production database experience, I know DBAs will be eager to give this new capability a try.
PowerStoreOS 3.5 has just been released and with it comes more data protection capabilities. This translates into a more robust arsenal of tools and capabilities for the DBA’s tool kit. The feature provides a way to quickly backup, restore, and thinly provision database clones for different environments from a remote data source!
There are several data protection enhancements added to PowerStoreOS 3.5. One that excites me is a native backup solution that integrates with remote data storage appliances: Dell PowerProtect DD series appliances.
Integration with PowerProtect DD series appliances
The native integration from PowerStore to PowerProtect DD series appliances allows DBAs to create a crash consistent backup image of a database in the form of a remote snapshot. Administrators can initiate this remote snapshot directly from PowerStore Manager with a remote connection to a PowerProtect DD appliance. Because the snapshot is created directly on the PowerProtect DD appliance, no additional storage is consumed on PowerStore, making this backup strategy storage efficient. This close integration significantly streamlines the process of transmitting backups to the remote PowerProtect DD Series appliance by eliminating the need for a dedicated backup host, and reduces time and complexity.
This removes the worries of having to work with other types of remote storage, including tape, for your Oracle crash consistent images!
Remote Backup Rules
New to PowerStoreOS 3.5 is a new protection rule called a Remote Backup Rule, which sets a schedule by which remote snapshots are taken. The rule allows users to create remote crash consistent Oracle database snapshots on a PowerProtect DD Series appliance.
A Remote Backup Rule can be scheduled for any day of the week with a frequency of every 6, 12, or 24 hours, or at a specific time of day. The default retention period for the remote snapshots is 7 days but can be configured up to 70 years!
After a Remote Backup Rule is created, it can be added to a protection Policy. In the following figure, remote backup rule ORA-ASM-DB2 has been added to the ORA-ASM-DB2 protection policy.
After the remote backup rule is added to a protection policy, the protection policy can be assigned to the storage resource used by the Oracle database, such as a volume group with write-order consistency enabled. The following figure shows that the PowerStore volume group ORA-ASM-DB2 is assigned the ORA-ASM-DB2 protection policy.
Remote snapshots will now be automatically generated, based on the defined schedule in the remote backup rule ORA-ASM-DB2.
Creating remote snapshots
Creating remote snapshots are just clicks away! To create a remote snapshot manually, select Protection, then Remote Backup. Click BACKUP SESSIONS, select the desired backup session, then click BACKUP. That’s all that’s needed to create the snapshot on the remote PowerProtect DD appliance.
Effortless retrieval of remote backups of Oracle databases
The PowerStoreOS 3.5 retrieve option has several use cases. Here I’ll mention two. One is to retrieve a remote snapshot for a currently existing PowerStore resource and restore it. Another is to create a thin clone of the PowerStore resource from the retrieved remote snapshot. For Oracle databases, the PowerStore resource is the PowerStore Volume Group that contains the volumes that are hosting the Oracle database.
To retrieve a remote snapshot for a volume group, it’s as simple as six mouse clicks from the Remote Backup feature!:
RESOURCES --> select the PowerStore resource --> MANAGE SNAPSHOTS:
Select the remote snapshot --> RETRIEVE --> RETRIEVE:
When the PowerStore storage resources have been recovered by the snapshot, simply start the Oracle database and Oracle does the rest with crash recovery.
PowerStoreOS 3.5 provides enhanced data protection features that simplify the DBA’s tasks of taking backups, recoveries, restores, and cloning databases with a few mouse clicks. DBAs will find this feature a great tool for their toolkit.
PowerStoreOS 3.5 is truly a game changer! This new data protection capability is just one of the numerous features introduced in PowerStoreOS 3.5. Be sure to keep an eye out for additional blog posts that showcase these valuable features.
Author: Mark Tomczik