Talking CloudIQ: Capacity Monitoring and Planning
Fri, 09 Dec 2022 15:37:42 -0000
|Read Time: 0 minutes
Introduction
This is the third in a series of blogs discussing CloudIQ. In my first blog, I provided a high-level overview of CloudIQ and some of its key features. My second blog talked about the CloudIQ Proactive Health Score. I will continue the series with a discussion of the capacity monitoring and planning features in CloudIQ.
Planning ahead
Capacity monitoring helps you plan for expansions of storage arrays, data protection appliances, storage-as-a-service, and hyperconverged infrastructure (HCI) to help overcome unexpected spikes in storage consumption. CloudIQ uses advanced analytics to provide short-term capacity prediction analysis, longer-term capacity forecasting, and capacity anomaly detection. Capacity anomaly detection is the identification of a sudden surge in utilization that may result in a space full condition in less than 24 hours.
The CloudIQ Home page displays the Capacity Approaching Full tile which identifies storage entities that are full or expected to be full in each of the following time ranges:
- Imminent (predicted to run out of space within 24 hours)
- Full
- Within a week
- Within a month
- Within a quarter
Figure 1. The Capacity Approaching Full tile
In situations where there is a storage entity in the Imminent category, CloudIQ identifies the components of the entity that are experiencing the sudden increase in utilization. This gives users the necessary information about where to look to correct the offending behavior. In the following example, CloudIQ has identified a storage pool that is expected to run out of space in five hours. The pool details page identifies the file systems and LUNs that are the top contributors to the expected rise in utilization.
Figure 2. Capacity Forecast for a pool that has a capacity anomaly
Two other CloudIQ features help you quickly find a solution for storage that is fast approaching full. First, there is the identification of reclaimable storage that shows you where you can recover unused capacity in a system. Second, there is the multisystem capacity view that lets you scan all your storage systems to pinpoint which have excess capacity to relieve approaching-full systems of their workloads.
Reclaimable storage
CloudIQ identifies different types of storage that are potentially reclaimable. The following criteria are used to identify reclaimable storage:
- Block objects with no hosts attached
- File objects with no front end I/O in the past week
- Block objects with no front end I/O in the past week
- Block-based virtual machines that have been shut down for the past week
- File-based virtual machines that have been shut down for the past week
Users can quickly see the storage objects, where the object resides, and the amount of reclaimable space. The Last IO Time is provided for block and file objects that have no detected IO activity in the last week. For VMs that have been shut down for at least a week, the storage object on which the VM resides along with the vCenter and time that the VM was shut down is available. The following figure shows an example of reclaimable storage for block objects that have had no front-end IO activity in the past week.
Figure 3. The Reclaimable Storage page – Block Objects with no front end IO activity
Multisystem capacity view
The multisystem capacity view provides a quick view of physical usable, used, free, and storage efficiencies across all storage, HCI, and data protection systems monitored by CloudIQ. This allows users to see quickly which systems are low on usable space, determine which systems are good targets for workload migration, and verify that their storage efficiencies and data reduction numbers are what they are expecting.
Figure 4. Multisystem capacity view for storage
Storage system details
Detailed capacity views for storage systems and storage objects provide additional information, including data efficiencies and data reduction metrics. The following figure shows the physical and logical storage breakdown and data reduction charts for a PowerStore cluster.
Figure 5. PowerStore cluster storage details
For APEX block storage service subscriptions, CloudIQ provides both subscribed and physical storage views. Subscribed views provide the storage usage including base and on-demand storage usage.
Figure 6. APEX block storage services subscription view
Custom reports
With custom reports and the use of custom tags, users can create meaningful business reports and schedule those reports to be delivered to the required end users. Reports can include both line charts and tables and can be filtered on any field. The following figure shows a simple table that includes used and free capacities, data reduction values, and several custom tags.
Figure 7. Custom report for storage
Conclusion
CloudIQ’s intelligence and predictive analytics helps users proactively manage and accurately plan data storage and workload expansions, and to act quickly to avoid rapidly approaching capacity full conditions. Custom reports and tagging allows users to create, schedule, and deliver reports with technical and business information tailored to a wide variety of stakeholders. And for users looking to integrate data from CloudIQ with existing IT management tools, CloudIQ provides a public REST API.
Resources
How do you become more familiar with Dell Technologies and CloudIQ? The Dell Technologies Info Hub site provides expertise that helps to ensure customer success with Dell Technologies platforms. We have CloudIQ demos, white papers, and videos available at the Dell Technologies CloudIQ page. Also, feel free to reference the CloudIQ Overview Whitepaper which provides an in-depth summary of CloudIQ. Interested in DevOps? Go to our public API page for information about integrating CloudIQ with other IT tools using Webhooks or REST API.
Author: Derek Barboza, Senior Principal Engineering Technologist
Related Blog Posts
Talking CloudIQ: Proactive Health Scores
Fri, 05 Aug 2022 20:29:33 -0000
|Read Time: 0 minutes
Introduction
This is the second in a series of blogs discussing CloudIQ. In my first blog, I provided a high-level overview of CloudIQ and some of its key features. I will continue with a series of blogs, each talking about one of the key features in more detail. This blog discusses one of CloudIQ’s key differentiating features: the Proactive Health Score.
Proactive Health Score
The Proactive Health Score uses various factors to provide a consolidated view of a system’s health into a single health score. Health scores are based on up to five categories: Components, Configuration, Capacity, Performance, and Data Protection. Based on the resulting health score, the system is put into one of three risk categories: Poor, Fair, or Good. The score starts at 100 and is reduced by the issue with the highest deduction.
A system in the Poor category has a score of 0 to 70 and poses an imminent critical risk. It could be a storage pool that is overprovisioned and full, meaning that systems will be trying to write to storage that is unavailable. Or it could be a significant component failure. Whatever the issue, it is something that requires your immediate attention.
A system in the Fair category has a score of 71 to 94. Systems in this category have an issue that should be looked at, but certainly not something that requires you to get out of bed at 3:00am to address immediately. It could be something like a storage pool predicted to be full in a week or a system inlet temperature that exceeds the upper warning threshold on a PowerEdge server.
A system in the Good category has a score of 95 to 100 and is doing fine. There may be a minor issue that you need to look at, but nothing significant that is expected to cause any near-term problems. An example would be a fibre port with a warning status on a Connectrix switch.
Now what happens if there are multiple issues on a system? We hinted at this earlier. The score is only affected by the most critical issue. Let’s say that there are four issues on a system: one 30-point deduction, one 10-point deduction, and two 5-point deductions. In this case, the health score is 70. When the 30-point deduction is addressed, the score would become 90. We do this to prevent a system with several minor issues from appearing at high risk or at a higher risk than a system with a significant issue.
Figure 1. System Health page
Recommended resolution
So now that we have been notified of an issue on a system, what do we do next? Well, with CloudIQ, we will offer up recommended remediation actions to address the issue before it has a significant impact on the environment. This may come in the form of a recommended configuration change or other action, a knowledge base article with a resolution, or some commands to run to gain the necessary information to resolve the issue.
Figure 2. Recommended remediation
Health Score History
CloudIQ also tracks the history of the Proactive Health Score. We can see both new and resolved issues along a chart with a selectable date range. Details of the issues are listed below the chart. By providing the history of the health score, CloudIQ allows users to identify possible recurring issues in the environment.
Figure 3. Health Score history
Notifications
What if we do not want to log in to CloudIQ on a daily or weekly basis to check our systems? We can easily be notified by email any time a system health change occurs. These notifications can be set up for a configurable set of systems, allowing users only to receive notifications for those systems for which they are responsible.
For the more motivated user, CloudIQ supports Webhooks. With this feature, users can send a Webhook for any health change notification to integrate with third-party tools such as ServiceNow, Slack, or Teams. Webhooks are sent for both open and closed issues with a unique identifier. This allows users to correlate the resolved issue with the open issue to automatically close out any created incident. Some Webhook integration examples can be found here.
Conclusion
Whether it be for storage, networking, hyperconverged, servers, or data protection, the Proactive Health Score summarizes the health of a system into a single number, providing an immediate indication of the status of each system. Developed in tandem with experts from each product team, any issues identified for a system are accompanied by recommended remediation to help with self-service and quickly reduce risk. And with email notifications and Webhooks, users can be notified proactively any time an issue is identified.
Resources
How do you become more familiar with Dell Technologies and CloudIQ? The Dell Technologies Info Hub site provides expertise that helps to ensure customer success with Dell Technologies platforms. We have CloudIQ demos, white papers, and videos available at the Dell Technologies CloudIQ page. Also, feel free to reference the CloudIQ Overview Whitepaper which provides an in-depth summary of CloudIQ. Interested in DevOps? Go to our public API page for information about integrating CloudIQ with other IT tools using Webhooks or REST API.
Stay tuned for my next blog, where I'll talk about capacity forecasting and capacity anomaly detection in CloudIQ.
Author: Derek Barboza, Senior Principal Engineering Technologist
CloudIQ: Cloud-based Monitoring for your Dell Technologies IT Environment
Wed, 25 May 2022 19:49:28 -0000
|Read Time: 0 minutes
Introduction
CloudIQ is Dell’s cloud-based AIOps application for monitoring Dell core, edge, and cloud. Born out of the Dell Unity storage product group several years ago, CloudIQ has quickly grown to cover a broad range of Dell Technologies products. With the latest addition of PowerSwitch, CloudIQ now covers Dell’s entire infrastructure portfolio, including compute, networking, CI/HCI, data protection, and storage systems.
According to a survey conducted last year, IT organizations were able to resolve infrastructure issues two to ten times faster and save a day per week on average with CloudIQ.1
Supported Platforms
- Storage: PowerStore, PowerMax, PowerScale, PowerVault, Dell Unity XT, Dell Unity, SC Series, XtremIO, VMAX, and Isilon
- Converged & HyperConverged: VxBlock, VxRail, and PowerFlex
- Networking: PowerSwitch and Connectrix
- Data Protection: PowerProtect DD Series, PowerProtect DD Virtual Edition, and PowerProtect Data Manager
- APEX Data Storage Services
- VMware integration
Figure 1. CloudIQ Supported Platforms
Key Features
CloudIQ has a variety of innovative features based on machine learning and other algorithms that help you reduce risk, plan ahead, and improve productivity. These features include the proactive health score, performance impact and anomaly detection, workload contention identification, capacity forecasting and anomaly detection, cybersecurity monitoring, reclaimable storage identification, and VMware integration.
With custom reporting features, Webhooks, and a REST API, you can integrate data from CloudIQ into ticketing, collaboration, and automation tools and processes that you use in day-to-day IT operations.
Best of all, CloudIQ comes with your standard Dell ProSupport and ProSupport Plus contracts at no extra cost.
Keep an eye out for follow up blogs discussing CloudIQ’s key features in more detail!
Figure 2. CloudIQ Overview Page
Conclusion
With the addition of PowerSwitch support, CloudIQ now gives users the ability to monitor the full range of their Dell Technologies IT infrastructure from a single user interface. And the fact that it is a cloud offering hosted in a secure Dell IT environment means that it is accessible from virtually anywhere. Simply open a web browser, point to https://cloudiq.dell.com, and log in with your Dell support credentials. As a cloud-based application, it also means that you always have access to the latest features because CloudIQ’s agile development process allows for continuous and seamless updates without any effort from you. There is also a mobile app, so you can take it anywhere.
Resources
How do you become more familiar with Dell Technologies and CloudIQ? The Dell Technologies Info Hub site provides expertise that helps to ensure customer success with Dell Technologies platforms. We have CloudIQ demos, white papers, and videos available at the Dell Technologies CloudIQ page. Also, feel free to reference the CloudIQ Overview Whitepaper which provides an in-depth summary of CloudIQ.
[1] Based on a Dell Technologies survey of CloudIQ users conducted May-June 2021. Actual results may vary.
Author: Derek Barboza, Senior Principal Engineering Technologist