Short topics related to data storage essentials.
Mon, 19 Feb 2024 19:45:14 -0000
|Read Time: 0 minutes
In the last several years, there has been an increased desire for deeper visibility and insights into what is going on within customers’ data centers. Especially with wider adoption of AI/ML, demand for insight-driven outcomes has increased. Customers are looking to have a single pane of glass that has visibility into their infrastructure.
One of the major benefits I see for customers who invested with Dell across our broad portfolio is that CloudIQ truly becomes that single pane of glass. It enables customers to integrate toCloudIQ using WebHooks and REST API with external tools and create actionable processes. One example would be integration with ServiceNow. The other benefit is the breadth of the insights based on AI/ML algorithms and our capability to not only be descriptive in our recommendations, but also become more prescriptive.
I can go on and on describing the benefits of CloudIQ, but in this blog, I would like to focus on the CloudIQ Collector. Although customers are accustomed to using VMware vCenter to look up configuration and performance details specific to Virtual Machines and vVols, with AIOps-based tools like CloudIQ, the goal is to bring this information together in a single management pane of glass. Customers using Dell primary storage solutions can leverage the CloudIQ Collector to bring visibility at the VMware Virtual Machine level inside the CloudIQ portal. I can see this capability enabling customers to use CloudIQ for the following use cases:
The Dell CloudIQ Collector is a VMware Open Virtual Appliance (OVA) using Open Virtualization Format (OVF) and is installed as a virtual machine that collects data from VMware environments, Dell Connectrix switches, and Dell PowerSwitch devices. The Collector retrieves information from the target objects (vCenter or switches) and sends the collected data back to CloudIQ using a Secure Connect Gateway. For VMware, the Collector communicates to vCenter by using the VMware API and requires a user with read-only privileges. For Connectrix and PowerSwitch devices, the Collector communicates to the individual switches using REST API and uses a nonprivileged user. A single collector can be used for VMware, Connectrix, and PowerSwitch.
The theme again is to provide overall visibility across different pieces of infrastructure to our customers. The CloudIQ Collector Overview white paper does a nice job on how to implement the Collector, but here I will go more into the functionality and what data we present to our customers.
Once the CloudIQ Collector is installed and fully configured, VMware data will appear in CloudIQ within 24 hours and will be accessible within the following views in the CloudIQ portal.
Traditionally customers with Dell’s primary storage have had a certain level of visibility into their VMware environment. It typically was accomplished by linking VMware vCenter with our management tools for products like PowerMax, PowerStore, and Unity XT. For reasons of keeping this blog concise, I will focus on PowerStore, but as mentioned above, other Dell primary storage products have visibility into the VMware environment from their respective element managers.
The Dell PowerStore management UI is called PowerStore Manager. Integrating PowerStore Manager with VMware vCenter is straightforward. If integration is successful, you will see the status turn to green and show OK.
Figure 1. Registered vCenter in PowerStore Manager
This integration with vCenter will populate the Virtual Machine tab in PowerStore Manager.
Figure 2. Virtual Machines page in PowerStore Manager
As you can see, we support vVol, VMFS, and NFS based virtual machines. You can also expand the view by adding additional columns by clicking “Show/Hide Table Columns” on the right side of the screen.
The virtual machine names column allows users to click each virtual machine and see additional details.
Figure 3. Virtual machine details
The above image demonstrates a detailed view of a vVol virtual machine. You can navigate through multiple tabs that show additional and deeper details, such as performance and storage-related metrics, data protection policies applied, and so on.
The other integration point you can explore is the datastore a virtual machine resides in. This comes in handy when customers need to troubleshoot a specific issue, or simply map out the components. A PowerStore administrator can trace the virtual machine directly to either Storage Container, VMFS block LUN, or an NFS-based datastore, without leaving the virtual machines view of the PowerStore Manager.
Figure 4. Storage container details
In the above image, I selected a Storage Container that holds one of the vVols. Once again, you see a consistent view, with multiple tabs allowing you to easily navigate and look up additional details.
VMFS or NFS based virtual machines follow the same logic. We collect and present slightly fewer details than vVol based virtual machines, but this is where CloudIQ Collector supplements this view.
Figure 5. VMFS virtual machine performance chart
I have been guiding all my customers to embrace CloudIQ over the past several years. And although CloudIQ is provided to customers as a Software-as-a-Service application, the CloudIQ Collector is one of the elements that will need to be installed inside the customers’ data center to monitor VMware, Connectrix switches, and PowerSwitch devices.
Logging in to CloudIQ is based on the customers’ accounts registered with a Dell support contract. In addition to this, customers can leverage Role-Based Access Control (RBAC) implemented within the CloudIQ portal.
Once logged in, customers can explore the categories shown on the left side of the CloudIQ portal. The categories that we will be focusing on in this blog are under the ‘Monitor’ category.
Figure 6. Virtualization View in CloudIQ
The Virtualization view enables you to view and manage components such as the vCenter, data center, and clusters using the tree view and the table view. It also displays information about each VMware vCenter server in the system. For those customers who use Dell HCI solutions like VxRail, and Dell primary storage products, like PowerStore, or simply a VMware ESXi environment managed by a vCenter, this view will have a consolidated view of all these environments.
Across the top, customers can see a quick snapshot of the overall status of the environment.
Figure 7. Summary banner
The navigation panel on the left shows you all vCenters with their respective clusters and data centers. Customers can browse through the list and select a particular cluster. As the image below shows, you can start zooming in on each virtual machine listed under the VMs tab. The areas I highlighted below are hyperlinks and allow customers to get additional details for each virtual machine.
Figure 8. Virtual Machines tab
Clicking the Backup_VM1 virtual machine leads me to the VM details page.
Figure 9. Virtual machine details page
This is where it starts to get interesting. For example, customers can see our AI/ML algorithms in action in the form of anomaly detection. CloudIQ collects telemetry data and compares metrics against historical seasonality. We can identify issues, like increased latency, as we compare data against what we saw in the past for the same period.
Figure 10. Performance anomaly detection
Toward the bottom of the view, you can see a section called “Configuration Changes.” We display hourly aggregated configuration changes that have been made to this Virtual Machine and by charting them along the time access, you can potentially correlate a configuration change with a change in performance profile.
Figure 11. Configuration change tracking
The right side of this view is showing three tabs:
Figure 12. End to end map
End to End Map displays an interactive topology map showing the components including inventory and basic performance. Selecting the cluster, host, datastore, network, storage entity, or array displays more object details underneath the topology map.
Storage Paths provides information for the datastore storage paths including the associated host adapter worldwide name (WWN), fabric, and array adapter.
Figure 13. Storage paths
Configuration Changes displays configuration changes for the last 24 hours for the virtual machine.
Figure 14. Configuration changes
If you use other solutions from the Dell Technologies portfolio, such as PowerEdge servers for your VMware ESXi clusters, there is yet another option/view you can explore. You can navigate between the VM details page and the PowerEdge details page to quickly see related information.
Figure 15. PowerEdge system details page
To round off our discussion, customers also have reporting capabilities that can be leveraged.
Figure 16. Report browser
Customers can generate several types of reports:
If you would like to report on the inventory of Virtual Machines, a table would be sufficient.
Figure 17. Example of a custom table
When creating a table, there is a set of default columns preselected. You can choose to include additional columns from the available columns list or remove some of the preselected ones.
Figure 18. Customizing columns in a table
The second option is to generate a line chart which shows historical performance data. As I am demonstrating below, you can select ‘VMware’ as the product category and ‘Virtual Machine’ as the subcategory. This selection will show you all the virtual machines available in the inventory. Feel free to select one or more virtual machines and go to the next screen. Filtering capabilities are available to display and select specific VMs.
Figure 19. Configuring a line chart
The next screen is where you select the metrics you want to include in your report.
Figure 20. Metric selection
By default, the resulting report shows you data for the last 24 hours. Since CloudIQ keeps 2 years of historical data, you can define a larger window by clicking the drop-down menu.
Figure 21. Line chart example
As you can see above, you can correlate performance for virtual machines that might have dependencies, but you can also click either virtual machine on the right side and dim down the graph, so it doesn’t interfere or crowd the screen.
Once you are happy with the data on the screen, you can schedule the report and save it in a PDF format.
As you can see, there is a plethora of information available to customers across Dell management software. In CloudIQ, there are many other views that can show additional details about virtual machines and volumes, for example when browsing a server or a datastore. I encourage you to connect with a Dell representative and schedule a full demo of this product.
Important Links:
https://www.dell.com/en-us/dt/solutions/cloudiq.htm
https://infohub.delltechnologies.com/t/cloudiq-a-detailed-review/
https://infohub.delltechnologies.com/t/dell-cloudiq-collector-an-overview/
https://developer.dell.com/apis
Authors:
Michael Aharon, Advisory Solutions Consultant;
Derek Barboza, Senior Principal Engineering Technologist
Wed, 08 Nov 2023 16:32:28 -0000
|Read Time: 0 minutes
In my previous blogs, I have focused on a specific feature in CloudIQ. This blog talks about various CloudIQ features for Dell’s PowerEdge servers. Dell CloudIQ continues to expand its feature set for PowerEdge assets. CloudIQ integrates with Dell’s OpenManage Enterprise at each of your sites, to efficiently collect and aggregate telemetry data to give you a multisite, enterprise-wide view of all your PowerEdge servers and chassis. And with OpenManage Enterprise 4.0, onboarding your PowerEdge servers to CloudIQ is easier than ever!
Since the introduction of PowerEdge support in CloudIQ, health, inventory, and performance monitoring for PowerEdge servers have all been available. CloudIQ provides an overall health score for each PowerEdge server and recommended remediation when an issue is identified. Inventory reporting provides numerous properties about each server, including contract status, component firmware versions, licensing information, and hardware listings to name a few. CloudIQ displays key performance metrics and not only shows historical trends but identifies performance anomalies and provides performance forecasting. This information allows you to see unexpected performance patterns, and plan future resource needs based on trending workloads.
Figure 1. Example of a performance forecasting chart for PowerEdge
Cybersecurity is a feature in CloudIQ that allows you to compare your existing security configuration settings to a predefined set of desired security configuration settings. The configuration is continuously monitored, notifying you when a configuration does not meet its desired setting. Cybersecurity monitors up to 31 server configuration settings and 18 chassis configuration settings tied to NIST security standards. Without automated continuous checking, it's impractical to manually check all settings on all servers every day. Lab tests show that it takes six minutes on average to manually check just 15 settings on a single server.
Users can also see a list of applicable Dell Security Advisories (DSAs) for their PowerEdge systems. By intelligently matching attributes like models and code versions, users can quickly see which DSAs are applicable to their systems, allowing them to take immediate action to remediate these security vulnerabilities.
Figure 2. The Security Assessment page for a PowerEdge chassis
You can now initiate BIOS and firmware updates for PowerEdge servers and chassis from CloudIQ. Users with a Server Admin role in CloudIQ can initiate these upgrades across multiple systems with just a few clicks. This feature simplifies the process of keeping your fleet of servers consistent and secure.
Figure 3. Multisystem update for PowerEdge servers and chassis
The integration of PowerEdge into the Virtualization View consolidates and simplifies resource information about PowerEdge servers running ESXi. Available details include the OS version, model, resource consumption per virtual machine, and health issues with recommendations for remediation. A hyperlink lets you quickly navigate to the system details page for the PowerEdge server for more troubleshooting. Another hyperlink directs you to vCenter to perform virtualized resource administration.
Figure 4. PowerEdge support in the Virtualization View
CloudIQ has introduced carbon footprint analysis support for PowerEdge servers and chassis. CloudIQ takes power and energy metrics and calculates carbon emissions based on international standards and conversion factors for location. CloudIQ Administrators can override and customize these values with their own unique location emission factors.
Figure 5. Energy, power, and carbon emissions for a PowerEdge server
You can generate custom reports using both tables and charts for PowerEdge servers:
You can also take advantage of custom tags in your reports. For example, you can create a list of PowerEdge servers in a certain business unit with their BIOS and firmware versions, contract expiration dates, average power consumption, and service tags. And with Webhooks and REST API access, you can integrate data and events from CloudIQ with ServiceNow, Slack, and other IT tools to help you monitor your entire Dell IT infrastructure.
Figure 6. Custom reporting table for PowerEdge with custom tags
As IT resources become more remote and isolated, it has become increasingly time consuming to maintain, manage, and secure resources in the data center and at the edge. CloudIQ simplifies monitoring and management by providing a single portal to view all your PowerEdge servers across your entire environment. With cybersecurity monitoring of PowerEdge servers and chassis, you can quickly see where security configuration settings may be incorrectly set or accidentally changed, opening those systems to cyberattacks, and receive instructions to remediate. With the new maintenance and management features, CloudIQ simplifies the process of keeping your entire fleet at consistent, secure, and desired BIOS and firmware versions. The carbon footprint page in CloudIQ helps you meet sustainability goals. And with Webhook and REST API support, CloudIQ can be integrated with other IT tools to help you monitor not only your PowerEdge servers, but your entire Dell IT portfolio.
This Knowledge Base Article discusses how to onboard PowerEdge devices to CloudIQ.
For a quick demo about CloudIQ PowerEdge support, see the CloudIQ videos section on the Info Hub.
Direct from Development Tech Note: Dell CloudIQ Cybersecurity for PowerEdge: The Benefits of Automation
See other informative blogs: Overview of CloudIQ, Proactive Health Scores, Capacity Monitoring and Planning, Cybersecurity, and Custom Reports and Tags.
How do you become more familiar with Dell Technologies and CloudIQ? The Dell Technologies Info Hub site provides expertise that helps to ensure customer success with Dell Technologies platforms. We have CloudIQ demos, white papers, and videos available at the Dell Technologies CloudIQ page. Also, feel free to reference the white paper CloudIQ: A Detailed Overview which provides an in-depth summary of CloudIQ.
Author: Derek Barboza, Senior Principal Engineering Technologist
Wed, 04 Oct 2023 16:03:26 -0000
|Read Time: 0 minutes
In this blog post, we’ll cover a topic that is top of mind for all organizations, small and large--Energy Efficiency. I’ll also highlight how Dell Technologies helps customers increase energy efficiency using our vast portfolio. First, let’s define what Energy Efficiency is.
“Simply put, energy efficiency means using less energy to get the same job done
– and in the process, cutting energy bills and reducing pollution.”
Reference: Energy Efficiency | ENERGY STAR
As organizations undergo digital transformation and modernization, there is a massive explosion in the amount of data that needs to be stored. This data expansion is driven by technologies like Cloud Computing, Artificial Intelligence, and streaming services, just to name a few. This in turn impacts how much power organizations are now consuming in their data centers, which forces IT vendors to make their solutions more efficient and reduce emission and carbon footprint.
Dell Technologies has been helping customers harness the power of technology to drive human progress for several decades. Our latest Environmental, Social and Governance report focuses on the investments Dell has made to support these initiatives.
If you’re interested in delving deeper, check out Dell's FY23 Environmental, Social and Governance.
Energy concerns were of paramount importance for our customers in 2022, not only in response to rising energy costs but also as they worked toward reducing emissions. As a leader in sustainable technology, Dell partnered with customers to make the transition to more energy efficient data centers with advanced cooling and thermals, power management tools, and as-a-Service (aaS) solutions to “right size” data storage. With the cost of energy commodities expected to be on average 46% higher in 2023, we will continue to set the standard on data center infrastructure solutions to drive efficient operational and environmental outcomes for our customers.
Dell reinvests over $4B in R&D on an annual basis, continuing to lead the market with our innovation in storage and data reduction efficiencies to save energy and reduce our carbon and hardware footprint.
Dell’s commitment to reducing carbon footprint is exemplified by the introduction of innovative ideas to optimize our portfolio. Recognized as one of the winners of Fast Company’s 2023 World Changing Ideas Awards, Dell’s Concept Luna was designed to showcase how the future of electronic devices can be one where they’re repaired instead of thrown out. Feel free review the full article, How Dell is infusing sustainability across its businesses, to learn more
Based on what we covered so far, we truly believe that informing our customers of critical data points that contribute to overall awareness of power, energy consumption, and carbon footprint is essential.
Several years ago, Dell Technologies developed a product called CloudIQ, the cloud-based AIOps proactive monitoring and predictive analytics application for Dell systems. CloudIQ leverages machine learning and other algorithms, notifications, and recommendations to help customers optimize compute, storage, data protection, and network health, performance, and capacity. CloudIQ supports a broad range of Dell Technologies products, including:
Over 90% of our customers actively use CloudIQ as their centralized dashboard to inform them proactively about KPIs across their Dell Technologies estate.
Introducing Carbon Footprint, an additional capability within CloudIQ designed to provide insights for power, energy consumption, and carbon footprint forecasting across all systems. At the time of the initial release, we are supporting the following products from our portfolio:
and focusing on the following KPIs:
Later in 2023, we will also add support for PowerSwitch.
Having Carbon Footprint enabled and KPIs exposed within CloudIQ is beneficial to internal stakeholders within an organization and allows you to make confident decisions when optimizing your environment.
Based on the Software-as-a-Service (SaaS) model and agile development methodology employed by CloudIQ, you’ll benefit from having access to new features as soon as they become available.
Most Dell Technologies products supported by CloudIQ leverage our call home functionality called SupportAssist / Secure Connect Gateway. Depending on the product, you will need to enable the CloudIQ feature, after which the CloudIQ dashboard will populate with data.
For the full overview of the CloudIQ product, please see the detailed review whitepaper here.
To access the Carbon Footprint feature in the CloudIQ dashboard, select Monitor > Carbon Footprint on the left-hand side of the CloudIQ console, as shown in the following figure.
On this screen, CloudIQ users with the CloudIQ Admin role will be able to adjust and personalize their geographical location metrics, such as CO2E and PUE, as illustrated in the following figure. The location labels reflect the specific locations where the physical assets are installed.
Side note: What do these metrics mean?
The Total Carbon Emissions CO2e section can be displayed using either a Bar Chart or a Line Chart. Simply select the gear wheel on the right-hand side and pick your preferred view.
The Total Carbon Emissions CO2e chart can increase or decrease based on how the system’s energy / emission factor / PUE changes over time. If new systems are added, the total will increase. Similarly, the total can decrease if power is capped (as is available for PowerEdge), workloads are reconciled, and/or some systems are shut down.
For larger environments with multiple assets, applying filters is a breeze. The following example shows the system filtered based on Unity arrays only.
This table displays several columns that represent the asset itself, its location, site name, etc. In addition, we show the following data points:
The entire table with all assets or a subset thereof can be exported into a CSV file.
To see more details for each of the assets and how they perform in comparison to historical data, select the details icon next to the asset itself. As displayed in the following figure, the two graphs will display data points over the last seven (7) days and forecasted data points for the next thirty (30) days. By toggling the radio button, you can switch from one view to another. The grey area shows a range based on historical data collected for the previous seven (7) days, and the blue line is charted based on the last seven (7) days. If the blue line is within the boundaries of the grey area, this means there have been no unforeseen changes in Energy Consumption or the CO2e in the past seven (7) days.
What we’ve shown here is just the beginning. We will continue improving and enhancing CloudIQ capabilities to ensure you enjoy the most relevant and accurate KPIs and can act upon them expeditiously.
Dell has several other tools and assessments that you can benefit from. For example, we can run a tool called Live Optics that collects configuration and performance data from your Dell and third-party products and produces a report that can be analyzed to propose options like optimization, consolidation, or a refresh with a new technology. The other option is to request a #GetEfficient report, which will be directly focused on reduction in physical footprint and power consumption.
Author: Michael Aharon & Derek Barboza
Tue, 13 Jun 2023 16:29:55 -0000
|Read Time: 0 minutes
Every organization must report on their IT infrastructure. Whether it be to provide an inventory of assets or determine resource utilization, CloudIQ custom reporting with custom tags helps automate this task, saving time and delivering these reports right to your inbox.
Custom tags are customer-specific metadata that you can enter into CloudIQ to identify resources with customer information, such as application name, service level, business unit, department, and so on. You can enter custom tags against the system or against components of a system. Examples of component tags include hosts, PowerMax storage groups, volumes, file systems, storage pools, and virtual machines. We can quickly see the benefit of applying an application name to a storage group, or a business unit to a virtual machine. By doing so, we can generate application-level reports or asset reports by department.
Figure 1. Custom tags in the Storage Inventory View
Custom reports in CloudIQ can contain tables, charts, or a combination of both. Charts can be either common line charts or anomaly charts. Anomaly charts allow users to see unexpected activities in performance by charting the metric along with the expected range of the metric – which has been determined by CloudIQ’s machine learning algorithms.
Tables are available to provide lists of assets, code versions, contract information, capacity metrics, and average performance metrics. You can also take advantage of custom tags to either be included in the report or to be used as a filter to capture only those assets that meet your business needs, based on the values of those custom tags. For example, you can create a list of PowerEdge servers in a certain business unit with their BIOS and firmware versions, contract expiration dates, average power consumption, and service tags.
Figure 2. Table showing a business unit’s custom tag
Perhaps you want to keep an eye on the performance profile of a critical storage system, tracking system bandwidth and IOPS looking for any unusual activity. With just a few clicks you can create the report to chart the metrics, along with the expected lower and upper bounds. A few additional clicks and you can schedule this report to be delivered to yourself or anyone else at the interval you choose. You can give this report a quick look to identify if there are any unusual spikes that could be from an unexpected workload or even from some type of malicious attack.
Figure 3. Examples of performance anomaly charts
An IT infrastructure monitoring tool must be flexible and have automated ways to extract and report on assets, capacity, and performance in a meaningful way for your organization. By applying customer-specific metadata in the form of custom tags to assets in CloudIQ, you have the power to generate and automate the delivery of insightful and information rich custom reports to IT infrastructure stakeholders. Extracting the powerful information and machine learning data from CloudIQ allows you to efficiently maintain existing infrastructure and plan for future resource needs.
For a quick demo on custom reports and other CloudIQ features, see the CloudIQ videos section on the Info Hub.
For other informative blogs, see: Overview of CloudIQ, Proactive Health Scores, Capacity Monitoring and Planning, and Cybersecurity.
How do you become more familiar with Dell Technologies and CloudIQ? The Dell Technologies Info Hub site provides expertise that helps to ensure customer success with Dell Technologies platforms. We also have CloudIQ demos, white papers, and videos available at the Dell Technologies CloudIQ page. And feel free to reference the CloudIQ Overview White Paper which provides in-depth summary of CloudIQ.
Author: Derek Barboza, Senior Principal Engineering Technologist
Wed, 01 Mar 2023 17:16:08 -0000
|Read Time: 0 minutes
Dell Storage Resource Manager (SRM) provides comprehensive monitoring, reporting, and analysis for heterogeneous block, file, object, and virtualized storage environments. It enables you to visualize applications to storage dependencies, monitor, and analyze configurations and capacity growth. It has visibility into the environment’s physical and virtual relationships to ensure consistent service levels.
To enable storage administrators to monitor their physical and virtual compute environment, Dell provides SRM solution packs. These solution packs include SolutionPack for Physical Hosts, Microsoft Hyper-V, IBM LPAR, Brocade FC Switch and Cisco MDS/Nexus with passive host discovery options, VMware vSphere & vSAN, and Dell VxRail.
With the new SolutionPack for iDRAC PowerEdge, we can monitor the status of server hardware components such as power supplies, temperature probes, cooling fans, and battery. We can also gather historical information about electrical energy usage and other key performance indicators that measure the proper functioning of a server device.
To illustrate SRM’s cross-domain functionality, we examine the most common use case, where Dell PowerEdge physical servers are deployed as part of VMware hypervisor clusters.
SolutionPack for VMware vSphere & vSAN provides capacity, performance, and relationship data for all VMware discovered components, such as VMs, hypervisors, clusters, and datastores, as well as their relationship with fabric and backend storage arrays. Here is one example of the end-to-end topology of the virtualized environment:
Figure 1. Example of end-to-end topology of a virtualized environment
To gain physical access to the PowerEdge servers and their hardware components, we rely on integrated Dell Remote Access Controller (iDRAC), which is a baseboard management controller that is integrated in PowerEdge servers.
iDRAC exposes hardware components’ data through several APIs, one of them being SNMP. With SRM SNMP collector, which is part of the SolutionPack for iDRAC PowerEdge, we discover iDRACs from which we pull PowerEdge server data. This data includes electrical energy usage (Wh), probes temperature (C), power supply output (W), and cooling devices speed (RPM). It also includes status of power supplies, battery, cooling devices, temperature probes, and server availability. SRM provides historical reports for all the metrics, with a maximum 7-year data retention for weekly aggregates.
With the data available from the iDRAC PowerEdge, VMware vSphere & vSAN, and relevant fabric and storage array solution packs, users can seamlessly navigate from the context of physical server hardware component reports to the context of the physical server reports within the broader SAN environment.
Let’s examine the component status data, performance data, and alerts provided by the SolutionPack for iDRAC PowerEdge.
The Summary page Card View and Table View for PowerEdge servers show hardware components status (temperature probes, cooling devices, battery, power supply), server availability, daily electrical energy usage (kWh), energy cost ($), and daily carbon emission (kgCO2e). Energy cost and carbon footprint metrics are calculated based on server location. In the following example, we see significant difference in daily carbon emission between Poland and Germany, even though there is small difference in daily energy usage. The same applies to energy cost prices.
Figure 2. Card view of hardware component status
Figure 3. Table view of hardware component status (first 10 columns)
Figure 4. Table view of hardware component status (final columns—continuation of preceding figure)
Energy cost and carbon emissions per country are calculated dynamically based on data enrichment enabled on SRM collectors. Metrics collected from each iDRAC are automatically tagged with location, carbon intensity, and energy cost properties. Here is an example of data enrichment configuration from the SRM admin UI:
Figure 5. SRM admin UI showing data enrichment configuration
CSV files that contain values for energy cost and carbon intensity per country are available publicly and can be transferred automatically through FTP to SRM collectors as part of the data enrichment process. Here is a CSV file excerpt that contains kWh cost ($) per country:
Figure 6. Excerpt of kwh-cost-per-country CSV file
And here is a CSV file excerpt that contains carbon intensity per kWh per country: Figure 7. Excerpt of carbon-intensity-by-country CSV file
The CSV file for data enrichment with device,location mapping is specific to every customer.
From the initial Card View or Table View, you can drill down to the PowerEdge server end-to-end topology map. This is a host-based landing page where you can see the server’s relationship with the rest of the SAN components, as well as server attributes, performance, capacity, alerts, and inventory data. This is an example of SRM’s powerful capability to aggregate and visualize data from multiple contexts within the same reporting UI.
Figure 8. End-to-end topology map
The iDRAC PowerEdge Inventory report shows servers’ hardware component names, quantities, server hostname, serial number, operating system version, model, and IP address:
Figure 9. Inventory report (first six columns)
Figure 10. Inventory report (final columns—continuation of preceding figure)
Drilling down from the preceding table leads to the daily status dashboard of a selected server’s hardware components. Here are a few examples:
Figure 11. Status of cooling devices
Figure 12. Power supply output watts
Figure 13. Energy usage (Wh)
The iDRAC PowerEdge Performance report shows key metric values for servers’ hardware components, such as probes temperature (C), temperature lower and upper thresholds, cooling device (RPM), and cooling device critical and non-critical thresholds. Each selected row plots interactively historical performance data on the charts below the table, including server electrical energy usage (Wh), probes temperature (C), and cooling devices (RPM).
Figure 14. Trend chart—Electrical energy usage (Wh)
Figure 15. Trend chart—Probes temperature (C) values plotted alongside threshold values
The following trend chart shows cooling device (RPM) values plotted on the same chart with the active alerts relevant to the cooling device. The alert is displayed as a black dot with pop-up details of the issue that caused the alert. This feature greatly improves troubleshooting and is another example of SRM’s powerful capability to aggregate and visualize data from multiple contexts within the same reporting UI.
Figure 16. Trend chart—Cooling device (RPM) values plotted on the same chart with the active alerts relevant to the cooling device
The following bar charts show Carbon Emission, Energy Cost ($), Cooling (RPM), Energy Usage (kWh), and Temperature (C) per location during the last month. You can drill down on each bar chart to see reports for each location to analyze the top 10 contributing items per device type (hypervisor, host) and per server.
Figure 17. Carbon Emission and Energy Cost bar charts
Figure 18. Energy Usage and Temperature bar charts
The iDRAC PowerEdge Operations report shows currently active alerts received from iDRAC as SNMP traps. The solution ack contains 80 certified alert definitions that cover iDRAC System Health and Storage category alerts, including AmperageProbe, Battery, Cable, CMC, Fan, FC, LinkStatus, MemoryDevice, Network, OS, PhysicalDisk, PowerSupply, PowerUsage, TemperatureProbe, TemperatureStatistics, VoltageProbe, LiquidCoolingLeak, and others.
You can enable any or all alerts on each iDRAC under Configuration > System Settings > Alert Configuration > Alerts. You can configure SNMP trap receivers under Configuration > System Settings > Alert Configuration > SNMP Traps. In this case, the SNMP trap receiver is the SRM collector server.
Figure 19. Active alerts on iDRAC PowerEdge Operations report
By right-clicking an alert row, you can acknowledge, assign, close, take ownership of, or assign a ticket ID to the alert.
Figure 20. Acting on an alert
By clicking on an alert row, you can see a detailed report about the alert. Also, the SRM alerting module includes functionality to forward selected alerts to external applications, such as ServiceNow ITSM through a Webhook API or fault management applications through an SNMP trap or email.
You can navigate directly from the alerts report to the affected server’s landing page by clicking the device name link in the Device column of the All Alerts report. SRM relates alert-specific data with the time-series data originated from the same device and seamlessly navigates through corresponding reports. The following figure shows an affected server’s summary report with the topology and underlying Operations section showing the server’s active alerts.
Figure 21. Server summary report with topology and active alerts
SRM’s powerful framework allows storage administrators to easily integrate environmental data for PowerEdge physical servers into the existing end-to-end SAN inventory, performance, capacity, and alert reports. SRM reduces the time that is required to identify the cause of issues occurring in the data center.
With the new SolutionPack for iDRAC PowerEdge, administrators can monitor PowerEdge hardware components and obtain historical information about energy usage and other key performance indicators.
The iDRAC PowerEdge Solution Pack supports:
Author: Dejan Stojanovic
Mon, 20 Feb 2023 21:08:34 -0000
|Read Time: 0 minutes
This is the fourth in a series of blogs discussing CloudIQ. Previous blogs provide an overview of CloudIQ and discuss proactive health scores and capacity monitoring and planning. This blog discusses the cybersecurity feature in CloudIQ. Cyber-attacks have become a significant issue for all companies across all industries. The immediate economic consequences, combined with the longer-term impact of the loss of organizational reputation, can have both immediate and lasting effects.
Misconfigurations of infrastructure systems can open your organization to cyber intrusion and is a leading threat to data security. The CloudIQ cybersecurity feature proactively monitors infrastructure security configurations for Dell PowerStore and PowerMax storage systems and PowerEdge servers, and notifies users of security risks. A risk level is assigned to each system, placing the system into one of four categories, depending on the number and severity of the issues: Normal, Low, Medium, or High.
Figure 1. Cybersecurity system risk levels
When a security risk is found, remediation instructions are provided to help you address the issue as quickly as possible.
Figure 2. Cybersecurity details with remediation
CloudIQ evaluates outgoing Dell Security Advisories (DSAs) and intelligently notifies users when those advisories are applicable to their specific Dell system models with specific system software and firmware versions. This eliminates the need for users to investigate if a Security Advisory applies to their systems and allows them to immediately focus on remediation.
Figure 3. Dell Security Advisory listing
By using CloudIQ Cybersecurity policy templates, users can quickly set up security configuration evaluation tests and assign them to large numbers of systems with just a few clicks. Once assigned, the test plan is evaluated against each associated system, and the system administrator is notified in minutes of any unwanted configuration settings.
Testing has shown that it takes less than 3 minutes to set policies and automate security configuration checking for 1 to 1,000 systems. That’s a dramatic time savings versus the 6 minutes that it would take to manually check each individual system’s security configuration.1
Figure 4. Evaluation plan templates
Cybersecurity has clearly become a challenge and priority for companies of all sizes. With the large and growing number of systems distributed across core and edge locations, it is impractical for any IT organization to manually check those systems for misconfigurations. Dell CloudIQ eliminates manual checking by automating it and recommending how to quickly mitigate misconfiguration risks that can lead to unwanted intrusions threatening data security. With the intelligent evaluation of Dell Security Advisories, CloudIQ identifies applicable DSAs, further saving time and expediting remediation.
For additional cybersecurity related information, see the following documents:
How do you become more familiar with Dell Technologies and CloudIQ? The Dell Technologies Info Hub provides expertise that helps to ensure customer success with Dell platforms. We have CloudIQ demos, white papers, and videos available at the Dell Technologies CloudIQ page. Also, you can refer to the CloudIQ: A Detailed Review white paper, which provides in-depth summary of CloudIQ.
Author: Derek Barboza, Senior Principal Engineering Technologist
1Dell CloudIQ Cybersecurity for PowerEdge: The Benefits of Automation
Thu, 12 Jan 2023 19:27:23 -0000
|Read Time: 0 minutes
Made available on December 20th, 2022, the 1.5 release of our flagship cloud-native storage management products, Dell CSI Drivers and Dell Container Storage Modules (CSM), is here!
See the official changelog in the CHANGELOG directory of the CSM repository.
First, this release extends support for Red Hat OpenShift 4.11 and Kubernetes 1.25 to every CSI Driver and Container Storage Module.
Featured in the previous CSM release (1.4), avid customers may recall a few new additions to the portfolio made available in tech preview. Primarily:
Building on these three new modules, Dell Technologies is adding deeper capabilities and major improvements as part of today’s 1.5 release for CSM, including:
For the platform updates included in today’s 1.5 release, the major new features are:
This feature is named “Auto RDM over FC” in the CSI/CSM documentation.
The concept is that the CSI driver will connect to both Unisphere and vSphere API to create the respective objects.
When deployed with “Auto-RDM” the driver can only function in that mode. It is not possible to combine iSCSI and FC access within the same driver installation.
The same limitation applies for RDM usage. You can learn more about it at RDM Considerations and Limitations on the VMware website.
That’s all for CSM 1.5! Feel free to share feedback or send questions to the Dell team on Slack: https://dell-csm.slack.com.
Author: Florian Coulombel
Fri, 23 Dec 2022 21:50:39 -0000
|Read Time: 0 minutes
Velero is one of the most popular tools for backup and restore of Kubernetes resources.
You can use Velero for different backup options to protect your Kubernetes cluster. The three modes are:
In all cases, Velero syncs the information (YAML and restic data) to a storage object.
PowerScale is Dell Technologies’ leading scale-out NAS solution. It supports many different access protocols including NFS, SMB, HTTP, FTP, HDFS, and, in the case that interests us, S3!
Note: PowerScale is not 100% compatible with the AWS S3 protocol (for details, see the PowerScale OneFS S3 API Guide).
For a simple backup solution of a few terabytes of Kubernetes data, PowerScale and Velero are a perfect duo.
To deploy this solution, you need to configure PowerScale and then install and configure Velero.
Prepare PowerScale to be a target for the backup as follows:
You can check that in the UI under Protocols > Object Storage (S3) > Global Settings or in the CLI.
In the UI:
In the CLI:
PS1-1% isi s3 settings global view HTTP Port: 9020 HTTPS Port: 9021 HTTPS only: No S3 Service Enabled: Yes
2. Create a bucket with the permission to write objects (at a minimum).
That action can also be done from the UI or CLI.
In the UI:
In the CLI:
See isi S3 buckets create in the PowerScale OneFS CLI Command Reference.
3. Create a key for the user that will be used to upload the objects.
Important notes:
Now that PowerScale is ready, we can proceed with the Velero deployment.
We assume that the Velero binary is installed and has access to the Kubernetes cluster. If not, see the Velero installation document for the deployment instructions.
Configure Velero:
$ cat ~/credentials-velero [default] aws_access_key_id = 1_admin_accid aws_secret_access_key = 0**************************i …
$ velero install \ --provider aws \ --plugins velero/velero-plugin-for-aws:v1.5.1 \ --bucket velero-backup \ --secret-file ./credentials-velero \ --use-volume-snapshots=false \ --cacert ./ps2-cacert.pem \ --backup-location-config region=powerscale,s3ForcePathStyle="true",s3Url=https://192.168.1.21:9021 …
The preceding command shows how to use Velero most simplistically and securely.
It is possible to add parameters to enable protection with snapshots. Every Dell CSI driver has snapshot support. To take advantage of that support, we use the install command with this addition:
velero install \ --features=EnableCSI \ --plugins=velero/velero-plugin-for-aws:v1.5.1,velero/velero-plugin-for-csi:v0.3.0 \ --use-volume-snapshots=true ...
Now that CSI snaps are enabled, we can enable restic to move data out of those snapshots into our backup target by adding:
--use-restic
As you can see, we are using the velero/velero-plugin-for-aws:v1.5.1 image, which is the latest available at the time of the publication of this article. You can obtain the current version from GitHub: https://github.com/vmware-tanzu/velero-plugin-for-aws
After the Velero installation is done, check that everything is correct:
kubectl logs -n velero deployment/velero
If you have an error with the certificates, you should see it quickly.
You can now back up and restore your Kubernetes resources with the usual Velero commands. For example, to protect the entire Kubernetes except kube-system, including the data with PV snapshots:
velero backup create backup-all --exclude-namespaces kube-system
You can check the actual content directly from PowerScale file system explorer:
Here is a demo:
Conclusion
For easy protection of small Kubernetes clusters, Velero combined with PowerScale S3 is a great solution. If you are looking for broader features (for a greater amount of data or more destinations that go beyond Kubernetes), look to Dell PowerProtect Data Manager, a next-generation, comprehensive data protection solution.
Interestingly, Dell PowerProtect Data Manager uses the Velero plug-in to protect Kubernetes resources!
Fri, 09 Dec 2022 15:37:42 -0000
|Read Time: 0 minutes
This is the third in a series of blogs discussing CloudIQ. In my first blog, I provided a high-level overview of CloudIQ and some of its key features. My second blog talked about the CloudIQ Proactive Health Score. I will continue the series with a discussion of the capacity monitoring and planning features in CloudIQ.
Capacity monitoring helps you plan for expansions of storage arrays, data protection appliances, storage-as-a-service, and hyperconverged infrastructure (HCI) to help overcome unexpected spikes in storage consumption. CloudIQ uses advanced analytics to provide short-term capacity prediction analysis, longer-term capacity forecasting, and capacity anomaly detection. Capacity anomaly detection is the identification of a sudden surge in utilization that may result in a space full condition in less than 24 hours.
The CloudIQ Home page displays the Capacity Approaching Full tile which identifies storage entities that are full or expected to be full in each of the following time ranges:
Figure 1. The Capacity Approaching Full tile
In situations where there is a storage entity in the Imminent category, CloudIQ identifies the components of the entity that are experiencing the sudden increase in utilization. This gives users the necessary information about where to look to correct the offending behavior. In the following example, CloudIQ has identified a storage pool that is expected to run out of space in five hours. The pool details page identifies the file systems and LUNs that are the top contributors to the expected rise in utilization.
Figure 2. Capacity Forecast for a pool that has a capacity anomaly
Two other CloudIQ features help you quickly find a solution for storage that is fast approaching full. First, there is the identification of reclaimable storage that shows you where you can recover unused capacity in a system. Second, there is the multisystem capacity view that lets you scan all your storage systems to pinpoint which have excess capacity to relieve approaching-full systems of their workloads.
CloudIQ identifies different types of storage that are potentially reclaimable. The following criteria are used to identify reclaimable storage:
Users can quickly see the storage objects, where the object resides, and the amount of reclaimable space. The Last IO Time is provided for block and file objects that have no detected IO activity in the last week. For VMs that have been shut down for at least a week, the storage object on which the VM resides along with the vCenter and time that the VM was shut down is available. The following figure shows an example of reclaimable storage for block objects that have had no front-end IO activity in the past week.
Figure 3. The Reclaimable Storage page – Block Objects with no front end IO activity
The multisystem capacity view provides a quick view of physical usable, used, free, and storage efficiencies across all storage, HCI, and data protection systems monitored by CloudIQ. This allows users to see quickly which systems are low on usable space, determine which systems are good targets for workload migration, and verify that their storage efficiencies and data reduction numbers are what they are expecting.
Figure 4. Multisystem capacity view for storage
Detailed capacity views for storage systems and storage objects provide additional information, including data efficiencies and data reduction metrics. The following figure shows the physical and logical storage breakdown and data reduction charts for a PowerStore cluster.
Figure 5. PowerStore cluster storage details
For APEX block storage service subscriptions, CloudIQ provides both subscribed and physical storage views. Subscribed views provide the storage usage including base and on-demand storage usage.
Figure 6. APEX block storage services subscription view
With custom reports and the use of custom tags, users can create meaningful business reports and schedule those reports to be delivered to the required end users. Reports can include both line charts and tables and can be filtered on any field. The following figure shows a simple table that includes used and free capacities, data reduction values, and several custom tags.
Figure 7. Custom report for storage
CloudIQ’s intelligence and predictive analytics helps users proactively manage and accurately plan data storage and workload expansions, and to act quickly to avoid rapidly approaching capacity full conditions. Custom reports and tagging allows users to create, schedule, and deliver reports with technical and business information tailored to a wide variety of stakeholders. And for users looking to integrate data from CloudIQ with existing IT management tools, CloudIQ provides a public REST API.
How do you become more familiar with Dell Technologies and CloudIQ? The Dell Technologies Info Hub site provides expertise that helps to ensure customer success with Dell Technologies platforms. We have CloudIQ demos, white papers, and videos available at the Dell Technologies CloudIQ page. Also, feel free to reference the CloudIQ Overview Whitepaper which provides an in-depth summary of CloudIQ. Interested in DevOps? Go to our public API page for information about integrating CloudIQ with other IT tools using Webhooks or REST API.
Author: Derek Barboza, Senior Principal Engineering Technologist
Tue, 22 Nov 2022 17:27:08 -0000
|Read Time: 0 minutes
This tutorial blog demonstrates how to use CloudIQ Webhooks to integrate CloudIQ health notifications with BigPanda (https://www.bigpanda.io/), an event management processing tool. This allows users to integrate CloudIQ notifications with events from other IT tools into BigPanda. We will show how to create a REST API Integration in BigPanda and provide an example of intermediate code that uses Google Cloud functions to process Webhooks.
BigPanda offers a solution that has a modern twist on event management process. The main product consists of a fully customizable cloud-hosted event management console for event integration, reporting, correlation, and enrichment.
A CloudIQ Webhook is a notification that is sent when a health issue changes. CloudIQ sends the Webhook notification when a new or resolved health issue is identified in CloudIQ. A Webhook is an HTTP post composed of a header and JSON payload that is sent to a user configured destination. Webhooks are available under the Admin > Integrations menu in the CloudIQ UI. Users must have the CloudIQ DevOps role to access the Integrations menu.
A Webhook consists of data in the header and the payload. The header includes control information; the payload is a JSON data structure that includes useful details about the notification and the health issue. Examples of the header and payload JSON files can be found here.
In CloudIQ, we enable Webhook integration by configuring a name, destination, and the secret to sign the payload.
In BigPanda, we have a couple of possibilities for third-party integration:
In our example, we use the REST API. Note that some of the requirements of the Open Integration Hub (alert severity, configurable application key, and so on) are not configurable today in CloudIQ Webhooks.
The main challenge when integrating CloudIQ health events with BigPanda alerts is implementing a mapping function to translate CloudIQ fields to BigPanda fields.
To do this, we will use a serverless function to:
In this integration, the serverless function is a Google Cloud Function. Any other serverless framework can work.
The first step is to create an application for integration in BigPanda. Do the following:
1. Log into the BigPanda console.
2. Click the Integrations button at the top of the console.
3. Click the blue New Integration button.
4. Select Alerts Rest API (the first card).
5. Set an integration name, then click Generate App Key.
6. Save the generated app key and bearer token.
If you forgot to save the “application key” or “token”, you can obtain them later by selecting `Review Instructions`.
Note that the “application key” and “token” will be needed later to configure the trigger to post data to that endpoint.
This step is very similar to what has been presented in the CloudIQ to Slack tutorial. The only changes are that we are using a golang runtime and we store the authentication token in a secret instead of in a plain text environment variable.
2. Provide a name (BP_TOKEN in this example).
3. Paste the Authorization token from the HTTP headers section of the BigPanda integration into the ‘Secret value’ field.
4. Select Create Function and provide a function name (ciq-bigpanda-integration in this example).
5. Under the Trigger section, keep a trigger type of HTTP and select Allow unauthenticated invocations.
6. Take note of the Trigger URL because it will be used as the Payload URL when configuring the Webhook in CloudIQ.
7. Select SAVE.
8. Expand the RUNTIME, BUILD AND CONNECTIONS SETTINGS section.
9. Under the RUNTIME tab, click the + ADD VARIABLE button to create the following variable:
BP_APP_KEY. The value is set to the application key obtained after creating the BigPanda integration.
10. Select the SECURITY AND IMAGE REPO tab.
11. Select REFERENCE A SECRET.
12. Select the BP_TOKEN secret from the pulldown.
13. Select Exposed as environment variable from the Reference Method pulldown.
14. Enter BP_TOKEN as the environment variable name.
15. Select DONE, then click Next.
16. Select Go 1.16 from the Runtime pulldown.
17. Change the Entry point to CiqEventToBigPandaAlert.
18. Replace the code for function.go with the example function.go code.
19. Replace the go.mod with the example go.mod code.
20. Select DEPLOY.
Using Go's static typing first approach, we have clearly defined `struct` for the input (`CiqHealthEvent`) and output (`BigPandaAlerts`).
Most of the logic consists of mapping one field to the other.
func CiqEventMapping(c *CiqHealthEvent, bp *BigPandaClient) *BigPandaAlerts { log.Println("mapping input CloudIQ event: ") log.Printf("%+v", c) alert := BigPandaAlerts{ AppKey: bp.AppKey, Cluster: "CloudIQ", Host: c.SystemName, } if len(c.NewIssues) > 0 { for _, v := range c.NewIssues { alert.Alerts = append(alert.Alerts, BigPandaAlert{ Status: statusForScore(c.CurrentScore), Timestamp: c.Timestamp, Host: c.SystemName, Description: v.Description, Check: v.RuleID, IncidentIdentifier: v.ID, }) } } return &alert }
Two things to note here:
1. Because CloudIQ doesn't have the notion of severity, we convert the score to a status using the code below.
2. CloudIQ has an event identifier that will help to deduplicate the alert in BigPanda or reopen a closed event in case of a re-notify.
// BigPanda status values: ok,ok-suspect,warning,warning-suspect,critical,critical-suspect,unknown,acknowledged,oksuspect,warningsuspect,criticalsuspect,ok_suspect,warning_suspect,critical_suspect,ok suspect,warning suspect,critical suspect func statusForScore(s int) string { if s == 100 { return "ok" } else if s <= 99 && s > 95 { return "ok suspect" } else if s <= 95 && s > 70 { return "warning" } else if s <= 70 { return "critical" } else { return "unknown" } }
Behind the scenes, the GCP Cloud Functions are built and executed as a container. To develop and test the code locally (instead of doing everything in the GCP Console), we can develop locally and then build the package using buildpack (https://github.com/googlecloudplatform/buildpacks) as GCP does:
pack build \ --builder gcr.io/buildpacks/builder:v1 \ --env GOOGLE_RUNTIME=go \ --env GOOGLE_FUNCTION_SIGNATURE_TYPE=http \ --env GOOGLE_FUNCTION_TARGET=ciq-bigpanda-integration \ ciq-bigpanda-integration
After the build is successful, we can test it with something similar to:
docker run --rm -p 8080:8080 -e BP_TOKEN=xxxxx -e BP_APP_KEY=yyyyy ciq-bigpanda-integration
Alternatively, you can create a “main.go” and run it with:
FUNCTION_TARGET=CiqEventToBigPandaAlert go run cmd/main.go
Users can choose to deploy the function outside of the GCP console. You can publish it with:
cloud functions deploy ciq-bigpanda-integration --runtime go116 --entry-point ciq-bigpanda-integration --trigger-http --allow-unauthenticated
It is time to point the CloudIQ Webhook to the GCP Function trigger URL. From the Admin > Integrations menu in CloudIQ, go to the Webhooks tab.
To ease the simulation of a Webhook event, go to the CloudIQ Integration and click the TEST WEBHOOK button. This sends a ping request to the destination. You can also go to CloudIQ and redeliver an existing event.
For an actual event and not just a `ping`, use the `easy_post.sh` script after configuring the appropriate ENDPOINT.
#!/bin/bash HEADERS_FILE=${HEADERS_FILE-./headers.json} PAYLOAD_FILE=${PAYLOAD_FILE-./payload.json} ENDPOINT=${ENDPOINT-https://webhook.site/6fd7d650-1b5b-4b8c-9781-2043005bdf2d} mapfile -t HEADERS < <(jq -r '. | to_entries[] | "-H \(.key):\(.value)"'< ${HEADERS_FILE}) curl -k -H "Content-Type: application/json" ${HEADERS[@]} --request POST --data @${PAYLOAD_FILE} ${ENDPOINT}
If everything flows correctly, you will see the health alerts delivered to the BigPanda console. This allows users to consolidate CloudIQ notificaitons with events from other IT tools into a single monitoring interface.
Author: Derek Barboza
Mon, 26 Sep 2022 15:17:45 -0000
|Read Time: 0 minutes
One of the first things I do after deploying a Kubernetes cluster is to install a CSI driver to provide persistent storage to my workloads; coupled with a GitOps workflow; it takes only seconds to be able to run stateful workloads.
The GitOps process is nothing more than a few principles:
Nonetheless, to ensure that the process runs smoothly, you must make certain that the application you will manage with GitOps complies with these principles.
This article describes how to use the Microsoft Azure Arc GitOps solution to deploy the Dell CSI driver for Dell PowerMax and affiliated Container Storage Modules (CSMs).
The platform we will use to implement the GitOps workflow is Azure Arc with GitHub. Still, other solutions are possible using Kubernetes agents such as Argo CD, Flux CD, and GitLab.
Azure GitOps itself is built on top of Flux CD.
The first step is to onboard your existing Kubernetes cluster within the Azure portal.
Obviously, the Azure agent will connect to the Internet. In my case, the installation of the Arc agent fails from the Dell network with the error described here: https://docs.microsoft.com/en-us/answers/questions/734383/connect-openshift-cluster-to-azure-arc-secret-34ku.html
Certain URLs (even when bypassing the corporate proxy) don't play well when communicating with Azure. I have seen some services get a self-signed certificate, causing the issue.
The solution for me was to put an intermediate transparent proxy between the Kubernetes cluster and the corporate cluster. That way, we can have better control over the responses given by the proxy.
In this example, we install Squid on a dedicated box with the help of Docker. To make it work, I used the Squid image by Ubuntu and made sure that Kubernetes requests were direct with the help of always_direct:
docker run -d --name squid-container ubuntu/squid:5.2-22.04_beta ; docker cp squid-container:/etc/squid/squid.conf ./ ; egrep -v '^#' squid.conf > my_squid.conf docker rm -f squid-container
Then add the following section:
acl k8s port 6443 # k8s https always_direct allow k8s
You can now install the agent per the following instructions: https://docs.microsoft.com/en-us/azure/azure-arc/kubernetes/quickstart-connect-cluster?tabs=azure-cli#connect-using-an-outbound-proxy-server.
export HTTP_PROXY=http://mysquid-proxy.dell.com:3128 export HTTPS_PROXY=http://mysquid-proxy.dell.com:3128 export NO_PROXY=https://kubernetes.local:6443 az connectedk8s connect --name AzureArcCorkDevCluster \ --resource-group AzureArcTestFlorian \ --proxy-https http://mysquid-proxy.dell.com:3128 \ --proxy-http http://mysquid-proxy.dell.com:3128 \ --proxy-skip-range 10.0.0.0/8,kubernetes.default.svc,.svc.cluster.local,.svc \ --proxy-cert /etc/ssl/certs/ca-bundle.crt
If everything worked well, you should see the cluster with detailed info from the Azure portal:
To benefit from all the features that Azure Arc offers, give the agent the privileges to access the cluster.
The first step is to create a service account:
kubectl create serviceaccount azure-user kubectl create clusterrolebinding demo-user-binding --clusterrole cluster-admin --serviceaccount default:azure-user kubectl apply -f - <<EOF apiVersion: v1 kind: Secret metadata: name: azure-user-secret annotations: kubernetes.io/service-account.name: azure-user type: kubernetes.io/service-account-token EOF
Then, from the Azure UI, when you are prompted to give a token, you can obtain it as follows:
kubectl get secret azure-user-secret -o jsonpath='{$.data.token}' | base64 -d | sed $'s/$/\\\n/g'
Then paste the token in the Azure UI.
The GitOps agent installation can be done with a CLI or in the Azure portal.
As of now, the Microsoft documentation presents in detail the deployment that uses the CLI; so let's see how it works with the Azure portal:
The Git repository organization is a crucial part of the GitOps architecture. It hugely depends on how internal teams are organized, the level of information you want to expose and share, the location of the different clusters, and so on.
In our case, the requirement is to connect multiple Kubernetes clusters owned by different teams to a couple of PowerMax systems using only the latest and greatest CSI driver and affiliated CSM for PowerMax.
Therefore, the monorepo approach is suited.
The organization follows this structure:
.
├── apps
│ ├── base
│ └── overlays
│ ├── cork-development
│ │ ├── dev-ns
│ │ └── prod-ns
│ └── cork-production
│ └── prod-ns
├── clusters
│ ├── cork-development
│ └── cork-production
└── infrastructure
├── cert-manager
├── csm-replication
├── external-snapshotter
└── powermax
You can see all files in https://github.com/coulof/fluxcd-csm-powermax.
Note: The GitOps agent comes with multi-tenancy support; therefore, we cannot cross-reference objects between namespaces. The Kustomization and HelmRelease must be created in the same namespace as the agent (here, flux-system) and have a corresponding targetNamespace to the resource to be installed.
This article is the first of a series exploring the GitOps workflow. Next, we will see how to manage application and persistent storage with the GitOps workflow, how to upgrade the modules, and so on.
Tue, 23 Aug 2022 17:09:57 -0000
|Read Time: 0 minutes
Network connectivity is an essential part of any infrastructure architecture. When it comes to how Kubernetes connects to PowerScale, there are several options to configure the Container Storage Interface (CSI). In this post, we will cover the concepts and configuration you can implement.
The story starts with CSI plugin architecture.
Like all other Dell storage CSI, PowerScale CSI follows the Kubernetes CSI standard by implementing functions in two components.
The CSI controller plugin is deployed as a Kubernetes Deployment, typically with two or three replicas for high-availability, with only one instance acting as a leader. The controller is responsible for communicating with PowerScale, using Platform API to manage volumes (to PowerScale it’s to create/delete directories, NFS exports, and quotas), to update the NFS client list when a Pod moves, and so on.
A CSI node plugin is a Kubernetes DaemonSet, running on all nodes by default. It’s responsible for mounting the NFS export from PowerScale, to map the NFS mount path to a Pod as persistent storage, so that applications and users in the Pod can access the data on PowerScale.
Because CSI needs to access both PAPI (PowerScale Platform API) and NFS data, a single user role typically isn’t secure enough: the role for PAPI access will need more privileges than normal users.
According to the PowerScale CSI manual, CSI requires a user that has the following privileges to perform all CSI functions:
Privilege | Type |
ISI_PRIV_LOGIN_PAPI | Read Only |
ISI_PRIV_NFS | Read Write |
ISI_PRIV_QUOTA | Read Write |
ISI_PRIV_SNAPSHOT | Read Write |
ISI_PRIV_IFS_RESTORE | Read Only |
ISI_PRIV_NS_IFS_ACCESS | Read Only |
ISI_PRIV_IFS_BACKUP | Read Only |
Among these privileges, ISI_PRIV_SNAPSHOT and ISI_PRIV_QUOTA are only available in the System zone. And this complicates things a bit. To fully utilize these CSI features, such as volume snapshot, volume clone, and volume capacity management, you have to allow the CSI to be able to access the PowerScale System zone. If you enable the CSM for replication, the user needs the ISI_PRIV_SYNCIQ privilege, which is a System-zone privilege too.
By contrast, there isn’t any specific role requirement for applications/users in Kubernetes to access data: the data is shared by the normal NFS protocol. As long as they have the right ACL to access the files, they are good. For this data accessing requirement, a non-system zone is suitable and recommended.
These two access zones are defined in different places in CSI configuration files:
If an admin really cannot expose their System zone to the Kubernetes cluster, they have to disable the snapshot and quota features in the CSI installation configuration file (values.yaml). In this way, the PAPI access zone can be a non-System access zone.
The following diagram shows how the Kubernetes cluster connects to PowerScale access zones.
Normally a Kubernetes cluster comes with many networks: a pod inter-communication network, a cluster service network, and so on. Luckily, the PowerScale network doesn’t have to join any of them. The CSI pods can access a host’s network directly, without going through the Kubernetes internal network. This also has the advantage of providing a dedicated high-performance network for data transfer.
For example, on a Kubernetes host, there are two NICs: IP 192.168.1.x and 172.24.1.x. NIC 192.168.1.x is used for Kubernetes, and is aligned with its hostname. NIC 172.24.1.x isn’t managed by Kubernetes. In this case, we can use NIC 172.24.1.x for data transfer between Kubernetes hosts and PowerScale.
Because by default, the CSI driver will use the IP that is aligned with its hostname, to let CSI recognize the second NIC 172.24.1.x, we have explicitly set the IP range in “allowedNetworks” in the values.yaml file in the CSI driver installation. For example:
allowedNetworks: [172.24.1.0/24]
Also, in this network configuration, it’s unlikely that the Kubernetes internal DNS can resolve the PowerScale FQDN. So, we also have to make sure the “dnsPolicy” has been set to “ClusterFirstWithHostNet” in the values.yaml file. With this dnsPolicy, the CSI pods will reach the DNS server in /etc/resolv.conf in the host OS, not the internal DNS server of Kubernetes.
The following diagram shows the configuration mentioned above:
Please note that the “allowedNetworks” setting only affects the data access zone, and not the PAPI access zone. In fact, CSI just uses this parameter to decide which host IP should be set as the NFS client IP on the PowerScale side.
Regarding the network routing, CSI simply follows the OS route configuration. Because of that, if we want the PAPI access zone to go through the primary NIC (192.168.1.x), and have the data access zone to go through the second NIC (172.24.1.x), we have to change the route configuration of the Kubernetes host, not this parameter.
Hopefully this blog helps you understand the network configuration for PowerScale CSI better. Stay tuned for more information on Dell Containers & Storage!
Authors: Sean Zhan, Florian Coulombel
Fri, 05 Aug 2022 20:29:33 -0000
|Read Time: 0 minutes
This is the second in a series of blogs discussing CloudIQ. In my first blog, I provided a high-level overview of CloudIQ and some of its key features. I will continue with a series of blogs, each talking about one of the key features in more detail. This blog discusses one of CloudIQ’s key differentiating features: the Proactive Health Score.
The Proactive Health Score uses various factors to provide a consolidated view of a system’s health into a single health score. Health scores are based on up to five categories: Components, Configuration, Capacity, Performance, and Data Protection. Based on the resulting health score, the system is put into one of three risk categories: Poor, Fair, or Good. The score starts at 100 and is reduced by the issue with the highest deduction.
A system in the Poor category has a score of 0 to 70 and poses an imminent critical risk. It could be a storage pool that is overprovisioned and full, meaning that systems will be trying to write to storage that is unavailable. Or it could be a significant component failure. Whatever the issue, it is something that requires your immediate attention.
A system in the Fair category has a score of 71 to 94. Systems in this category have an issue that should be looked at, but certainly not something that requires you to get out of bed at 3:00am to address immediately. It could be something like a storage pool predicted to be full in a week or a system inlet temperature that exceeds the upper warning threshold on a PowerEdge server.
A system in the Good category has a score of 95 to 100 and is doing fine. There may be a minor issue that you need to look at, but nothing significant that is expected to cause any near-term problems. An example would be a fibre port with a warning status on a Connectrix switch.
Now what happens if there are multiple issues on a system? We hinted at this earlier. The score is only affected by the most critical issue. Let’s say that there are four issues on a system: one 30-point deduction, one 10-point deduction, and two 5-point deductions. In this case, the health score is 70. When the 30-point deduction is addressed, the score would become 90. We do this to prevent a system with several minor issues from appearing at high risk or at a higher risk than a system with a significant issue.
Figure 1. System Health page
So now that we have been notified of an issue on a system, what do we do next? Well, with CloudIQ, we will offer up recommended remediation actions to address the issue before it has a significant impact on the environment. This may come in the form of a recommended configuration change or other action, a knowledge base article with a resolution, or some commands to run to gain the necessary information to resolve the issue.
Figure 2. Recommended remediation
CloudIQ also tracks the history of the Proactive Health Score. We can see both new and resolved issues along a chart with a selectable date range. Details of the issues are listed below the chart. By providing the history of the health score, CloudIQ allows users to identify possible recurring issues in the environment.
Figure 3. Health Score history
What if we do not want to log in to CloudIQ on a daily or weekly basis to check our systems? We can easily be notified by email any time a system health change occurs. These notifications can be set up for a configurable set of systems, allowing users only to receive notifications for those systems for which they are responsible.
For the more motivated user, CloudIQ supports Webhooks. With this feature, users can send a Webhook for any health change notification to integrate with third-party tools such as ServiceNow, Slack, or Teams. Webhooks are sent for both open and closed issues with a unique identifier. This allows users to correlate the resolved issue with the open issue to automatically close out any created incident. Some Webhook integration examples can be found here.
Whether it be for storage, networking, hyperconverged, servers, or data protection, the Proactive Health Score summarizes the health of a system into a single number, providing an immediate indication of the status of each system. Developed in tandem with experts from each product team, any issues identified for a system are accompanied by recommended remediation to help with self-service and quickly reduce risk. And with email notifications and Webhooks, users can be notified proactively any time an issue is identified.
How do you become more familiar with Dell Technologies and CloudIQ? The Dell Technologies Info Hub site provides expertise that helps to ensure customer success with Dell Technologies platforms. We have CloudIQ demos, white papers, and videos available at the Dell Technologies CloudIQ page. Also, feel free to reference the CloudIQ Overview Whitepaper which provides an in-depth summary of CloudIQ. Interested in DevOps? Go to our public API page for information about integrating CloudIQ with other IT tools using Webhooks or REST API.
Stay tuned for my next blog, where I'll talk about capacity forecasting and capacity anomaly detection in CloudIQ.
Author: Derek Barboza, Senior Principal Engineering Technologist
Thu, 17 Nov 2022 15:04:10 -0000
|Read Time: 0 minutes
At Dell Technologies, we are proud to announce a new interactive demo for Storage Resource Manager (SRM), located here:
This interactive demo is based on the SRM release 4.7.0.0, which introduces several new features, enhancements, and platform supports.
The landing page of the interactive demo provides a summary of the use cases and features covered. This demo has the same look and feel as the actual HTML-based SRM user interface, where you can scroll up and down the page and click on each page object.
Dell SRM provides insight into data center operations from application to storage. Through automated discovery and reporting, Dell SRM breaks down the silos. Its simple use-case driven user interface simplifies tasks such as:
There are eight independent interactive demo modules available, each of which covers a main SRM use case or feature:
Here is a peek inside each of the eight demo modules:
The data that is available in this comprehensive eight module demo is from the following supported vendors and technologies:
|
|
Enjoy this demo and let us know how you like it!
Author: Dejan Stojanovic
Wed, 25 May 2022 19:49:28 -0000
|Read Time: 0 minutes
CloudIQ is Dell’s cloud-based AIOps application for monitoring Dell core, edge, and cloud. Born out of the Dell Unity storage product group several years ago, CloudIQ has quickly grown to cover a broad range of Dell Technologies products. With the latest addition of PowerSwitch, CloudIQ now covers Dell’s entire infrastructure portfolio, including compute, networking, CI/HCI, data protection, and storage systems.
According to a survey conducted last year, IT organizations were able to resolve infrastructure issues two to ten times faster and save a day per week on average with CloudIQ.1
Figure 1. CloudIQ Supported Platforms
CloudIQ has a variety of innovative features based on machine learning and other algorithms that help you reduce risk, plan ahead, and improve productivity. These features include the proactive health score, performance impact and anomaly detection, workload contention identification, capacity forecasting and anomaly detection, cybersecurity monitoring, reclaimable storage identification, and VMware integration.
With custom reporting features, Webhooks, and a REST API, you can integrate data from CloudIQ into ticketing, collaboration, and automation tools and processes that you use in day-to-day IT operations.
Best of all, CloudIQ comes with your standard Dell ProSupport and ProSupport Plus contracts at no extra cost.
Keep an eye out for follow up blogs discussing CloudIQ’s key features in more detail!
Figure 2. CloudIQ Overview Page
With the addition of PowerSwitch support, CloudIQ now gives users the ability to monitor the full range of their Dell Technologies IT infrastructure from a single user interface. And the fact that it is a cloud offering hosted in a secure Dell IT environment means that it is accessible from virtually anywhere. Simply open a web browser, point to https://cloudiq.dell.com, and log in with your Dell support credentials. As a cloud-based application, it also means that you always have access to the latest features because CloudIQ’s agile development process allows for continuous and seamless updates without any effort from you. There is also a mobile app, so you can take it anywhere.
How do you become more familiar with Dell Technologies and CloudIQ? The Dell Technologies Info Hub site provides expertise that helps to ensure customer success with Dell Technologies platforms. We have CloudIQ demos, white papers, and videos available at the Dell Technologies CloudIQ page. Also, feel free to reference the CloudIQ Overview Whitepaper which provides an in-depth summary of CloudIQ.
[1] Based on a Dell Technologies survey of CloudIQ users conducted May-June 2021. Actual results may vary.
Author: Derek Barboza, Senior Principal Engineering Technologist
Wed, 20 Apr 2022 21:28:38 -0000
|Read Time: 0 minutes
With all the Dell Container Storage Interface (CSI) drivers and dependencies being open-source, anyone can tweak them to fit a specific use case.
This blog shows how to create a patched version of a Dell CSI Driver for PowerScale.
As a practical example, the following steps show how to create a patched version of Dell CSI Driver for PowerScale that supports a longer mounted path.
The CSI Specification defines that a driver must accept a max path of 128 bytes minimal:
// SP SHOULD support the maximum path length allowed by the operating
// system/filesystem, but, at a minimum, SP MUST accept a max path
// length of at least 128 bytes.
Dell drivers use the gocsi library as a common boilerplate for CSI development. That library enforces the 128 bytes maximum path length.
The PowerScale hardware supports path lengths up to 1023 characters, as described in the File system guidelines chapter of the PowerScale spec. We’ll therefore build a csi-powerscale driver that supports that maximum length path value.
The Dell CSI drivers are all built with golang and, obviously, run as a container. As a result, the prerequisites are relatively simple. You need:
The first thing to do is to clone the official csi-powerscale repository in your GOPATH source directory.
cd $GOPATH/src/github.com/
git clone git@github.com:dell/csi-powerscale.git dell/csi-powerscale
cd dell/csi-powerscale
You can then pick the version of the driver you want to patch; git tag gives the list of versions.
In this example, we pick the v2.1.0 with git checkout v2.1.0 -b v2.1.0-longer-path.
The next step is to obtain the library we want to patch.
gocsi and every other open-source component maintained for Dell CSI are available on https://github.com/dell.
The following figure shows how to fork the repository on your private github:
Now we can get the library with:
cd $GOPATH/src/github.com/
git clone git@github.com:coulof/gocsi.git coulof/gocsi
cd coulof/gocsi
To simplify the maintenance and merge of future commits, it is wise to add the original repo as an upstream branch with:
git remote add upstream git@github.com:dell/gocsi.git
The next important step is to pick and choose the correct library version used by our version of the driver.
We can check the csi-powerscale dependency file with: grep gocsi $GOPATH/src/github.com/dell/csi-powerscale/go.mod and create a branch of that version. In this case, the version is v1.5.0, and we can branch it with: git checkout v1.5.0 -b v1.5.0-longer-path.
Now it’s time to hack our patch! Which is… just a oneliner:
--- a/middleware/specvalidator/spec_validator.go
+++ b/middleware/specvalidator/spec_validator.go
@@ -770,7 +770,7 @@ func validateVolumeCapabilitiesArg(
}
const (
- maxFieldString = 128
+ maxFieldString = 1023
maxFieldMap = 4096
maxFieldNodeId = 256
)
We can then commit and push our patched library with a nice tag:
git commit -a -m 'increase path limit'
git push --set-upstream origin v1.5.0-longer-path
git tag -a v1.5.0-longer-path
git push --tags
With the patch committed and pushed, it’s time to build the CSI driver binary and its container image.
Let’s go back to the csi-powerscale main repo: cd $GOPATH/src/github.com/dell/csi-powerscale
As mentioned in the introduction, we can take advantage of the replace directive in the go.mod file to point to the patched lib. In this case we add the following:
diff --git a/go.mod b/go.mod
index 5c274b4..c4c8556 100644
--- a/go.mod
+++ b/go.mod
@@ -26,6 +26,7 @@ require (
)
replace (
+ github.com/dell/gocsi => github.com/coulof/gocsi v1.5.0-longer-path
k8s.io/api => k8s.io/api v0.20.2
k8s.io/apiextensions-apiserver => k8s.io/apiextensions-apiserver v0.20.2
k8s.io/apimachinery => k8s.io/apimachinery v0.20.2
When that is done, we obtain the new module from the online repo with: go mod download
Note: If you want to test the changes locally only, we can use the replace directive to point to the local directory with:
replace github.com/dell/gocsi => ../../coulof/gocsi
We can then build our new driver binary locally with: make build
After compiling it successfully, we can create the image. The shortest path to do that is to replace the csi-isilon binary from the dellemc/csi-isilon docker image with:
cat << EOF > Dockerfile.patch
FROM dellemc/csi-isilon:v2.1.0
COPY "csi-isilon" .
EOF
docker build -t coulof/csi-isilon:v2.1.0-long-path -f Dockerfile.patch .
Alternatively, you can rebuild an entire docker image using provided Makefile.
By default, the driver uses a Red Hat Universal Base Image minimal. That base image sometimes misses dependencies, so you can use another flavor, such as:
BASEIMAGE=registry.fedoraproject.org/fedora-minimal:latest REGISTRY=docker.io IMAGENAME=coulof/csi-powerscale IMAGETAG=v2.1.0-long-path make podman-build
The image is ready to be pushed in whatever image registry you prefer. In this case, this is hub.docker.com: docker push coulof/csi-isilon:v2.1.0-long-path.
The last step is to replace the driver image used in your Kubernetes with your custom one.
Again, multiple solutions are possible, and the one to choose depends on how you deployed the driver.
If you used the helm installer, you can add the following block at the top of the myvalues.yaml file:
images:
driver: docker.io/coulof/csi-powerscale:v2.1.0-long-path
Then update or uninstall/reinstall the driver as described in the documentation.
If you decided to use the Dell CSI Operator, you can simply point to the new image:
apiVersion: storage.dell.com/v1
kind: CSIIsilon
metadata:
name: isilon
spec:
driver:
common:
image: "docker.io/coulof/csi-powerscale:v2.1.0-long-path"
...
Or, if you want to do a quick and dirty test, you can create a patch file (here named path_csi-isilon_controller_image.yaml) with the following content:
spec:
template:
spec:
containers:
- name: driver
image: docker.io/coulof/csi-powerscale:v2.1.0-long-path
You can then apply it to your existing install with: kubectl patch deployment -n powerscale isilon-controller --patch-file path_csi-isilon_controller_image.yaml
In all cases, you can check that everything works by first making sure that the Pod is started:
kubectl get pods -n powerscale
and that the logs are clean:
kubectl logs -n powerscale -l app=isilon-controller -c driver.
As demonstrated, thanks to the open source, it’s easy to fix and improve Dell CSI drivers or Dell Container Storage Modules.
Keep in mind that Dell officially supports (through tickets, Service Requests, and so on) the image and binary, but not the custom build.
Thanks for reading and stay tuned for future posts on Dell Storage and Kubernetes!
Author: Florian Coulombel
Thu, 07 Apr 2022 14:26:51 -0000
|Read Time: 0 minutes
We are happy to announce the release of the new SRM hands on lab:
This new SRM hands on lab is based on the latest SRM release (4.7.0.0), which introduced many new features, enhancements, and platform supports.
To find this lab, go to the demo center (https://democenter.delltechnologies.com) and enter “srm” in the search box. This link to the lab will appear:
The welcome screen on the lab looks like this. It includes a network diagram and a comprehensive lab guide:
In the first module, called “What’s New”, the lab focuses on the following new features, enhancements, and newly supported platforms:
The rest of the modules cover in-depth SRM use-cases listed below. Each module is independent so that you can focus on your area of interest:
and some of the main SRM features:
Check out some of the SRM dashboards available:
The lab includes a great variety of SRM reports containing data from supported vendors and technologies:
|
|
The SRM 4.7.0.0 hands on lab helps you experience SRM use-cases and features, by browsing through the powerful user interface and elaborating on data from multiple vendors and technologies.
Enjoy the SRM hands on lab! If you have any questions, please contact us at support@democenter.dell.com.
Author: Dejan Stojanovic
Mon, 21 Mar 2022 14:42:31 -0000
|Read Time: 0 minutes
The quarterly update for Dell CSI Drivers & Dell Container Storage Modules (CSM) is here! Here’s what we’re planning.
Dell Container Storage Modules (CSM) add data services and features that are not in the scope of the CSI specification today. The new CSM Operator simplifies the deployment of CSMs. With an ever-growing ecosystem and added features, deploying a driver and its affiliated modules need to be carefully studied before beginning the deployment.
The new CSM Operator:
In the short/middle term, the CSM Operator will deprecate the experimental CSM Installer.
For disaster recovery protection, PowerScale implements data replication between appliances by means of the the SyncIQ feature. SyncIQ replicates the data between two sites, where one is read-write while the other is read-only, similar to Dell storage backends with async or sync replication.
The role of the CSM replication module and underlying CSI driver is to provision the volume within Kubernetes clusters and prepare the export configurations, quotas, and so on.
CSM Replication for PowerScale has been designed and implemented in such a way that it won’t collide with your existing Superna Eyeglass DR utility.
A live-action demo will be posted in the coming weeks on our VP YouTube channel: https://www.youtube.com/user/itzikreich/.
In this release, each CSI driver:
Kubernetes v1.19 introduced the fsGroupPolicy to give more control to the CSI driver over the permission sets in the securityContext.
There are three possible options:
In all cases, Dell CSI drivers let kubelet perform the change ownership operations and do not do it at the driver level.
Drivers for PowerFlex and Unity can now be installed with the help of the install scripts we provide under the dell-csi-installer directory.
A standalone Helm chart helps to easily integrate the driver installation with the agent for Continuous Deployment like Flux or Argo CD.
Note: To ensure that you install the driver on a supported Kubernetes version, the Helm charts take advantage of the kubeVersion field. Some Kubernetes distributions use labels in kubectl version (such as v1.21.3-mirantis-1 and v1.20.7-eks-1-20-7) that require manual editing.
Drivers for PowerFlex and Unity implement Volume Health Monitoring.
This feature is currently in alpha in Kubernetes (in Q1-2022), and is disabled with a default installation.
Once enabled, the drivers will expose the standard storage metrics, such as capacity usage and inode usage through the Kubernetes /metrics endpoint. The metrics will flow natively in popular dashboards like the ones built-in OpenShift Monitoring:
All Dell drivers and dependencies like gopowerstore, gobrick, and more are now on Github and will be fully open-sourced. The umbrella project is and remains https://github.com/dell/csm, from which you can open tickets and see the roadmap.
The Dell partnership with Google continues, and the latest CSI drivers for PowerScale and PowerStore support Anthos v1.9.
Both CSI PowerScale and PowerStore now allow setting the default permissions for the newly created volume. To do this, you can use POSIX octal notation or ACL.
For more details you can:
Author: Florian Coulombel
Tue, 15 Mar 2022 19:24:40 -0000
|Read Time: 0 minutes
Dell Technologies takes a comprehensive approach to cyber resiliency and is committed to helping customers achieve their security objectives and requirements. Storage Engineering Technologists Richard Pace, Justin Bastin, and Derek Barboza worked together, cross platform, to deliver three independent cyber security white papers for PowerMax, Mainframe, and PowerStore:
Each paper acts as a single point where customers can gain an understanding of the respective robust features and data services available to safeguard sensitive and mission critical data in the event of a cyber crime. All three papers leverage CloudIQ and the CyberSecurity feature to provide customers insight in anomaly detection.
The following figure shows a CloudIQ anomaly that indicates unusual behavior in a customer’s environment:
Backed by CyberSecurity in CloudIQ, we can see how quickly CloudIQ detects the issue and provides the details for manual remediation.
Dell has an ingrained culture of security. We follow a 'shift-left' approach that ensures that security is baked into every process in the development life cycle. The Dell Secure Development Lifecycle (SDL) defines security controls based on industry standards that Dell product teams adopt while developing new features and functionality. Dell’s SDL defines security controls that our product teams adopt while developing new features and functionality. Our SDL includes both analysis activities and prescriptive proactive controls around key risk areas.
Dell strives to help our customers minimize risk associated with security vulnerabilities in our products. Our goal is to provide customers with timely information, guidance, and mitigation options to address vulnerabilities. The Dell Product Security Incident Response Team (Dell PSIRT) is chartered and responsible for coordinating the response and disclosure for all product vulnerabilities that are reported to Dell. Dell employs a rigorous process to continually evaluate and improve our vulnerability response practices, and regularly benchmarks these against the rest of the industry.
Authors: Richard Pace, Justin F. Bastin
Thu, 14 Oct 2021 11:40:35 -0000
|Read Time: 0 minutes
The quarterly update for Dell CSI Driver is here! But today marks a significant milestone because we are also announcing the availability of Dell EMC Container Storage Modules (CSM). Here’s what we’re covering in this blog:
Dell Container Storage Modules is a set of modules that aims to extend Kubernetes storage features beyond what is available in the CSI specification.
The CSM modules will expose storage enterprise features directly within Kubernetes, so developers are empowered to leverage them for their deployment in a seamless way.
Most of these modules are released as sidecar containers that work with the CSI driver for the Dell storage array technology you use.
CSM modules are open-source and freely available from : https://github.com/dell/csm.
Many stateful apps can run on top of multiple volumes. For example, we can have a transactional DB like Postgres with a volume for its data and another for the redo log, or Cassandra that is distributed across nodes, each having a volume, and so on.
When you want to take a recoverable snapshot, it is vital to take them consistently at the exact same time.
Dell CSI Volume Group Snapshotter solves that problem for you. With the help of a CustomResourceDefinition, an additional sidecar to the Dell CSI drivers, and leveraging vanilla Kubernetes snapshots, you can manage the life cycle of crash-consistent snapshots. This means you can create a group of volumes for which the drivers create snapshots, restore them, or move them with one shot simultaneously!
To take a crash-consistent snapshot, you can either use labels on your PersistantVolumeClaim, or be explicit and pass the list of PVCs that you want to snap. For example:
apiVersion: v1 apiVersion: volumegroup.storage.dell.com/v1alpha2 kind: DellCsiVolumeGroupSnapshot metadata: # Name must be 13 characters or less in length name: "vg-snaprun1" spec: driverName: "csi-vxflexos.dellemc.com" memberReclaimPolicy: "Retain" volumesnapshotclass: "poweflex-snapclass" pvcLabel: "vgs-snap-label" # pvcList: # - "pvcName1" # - "pvcName2"
For the first release, CSI for PowerFlex supports Volume Group Snapshot.
The CSM Observability module is delivered as an open-telemetry agent that collects array-level metrics to scrape them for storage in a Prometheus DB.
The integration is as easy as creating a Prometheus ServiceMonitor for Prometheus. For example:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: otel-collector namespace: powerstore spec: endpoints: - path: /metrics port: exporter-https scheme: https tlsConfig: insecureSkipVerify: true selector: matchLabels: app.kubernetes.io/instance: karavi-observability app.kubernetes.io/name: otel-collector
With the observability module, you will gain visibility on the capacity of the volume you manage with Dell CSI drivers and their performance, in terms of bandwidth, IOPS, and response time.
Thanks to pre-canned Grafana dashboards, you will be able to go through these metrics’ history and see the topology between a Kubernetes PersistentVolume (PV) until its translation as a LUN or fileshare in the backend array.
The Kubernetes admin can also collect array level metrics to check the overall capacity performance directly from the familiar Prometheus/Grafana tools.
For the first release, Dell EMC PowerFlex and Dell EMC PowerStore support CSM Observability.
Each Dell storage array supports replication capabilities. It can be asynchronous with an associated recovery point objective, synchronous replication between sites, or even active-active.
Each replication type serves a different purpose related to the use-case or the constraint you have on your data centers.
The Dell CSM replication module allows creating a persistent volume that can be of any of three replication types -- synchronous, asynchronous, and metro -- assuming the underlying storage box supports it.
The Kubernetes architecture can build on a stretched cluster between two sites or on two or more independent clusters. The module itself is composed of three main components:
The usual workflow is to create a PVC that is replicated with a classic Kubernetes directive by just picking the right StorageClass. You can then use repctl or edit the DellCSIReplicationGroup CRD to launch operations like Failover, Failback, Reprotect, Suspend, Synchronize, and so on.
For the first release, Dell EMC PowerMax and Dell EMC PowerStore support CSM Replication.
With CSM Authorization we are giving back more control of storage consumption to the storage administrator.
The authorization module is an independent service, installed and owned by the storage admin.
Within that module, the storage administrator will create access control policies and storage quotas to make sure that Kubernetes consumers are not overconsuming storage or trying to access data that doesn’t belong to them.
CSM Authorization makes multi-tenant architecture real by enforcing Role-Based Access Control on storage objects coming from multiple and independent Kubernetes clusters.
The authorization module acts as a proxy between the CSI driver and the backend array. Access is granted with an access token that can be revoked at any point in time. Quotas can be changed on the fly to limit or increase storage consumption from the different tenants.
For the first release, Dell EMC PowerMax and Dell EMC PowerFlex support CSM Authorization.
When dealing with StatefulApp, if a node goes down, vanilla Kubernetes is pretty conservative.
Indeed, from the Kubernetes control plane, the failing node is seen as not ready. It can be because the node is down, or because of network partitioning between the control plane and the node, or simply because the kubelet is down. In the latter two scenarios, the StatefulApp is still running and possibly writing data on disk. Therefore, Kubernetes won’t take action and lets the admin manually trigger a Pod deletion if desired.
The CSM Resiliency module (sometimes named PodMon) aims to improve that behavior with the help of collected metrics from the array.
Because the driver has access to the storage backend from pretty much all other nodes, we can see the volume status (mapped or not) and its activity (are there IOPS or not). So when a node goes into NotReady state, and we see no IOPS on the volume, Resiliency will relocate the Pod to a new node and clean whatever leftover objects might exist.
The entire process happens in seconds between the moment a node is seen down and the rescheduling of the Pod.
To protect an app with the resiliency module, you only have to add the label podmon.dellemc.com/driver to it, and it is then protected.
For more details on the module’s design, you can check the documentation here.
For the first release, Dell EMC PowerFlex and Dell EMC Unity support CSM Resiliency.
Each module above is released either as an independent helm chart or as an option within the CSI Drivers.
For more complex deployments, which may involve multiple Kubernetes clusters or a mix of modules, it is possible to use the csm installer.
The CSM Installer, built on top of carvel gives the user a single command line to create their CSM-CSI application and to manage them outside the Kubernetes cluster.
For the first release, all drivers and modules support the CSM Installer.
For each driver, this release provides:
VMware Tanzu offers storage management by means of its CNS-CSI driver, but it doesn’t support ReadWriteMany access mode.
If your workload needs concurrent access to the filesystem, you can now rely on CSI Driver for PowerStore, PowerScale and Unity through the NFS protocol. The three platforms are officially supported and qualified on Tanzu.
NFS Driver, PowerStore, PowerScale, and Unity have all been tested and work when the Kubernetes cluster is behind a private network.
By default, the CSI driver creates volumes with 777 POSIX permission on the directory.
Now with the isiVolumePathPermissions parameter, you can use ACLs or any more permissive POSIX rights.
The isiVolumePathPermissions can be configured as part of the ConfigMap with the PowerScale settings or at the StorageClass level. The accepted parameter values are: private_read, private, public_read, public_read_write, and public for the ACL or any combination of [POSIX Mode].
For more details you can:
Author: Florian Coulombel