Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

Dell.com Contact Us
US(English)

Blogs

Short topics related to data storage essentials.

blogs (20)

VMware Visibility in CloudIQ

Michael Aharon Derek Barboza Michael Aharon Derek Barboza

Mon, 19 Feb 2024 19:45:14 -0000

|

Read Time: 0 minutes

Introduction

In the last several years, there has been an increased desire for deeper visibility and insights into what is going on within customers’ data centers. Especially with wider adoption of AI/ML, demand for insight-driven outcomes has increased. Customers are looking to have a single pane of glass that has visibility into their infrastructure.

Benefits

One of the major benefits I see for customers who invested with Dell across our broad portfolio is that CloudIQ truly becomes that single pane of glass. It enables customers to integrate toCloudIQ using WebHooks and REST API with external tools and create actionable processes. One example would be integration with ServiceNow. The other benefit is the breadth of the insights based on AI/ML algorithms and our capability to not only be descriptive in our recommendations, but also become more prescriptive.

I can go on and on describing the benefits of CloudIQ, but in this blog, I would like to focus on the CloudIQ Collector. Although customers are accustomed to using VMware vCenter to look up configuration and performance details specific to Virtual Machines and vVols, with AIOps-based tools like CloudIQ, the goal is to bring this information together in a single management pane of glass. Customers using Dell primary storage solutions can leverage the CloudIQ Collector to bring visibility at the VMware Virtual Machine level inside the CloudIQ portal. I can see this capability enabling customers to use CloudIQ for the following use cases:

  1. See the end-to-end map making up the virtual infrastructure. This information includes the ESXi cluster, the ESXi host, the switch, the storage array, the datastore, and the virtual machine.
  2. Simplify troubleshooting. You will see later in this blog that having visibility at the entire end-to-end map allows customers to view performance impacts (IOPs, Bandwidth, Latency) as well as ESXi host-specific performance metrics.
  3. Leverage CloudIQ integration through WebHooks and REST API with external tools for notifications, alerting, reporting, and so on.

CloudIQ Collector

The Dell CloudIQ Collector is a VMware Open Virtual Appliance (OVA) using Open Virtualization Format (OVF) and is installed as a virtual machine that collects data from VMware environments, Dell Connectrix switches, and Dell PowerSwitch devices. The Collector retrieves information from the target objects (vCenter or switches) and sends the collected data back to CloudIQ using a Secure Connect Gateway. For VMware, the Collector communicates to vCenter by using the VMware API and requires a user with read-only privileges. For Connectrix and PowerSwitch devices, the Collector communicates to the individual switches using REST API and uses a nonprivileged user. A single collector can be used for VMware, Connectrix, and PowerSwitch.

The theme again is to provide overall visibility across different pieces of infrastructure to our customers. The CloudIQ Collector Overview white paper does a nice job on how to implement the Collector, but here I will go more into the functionality and what data we present to our customers.

Once the CloudIQ Collector is installed and fully configured, VMware data will appear in CloudIQ within 24 hours and will be accessible within the following views in the CloudIQ portal.

  1. On the Virtualization page:
    1. Monitor > Virtualization
  2. On a system inventory page:
    1. Monitor > Systems (click an array, click the inventory tab, click the virtual machines tab)
  3. On a hosting storage object (Pool, LUN/Volume, or Storage Group) inventory page:
    1. Monitor > Systems (click an array, click the inventory tab, click the pools tab, click a pool, click the virtual machines tab)
    2. Monitor > Systems (click an array, click the inventory tab, click the storage groups tab, click a storage group, click the virtual machines tab)
    3. Monitor > Systems (click an array, click the inventory tab, click the storage tab, click a storage object (LUN/volume/file system), click the virtual machines tab)
  4. On a host or server properties page:
    1. Monitor > Systems (click an array, click the inventory tab, click the hosts or servers tab, click a host, click the virtual machines tab)
  5. Through global search

VMware visibility in management tools

Traditionally customers with Dell’s primary storage have had a certain level of visibility into their VMware environment. It typically was accomplished by linking VMware vCenter with our management tools for products like PowerMax, PowerStore, and Unity XT. For reasons of keeping this blog concise, I will focus on PowerStore, but as mentioned above, other Dell primary storage products have visibility into the VMware environment from their respective element managers.

The Dell PowerStore management UI is called PowerStore Manager. Integrating PowerStore Manager with VMware vCenter is straightforward. If integration is successful, you will see the status turn to green and show OK.

 

Figure 1. Registered vCenter in PowerStore Manager

This integration with vCenter will populate the Virtual Machine tab in PowerStore Manager.

Figure 2. Virtual Machines page in PowerStore Manager

As you can see, we support vVol, VMFS, and NFS based virtual machines. You can also expand the view by adding additional columns by clicking “Show/Hide Table Columns” on the right side of the screen.

The virtual machine names column allows users to click each virtual machine and see additional details.

Figure 3. Virtual machine details

The above image demonstrates a detailed view of a vVol virtual machine. You can navigate through multiple tabs that show additional and deeper details, such as performance and storage-related metrics, data protection policies applied, and so on.

The other integration point you can explore is the datastore a virtual machine resides in. This comes in handy when customers need to troubleshoot a specific issue, or simply map out the components. A PowerStore administrator can trace the virtual machine directly to either Storage Container, VMFS block LUN, or an NFS-based datastore, without leaving the virtual machines view of the PowerStore Manager.

Figure 4. Storage container details

In the above image, I selected a Storage Container that holds one of the vVols. Once again, you see a consistent view, with multiple tabs allowing you to easily navigate and look up additional details.

VMFS or NFS based virtual machines follow the same logic. We collect and present slightly fewer details than vVol based virtual machines, but this is where CloudIQ Collector supplements this view.

Figure 5. VMFS virtual machine performance chart

VMware visibility in CloudIQ

I have been guiding all my customers to embrace CloudIQ over the past several years. And although CloudIQ is provided to customers as a Software-as-a-Service application, the CloudIQ Collector is one of the elements that will need to be installed inside the customers’ data center to monitor VMware, Connectrix switches, and PowerSwitch devices.

Logging in to CloudIQ is based on the customers’ accounts registered with a Dell support contract. In addition to this, customers can leverage Role-Based Access Control (RBAC) implemented within the CloudIQ portal.

Once logged in, customers can explore the categories shown on the left side of the CloudIQ portal. The categories that we will be focusing on in this blog are under the ‘Monitor’ category.

Figure 6. Virtualization View in CloudIQ

The Virtualization view enables you to view and manage components such as the vCenter, data center, and clusters using the tree view and the table view. It also displays information about each VMware vCenter server in the system. For those customers who use Dell HCI solutions like VxRail, and Dell primary storage products, like PowerStore, or simply a VMware ESXi environment managed by a vCenter, this view will have a consolidated view of all these environments.  

Across the top, customers can see a quick snapshot of the overall status of the environment.

Figure 7. Summary banner

The navigation panel on the left shows you all vCenters with their respective clusters and data centers. Customers can browse through the list and select a particular cluster. As the image below shows, you can start zooming in on each virtual machine listed under the VMs tab. The areas I highlighted below are hyperlinks and allow customers to get additional details for each virtual machine.

Figure 8. Virtual Machines tab

Clicking the Backup_VM1 virtual machine leads me to the VM details page.

Figure 9. Virtual machine details page

This is where it starts to get interesting. For example, customers can see our AI/ML algorithms in action in the form of anomaly detection. CloudIQ collects telemetry data and compares metrics against historical seasonality. We can identify issues, like increased latency, as we compare data against what we saw in the past for the same period.

Figure 10. Performance anomaly detection

Toward the bottom of the view, you can see a section called “Configuration Changes.” We display hourly aggregated configuration changes that have been made to this Virtual Machine and by charting them along the time access, you can potentially correlate a configuration change with a change in performance profile.

Figure 11. Configuration change tracking

 The right side of this view is showing three tabs:

  • End to end map
  • Storage paths
  • Configuration changes

Figure 12. End to end map

End to End Map displays an interactive topology map showing the components including inventory and basic performance. Selecting the cluster, host, datastore, network, storage entity, or array displays more object details underneath the topology map.

Storage Paths provides information for the datastore storage paths including the associated host adapter worldwide name (WWN), fabric, and array adapter.

Figure 13. Storage paths

 Configuration Changes displays configuration changes for the last 24 hours for the virtual machine.

Figure 14. Configuration changes

If you use other solutions from the Dell Technologies portfolio, such as PowerEdge servers for your VMware ESXi clusters, there is yet another option/view you can explore. You can navigate between the VM details page and the PowerEdge details page to quickly see related information.

Figure 15. PowerEdge system details page

Custom Reports

To round off our discussion, customers also have reporting capabilities that can be leveraged.

Figure 16. Report browser

Customers can generate several types of reports:

  • Anomaly Chart
  • Line Chart
  • Table

If you would like to report on the inventory of Virtual Machines, a table would be sufficient.

Figure 17. Example of a custom table

 When creating a table, there is a set of default columns preselected. You can choose to include additional columns from the available columns list or remove some of the preselected ones.

Figure 18. Customizing columns in a table

The second option is to generate a line chart which shows historical performance data. As I am demonstrating below, you can select ‘VMware’ as the product category and ‘Virtual Machine’ as the subcategory. This selection will show you all the virtual machines available in the inventory. Feel free to select one or more virtual machines and go to the next screen. Filtering capabilities are available to display and select specific VMs.

Figure 19. Configuring a line chart

The next screen is where you select the metrics you want to include in your report.

Figure 20. Metric selection

 By default, the resulting report shows you data for the last 24 hours. Since CloudIQ keeps 2 years of historical data, you can define a larger window by clicking the drop-down menu.

Figure 21. Line chart example

As you can see above, you can correlate performance for virtual machines that might have dependencies, but you can also click either virtual machine on the right side and dim down the graph, so it doesn’t interfere or crowd the screen.

Once you are happy with the data on the screen, you can schedule the report and save it in a PDF format.

Conclusion

As you can see, there is a plethora of information available to customers across Dell management software. In CloudIQ, there are many other views that can show additional details about virtual machines and volumes, for example when browsing a server or a datastore. I encourage you to connect with a Dell representative and schedule a full demo of this product.

Resources

Important Links:

https://www.dell.com/en-us/dt/solutions/cloudiq.htm

https://infohub.delltechnologies.com/t/cloudiq-a-detailed-review/

https://infohub.delltechnologies.com/t/dell-cloudiq-collector-an-overview/

https://developer.dell.com/apis

Authors: 

Michael Aharon, Advisory Solutions Consultant; 

Derek Barboza, Senior Principal Engineering Technologist

Read Full Blog
  • PowerEdge
  • edge
  • CloudIQ
  • sustainability

Talking CloudIQ: PowerEdge

Derek Barboza Derek Barboza

Wed, 08 Nov 2023 16:32:28 -0000

|

Read Time: 0 minutes

Introduction

In my previous blogs, I have focused on a specific feature in CloudIQ. This blog talks about various CloudIQ features for Dell’s PowerEdge servers. Dell CloudIQ continues to expand its feature set for PowerEdge assets. CloudIQ integrates with Dell’s OpenManage Enterprise at each of your sites, to efficiently collect and aggregate telemetry data to give you a multisite, enterprise-wide view of all your PowerEdge servers and chassis. And with OpenManage Enterprise 4.0, onboarding your PowerEdge servers to CloudIQ is easier than ever!

Health, inventory, and performance

Since the introduction of PowerEdge support in CloudIQ, health, inventory, and performance monitoring for PowerEdge servers have all been available. CloudIQ provides an overall health score for each PowerEdge server and recommended remediation when an issue is identified. Inventory reporting provides numerous properties about each server, including contract status, component firmware versions, licensing information, and hardware listings to name a few. CloudIQ displays key performance metrics and not only shows historical trends but identifies performance anomalies and provides performance forecasting. This information allows you to see unexpected performance patterns, and plan future resource needs based on trending workloads.

Figure 1.  Example of a performance forecasting chart for PowerEdge

Cybersecurity

Cybersecurity is a feature in CloudIQ that allows you to compare your existing security configuration settings to a predefined set of desired security configuration settings. The configuration is continuously monitored, notifying you when a configuration does not meet its desired setting. Cybersecurity monitors up to 31 server configuration settings and 18 chassis configuration settings tied to NIST security standards. Without automated continuous checking, it's impractical to manually check all settings on all servers every day. Lab tests show that it takes six minutes on average to manually check just 15 settings on a single server.

Users can also see a list of applicable Dell Security Advisories (DSAs) for their PowerEdge systems. By intelligently matching attributes like models and code versions, users can quickly see which DSAs are applicable to their systems, allowing them to take immediate action to remediate these security vulnerabilities.

Figure 2.  The Security Assessment page for a PowerEdge chassis

System Management

You can now initiate BIOS and firmware updates for PowerEdge servers and chassis from CloudIQ. Users with a Server Admin role in CloudIQ can initiate these upgrades across multiple systems with just a few clicks. This feature simplifies the process of keeping your fleet of servers consistent and secure.

Figure 3.  Multisystem update for PowerEdge servers and chassis

Virtualization View

The integration of PowerEdge into the Virtualization View consolidates and simplifies resource information about PowerEdge servers running ESXi. Available details include the OS version, model, resource consumption per virtual machine, and health issues with recommendations for remediation. A hyperlink lets you quickly navigate to the system details page for the PowerEdge server for more troubleshooting. Another hyperlink directs you to vCenter to perform virtualized resource administration.

Figure 4.  PowerEdge support in the Virtualization View

Carbon footprint monitoring

CloudIQ has introduced carbon footprint analysis support for PowerEdge servers and chassis. CloudIQ takes power and energy metrics and calculates carbon emissions based on international standards and conversion factors for location. CloudIQ Administrators can override and customize these values with their own unique location emission factors.

Figure 5.  Energy, power, and carbon emissions for a PowerEdge server

Custom reports and IT integrations

You can generate custom reports using both tables and charts for PowerEdge servers:

  • Tables are available to provide lists of assets, code versions, contract information, capacity metrics, and average performance metrics.
  • Charts can be used to see historical performance trends and performance anomalies.

You can also take advantage of custom tags in your reports. For example, you can create a list of PowerEdge servers in a certain business unit with their BIOS and firmware versions, contract expiration dates, average power consumption, and service tags. And with Webhooks and REST API access, you can integrate data and events from CloudIQ with ServiceNow, Slack, and other IT tools to help you monitor your entire Dell IT infrastructure.

Figure 6.  Custom reporting table for PowerEdge with custom tags

Conclusion

As IT resources become more remote and isolated, it has become increasingly time consuming to maintain, manage, and secure resources in the data center and at the edge. CloudIQ simplifies monitoring and management by providing a single portal to view all your PowerEdge servers across your entire environment. With cybersecurity monitoring of PowerEdge servers and chassis, you can quickly see where security configuration settings may be incorrectly set or accidentally changed, opening those systems to cyberattacks, and receive instructions to remediate. With the new maintenance and management features, CloudIQ simplifies the process of keeping your entire fleet at consistent, secure, and desired BIOS and firmware versions. The carbon footprint page in CloudIQ helps you meet sustainability goals. And with Webhook and REST API support, CloudIQ can be integrated with other IT tools to help you monitor not only your PowerEdge servers, but your entire Dell IT portfolio.

Resources

This Knowledge Base Article discusses how to onboard PowerEdge devices to CloudIQ.

For a quick demo about CloudIQ PowerEdge support, see the CloudIQ videos section on the Info Hub.

Direct from Development Tech Note: Dell CloudIQ Cybersecurity for PowerEdge: The Benefits of Automation

See other informative blogs: Overview of CloudIQ, Proactive Health Scores, Capacity Monitoring and Planning, Cybersecurity, and Custom Reports and Tags.

How do you become more familiar with Dell Technologies and CloudIQ? The Dell Technologies Info Hub site provides expertise that helps to ensure customer success with Dell Technologies platforms. We have CloudIQ demos, white papers, and videos available at the Dell Technologies CloudIQ page. Also, feel free to reference the white paper CloudIQ: A Detailed Overview which provides an in-depth summary of CloudIQ.

Author: Derek Barboza, Senior Principal Engineering Technologist

Read Full Blog
  • CloudIQ
  • Sustainability
  • Carbon footprint
  • Energy Usage

CloudIQ - Carbon Footprint Analysis

Derek Barboza Michael Aharon Derek Barboza Michael Aharon

Wed, 04 Oct 2023 16:03:26 -0000

|

Read Time: 0 minutes

In this blog post, we’ll cover a topic that is top of mind for all organizations, small and large--Energy Efficiency. I’ll also highlight how Dell Technologies helps customers increase energy efficiency using our vast portfolio. First, let’s define what Energy Efficiency is.

“Simply put, energy efficiency means using less energy to get the same job done 

– and in the process, cutting energy bills and reducing pollution.”

Reference: Energy Efficiency | ENERGY STAR

As organizations undergo digital transformation and modernization, there is a massive explosion in the amount of data that needs to be stored. This data expansion is driven by technologies like Cloud Computing, Artificial Intelligence, and streaming services, just to name a few. This in turn impacts how much power organizations are now consuming in their data centers, which forces IT vendors to make their solutions more efficient and reduce emission and carbon footprint.

Dell Technologies has been helping customers harness the power of technology to drive human progress for several decades. Our latest Environmental, Social and Governance report focuses on the investments Dell has made to support these initiatives.

 

If you’re interested in delving deeper, check out Dell's FY23 Environmental, Social and Governance.

Energy concerns were of paramount importance for our customers in 2022, not only in response to rising energy costs but also as they worked toward reducing emissions. As a leader in sustainable technology, Dell partnered with customers to make the transition to more energy efficient data centers with advanced cooling and thermals, power management tools, and as-a-Service (aaS) solutions to “right size” data storage. With the cost of energy commodities expected to be on average 46% higher in 2023, we will continue to set the standard on data center infrastructure solutions to drive efficient operational and environmental outcomes for our customers.

Dell reinvests over $4B in R&D on an annual basis, continuing to lead the market with our innovation in storage and data reduction efficiencies to save energy and reduce our carbon and hardware footprint.

Dell’s commitment to reducing carbon footprint is exemplified by the introduction of innovative ideas to optimize our portfolio. Recognized as one of the winners of Fast Company’s 2023 World Changing Ideas Awards, Dell’s Concept Luna was designed to showcase how the future of electronic devices can be one where they’re repaired instead of thrown out. Feel free review the full article, How Dell is infusing sustainability across its businesses, to learn more

Based on what we covered so far, we truly believe that informing our customers of critical data points that contribute to overall awareness of power, energy consumption, and carbon footprint is essential.

CloudIQ Carbon Footprint: integrating energy efficiency across systems 

Several years ago, Dell Technologies developed a product called CloudIQ, the cloud-based AIOps proactive monitoring and predictive analytics application for Dell systems. CloudIQ leverages machine learning and other algorithms, notifications, and recommendations to help customers optimize compute, storage, data protection, and network health, performance, and capacity. CloudIQ supports a broad range of Dell Technologies products, including:

  • Servers -- PowerEdge
  • Storage -- PowerStore, PowerMax, PowerScale, PowerVault, Unity, Unity XT, XtremIO, and SC Series
  • Data protection -- PowerProtect DD and PowerProtect Data Manager
  • Converged and hyperconverged infrastructure -- VxBlock, VxRail, and PowerFlex
  • Networking -- PowerSwitch and Connectrix, plus Dell Technologies APEX Data Storage Services

Over 90% of our customers actively use CloudIQ as their centralized dashboard to inform them proactively about KPIs across their Dell Technologies estate.

Introducing Carbon Footprint, an additional capability within CloudIQ designed to provide insights for power, energy consumption, and carbon footprint forecasting across all systems. At the time of the initial release, we are supporting the following products from our portfolio:

  • PowerEdge
  • VxRail
  • Unity
  • PowerScale
  • Connectrix Systems

and focusing on the following KPIs:

  • Total carbon emissions for this year (YTD)
  • Energy consumption trends (monthly and YTD)
  • 24H power consumption, 24H load on average
  • Historical and forecast data for energy and carbon footprint

Later in 2023, we will also add support for PowerSwitch.

Having Carbon Footprint enabled and KPIs exposed within CloudIQ is beneficial to internal stakeholders within an organization and allows you to make confident decisions when optimizing your environment.

Based on the Software-as-a-Service (SaaS) model and agile development methodology employed by CloudIQ, you’ll benefit from having access to new features as soon as they become available.

Most Dell Technologies products supported by CloudIQ leverage our call home functionality called SupportAssist / Secure Connect Gateway. Depending on the product, you will need to enable the CloudIQ feature, after which the CloudIQ dashboard will populate with data. 

For the full overview of the CloudIQ product, please see the detailed review whitepaper here.

Accessing and using Carbon Footprint

To access the Carbon Footprint feature in the CloudIQ dashboard, select Monitor > Carbon Footprint on the left-hand side of the CloudIQ console, as shown in the following figure.

On this screen, CloudIQ users with the CloudIQ Admin role will be able to adjust and personalize their geographical location metrics, such as CO2E and PUE, as illustrated in the following figure. The location labels reflect the specific locations where the physical assets are installed.

Side note: What do these metrics mean?

  • Carbon dioxide equivalent (CO2e) refers to the number of metric tons of CO2 emissions with the same global warming potential as one metric ton of another greenhouse gas. Other greenhouse gases, like methane, have different global warming potentials--a measurement of the potential impact a greenhouse gas has on global warming over a given period--compared to carbon dioxide. By converting all greenhouse gas emissions into CO2e units, it becomes easier to compare the impact of different types of emissions and to create strategies for reducing GHG emissions.
  • Power usage effectiveness (PUE) is used to determine the energy efficiency of a data center. The best PUE ratio is 1.0, indicating a perfectly efficient data center in which 100% of the facility's power is delivered to IT equipment. This means that no power is used for any other purpose in the facility such as cooling, lighting, or any other overhead that supports the equipment.

The Total Carbon Emissions CO2e section can be displayed using either a Bar Chart or a Line Chart. Simply select the gear wheel on the right-hand side and pick your preferred view.

The Total Carbon Emissions CO2e chart can increase or decrease based on how the system’s energy / emission factor / PUE changes over time. If new systems are added, the total will increase. Similarly, the total can decrease if power is capped (as is available for PowerEdge), workloads are reconciled, and/or some systems are shut down.

For larger environments with multiple assets, applying filters is a breeze. The following example shows the system filtered based on Unity arrays only.

This table displays several columns that represent the asset itself, its location, site name, etc. In addition, we show the following data points:

  • YTD Energy (kWh) - YTD value is from when power consumption data collection started, which may not have been the start of the calendar year
  • Energy Forecast (kWh) - Forecasted energy consumption at the end of the year (December 31st) in Kilowatts (kWh)
  • YTD CO2e (kg) - YTD value of carbon emissions measured when data collection started, which may not have been the start of the calendar year.
  • CO2e Forecast (kg) - Forecasted CO2 (carbon dioxide) equivalent produced at the end of the year (December 31st) in Kilograms (kg)

The entire table with all assets or a subset thereof can be exported into a CSV file.

To see more details for each of the assets and how they perform in comparison to historical data, select the details icon next to the asset itself. As displayed in the following figure, the two graphs will display data points over the last seven (7) days and forecasted data points for the next thirty (30) days. By toggling the radio button, you can switch from one view to another. The grey area shows a range based on historical data collected for the previous seven (7) days, and the blue line is charted based on the last seven (7) days. If the blue line is within the boundaries of the grey area, this means there have been no unforeseen changes in Energy Consumption or the CO2e in the past seven (7) days.

What we’ve shown here is just the beginning. We will continue improving and enhancing CloudIQ capabilities to ensure you enjoy the most relevant and accurate KPIs and can act upon them expeditiously.

Where do we go from here?

Dell has several other tools and assessments that you can benefit from. For example, we can run a tool called Live Optics that collects configuration and performance data from your Dell and third-party products and produces a report that can be analyzed to propose options like optimization, consolidation, or a refresh with a new technology. The other option is to request a #GetEfficient report, which will be directly focused on reduction in physical footprint and power consumption.

Resources

 

Author: Michael Aharon & Derek Barboza



Read Full Blog
  • CloudIQ
  • automation
  • Custom Reporting

Talking CloudIQ: Custom Reports with Custom Tags

Derek Barboza Derek Barboza

Tue, 13 Jun 2023 16:29:55 -0000

|

Read Time: 0 minutes

Introduction

Every organization must report on their IT infrastructure. Whether it be to provide an inventory of assets or determine resource utilization, CloudIQ custom reporting with custom tags helps automate this task, saving time and delivering these reports right to your inbox.

Custom tags

Custom tags are customer-specific metadata that you can enter into CloudIQ to identify resources with customer information, such as application name, service level, business unit, department, and so on. You can enter custom tags against the system or against components of a system. Examples of component tags include hosts, PowerMax storage groups, volumes, file systems, storage pools, and virtual machines. We can quickly see the benefit of applying an application name to a storage group, or a business unit to a virtual machine. By doing so, we can generate application-level reports or asset reports by department.

Figure 1.  Custom tags in the Storage Inventory View

Custom reports

Custom reports in CloudIQ can contain tables, charts, or a combination of both. Charts can be either common line charts or anomaly charts. Anomaly charts allow users to see unexpected activities in performance by charting the metric along with the expected range of the metric – which has been determined by CloudIQ’s machine learning algorithms.

Tables

Tables are available to provide lists of assets, code versions, contract information, capacity metrics, and average performance metrics. You can also take advantage of custom tags to either be included in the report or to be used as a filter to capture only those assets that meet your business needs, based on the values of those custom tags. For example, you can create a list of PowerEdge servers in a certain business unit with their BIOS and firmware versions, contract expiration dates, average power consumption, and service tags.

Figure 2.  Table showing a business unit’s custom tag

Line and anomaly charts

Perhaps you want to keep an eye on the performance profile of a critical storage system, tracking system bandwidth and IOPS looking for any unusual activity. With just a few clicks you can create the report to chart the metrics, along with the expected lower and upper bounds. A few additional clicks and you can schedule this report to be delivered to yourself or anyone else at the interval you choose. You can give this report a quick look to identify if there are any unusual spikes that could be from an unexpected workload or even from some type of malicious attack.

Figure 3.  Examples of performance anomaly charts

Conclusion

An IT infrastructure monitoring tool must be flexible and have automated ways to extract and report on assets, capacity, and performance in a meaningful way for your organization. By applying customer-specific metadata in the form of custom tags to assets in CloudIQ, you have the power to generate and automate the delivery of insightful and information rich custom reports to IT infrastructure stakeholders. Extracting the powerful information and machine learning data from CloudIQ allows you to efficiently maintain existing infrastructure and plan for future resource needs.

Resources

For a quick demo on custom reports and other CloudIQ features, see the CloudIQ videos section on the Info Hub.

For other informative blogs, see: Overview of CloudIQ, Proactive Health Scores, Capacity Monitoring and Planning, and Cybersecurity.

How do you become more familiar with Dell Technologies and CloudIQ? The Dell Technologies Info Hub site provides expertise that helps to ensure customer success with Dell Technologies platforms. We also have CloudIQ demos, white papers, and videos available at the Dell Technologies CloudIQ page. And feel free to reference the CloudIQ Overview White Paper which provides in-depth summary of CloudIQ.

Author: Derek Barboza, Senior Principal Engineering Technologist

Read Full Blog
  • PowerEdge
  • SRM
  • iDRAC

SolutionPack for iDRAC PowerEdge

Dejan Stojanovic Dejan Stojanovic

Wed, 01 Mar 2023 17:16:08 -0000

|

Read Time: 0 minutes

Summary

Dell Storage Resource Manager (SRM) provides comprehensive monitoring, reporting, and analysis for heterogeneous block, file, object, and virtualized storage environments. It enables you to visualize applications to storage dependencies, monitor, and analyze configurations and capacity growth. It has visibility into the environment’s physical and virtual relationships to ensure consistent service levels.

To enable storage administrators to monitor their physical and virtual compute environment, Dell provides SRM solution packs. These solution packs include SolutionPack for Physical Hosts, Microsoft Hyper-V, IBM LPAR, Brocade FC Switch and Cisco MDS/Nexus with passive host discovery options, VMware vSphere & vSAN, and Dell VxRail.

With the new SolutionPack for iDRAC PowerEdge, we can monitor the status of server hardware components such as power supplies, temperature probes, cooling fans, and battery. We can also gather historical information about electrical energy usage and other key performance indicators that measure the proper functioning of a server device.

SRM cross-domain functionality

To illustrate SRM’s cross-domain functionality, we examine the most common use case, where Dell PowerEdge physical servers are deployed as part of VMware hypervisor clusters.

SolutionPack for VMware vSphere & vSAN provides capacity, performance, and relationship data for all VMware discovered components, such as VMs, hypervisors, clusters, and datastores, as well as their relationship with fabric and backend storage arrays. Here is one example of the end-to-end topology of the virtualized environment:

Figure 1. Example of end-to-end topology of a virtualized environment

To gain physical access to the PowerEdge servers and their hardware components, we rely on integrated Dell Remote Access Controller (iDRAC), which is a baseboard management controller that is integrated in PowerEdge servers.

iDRAC exposes hardware components’ data through several APIs, one of them being SNMP. With SRM SNMP collector, which is part of the SolutionPack for iDRAC PowerEdge, we discover iDRACs from which we pull PowerEdge server data. This data includes electrical energy usage (Wh), probes temperature (C), power supply output (W), and cooling devices speed (RPM). It also includes status of power supplies, battery, cooling devices, temperature probes, and server availability. SRM provides historical reports for all the metrics, with a maximum 7-year data retention for weekly aggregates.

With the data available from the iDRAC PowerEdge, VMware vSphere & vSAN, and relevant fabric and storage array solution packs, users can seamlessly navigate from the context of physical server hardware component reports to the context of the physical server reports within the broader SAN environment.

SolutionPack for iDRAC PowerEdge data collection and alerts

Let’s examine the component status data, performance data, and alerts provided by the SolutionPack for iDRAC PowerEdge.

Status and performance data

Initial Card View and Table View

The Summary page Card View and Table View for PowerEdge servers show hardware components status (temperature probes, cooling devices, battery, power supply), server availability, daily electrical energy usage (kWh), energy cost ($), and daily carbon emission (kgCO2e). Energy cost and carbon footprint metrics are calculated based on server location. In the following example, we see significant difference in daily carbon emission between Poland and Germany, even though there is small difference in daily energy usage. The same applies to energy cost prices.

Figure 2. Card view of hardware component status

 

Figure 3. Table view of hardware component status (first 10 columns)

 

Figure 4. Table view of hardware component status (final columns—continuation of preceding figure)

Energy cost and carbon emissions per country are calculated dynamically based on data enrichment enabled on SRM collectors. Metrics collected from each iDRAC are automatically tagged with location, carbon intensity, and energy cost properties. Here is an example of data enrichment configuration from the SRM admin UI:

 Figure 5. SRM admin UI showing data enrichment configuration 

CSV files that contain values for energy cost and carbon intensity per country are available publicly and can be transferred automatically through FTP to SRM collectors as part of the data enrichment process. Here is a CSV file excerpt that contains kWh cost ($) per country:

Figure 6. Excerpt of kwh-cost-per-country CSV file

And here is a CSV file excerpt that contains carbon intensity per kWh per country:  Figure 7. Excerpt of carbon-intensity-by-country CSV file

The CSV file for data enrichment with device,location mapping is specific to every customer.

End-to-end topology map

From the initial Card View or Table View, you can drill down to the PowerEdge server end-to-end topology map. This is a host-based landing page where you can see the server’s relationship with the rest of the SAN components, as well as server attributes, performance, capacity, alerts, and inventory data. This is an example of SRM’s powerful capability to aggregate and visualize data from multiple contexts within the same reporting UI.

Figure 8. End-to-end topology map

iDRAC PowerEdge Inventory report

The iDRAC PowerEdge Inventory report shows servers’ hardware component names, quantities, server hostname, serial number, operating system version, model, and IP address:

Figure 9. Inventory report (first six columns)

Figure 10. Inventory report (final columns—continuation of preceding figure) 

Drilling down from the preceding table leads to the daily status dashboard of a selected server’s hardware components. Here are a few examples:


Figure 11. Status of cooling devices

Figure 12. Power supply output watts

  Figure 13. Energy usage (Wh) 

iDRAC PowerEdge Performance report

The iDRAC PowerEdge Performance report shows key metric values for servers’ hardware components, such as probes temperature (C), temperature lower and upper thresholds, cooling device (RPM), and cooling device critical and non-critical thresholds. Each selected row plots interactively historical performance data on the charts below the table, including server electrical energy usage (Wh), probes temperature (C), and cooling devices (RPM).

Figure 14. Trend chart—Electrical energy usage (Wh)

 

Figure 15. Trend chart—Probes temperature (C) values plotted alongside threshold values 

The following trend chart shows cooling device (RPM) values plotted on the same chart with the active alerts relevant to the cooling device. The alert is displayed as a black dot with pop-up details of the issue that caused the alert. This feature greatly improves troubleshooting and is another example of SRM’s powerful capability to aggregate and visualize data from multiple contexts within the same reporting UI.

Figure 16. Trend chart—Cooling device (RPM) values plotted on the same chart with the active alerts relevant to the cooling device

The following bar charts show Carbon Emission, Energy Cost ($), Cooling (RPM), Energy Usage (kWh), and Temperature (C) per location during the last month. You can drill down on each bar chart to see reports for each location to analyze the top 10 contributing items per device type (hypervisor, host) and per server.

Figure 17. Carbon Emission and Energy Cost bar charts

 

Figure 18. Energy Usage and Temperature bar charts

Alerts

The iDRAC PowerEdge Operations report shows currently active alerts received from iDRAC as SNMP traps. The solution ack contains 80 certified alert definitions that cover iDRAC System Health and Storage category alerts, including AmperageProbe, Battery, Cable, CMC, Fan, FC, LinkStatus, MemoryDevice, Network, OS, PhysicalDisk, PowerSupply, PowerUsage, TemperatureProbe, TemperatureStatistics, VoltageProbe, LiquidCoolingLeak, and others.

You can enable any or all alerts on each iDRAC under Configuration > System Settings > Alert Configuration > Alerts. You can configure SNMP trap receivers under Configuration > System Settings > Alert Configuration > SNMP Traps. In this case, the SNMP trap receiver is the SRM collector server.

Figure 19. Active alerts on iDRAC PowerEdge Operations report

By right-clicking an alert row, you can acknowledge, assign, close, take ownership of, or assign a ticket ID to the alert.

Figure 20. Acting on an alert

By clicking on an alert row, you can see a detailed report about the alert. Also, the SRM alerting module includes functionality to forward selected alerts to external applications, such as ServiceNow ITSM through a Webhook API or fault management applications through an SNMP trap or email.

You can navigate directly from the alerts report to the affected server’s landing page by clicking the device name link in the Device column of the All Alerts report. SRM relates alert-specific data with the time-series data originated from the same device and seamlessly navigates through corresponding reports. The following figure shows an affected server’s summary report with the topology and underlying Operations section showing the server’s active alerts.

Figure 21. Server summary report with topology and active alerts

Conclusion

SRM’s powerful framework allows storage administrators to easily integrate environmental data for PowerEdge physical servers into the existing end-to-end SAN inventory, performance, capacity, and alert reports. SRM reduces the time that is required to identify the cause of issues occurring in the data center.

With the new SolutionPack for iDRAC PowerEdge, administrators can monitor PowerEdge hardware components and obtain historical information about energy usage and other key performance indicators.

Supported platforms

The iDRAC PowerEdge Solution Pack supports:

  • Dell iDRAC MIB v4.3
  • Dell PowerEdge models listed at https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=96cdj#SupportedOs


Author: Dejan Stojanovic

Read Full Blog
  • CloudIQ
  • cybersecurity

Talking CloudIQ: Cybersecurity

Derek Barboza Derek Barboza

Mon, 20 Feb 2023 21:08:34 -0000

|

Read Time: 0 minutes

Introduction

This is the fourth in a series of blogs discussing CloudIQ. Previous blogs provide an overview of CloudIQ and discuss proactive health scores and capacity monitoring and planning. This blog discusses the cybersecurity feature in CloudIQ. Cyber-attacks have become a significant issue for all companies across all industries. The immediate economic consequences, combined with the longer-term impact of the loss of organizational reputation, can have both immediate and lasting effects.

Reduce risk

Misconfigurations of infrastructure systems can open your organization to cyber intrusion and is a leading threat to data security. The CloudIQ cybersecurity feature proactively monitors infrastructure security configurations for Dell PowerStore and PowerMax storage systems and PowerEdge servers, and notifies users of security risks. A risk level is assigned to each system, placing the system into one of four categories, depending on the number and severity of the issues: Normal, Low, Medium, or High.

Figure 1.  Cybersecurity system risk levels

When a security risk is found, remediation instructions are provided to help you address the issue as quickly as possible.

Figure 2.  Cybersecurity details with remediation

Security Advisory integration

CloudIQ evaluates outgoing Dell Security Advisories (DSAs) and intelligently notifies users when those advisories are applicable to their specific Dell system models with specific system software and firmware versions. This eliminates the need for users to investigate if a Security Advisory applies to their systems and allows them to immediately focus on remediation.

Figure 3.  Dell Security Advisory listing

Manage policy

By using CloudIQ Cybersecurity policy templates, users can quickly set up security configuration evaluation tests and assign them to large numbers of systems with just a few clicks. Once assigned, the test plan is evaluated against each associated system, and the system administrator is notified in minutes of any unwanted configuration settings.

Testing has shown that it takes less than 3 minutes to set policies and automate security configuration checking for 1 to 1,000 systems. That’s a dramatic time savings versus the 6 minutes that it would take to manually check each individual system’s security configuration.1

Figure 4.  Evaluation plan templates

Conclusion

Cybersecurity has clearly become a challenge and priority for companies of all sizes. With the large and growing number of systems distributed across core and edge locations, it is impractical for any IT organization to manually check those systems for misconfigurations. Dell CloudIQ eliminates manual checking by automating it and recommending how to quickly mitigate misconfiguration risks that can lead to unwanted intrusions threatening data security. With the intelligent evaluation of Dell Security Advisories, CloudIQ identifies applicable DSAs, further saving time and expediting remediation.

Resources

For additional cybersecurity related information, see the following documents:

How do you become more familiar with Dell Technologies and CloudIQ? The Dell Technologies Info Hub provides expertise that helps to ensure customer success with Dell platforms. We have CloudIQ demos, white papers, and videos available at the Dell Technologies CloudIQ page. Also, you can refer to the CloudIQ: A Detailed Review white paper, which provides in-depth summary of CloudIQ.


Author: Derek Barboza, Senior Principal Engineering Technologist


1Dell CloudIQ Cybersecurity for PowerEdge: The Benefits of Automation

 


Read Full Blog
  • Red Hat
  • containers
  • Kubernetes
  • OpenShift Container Platform
  • CSM

Dell Container Storage Modules 1.5 Release

Florian Coulombel Florian Coulombel

Thu, 12 Jan 2023 19:27:23 -0000

|

Read Time: 0 minutes

Made available on December 20th, 2022, the 1.5 release of our flagship cloud-native storage management products, Dell CSI Drivers and Dell Container Storage Modules (CSM), is here!


See the official changelog in the CHANGELOG directory of the CSM repository.

First, this release extends support for Red Hat OpenShift 4.11 and Kubernetes 1.25 to every CSI Driver and Container Storage Module.

Featured in the previous CSM release (1.4), avid customers may recall a few new additions to the portfolio made available in tech preview. Primarily:

  • CSM Application Mobility: Enables the movement of Kubernetes resources and data from one cluster to another no matter the source and destination (on-prem, co-location, cloud) and any type of backend storage (Dell or non-Dell)
  • CSM Secure: Allows for on-the-fly encryption of PV data
  • CSM Operator: Manages CSI and CSM as a single stack

Building on these three new modules, Dell Technologies is adding deeper capabilities and major improvements as part of today’s 1.5 release for CSM, including:

  • CSM Application Mobility: Users can now schedule backups
  • CSM Secure: Users can now “rekey” an encrypted PV
  • CSM Operator: Support added for Dell’s PowerFlex CSI Driver, the Authorization Proxy Server, and the CSM Observability module for Dell PowerFlex and Dell PowerScale

For the platform updates included in today’s 1.5 release, the major new features are:

  • It is now possible to set the Quality of Service of a Dell PowerFlex persistent volume. Two new parameters can be set in the StorageClass (bandwidthLimitInKbps and iopsLimit) to limit the consumption of a volume. Watch this short video to learn how it works.

  • For Dell PowerScale, when a Kubernetes node is decommissioned from the cluster, the NFS export created by the driver will “Ignore the Unresolvable Hosts” and clean them later.
  • Last but not least, when you have a Kubernetes cluster that runs on top of Virtual Machines backed by VMware, the CSI driver can mount FiberChannel attached LUNs.

This feature is named “Auto RDM over FC” in the CSI/CSM documentation.

The concept is that the CSI driver will connect to both Unisphere and vSphere API to create the respective objects.

 

When deployed with “Auto-RDM” the driver can only function in that mode. It is not possible to combine iSCSI and FC access within the same driver installation.

The same limitation applies for RDM usage. You can learn more about it at RDM Considerations and Limitations on the VMware website.

That’s all for CSM 1.5! Feel free to share feedback or send questions to the Dell team on Slack: https://dell-csm.slack.com.

Author: Florian Coulombel


Read Full Blog
  • Kubernetes
  • backup
  • PowerScale

Velero Backup to PowerScale S3 Bucket

Florian Coulombel Florian Coulombel

Fri, 23 Dec 2022 21:50:39 -0000

|

Read Time: 0 minutes

Velero is one of the most popular tools for backup and restore of Kubernetes resources.

You can use Velero for different backup options to protect your Kubernetes cluster. The three modes are:

  • Protect the Kubernetes resource objects such as Pod, Namespace, and so on, with CRDs included
  • Protect the PersistentVolume data with the help of VolumeSnapshot
  • Protect the content of the PVs with the help of restic

 In all cases, Velero syncs the information (YAML and restic data) to a storage object.

PowerScale is Dell Technologies’ leading scale-out NAS solution. It supports many different access protocols including NFS, SMB, HTTP, FTP, HDFS, and, in the case that interests us, S3!

Note: PowerScale is not 100% compatible with the AWS S3 protocol (for details, see the PowerScale OneFS S3 API Guide). 

For a simple backup solution of a few terabytes of Kubernetes data, PowerScale and Velero are a perfect duo.

Deployment

To deploy this solution, you need to configure PowerScale and then install and configure Velero.

PowerScale S3 configuration

Prepare PowerScale to be a target for the backup as follows:

  1. Make sure the S3 protocol is enabled.

You can check that in the UI under Protocols > Object Storage (S3) > Global Settings or in the CLI.

In the UI:

 

In the CLI:

PS1-1% isi s3 settings global view
         HTTP Port: 9020
        HTTPS Port: 9021
        HTTPS only: No
S3 Service Enabled: Yes

 

2.  Create a bucket with the permission to write objects (at a minimum).

That action can also be done from the UI or CLI.

In the UI:

In the CLI:

See isi S3 buckets create in the PowerScale OneFS CLI Command Reference.

3. Create a key for the user that will be used to upload the objects.

Important notes:

    • The username is the one indicated in the interface, not the one from the file system or provider (for example, here, the admin user is 1_admin_accid S3 user)
    • The key is only displayed upon creation and cannot be retrieved later. Be sure to copy it right away.

Now that PowerScale is ready, we can proceed with the Velero deployment.

Velero installation and configuration

We assume that the Velero binary is installed and has access to the Kubernetes cluster. If not, see the Velero installation document for the deployment instructions.

Configure Velero:

  1. Create a file with the credentials previously obtained from PowerScale.
    $ cat ~/credentials-velero
    [default]
    aws_access_key_id = 1_admin_accid
    aws_secret_access_key = 0**************************i
    …
  2. Optionally, obtain the PowerScale SSL certificate.
    In our case, the HTTPS endpoint uses a self-signed certificate, so we have to get it and pass it to Velero. Note that we can use HTTP protocol, and that step can be skipped at the cost of plain text data transit. For more information on the self-signed certificates in the context of Velero, see https://velero.io/docs/v1.9/self-signed-certificates/.

  3. Install Velero itself:
$ velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.5.1 \
    --bucket velero-backup \
    --secret-file ./credentials-velero \
    --use-volume-snapshots=false \
    --cacert ./ps2-cacert.pem \
    --backup-location-config region=powerscale,s3ForcePathStyle="true",s3Url=https://192.168.1.21:9021
…

 The preceding command shows how to use Velero most simplistically and securely.

It is possible to add parameters to enable protection with snapshots. Every Dell CSI driver has snapshot support. To take advantage of that support, we use the install command with this addition:

velero install \
--features=EnableCSI \
--plugins=velero/velero-plugin-for-aws:v1.5.1,velero/velero-plugin-for-csi:v0.3.0 \
--use-volume-snapshots=true
...

Now that CSI snaps are enabled, we can enable restic to move data out of those snapshots into our backup target by adding:

--use-restic

As you can see, we are using the velero/velero-plugin-for-aws:v1.5.1 image, which is the latest available at the time of the publication of this article. You can obtain the current version from GitHub: https://github.com/vmware-tanzu/velero-plugin-for-aws

After the Velero installation is done, check that everything is correct:

kubectl logs -n velero deployment/velero

If you have an error with the certificates, you should see it quickly.

You can now back up and restore your Kubernetes resources with the usual Velero commands. For example, to protect the entire Kubernetes except kube-system, including the data with PV snapshots:

velero backup create backup-all --exclude-namespaces kube-system

You can check the actual content directly from PowerScale file system explorer:

Here is a demo:


Conclusion

For easy protection of small Kubernetes clusters, Velero combined with PowerScale S3 is a great solution. If you are looking for broader features (for a greater amount of data or more destinations that go beyond Kubernetes), look to Dell PowerProtect Data Manager, a next-generation, comprehensive data protection solution.

Interestingly, Dell PowerProtect Data Manager uses the Velero plug-in to protect Kubernetes resources!

 

Resources

PowerScale OneFS S3 Overview

 

 

Read Full Blog
  • data analytics
  • data storage
  • CloudIQ
  • capacity planning

Talking CloudIQ: Capacity Monitoring and Planning

Derek Barboza Derek Barboza

Fri, 09 Dec 2022 15:37:42 -0000

|

Read Time: 0 minutes

Introduction

This is the third in a series of blogs discussing CloudIQ. In my first blog, I provided a high-level overview of CloudIQ and some of its key features. My second blog talked about the CloudIQ Proactive Health Score. I will continue the series with a discussion of the capacity monitoring and planning features in CloudIQ.

Planning ahead

Capacity monitoring helps you plan for expansions of storage arrays, data protection appliances, storage-as-a-service, and hyperconverged infrastructure (HCI) to help overcome unexpected spikes in storage consumption. CloudIQ uses advanced analytics to provide short-term capacity prediction analysis, longer-term capacity forecasting, and capacity anomaly detection. Capacity anomaly detection is the identification of a sudden surge in utilization that may result in a space full condition in less than 24 hours.

The CloudIQ Home page displays the Capacity Approaching Full tile which identifies storage entities that are full or expected to be full in each of the following time ranges:

  • Imminent (predicted to run out of space within 24 hours)
  • Full
  • Within a week
  • Within a month
  • Within a quarter

 

Figure 1.  The Capacity Approaching Full tile

In situations where there is a storage entity in the Imminent category, CloudIQ identifies the components of the entity that are experiencing the sudden increase in utilization. This gives users the necessary information about where to look to correct the offending behavior. In the following example, CloudIQ has identified a storage pool that is expected to run out of space in five hours. The pool details page identifies the file systems and LUNs that are the top contributors to the expected rise in utilization.

Figure 2.  Capacity Forecast for a pool that has a capacity anomaly

Two other CloudIQ features help you quickly find a solution for storage that is fast approaching full. First, there is the identification of reclaimable storage that shows you where you can recover unused capacity in a system. Second, there is the multisystem capacity view that lets you scan all your storage systems to pinpoint which have excess capacity to relieve approaching-full systems of their workloads.

Reclaimable storage

CloudIQ identifies different types of storage that are potentially reclaimable. The following criteria are used to identify reclaimable storage:

  • Block objects with no hosts attached
  • File objects with no front end I/O in the past week
  • Block objects with no front end I/O in the past week
  • Block-based virtual machines that have been shut down for the past week
  • File-based virtual machines that have been shut down for the past week

Users can quickly see the storage objects, where the object resides, and the amount of reclaimable space. The Last IO Time is provided for block and file objects that have no detected IO activity in the last week. For VMs that have been shut down for at least a week, the storage object on which the VM resides along with the vCenter and time that the VM was shut down is available. The following figure shows an example of reclaimable storage for block objects that have had no front-end IO activity in the past week.

Figure 3.  The Reclaimable Storage page – Block Objects with no front end IO activity

Multisystem capacity view

The multisystem capacity view provides a quick view of physical usable, used, free, and storage efficiencies across all storage, HCI, and data protection systems monitored by CloudIQ. This allows users to see quickly which systems are low on usable space, determine which systems are good targets for workload migration, and verify that their storage efficiencies and data reduction numbers are what they are expecting.

Figure 4.  Multisystem capacity view for storage

Storage system details

Detailed capacity views for storage systems and storage objects provide additional information, including data efficiencies and data reduction metrics. The following figure shows the physical and logical storage breakdown and data reduction charts for a PowerStore cluster.

Figure 5.  PowerStore cluster storage details

For APEX block storage service subscriptions, CloudIQ provides both subscribed and physical storage views. Subscribed views provide the storage usage including base and on-demand storage usage.

Figure 6.  APEX block storage services subscription view

Custom reports

With custom reports and the use of custom tags, users can create meaningful business reports and schedule those reports to be delivered to the required end users. Reports can include both line charts and tables and can be filtered on any field. The following figure shows a simple table that includes used and free capacities, data reduction values, and several custom tags.

Figure 7.  Custom report for storage

Conclusion

CloudIQ’s intelligence and predictive analytics helps users proactively manage and accurately plan data storage and workload expansions, and to act quickly to avoid rapidly approaching capacity full conditions. Custom reports and tagging allows users to create, schedule, and deliver reports with technical and business information tailored to a wide variety of stakeholders. And for users looking to integrate data from CloudIQ with existing IT management tools, CloudIQ provides a public REST API.

Resources

How do you become more familiar with Dell Technologies and CloudIQ? The Dell Technologies Info Hub site provides expertise that helps to ensure customer success with Dell Technologies platforms. We have CloudIQ demos, white papers, and videos available at the Dell Technologies CloudIQ page. Also, feel free to reference the CloudIQ Overview Whitepaper which provides an in-depth summary of CloudIQ. Interested in DevOps? Go to our public API page for information about integrating CloudIQ with other IT tools using Webhooks or REST API.

Author: Derek Barboza, Senior Principal Engineering Technologist

Read Full Blog
  • CloudIQ
  • REST API
  • BigPanda

Integrating CloudIQ Webhooks with BigPanda Events

Derek Barboza Derek Barboza

Tue, 22 Nov 2022 17:27:08 -0000

|

Read Time: 0 minutes

This tutorial blog demonstrates how to use CloudIQ Webhooks to integrate CloudIQ health notifications with BigPanda (https://www.bigpanda.io/), an event management processing tool. This allows users to integrate CloudIQ notifications with events from other IT tools into BigPanda. We will show how to create a REST API Integration in BigPanda and provide an example of intermediate code that uses Google Cloud functions to process Webhooks.

BigPanda overview

BigPanda offers a solution that has a modern twist on event management process. The main product consists of a fully customizable cloud-hosted event management console for event integration, reporting, correlation, and enrichment.

Webhook overview

A CloudIQ Webhook is a notification that is sent when a health issue changes. CloudIQ sends the Webhook notification when a new or resolved health issue is identified in CloudIQ. A Webhook is an HTTP post composed of a header and JSON payload that is sent to a user configured destination. Webhooks are available under the Admin > Integrations menu in the CloudIQ UI. Users must have the CloudIQ DevOps role to access the Integrations menu.

Webhook event details

A Webhook consists of data in the header and the payload. The header includes control information; the payload is a JSON data structure that includes useful details about the notification and the health issue. Examples of the header and payload JSON files can be found here.

BigPanda integration

In CloudIQ, we enable Webhook integration by configuring a name, destination, and the secret to sign the payload.

In BigPanda, we have a couple of possibilities for third-party integration:

In our example, we use the REST API. Note that some of the requirements of the Open Integration Hub (alert severity, configurable application key, and so on) are not configurable today in CloudIQ Webhooks.

Architecture

The main challenge when integrating CloudIQ health events with BigPanda alerts is implementing a mapping function to translate CloudIQ fields to BigPanda fields.

To do this, we will use a serverless function to:

  • Receive the health event from a CloudIQ Webhook trigger
  • Convert the CloudIQ health event to a BigPanda alert
  • Post that alert to BigPanda

In this integration, the serverless function is a Google Cloud Function. Any other serverless framework can work.

  

Create a BigPanda REST application

The first step is to create an application for integration in BigPanda. Do the following:

1. Log into the BigPanda console.

2. Click the Integrations button at the top of the console.

3. Click the blue New Integration button.

4. Select Alerts Rest API (the first card).

5. Set an integration name, then click Generate App Key.

6. Save the generated app key and bearer token.

If you forgot to save the “application key” or “token”, you can obtain them later by selecting `Review Instructions`.

Note that the “application key” and “token” will be needed later to configure the trigger to post data to that endpoint.

Create the GCP Cloud function

This step is very similar to what has been presented in the CloudIQ to Slack tutorial. The only changes are that we are using a golang runtime and we store the authentication token in a secret instead of in a plain text environment variable.

  1. Select Create Secret from the Secret Manager.

2.  Provide a name (BP_TOKEN in this example).

3.  Paste the Authorization token from the HTTP headers section of the BigPanda integration into the ‘Secret value’ field.

4.  Select Create Function and provide a function name (ciq-bigpanda-integration in this example).

5.  Under the Trigger section, keep a trigger type of HTTP and select Allow unauthenticated invocations

6.  Take note of the Trigger URL because it will be used as the Payload URL when configuring the Webhook in CloudIQ. 

7.  Select SAVE

8.  Expand the RUNTIME, BUILD AND CONNECTIONS SETTINGS section.

9.  Under the RUNTIME tab, click the + ADD VARIABLE button to create the following variable:
BP_APP_KEY. The value is set to the application key obtained after creating the BigPanda integration.

10. Select the SECURITY AND IMAGE REPO tab.

11. Select REFERENCE A SECRET.

12. Select the BP_TOKEN secret from the pulldown.

13. Select Exposed as environment variable from the Reference Method pulldown.

14. Enter BP_TOKEN as the environment variable name.

15. Select DONE, then click Next.

16. Select Go 1.16 from the Runtime pulldown.

17. Change the Entry point to CiqEventToBigPandaAlert.

18. Replace the code for function.go with the example function.go code.

19. Replace the go.mod with the example go.mod code.

20. Select DEPLOY.

Implement the Mapping

Using Go's static typing first approach, we have clearly defined `struct` for the input (`CiqHealthEvent`) and output (`BigPandaAlerts`).

Most of the logic consists of mapping one field to the other.

func CiqEventMapping(c *CiqHealthEvent, bp *BigPandaClient) *BigPandaAlerts {
        log.Println("mapping input CloudIQ event: ")
        log.Printf("%+v", c)
        alert := BigPandaAlerts{
               AppKey:  bp.AppKey,
               Cluster: "CloudIQ",
               Host:    c.SystemName,
        }
        if len(c.NewIssues) > 0 {
               for _, v := range c.NewIssues {
                       alert.Alerts = append(alert.Alerts, BigPandaAlert{
                               Status:             statusForScore(c.CurrentScore),
                               Timestamp:          c.Timestamp,
                               Host:               c.SystemName,
                               Description:        v.Description,
                               Check:              v.RuleID,
                               IncidentIdentifier: v.ID,
                       })
               }
        }
        return &alert
}

Two things to note here:

1. Because CloudIQ doesn't have the notion of severity, we convert the score to a status using the code below.

2. CloudIQ has an event identifier that will help to deduplicate the alert in BigPanda or reopen a closed event in case of a re-notify.

// BigPanda status values: ok,ok-suspect,warning,warning-suspect,critical,critical-suspect,unknown,acknowledged,oksuspect,warningsuspect,criticalsuspect,ok_suspect,warning_suspect,critical_suspect,ok suspect,warning suspect,critical suspect
func statusForScore(s int) string {
        if s == 100 {
               return "ok"
        } else if s <= 99 && s > 95 {
               return "ok suspect"
        } else if s <= 95 && s > 70 {
               return "warning"
        } else if s <= 70 {
               return "critical"
        } else {
               return "unknown"
        }
}

Build

Behind the scenes, the GCP Cloud Functions are built and executed as a container. To develop and test the code locally (instead of doing everything in the GCP Console), we can develop locally and then build the package using buildpack (https://github.com/googlecloudplatform/buildpacks) as GCP does:

pack build \
      --builder gcr.io/buildpacks/builder:v1 \
      --env GOOGLE_RUNTIME=go \
      --env GOOGLE_FUNCTION_SIGNATURE_TYPE=http \
      --env GOOGLE_FUNCTION_TARGET=ciq-bigpanda-integration \
      ciq-bigpanda-integration

Run

After the build is successful, we can test it with something similar to:

docker run --rm -p 8080:8080 -e BP_TOKEN=xxxxx -e BP_APP_KEY=yyyyy ciq-bigpanda-integration

Alternatively, you can create a “main.go” and run it with:

FUNCTION_TARGET=CiqEventToBigPandaAlert go run cmd/main.go

Deploy

Users can choose to deploy the function outside of the GCP console. You can publish it with:

cloud functions deploy ciq-bigpanda-integration --runtime go116 --entry-point ciq-bigpanda-integration --trigger-http --allow-unauthenticated


Configure CloudIQ

It is time to point the CloudIQ Webhook to the GCP Function trigger URL. From the Admin > Integrations menu in CloudIQ, go to the Webhooks tab.

  1. Click Add Webhook.
  2. Enter a Name for the Webhook.
  3. Enter the Payload URL. This is the Trigger URL from the GCP Function.
  4. Because we did not use a Webhook secret, enter any text.
  5. Click ADD WEBHOOK to save the configuration.


Testing

From CloudIQ

To ease the simulation of a Webhook event, go to the CloudIQ Integration and click the TEST WEBHOOK button. This sends a ping request to the destination. You can also go to CloudIQ and redeliver an existing event.

Easy post script

For an actual event and not just a `ping`, use the `easy_post.sh` script after configuring the appropriate ENDPOINT.

#!/bin/bash

HEADERS_FILE=${HEADERS_FILE-./headers.json}
PAYLOAD_FILE=${PAYLOAD_FILE-./payload.json}
ENDPOINT=${ENDPOINT-https://webhook.site/6fd7d650-1b5b-4b8c-9781-2043005bdf2d}
mapfile -t HEADERS < <(jq -r '. | to_entries[] | "-H \(.key):\(.value)"'< ${HEADERS_FILE})
curl -k -H "Content-Type: application/json" ${HEADERS[@]} --request POST --data @${PAYLOAD_FILE} ${ENDPOINT}

Conclusion

If everything flows correctly, you will see the health alerts delivered to the BigPanda console. This allows users to consolidate CloudIQ notificaitons with events from other IT tools into a single monitoring interface.

Resources

Author: Derek Barboza

Read Full Blog
  • data storage
  • Kubernetes
  • CSI
  • Microsoft Azure Arc

Dell Container Storage Modules—A GitOps-Ready Platform!

Florian Coulombel Florian Coulombel

Mon, 26 Sep 2022 15:17:45 -0000

|

Read Time: 0 minutes

One of the first things I do after deploying a Kubernetes cluster is to install a CSI driver to provide persistent storage to my workloads; coupled with a GitOps workflow; it takes only seconds to be able to run stateful workloads.

The GitOps process is nothing more than a few principles:

  • Git as a single source of truth
  • Resource explicitly declarative
  • Pull based 

Nonetheless, to ensure that the process runs smoothly, you must make certain that the application you will manage with GitOps complies with these principles.

This article describes how to use the Microsoft Azure Arc GitOps solution to deploy the Dell CSI driver for Dell PowerMax and affiliated Container Storage Modules (CSMs).

The platform we will use to implement the GitOps workflow is Azure Arc with GitHub. Still, other solutions are possible using Kubernetes agents such as Argo CD, Flux CD, and GitLab.

Azure GitOps itself is built on top of Flux CD.

Install Azure Arc

The first step is to onboard your existing Kubernetes cluster within the Azure portal.

Obviously, the Azure agent will connect to the Internet. In my case, the installation of the Arc agent fails from the Dell network with the error described here: https://docs.microsoft.com/en-us/answers/questions/734383/connect-openshift-cluster-to-azure-arc-secret-34ku.html

Certain URLs (even when bypassing the corporate proxy) don't play well when communicating with Azure. I have seen some services get a self-signed certificate, causing the issue.

The solution for me was to put an intermediate transparent proxy between the Kubernetes cluster and the corporate cluster. That way, we can have better control over the responses given by the proxy.

 

In this example, we install Squid on a dedicated box with the help of Docker. To make it work, I used the Squid image by Ubuntu and made sure that Kubernetes requests were direct with the help of always_direct:

docker run -d --name squid-container ubuntu/squid:5.2-22.04_beta ; docker cp squid-container:/etc/squid/squid.conf ./ ; egrep -v '^#' squid.conf > my_squid.conf
docker rm -f squid-container

Then add the following section:

acl k8s        port 6443        # k8s https
always_direct allow k8s

You can now install the agent per the following instructions: https://docs.microsoft.com/en-us/azure/azure-arc/kubernetes/quickstart-connect-cluster?tabs=azure-cli#connect-using-an-outbound-proxy-server.

export HTTP_PROXY=http://mysquid-proxy.dell.com:3128
export HTTPS_PROXY=http://mysquid-proxy.dell.com:3128
export NO_PROXY=https://kubernetes.local:6443
 
az connectedk8s connect --name AzureArcCorkDevCluster \
                        --resource-group AzureArcTestFlorian \
                        --proxy-https http://mysquid-proxy.dell.com:3128 \
                        --proxy-http http://mysquid-proxy.dell.com:3128 \
                        --proxy-skip-range 10.0.0.0/8,kubernetes.default.svc,.svc.cluster.local,.svc \
                        --proxy-cert /etc/ssl/certs/ca-bundle.crt

If everything worked well, you should see the cluster with detailed info from the Azure portal:

 

Add a service account for more visibility in Azure portal

To benefit from all the features that Azure Arc offers, give the agent the privileges to access the cluster.

The first step is to create a service account:

kubectl create serviceaccount azure-user
kubectl create clusterrolebinding demo-user-binding --clusterrole cluster-admin --serviceaccount default:azure-user
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: azure-user-secret
  annotations:
    kubernetes.io/service-account.name: azure-user
type: kubernetes.io/service-account-token
EOF

Then, from the Azure UI, when you are prompted to give a token, you can obtain it as follows:

kubectl get secret azure-user-secret -o jsonpath='{$.data.token}' | base64 -d | sed $'s/$/\\\n/g'

Then paste the token in the Azure UI.

Install the GitOps agent

The GitOps agent installation can be done with a CLI or in the Azure portal.

As of now, the Microsoft documentation presents in detail the deployment that uses the CLI; so let's see how it works with the Azure portal:

Organize the repository

The Git repository organization is a crucial part of the GitOps architecture. It hugely depends on how internal teams are organized, the level of information you want to expose and share, the location of the different clusters, and so on.

In our case, the requirement is to connect multiple Kubernetes clusters owned by different teams to a couple of PowerMax systems using only the latest and greatest CSI driver and affiliated CSM for PowerMax.

Therefore, the monorepo approach is suited.

The organization follows this structure:

.

├── apps

│   ├── base

│   └── overlays

│       ├── cork-development

│       │   ├── dev-ns

│       │   └── prod-ns

│       └── cork-production

│           └── prod-ns

├── clusters

│   ├── cork-development

│   └── cork-production

└── infrastructure

    ├── cert-manager

    ├── csm-replication

    ├── external-snapshotter

    └── powermax

  • apps: Contains the applications to be deployed on the clusters.
  • We have different overlays per cluster.
  • cluster: Usually contains the cluster-specific Flux CD main configuration; using Azure Arc, none is needed.
  • Infrastructure: Contains the deployments that are used to run the infrastructure services; they are common to every cluster.
    • cert-manager: Is a dependency of powermax reverse-proxy
    • csm-replication: Is a dependency of powermax to support SRDF replication
    • external-snapshotter: Is a dependency of powermax to snapshot
    • powermax: Contains the driver installation

 You can see all files in https://github.com/coulof/fluxcd-csm-powermax.

Note: The GitOps agent comes with multi-tenancy support; therefore, we cannot cross-reference objects between namespaces. The Kustomization and HelmRelease  must be created in the same namespace as the agent (here, flux-system) and have a corresponding targetNamespace to the resource to be installed.

Conclusion

This article is the first of a series exploring the GitOps workflow. Next, we will see how to manage application and persistent storage with the GitOps workflow, how to upgrade the modules, and so on.

Resources

 

Read Full Blog
  • data storage
  • CSI
  • PowerScale

Network Design for PowerScale CSI

Sean Zhan Florian Coulombel Sean Zhan Florian Coulombel

Tue, 23 Aug 2022 17:09:57 -0000

|

Read Time: 0 minutes

Network connectivity is an essential part of any infrastructure architecture. When it comes to how Kubernetes connects to PowerScale, there are several options to configure the Container Storage Interface (CSI). In this post, we will cover the concepts and configuration you can implement.

The story starts with CSI plugin architecture.

CSI plugins

Like all other Dell storage CSI, PowerScale CSI follows the Kubernetes CSI standard by implementing functions in two components.

  • CSI controller plugin
  • CSI node plugin

The CSI controller plugin is deployed as a Kubernetes Deployment, typically with two or three replicas for high-availability, with only one instance acting as a leader. The controller is responsible for communicating with PowerScale, using Platform API to manage volumes (to PowerScale it’s to create/delete directories, NFS exports, and quotas), to update the NFS client list when a Pod moves, and so on.

A CSI node plugin is a Kubernetes DaemonSet, running on all nodes by default. It’s responsible for mounting the NFS export from PowerScale, to map the NFS mount path to a Pod as persistent storage, so that applications and users in the Pod can access the data on PowerScale.

Roles, privileges, and access zone

Because CSI needs to access both PAPI (PowerScale Platform API) and NFS data, a single user role typically isn’t secure enough: the role for PAPI access will need more privileges than normal users.

According to the PowerScale CSI manual, CSI requires a user that has the following privileges to perform all CSI functions:

Privilege

Type

ISI_PRIV_LOGIN_PAPI

Read Only

ISI_PRIV_NFS

Read Write

ISI_PRIV_QUOTA

Read Write

ISI_PRIV_SNAPSHOT

Read Write

ISI_PRIV_IFS_RESTORE

Read Only

ISI_PRIV_NS_IFS_ACCESS

Read Only

ISI_PRIV_IFS_BACKUP

Read Only

Among these privileges, ISI_PRIV_SNAPSHOT and ISI_PRIV_QUOTA are only available in the System zone. And this complicates things a bit. To fully utilize these CSI features, such as volume snapshot, volume clone, and volume capacity management, you have to allow the CSI to be able to access the PowerScale System zone. If you enable the CSM for replication, the user needs the ISI_PRIV_SYNCIQ privilege, which is a System-zone privilege too.

By contrast, there isn’t any specific role requirement for applications/users in Kubernetes to access data: the data is shared by the normal NFS protocol. As long as they have the right ACL to access the files, they are good. For this data accessing requirement, a non-system zone is suitable and recommended.

These two access zones are defined in different places in CSI configuration files:

  • The PAPI access zone name (FQDN) needs to be set in the secret yaml file as “endpoint”, for example “f200.isilon.com”.
  • The data access zone name (FQDN) needs to be set in the storageclass yaml file as “AzServiceIP”, for example “openshift-data.isilon.com”.

If an admin really cannot expose their System zone to the Kubernetes cluster, they have to disable the snapshot and quota features in the CSI installation configuration file (values.yaml). In this way, the PAPI access zone can be a non-System access zone.

The following diagram shows how the Kubernetes cluster connects to PowerScale access zones.

Network

Normally a Kubernetes cluster comes with many networks: a pod inter-communication network, a cluster service network, and so on. Luckily, the PowerScale network doesn’t have to join any of them. The CSI pods can access a host’s network directly, without going through the Kubernetes internal network. This also has the advantage of providing a dedicated high-performance network for data transfer.

For example, on a Kubernetes host, there are two NICs: IP 192.168.1.x and 172.24.1.x. NIC 192.168.1.x is used for Kubernetes, and is aligned with its hostname. NIC 172.24.1.x isn’t managed by Kubernetes. In this case, we can use NIC 172.24.1.x for data transfer between Kubernetes hosts and PowerScale.

Because by default, the CSI driver will use the IP that is aligned with its hostname, to let CSI recognize the second NIC 172.24.1.x, we have explicitly set the IP range in “allowedNetworks” in the values.yaml file in the CSI driver installation. For example:

allowedNetworks: [172.24.1.0/24]

Also, in this network configuration, it’s unlikely that the Kubernetes internal DNS can resolve the PowerScale FQDN. So, we also have to make sure the “dnsPolicy” has been set to “ClusterFirstWithHostNet” in the values.yaml file. With this dnsPolicy, the CSI pods will reach the DNS server in /etc/resolv.conf in the host OS, not the internal DNS server of Kubernetes.

The following diagram shows the configuration mentioned above:

Please note that the “allowedNetworks” setting only affects the data access zone, and not the PAPI access zone. In fact, CSI just uses this parameter to decide which host IP should be set as the NFS client IP on the PowerScale side.

Regarding the network routing, CSI simply follows the OS route configuration. Because of that, if we want the PAPI access zone to go through the primary NIC (192.168.1.x), and have the data access zone to go through the second NIC (172.24.1.x), we have to change the route configuration of the Kubernetes host, not this parameter.

Hopefully this blog helps you understand the network configuration for PowerScale CSI better. Stay tuned for more information on Dell Containers & Storage!

Authors: Sean Zhan, Florian Coulombel

Read Full Blog
  • data storage
  • CloudIQ

Talking CloudIQ: Proactive Health Scores

Derek Barboza Derek Barboza

Fri, 05 Aug 2022 20:29:33 -0000

|

Read Time: 0 minutes

Introduction

This is the second in a series of blogs discussing CloudIQ. In my first blog, I provided a high-level overview of CloudIQ and some of its key features. I will continue with a series of blogs, each talking about one of the key features in more detail. This blog discusses one of CloudIQ’s key differentiating features: the Proactive Health Score.

Proactive Health Score

The Proactive Health Score uses various factors to provide a consolidated view of a system’s health into a single health score. Health scores are based on up to five categories: Components, Configuration, Capacity, Performance, and Data Protection. Based on the resulting health score, the system is put into one of three risk categories: Poor, Fair, or Good. The score starts at 100 and is reduced by the issue with the highest deduction.

A system in the Poor category has a score of 0 to 70 and poses an imminent critical risk. It could be a storage pool that is overprovisioned and full, meaning that systems will be trying to write to storage that is unavailable. Or it could be a significant component failure. Whatever the issue, it is something that requires your immediate attention.  

A system in the Fair category has a score of 71 to 94. Systems in this category have an issue that should be looked at, but certainly not something that requires you to get out of bed at 3:00am to address immediately. It could be something like a storage pool predicted to be full in a week or a system inlet temperature that exceeds the upper warning threshold on a PowerEdge server.

A system in the Good category has a score of 95 to 100 and is doing fine. There may be a minor issue that you need to look at, but nothing significant that is expected to cause any near-term problems. An example would be a fibre port with a warning status on a Connectrix switch.

Now what happens if there are multiple issues on a system? We hinted at this earlier. The score is only affected by the most critical issue. Let’s say that there are four issues on a system: one 30-point deduction, one 10-point deduction, and two 5-point deductions. In this case, the health score is 70. When the 30-point deduction is addressed, the score would become 90. We do this to prevent a system with several minor issues from appearing at high risk or at a higher risk than a system with a significant issue. 

Figure 1.  System Health page

Recommended resolution

So now that we have been notified of an issue on a system, what do we do next? Well, with CloudIQ, we will offer up recommended remediation actions to address the issue before it has a significant impact on the environment. This may come in the form of a recommended configuration change or other action, a knowledge base article with a resolution, or some commands to run to gain the necessary information to resolve the issue.

Figure 2.  Recommended remediation

Health Score History

CloudIQ also tracks the history of the Proactive Health Score. We can see both new and resolved issues along a chart with a selectable date range. Details of the issues are listed below the chart. By providing the history of the health score, CloudIQ allows users to identify possible recurring issues in the environment.

Figure 3.  Health Score history

Notifications

What if we do not want to log in to CloudIQ on a daily or weekly basis to check our systems? We can easily be notified by email any time a system health change occurs. These notifications can be set up for a configurable set of systems, allowing users only to receive notifications for those systems for which they are responsible.

For the more motivated user, CloudIQ supports Webhooks. With this feature, users can send a Webhook for any health change notification to integrate with third-party tools such as ServiceNow, Slack, or Teams. Webhooks are sent for both open and closed issues with a unique identifier. This allows users to correlate the resolved issue with the open issue to automatically close out any created incident. Some Webhook integration examples can be found here.

Conclusion

Whether it be for storage, networking, hyperconverged, servers, or data protection, the Proactive Health Score summarizes the health of a system into a single number, providing an immediate indication of the status of each system. Developed in tandem with experts from each product team, any issues identified for a system are accompanied by recommended remediation to help with self-service and quickly reduce risk. And with email notifications and Webhooks, users can be notified proactively any time an issue is identified.

Resources

How do you become more familiar with Dell Technologies and CloudIQ? The Dell Technologies Info Hub site provides expertise that helps to ensure customer success with Dell Technologies platforms. We have CloudIQ demos, white papers, and videos available at the Dell Technologies CloudIQ page. Also, feel free to reference the CloudIQ Overview Whitepaper which provides an in-depth summary of CloudIQ. Interested in DevOps? Go to our public API page for information about integrating CloudIQ with other IT tools using Webhooks or REST API.

Stay tuned for my next blog, where I'll talk about capacity forecasting and capacity anomaly detection in CloudIQ.

Author: Derek Barboza, Senior Principal Engineering Technologist

Read Full Blog
  • data management
  • SRM
  • Storage Resource Manager

Explore Real-World Cases with the Dell SRM Interactive Demo

Dejan Stojanovic Dejan Stojanovic

Thu, 17 Nov 2022 15:04:10 -0000

|

Read Time: 0 minutes

Summary

At Dell Technologies, we are proud to announce a new interactive demo for Storage Resource Manager (SRM), located here:

This interactive demo is based on the SRM release 4.7.0.0, which introduces several new features, enhancements, and platform supports.

Interactive Demo Info

The landing page of the interactive demo provides a summary of the use cases and features covered. This demo has the same look and feel as the actual HTML-based SRM user interface, where you can scroll up and down the page and click on each page object.

 

Dell SRM provides insight into data center operations from application to storage. Through automated discovery and reporting, Dell SRM breaks down the silos. Its simple use-case driven user interface simplifies tasks such as:

  1. Capacity Planning 
  2. Performance Analysis 
  3. Configuration Compliance 
  4. Chargeback
  5. Workload Analysis

There are eight independent interactive demo modules available, each of which covers a main SRM use case or feature:

  1. Enterprise Capacity Dashboard
  2. Capacity Planning What-If Scenario
  3. Performance Analysis - Host to LUN Troubleshooting
  4. Topology and End-To-End Relationships
  5. Chargeback Report by VirtualMachine
  6. Configuration Compliance Policies
  7. Configuration Compliance What-if Analysis
  8. Custom Report Wizard

Sample Screens from Interactive Demos

Here is a peek inside each of the eight demo modules:

1. Enterprise Capacity Dashboard



2. Capacity Planning What-If Scenario

 


3. Performance Analysis - Host to LUN Troubleshooting

 


 4. Topology and End-To-End Relationships



 5. Chargeback Report by VirtualMachine

 


6. Configuration Compliance Policies



7. Configuration Compliance What-if Analysis 



8. Custom Report Wizard


Supported Platforms

The data that is available in this comprehensive eight module demo is from the following supported vendors and technologies:

  • amazon s3
  • brocade
  • chargeback
  • cisco mds 
  • cisco ucs 
  • dell centera
  • dell datadomain
  • dell dpa
  • dell ecs
  • dell powerflex
  • dell powerscale 
  • dell powerstore
  • dell recoverpoint 
  • dell sc
  • dell unity/vnx
  • dell vmax/pmax
  • dell vplex 
  • dell vxrail 
  • dell xtremio
  • hitachi device manager
  • hp3par storeserv
  • hp storageworks
  • hpe-nimble
  • huawei 
  • ibm ds (3k/4k/5k)
  • ibm flashsystem
  • ibm-lpar
  • ibm-svc 
  • ibm-xiv
  • ms azure
  • ms-hyper-v 
  • ms sql server
  • netapp
  • oracle mysql
  • physical hosts
  • pure storage
  • vmware vsphere & vsan

 

Enjoy this demo and let us know how you like it!

Resources

Demo: End-to-End Monitoring Across the Data Center: Real-World Cases with Dell Storage Resource Manager

Author: Dejan Stojanovic

Read Full Blog
  • PowerMax
  • XtremIO
  • data storage
  • SC Series
  • PowerStore
  • PowerScale
  • CloudIQ
  • PowerVault
  • Dell Unity XT

CloudIQ: Cloud-based Monitoring for your Dell Technologies IT Environment

Derek Barboza Derek Barboza

Wed, 25 May 2022 19:49:28 -0000

|

Read Time: 0 minutes

Introduction

CloudIQ is Dell’s cloud-based AIOps application for monitoring Dell core, edge, and cloud. Born out of the Dell Unity storage product group several years ago, CloudIQ has quickly grown to cover a broad range of Dell Technologies products. With the latest addition of PowerSwitch, CloudIQ now covers Dell’s entire infrastructure portfolio, including compute, networking, CI/HCI, data protection, and storage systems.

According to a survey conducted last year, IT organizations were able to resolve infrastructure issues two to ten times faster and save a day per week on average with CloudIQ.1

Supported Platforms

  • Storage: PowerStore, PowerMax, PowerScale, PowerVault, Dell Unity XT, Dell Unity, SC Series, XtremIO, VMAX, and Isilon
  • Converged & HyperConverged: VxBlock, VxRail, and PowerFlex
  • Networking: PowerSwitch and Connectrix
  • Data Protection: PowerProtect DD Series, PowerProtect DD Virtual Edition, and PowerProtect Data Manager
  • APEX Data Storage Services
  • VMware integration

Figure 1.   CloudIQ Supported Platforms

Key Features

CloudIQ has a variety of innovative features based on machine learning and other algorithms that help you reduce risk, plan ahead, and improve productivity. These features include the proactive health score, performance impact and anomaly detection, workload contention identification, capacity forecasting and anomaly detection, cybersecurity monitoring, reclaimable storage identification, and VMware integration.

With custom reporting features, Webhooks, and a REST API, you can integrate data from CloudIQ into ticketing, collaboration, and automation tools and processes that you use in day-to-day IT operations.

Best of all, CloudIQ comes with your standard Dell ProSupport and ProSupport Plus contracts at no extra cost.

Keep an eye out for follow up blogs discussing CloudIQ’s key features in more detail!

Figure 2.    CloudIQ Overview Page

Conclusion

With the addition of PowerSwitch support, CloudIQ now gives users the ability to monitor the full range of their Dell Technologies IT infrastructure from a single user interface. And the fact that it is a cloud offering hosted in a secure Dell IT environment means that it is accessible from virtually anywhere. Simply open a web browser, point to https://cloudiq.dell.com, and log in with your Dell support credentials. As a cloud-based application, it also means that you always have access to the latest features because CloudIQ’s agile development process allows for continuous and seamless updates without any effort from you. There is also a mobile app, so you can take it anywhere.

Resources

How do you become more familiar with Dell Technologies and CloudIQ? The Dell Technologies Info Hub site provides expertise that helps to ensure customer success with Dell Technologies platforms. We have CloudIQ demos, white papers, and videos available at the Dell Technologies CloudIQ page. Also, feel free to reference the CloudIQ Overview Whitepaper which provides an in-depth summary of CloudIQ.

[1] Based on a Dell Technologies survey of CloudIQ users conducted May-June 2021. Actual results may vary.

Author: Derek Barboza, Senior Principal Engineering Technologist

Read Full Blog
  • containers
  • data storage
  • Kubernetes
  • CSI

How to Build a Custom Dell CSI Driver

Florian Coulombel Florian Coulombel

Wed, 20 Apr 2022 21:28:38 -0000

|

Read Time: 0 minutes

With all the Dell Container Storage Interface (CSI) drivers and dependencies being open-source, anyone can tweak them to fit a specific use case.

This blog shows how to create a patched version of a Dell CSI Driver for PowerScale.

The premise

As a practical example, the following steps show how to create a patched version of Dell CSI Driver for PowerScale that supports a longer mounted path.

The CSI Specification defines that a driver must accept a max path of 128 bytes minimal:

// SP SHOULD support the maximum path length allowed by the operating
// system/filesystem, but, at a minimum, SP MUST accept a max path
// length of at least 128 bytes.

Dell drivers use the gocsi library as a common boilerplate for CSI development. That library enforces the 128 bytes maximum path length.

The PowerScale hardware supports path lengths up to 1023 characters, as described in the File system guidelines chapter of the PowerScale spec. We’ll therefore build a csi-powerscale driver that supports that maximum length path value.

Steps to patch a driver

Dependencies

The Dell CSI drivers are all built with golang and, obviously, run as a container. As a result, the prerequisites are relatively simple. You need: 

  • Golang (v1.16 minimal at the time of the publication of that post)
  • Podman or Docker
  • And optionally make to run our Makefile

Clone, branch, and patch

The first thing to do is to clone the official csi-powerscale repository in your GOPATH source directory.

cd $GOPATH/src/github.com/
git clone git@github.com:dell/csi-powerscale.git dell/csi-powerscale
cd dell/csi-powerscale

You can then pick the version of the driver you want to patch; git tag gives the list of versions.

In this example, we pick the v2.1.0 with git checkout v2.1.0 -b v2.1.0-longer-path.

The next step is to obtain the library we want to patch.

gocsi and every other open-source component maintained for Dell CSI are available on https://github.com/dell.

The following figure shows how to fork the repository on your private github:

Now we can get the library with:

cd $GOPATH/src/github.com/
git clone git@github.com:coulof/gocsi.git coulof/gocsi
cd coulof/gocsi

To simplify the maintenance and merge of future commits, it is wise to add the original repo as an upstream branch with:

git remote add upstream git@github.com:dell/gocsi.git

The next important step is to pick and choose the correct library version used by our version of the driver.

We can check the csi-powerscale dependency file with: grep gocsi $GOPATH/src/github.com/dell/csi-powerscale/go.mod and create a branch of that version. In this case, the version is v1.5.0, and we can branch it with: git checkout v1.5.0 -b v1.5.0-longer-path.

Now it’s time to hack our patch! Which is… just a oneliner:

--- a/middleware/specvalidator/spec_validator.go
+++ b/middleware/specvalidator/spec_validator.go
@@ -770,7 +770,7 @@ func validateVolumeCapabilitiesArg(
 }
 
 const (
-       maxFieldString = 128
+       maxFieldString = 1023
        maxFieldMap    = 4096
        maxFieldNodeId = 256
 )

We can then commit and push our patched library with a nice tag:

git commit -a -m 'increase path limit'
git push --set-upstream origin v1.5.0-longer-path
git tag -a v1.5.0-longer-path
git push --tags

Build

With the patch committed and pushed, it’s time to build the CSI driver binary and its container image.

Let’s go back to the csi-powerscale main repo: cd $GOPATH/src/github.com/dell/csi-powerscale

As mentioned in the introduction, we can take advantage of the replace directive in the go.mod file to point to the patched lib. In this case we add the following:

diff --git a/go.mod b/go.mod
index 5c274b4..c4c8556 100644
--- a/go.mod
+++ b/go.mod
@@ -26,6 +26,7 @@ require (
 )
 
 replace (
+       github.com/dell/gocsi => github.com/coulof/gocsi v1.5.0-longer-path
        k8s.io/api => k8s.io/api v0.20.2
        k8s.io/apiextensions-apiserver => k8s.io/apiextensions-apiserver v0.20.2
        k8s.io/apimachinery => k8s.io/apimachinery v0.20.2

When that is done, we obtain the new module from the online repo with: go mod download

Note: If you want to test the changes locally only, we can use the replace directive to point to the local directory with:

replace github.com/dell/gocsi => ../../coulof/gocsi

We can then build our new driver binary locally with: make build

After compiling it successfully, we can create the image. The shortest path to do that is to replace the csi-isilon binary from the dellemc/csi-isilon docker image with:

cat << EOF > Dockerfile.patch
FROM dellemc/csi-isilon:v2.1.0
COPY "csi-isilon" .
EOF


docker build -t coulof/csi-isilon:v2.1.0-long-path -f Dockerfile.patch . 

Alternatively, you can rebuild an entire docker image using provided Makefile.

By default, the driver uses a Red Hat Universal Base Image minimal. That base image sometimes misses dependencies, so you can use another flavor, such as:

BASEIMAGE=registry.fedoraproject.org/fedora-minimal:latest REGISTRY=docker.io IMAGENAME=coulof/csi-powerscale IMAGETAG=v2.1.0-long-path make podman-build

The image is ready to be pushed in whatever image registry you prefer. In this case, this is hub.docker.com: docker push coulof/csi-isilon:v2.1.0-long-path.

Update CSI Kubernetes deployment

The last step is to replace the driver image used in your Kubernetes with your custom one.

Again, multiple solutions are possible, and the one to choose depends on how you deployed the driver.

If you used the helm installer, you can add the following block at the top of the myvalues.yaml file:

images:
  driver: docker.io/coulof/csi-powerscale:v2.1.0-long-path

Then update or uninstall/reinstall the driver as described in the documentation.

If you decided to use the Dell CSI Operator, you can simply point to the new image:

apiVersion: storage.dell.com/v1
kind: CSIIsilon
metadata:
  name: isilon
spec:
  driver:
    common:
      image: "docker.io/coulof/csi-powerscale:v2.1.0-long-path"
...

Or, if you want to do a quick and dirty test, you can create a patch file (here named path_csi-isilon_controller_image.yaml) with the following content:

spec:
  template:
    spec:
      containers:
      - name: driver 
        image: docker.io/coulof/csi-powerscale:v2.1.0-long-path

You can then apply it to your existing install with: kubectl patch deployment -n powerscale isilon-controller --patch-file path_csi-isilon_controller_image.yaml

In all cases, you can check that everything works by first making sure that the Pod is started:

kubectl get pods -n powerscale 

and that the logs are clean:

kubectl logs -n powerscale -l app=isilon-controller -c driver.

Wrap-up and disclaimer

As demonstrated, thanks to the open source, it’s easy to fix and improve Dell CSI drivers or Dell Container Storage Modules.

Keep in mind that Dell officially supports (through tickets, Service Requests, and so on) the image and binary, but not the custom build.

Thanks for reading and stay tuned for future posts on Dell Storage and Kubernetes!

Author: Florian Coulombel


Read Full Blog
  • data storage
  • SRM
  • lab

Announcing: the New Dell SRM Hands on Lab

Dejan Stojanovic Dejan Stojanovic

Thu, 07 Apr 2022 14:26:51 -0000

|

Read Time: 0 minutes

We are happy to announce the release of the new SRM hands on lab:

  • SRM 4.7.0.0 - Visualize, Analyze and Optimize Data Center Infrastructure with Dell SRM

This new SRM hands on lab is based on the latest SRM release (4.7.0.0), which introduced many new features, enhancements, and platform supports.

To find this lab, go to the demo center (https://democenter.delltechnologies.com) and enter “srm” in the search box. This link to the lab will appear:

 

Lab Info

The welcome screen on the lab looks like this. It includes a network diagram and a comprehensive lab guide:

 

In the first module, called “What’s New”, the lab focuses on the following new features, enhancements, and newly supported platforms:

  1. New features dialog
  2. Dell VxRail support
  3. MS Azure support
  4. Huawei Oceanstor support
  5. IBM FlashSystem support
  6. Chargeback trends reports
  7. Correlate performance data with alerts
  8. New business groups and operations dashboards
  9. Webhook API for auto ticketing
  10. In-context User Feedback

The rest of the modules cover in-depth SRM use-cases listed below. Each module is independent so that you can focus on your area of interest:

  • Configuration compliance
  • Workload analysis
  • Capacity planning
  • Performance troubleshooting
  • Chargeback reporting

and some of the main SRM features:

  • Topology and end-to-end relationships
  • Data extraction and automation tasks via REST API

Sample Reports

Check out some of the SRM dashboards available: 

  • For configuration compliance:

 

  • For active alerts:

Supported Platforms

The lab includes a great variety of SRM reports containing data from supported vendors and technologies:

  • amazon s3
  • brocade
  • chargeback
  • cisco mds 
  • cisco ucs 
  • dell centera
  • dell datadomain
  • dell dpa
  • dell ecs
  • dell powerflex
  • dell powerscale 
  • dell powerstore
  • dell recoverpoint 
  • dell sc
  • dell unity/vnx
  • dell vmax/pmax
  • dell vplex 
  • dell vxrail 
  • dell xtremio
  • hitachi device manager
  • hp3par storeserv
  • hp storageworks
  • hpe-nimble
  • huawei 
  • ibm ds (3k/4k/5k)
  • ibm flashsystem
  • ibm-lpar
  • ibm-svc 
  • ibm-xiv
  • ms azure
  • ms-hyper-v 
  • ms sql server
  • netapp
  • oracle mysql
  • physical hosts
  • pure storage
  • vmware vsphere & vsan

 

To wrap up

The SRM 4.7.0.0 hands on lab helps you experience SRM use-cases and features, by browsing through the powerful user interface and elaborating on data from multiple vendors and technologies.

Enjoy the SRM hands on lab! If you have any questions, please contact us at support@democenter.dell.com.

Resources

Author: Dejan Stojanovic


Read Full Blog
  • Unity
  • PowerMax
  • containers
  • Kubernetes
  • PowerFlex
  • PowerStore
  • PowerScale

Looking Ahead: Dell Container Storage Modules 1.2

Florian Coulombel Florian Coulombel

Mon, 21 Mar 2022 14:42:31 -0000

|

Read Time: 0 minutes

The quarterly update for Dell CSI Drivers & Dell Container Storage Modules (CSM) is here! Here’s what we’re planning.

CSM Features

New CSM Operator!

Dell Container Storage Modules (CSM) add data services and features that are not in the scope of the CSI specification today. The new CSM Operator simplifies the deployment of CSMs. With an ever-growing ecosystem and added features, deploying a driver and its affiliated modules need to be carefully studied before beginning the deployment

The new CSM Operator:

  • Serves as a one-stop-shop for deploying all Dell CSI driver and Container Storage Modules 
  • Simplifies the install and upgrade operations
  • Leverages the Operator framework to give a clear status of the deployment of the resources
  • Is certified by Red Hat OpenShift

In the short/middle term, the CSM Operator will deprecate the experimental CSM Installer.

Replication support with PowerScale

For disaster recovery protection, PowerScale implements data replication between appliances by means of the the SyncIQ feature. SyncIQ replicates the data between two sites, where one is read-write while the other is read-only, similar to Dell storage backends with async or sync replication.

The role of the CSM replication module and underlying CSI driver is to provision the volume within Kubernetes clusters and prepare the export configurations, quotas, and so on.

CSM Replication for PowerScale has been designed and implemented in such a way that it won’t collide with your existing Superna Eyeglass DR utility.

A live-action demo will be posted in the coming weeks on our VP YouTube channel: https://www.youtube.com/user/itzikreich/.

CSI features

Across the portfolio

In this release, each CSI driver:

fsGroupPolicy support

Kubernetes v1.19 introduced the fsGroupPolicy to give more control to the CSI driver over the permission sets in the securityContext.

There are three possible options: 

  • None -- which means that the fsGroup directive from the securityContext will be ignored 
  • File -- which means that the fsGroup directive will be applied on the volume. This is the default setting for NAS systems such as PowerScale or Unity-File.
  • ReadWriteOnceWithFSType -- which means that the fsGroup directive will be applied on the volume if it has fsType defined and is ReadWriteOnce. This is the default setting for block systems such as PowerMax and PowerStore-Block.

In all cases, Dell CSI drivers let kubelet perform the change ownership operations and do not do it at the driver level.

Standalone Helm install

Drivers for PowerFlex and Unity can now be installed with the help of the install scripts we provide under the dell-csi-installer directory.

A standalone Helm chart helps to easily integrate the driver installation with the agent for Continuous Deployment like Flux or Argo CD.

Note: To ensure that you install the driver on a supported Kubernetes version, the Helm charts take advantage of the kubeVersion field. Some Kubernetes distributions use labels in kubectl version (such as v1.21.3-mirantis-1 and v1.20.7-eks-1-20-7) that require manual editing.

Volume Health Monitoring support

Drivers for PowerFlex and Unity implement Volume Health Monitoring.

This feature is currently in alpha in Kubernetes (in Q1-2022), and is disabled with a default installation.

Once enabled, the drivers will expose the standard storage metrics, such as capacity usage and inode usage through the Kubernetes /metrics endpoint. The metrics will flow natively in popular dashboards like the ones built-in OpenShift Monitoring: 

Pave the way for full open source!

All Dell drivers and dependencies like gopowerstore, gobrick, and more are now on Github and will be fully open-sourced. The umbrella project is and remains https://github.com/dell/csm, from which you can open tickets and see the roadmap.

Google Anthos 1.9

The Dell partnership with Google continues, and the latest CSI drivers for PowerScale and PowerStore support Anthos v1.9.

NFSv4 POSIX and ACL support

Both CSI PowerScale and PowerStore now allow setting the default permissions for the newly created volume. To do this, you can use POSIX octal notation or ACL.

  • In PowerScale, you can use plain ACL or built-in values such as private_read, private, public_read, public_read_write, public or custom ones
  • In PowerStore, you can use the custom ones such as A::OWNER@:RWX, A::GROUP@:RWX, and A::OWNER@:rxtncy.

Useful links

For more details you can:

Author: Florian Coulombel

Read Full Blog
  • PowerMax
  • security
  • PowerStore
  • CloudIQ

PowerMax and PowerStore Cyber Security

Richard Pace Justin Bastin Richard Pace Justin Bastin

Tue, 15 Mar 2022 19:24:40 -0000

|

Read Time: 0 minutes

Dell Technologies takes a comprehensive approach to cyber resiliency and is committed to helping customers achieve their security objectives and requirements. Storage Engineering Technologists Richard Pace, Justin Bastin, and Derek Barboza worked together, cross platform, to deliver three independent cyber security white papers for PowerMax, Mainframe, and PowerStore:

Each paper acts as a single point where customers can gain an understanding of the respective robust features and data services available to safeguard sensitive and mission critical data in the event of a cyber crime. All three papers leverage CloudIQ and the CyberSecurity feature to provide customers insight in anomaly detection.  

The following figure shows a CloudIQ anomaly that indicates unusual behavior in a customer’s environment:

Backed by CyberSecurity in CloudIQ, we can see how quickly CloudIQ detects the issue and provides the details for manual remediation.


Dell has an ingrained culture of security. We follow a 'shift-left' approach that ensures that security is baked into every process in the development life cycle. The Dell Secure Development Lifecycle (SDL) defines security controls based on industry standards that Dell product teams adopt while developing new features and functionality. Dell’s SDL defines security controls that our product teams adopt while developing new features and functionality. Our SDL includes both analysis activities and prescriptive proactive controls around key risk areas.   

Dell strives to help our customers minimize risk associated with security vulnerabilities in our products. Our goal is to provide customers with timely information, guidance, and mitigation options to address vulnerabilities. The Dell Product Security Incident Response Team (Dell PSIRT) is chartered and responsible for coordinating the response and disclosure for all product vulnerabilities that are reported to Dell. Dell employs a rigorous process to continually evaluate and improve our vulnerability response practices, and regularly benchmarks these against the rest of the industry. 

Resources

Authors: Richard Pace, Justin F. Bastin


Read Full Blog
  • containers
  • data storage
  • Kubernetes
  • CSI

CSI drivers 2.0 and Dell EMC Container Storage Modules GA!

Florian Coulombel Florian Coulombel

Thu, 14 Oct 2021 11:40:35 -0000

|

Read Time: 0 minutes

The quarterly update for Dell CSI Driver is here! But today marks a significant milestone because we are also announcing the availability of Dell EMC Container Storage Modules (CSM). Here’s what we’re covering in this blog:

Container Storage Modules

Dell Container Storage Modules is a set of modules that aims to extend Kubernetes storage features beyond what is available in the CSI specification.

The CSM modules will expose storage enterprise features directly within Kubernetes, so developers are empowered to leverage them for their deployment in a seamless way.

Most of these modules are released as sidecar containers that work with the CSI driver for the Dell storage array technology you use.

CSM modules are open-source and freely available from : https://github.com/dell/csm.

Volume Group Snapshot

Many stateful apps can run on top of multiple volumes. For example, we can have a transactional DB like Postgres with a volume for its data and another for the redo log, or Cassandra that is distributed across nodes, each having a volume, and so on.

When you want to take a recoverable snapshot, it is vital to take them consistently at the exact same time.

Dell CSI Volume Group Snapshotter solves that problem for you. With the help of a CustomResourceDefinition, an additional sidecar to the Dell CSI drivers, and leveraging vanilla Kubernetes snapshots, you can manage the life cycle of crash-consistent snapshots. This means you can create a group of volumes for which the drivers create snapshots, restore them, or move them with one shot simultaneously!

To take a crash-consistent snapshot, you can either use labels on your PersistantVolumeClaim, or be explicit and pass the list of PVCs that you want to snap. For example:

apiVersion: v1
apiVersion: volumegroup.storage.dell.com/v1alpha2
kind: DellCsiVolumeGroupSnapshot
metadata:
  # Name must be 13 characters or less in length
  name: "vg-snaprun1"
spec:
  driverName: "csi-vxflexos.dellemc.com"
  memberReclaimPolicy: "Retain"
  volumesnapshotclass: "poweflex-snapclass"
  pvcLabel: "vgs-snap-label"
  # pvcList:
  #   - "pvcName1"
  #   - "pvcName2"

For the first release, CSI for PowerFlex supports Volume Group Snapshot.

Observability

The CSM Observability module is delivered as an open-telemetry agent that collects array-level metrics to scrape them for storage in a Prometheus DB.

The integration is as easy as creating a Prometheus ServiceMonitor for Prometheus. For example:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: otel-collector
  namespace: powerstore
spec:
  endpoints:
  - path: /metrics
    port: exporter-https
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  selector:
    matchLabels:
      app.kubernetes.io/instance: karavi-observability
      app.kubernetes.io/name: otel-collector

With the observability module, you will gain visibility on the capacity of the volume you manage with Dell CSI drivers and their performance, in terms of bandwidth, IOPS, and response time.

Thanks to pre-canned Grafana dashboards, you will be able to go through these metrics’ history and see the topology between a Kubernetes PersistentVolume (PV) until its translation as a LUN or fileshare in the backend array.

The Kubernetes admin can also collect array level metrics to check the overall capacity performance directly from the familiar Prometheus/Grafana tools.

For the first release, Dell EMC PowerFlex and Dell EMC PowerStore support CSM Observability.

Replication

Each Dell storage array supports replication capabilities. It can be asynchronous with an associated recovery point objective, synchronous replication between sites, or even active-active.

Each replication type serves a different purpose related to the use-case or the constraint you have on your data centers.

The Dell CSM replication module allows creating a persistent volume that can be of any of three replication types -- synchronous, asynchronous, and metro -- assuming the underlying storage box supports it.

The Kubernetes architecture can build on a stretched cluster between two sites or on two or more independent clusters.  The module itself is composed of three main components:

  • The Replication controller whose role is to manage the CustomResourceDefinition that abstracts the concept of Replication across the Kubernetes cluster
  • The Replication sidecar for the CSI driver that will convert the Replication controller request to an actual call on the array side
  • The repctl utility, to simplify managing replication objects across multiple Kubernetes clusters

The usual workflow is to create a PVC that is replicated with a classic Kubernetes directive by just picking the right StorageClass. You can then use repctl or edit the DellCSIReplicationGroup CRD to launch operations like Failover, Failback, Reprotect, Suspend, Synchronize, and so on.

For the first release, Dell EMC PowerMax and Dell EMC PowerStore support CSM Replication.

Authorization

With CSM Authorization we are giving back more control of storage consumption to the storage administrator.

The authorization module is an independent service, installed and owned by the storage admin.

Within that module, the storage administrator will create access control policies and storage quotas to make sure that Kubernetes consumers are not overconsuming storage or trying to access data that doesn’t belong to them.

CSM Authorization makes multi-tenant architecture real by enforcing Role-Based Access Control on storage objects coming from multiple and independent Kubernetes clusters.

The authorization module acts as a proxy between the CSI driver and the backend array. Access is granted with an access token that can be revoked at any point in time. Quotas can be changed on the fly to limit or increase storage consumption from the different tenants.

For the first release, Dell EMC PowerMax and Dell EMC PowerFlex support CSM Authorization.

Resilency

When dealing with StatefulApp, if a node goes down, vanilla Kubernetes is pretty conservative.

Indeed, from the Kubernetes control plane, the failing node is seen as not ready. It can be because the node is down, or because of network partitioning between the control plane and the node, or simply because the kubelet is down. In the latter two scenarios, the StatefulApp is still running and possibly writing data on disk. Therefore, Kubernetes won’t take action and lets the admin manually trigger a Pod deletion if desired.

The CSM Resiliency module (sometimes named PodMon) aims to improve that behavior with the help of collected metrics from the array.

Because the driver has access to the storage backend from pretty much all other nodes, we can see the volume status (mapped or not) and its activity (are there IOPS or not). So when a node goes into NotReady state, and we see no IOPS on the volume, Resiliency will relocate the Pod to a new node and clean whatever leftover objects might exist.

The entire process happens in seconds between the moment a node is seen down and the rescheduling of the Pod.

To protect an app with the resiliency module, you only have to add the label podmon.dellemc.com/driver to it, and it is then protected.

For more details on the module’s design, you can check the documentation here.

For the first release, Dell EMC PowerFlex and Dell EMC Unity support CSM Resiliency.

Installer

Each module above is released either as an independent helm chart or as an option within the CSI Drivers.

For more complex deployments, which may involve multiple Kubernetes clusters or a mix of modules, it is possible to use the csm installer.

The CSM Installer, built on top of carvel gives the user a single command line to create their CSM-CSI application and to manage them outside the Kubernetes cluster.

For the first release, all drivers and modules support the CSM Installer.

New CSI features

Across portfolio

For each driver, this release provides:

  • Support of OpenShift 4.8
  • Support of Kubernetes 1.22
  • Support of Rancher Kubernetes Engine 2
  • Normalized configurations between drivers
  • Dynamic Logging Configuration
  • New CSM installer

VMware Tanzu Kubernetes Grid

VMware Tanzu offers storage management by means of its CNS-CSI driver, but it doesn’t support ReadWriteMany access mode.

If your workload needs concurrent access to the filesystem, you can now rely on CSI Driver for PowerStore, PowerScale and Unity through the NFS protocol. The three platforms are officially supported and qualified on Tanzu.

NFS behind NAT

NFS Driver, PowerStore, PowerScale, and Unity have all been tested and work when the Kubernetes cluster is behind a private network.

PowerScale

By default, the CSI driver creates volumes with 777 POSIX permission on the directory.

Now with the isiVolumePathPermissions parameter, you can use ACLs or any more permissive POSIX rights.

The isiVolumePathPermissions can be configured as part of the ConfigMap with the PowerScale settings or at the StorageClass level. The accepted parameter values are: private_read, private, public_read, public_read_write, and public for the ACL or any combination of [POSIX Mode].

Useful links

For more details you can:

Author: Florian Coulombel

 

Read Full Blog