The OpenManage collection

Dell OpenManage Enterprise (OME) is a powerful fleet management tool for managing and monitoring Dell PowerEdge server infrastructure. Very recently, Dell announced OME 4.0, complete with a litany of new functionality that my colleague Mark detailed in another blog. Here, we'll explore how to automate inventory management of devices managed by OME using Ansible.

Prerequisites

Before we get started, ensure you have Ansible and Python installed on your system. Additionally, you will need to install Dell’s openmanage Ansible collection from Ansible Galaxy using the following command:

ansible-galaxy collection install dellemc.openmanage

The source code and examples for the openmanage collection can be found on GitHub as well. Note that this collection includes modules and roles for iDRAC/Redfish interfaces as well as modules for OpenManage Enterprise with complete fleet management workflows. In this blog, we will look at examples from the OME modules within the collection.

Figure 1. Dell openmanage ansible modules on GitHub

Inventory management workflows

Inventory management typically involves gathering details like the different devices under management, their health information, and so on. The dellemc.openmanage.ome_device_info is the optimal module for collecting the most information. Let’s dig into some tasks to get this information.

Retrieve basic inventory

This task retrieves basic inventory information for all devices managed by OME:

   - name: Retrieve basic inventory of all devices
      dellemc.openmanage.ome_device_info:
       hostname: "{{ hostname }}"
       username: "{{ username }}"
       password: "{{ password }}"
       validate_certs: no
     register: device_info_result

Once we have the output of this captured in a variable like device_info_result, we can drill down into the object to retrieve data like the number of servers and their service tags and print such information using the debug task:

   - name: Device count
     debug:
       msg: "Number of devices: {{ device_info_result.device_info.value | length }}"

   - name: get all service tags
     set_fact:
       service_tags: "{{ service_tags + [item.DeviceServiceTag] }}"
     loop: "{{ device_info_result.device_info.value }}"
     no_log: true

   - name: List service tags of devices  
     debug:
       msg: "{{ service_tags }}"

Note that device_info_result is a huge object. To view all the information that is available to extract, write the contents of the variable to a JSON file:

   - name: Save device_info to a file
     copy:
       content: "{{ device_info_result | to_nice_json }}"
       dest: "./output-json/device_info_result.json"

Subsystem health

Subsystem health information is another body of information that is extremely granular. This information is not part of the default module task. To get this data, we need to explicitly set the fact_subsystem option to subsystem_health. Following is the task to retrieve subsystem health information for devices identified by their service tags. We pass the entire array of service tags to get all the information at once:

   - name: Retrieve subsystem health of specified devices identified by service tags.
      dellemc.openmanage.ome_device_info:
       hostname: "{{ hostname }}"
       username: "{{ username }}"
       password: "{{ password }}"
       validate_certs: no
       fact_subset: "subsystem_health"
        system_query_options:
          device_service_tag: "{{ service_tags }}"
     register: health_info_result

Using the register directive, we loaded the subsystem health information into the variable health_info_result. Once again, we recommend writing this information to a JSON file using the following code in order to see the level of granularity that you can extract:

   - name: Save device health info to a file
     copy:
       content: "{{ health_info_result | to_nice_json }}"
       dest: "./output-json/health_info_result.json"

To identify device health issues, we loop through all the devices with the service_tags variable and check if there are any faults reported for each device. When faults are found, we store the fault information into a dictionary variable, shown as the inventory_issues variable in the following code. The dictionary variable has three fields: service tag, fault summary, and the fault list. Note that the fault list itself is an array containing all the faults for the device:

   - name: Gather information for devices with issues
     set_fact:
       inventory_issues: >
         {{
          inventory_issues  +
           [{
           'service_tag': item,
           'fault_summary': health_info_result['device_info']['device_service_tag'][service_tags[index]]['value'] | json_query('[?FaultSummaryList].FaultSummaryList[]'),
           'fault_list': health_info_result['device_info']['device_service_tag'][service_tags[index]]['value'] | json_query('[?FaultList].FaultList[]')   
           }]
         }}
     loop: "{{ service_tags }}"
     when: " (health_info_result['device_info']['device_service_tag'][service_tags[index]]['value'] | json_query('[?FaultList].FaultList[]') | length) > 0"
     loop_control:
       index_var: index
     no_log: true

In the next task, we loop through the devices with issues and gather more detailed fault information for each. The tasks to perform this extraction are included in an external task file named device_issues.yml which is run for every member of the inventory_issues dictionary. Note that we are passing device_item and device_index as variables for each iteration of device_issues.yml:

   - name: Gather fault details
     include_tasks: device_issues.yml
     vars:
       device_item: "{{ item }}"
       device_index: "{{ index }}"
     loop: "{{ inventory_issues }}"
     loop_control:
       index_var: index
     no_log: true

Within the device_issues.yml, we first initialize a dictionary variable that can gather information about the faults for the device. The variable captures the subsystem, fault message, and the recommended action:

- name: Initialize specifics structure
  set_fact:
    current_device:
      {
        'service_tag': '',
        'subsystem': [],
        'Faults': [],
         'Recommendations':[]
      }

We loop through all the faults for the device and populate the objects of the dictionary variable:

- name: Assign fault specifics
  set_fact:
    current_device:
      service_tag: "{{ device_item.service_tag }}"
      Faults: "{{ current_device.Faults + [fault.Message] }}"
      Recommendations: "{{ current_device.Recommendations + [fault.RecommendedAction] }}"
  loop: "{{ device_item.fault_list }}"
  loop_control:
     loop_var: fault
  when: device_item.fault_list is defined
  no_log: true

We then append to a global variable that is aggregating the information for all the devices:

- name: Append current device to all_faults
  set_fact:
    fault_details: "{{ fault_details + [current_device] }}"

Back to the main YML script, once we have all the information captured in fault_details, we can print the information we need to store to a file:

    - name: Print fault details
      debug:
        msg: "Fault details: {{ item.Faults }}"
      loop: "{{ fault_details }}"
      loop_control:
        label: "{{ item.service_tag }}"

    - name: Print recommendations
      debug:
        msg: "Recommended actions: {{ item.Recommendations }}"
      loop: "{{ fault_details }}"
      loop_control:
        label: "{{ item.service_tag }}"

Check out the following video to see how the different steps of the workflow are run:

Conclusion

To recap, we looked at the various information gathering tasks within inventory management of a large PowerEdge server footprint. Note that I used health information objects to demonstrate how to drill down to find the information you need, however you can do this with any fact subset that can retrieved using the dellemc.openmanage.ome_device_info module. You can find the code from this blog on GitHub as part of this Automation examples repo.

Author: Parasar Kodati, Engineering Technologist, Dell ISG

Your Browser is Out of Date