Dell OpenManage Enterprise Operations with Ansible Part 1: Inventory Management
Tue, 14 Nov 2023 14:36:09 -0000
|Read Time: 0 minutes
The OpenManage collection
Dell OpenManage Enterprise (OME) is a powerful fleet management tool for managing and monitoring Dell PowerEdge server infrastructure. Very recently, Dell announced OME 4.0, complete with a litany of new functionality that my colleague Mark detailed in another blog. Here, we'll explore how to automate inventory management of devices managed by OME using Ansible.
Prerequisites
Before we get started, ensure you have Ansible and Python installed on your system. Additionally, you will need to install Dell’s openmanage Ansible collection from Ansible Galaxy using the following command:
ansible-galaxy collection install dellemc.openmanage
The source code and examples for the openmanage collection can be found on GitHub as well. Note that this collection includes modules and roles for iDRAC/Redfish interfaces as well as modules for OpenManage Enterprise with complete fleet management workflows. In this blog, we will look at examples from the OME modules within the collection.
Figure 1. Dell openmanage ansible modules on GitHub
Inventory management workflows
Inventory management typically involves gathering details like the different devices under management, their health information, and so on. The dellemc.openmanage.ome_device_info is the optimal module for collecting the most information. Let’s dig into some tasks to get this information.
Retrieve basic inventory
This task retrieves basic inventory information for all devices managed by OME:
- name: Retrieve basic inventory of all devices dellemc.openmanage.ome_device_info: hostname: "{{ hostname }}" username: "{{ username }}" password: "{{ password }}" validate_certs: no register: device_info_result
Once we have the output of this captured in a variable like device_info_result, we can drill down into the object to retrieve data like the number of servers and their service tags and print such information using the debug task:
- name: Device count debug: msg: "Number of devices: {{ device_info_result.device_info.value | length }}"
- name: get all service tags set_fact: service_tags: "{{ service_tags + [item.DeviceServiceTag] }}" loop: "{{ device_info_result.device_info.value }}" no_log: true
- name: List service tags of devices debug: msg: "{{ service_tags }}"
Note that device_info_result is a huge object. To view all the information that is available to extract, write the contents of the variable to a JSON file:
- name: Save device_info to a file copy: content: "{{ device_info_result | to_nice_json }}" dest: "./output-json/device_info_result.json"
Subsystem health
Subsystem health information is another body of information that is extremely granular. This information is not part of the default module task. To get this data, we need to explicitly set the fact_subsystem option to subsystem_health. Following is the task to retrieve subsystem health information for devices identified by their service tags. We pass the entire array of service tags to get all the information at once:
- name: Retrieve subsystem health of specified devices identified by service tags. dellemc.openmanage.ome_device_info: hostname: "{{ hostname }}" username: "{{ username }}" password: "{{ password }}" validate_certs: no fact_subset: "subsystem_health" system_query_options: device_service_tag: "{{ service_tags }}" register: health_info_result
Using the register directive, we loaded the subsystem health information into the variable health_info_result. Once again, we recommend writing this information to a JSON file using the following code in order to see the level of granularity that you can extract:
- name: Save device health info to a file copy: content: "{{ health_info_result | to_nice_json }}" dest: "./output-json/health_info_result.json"
To identify device health issues, we loop through all the devices with the service_tags variable and check if there are any faults reported for each device. When faults are found, we store the fault information into a dictionary variable, shown as the inventory_issues variable in the following code. The dictionary variable has three fields: service tag, fault summary, and the fault list. Note that the fault list itself is an array containing all the faults for the device:
- name: Gather information for devices with issues set_fact: inventory_issues: > {{ inventory_issues + [{ 'service_tag': item, 'fault_summary': health_info_result['device_info']['device_service_tag'][service_tags[index]]['value'] | json_query('[?FaultSummaryList].FaultSummaryList[]'), 'fault_list': health_info_result['device_info']['device_service_tag'][service_tags[index]]['value'] | json_query('[?FaultList].FaultList[]') }] }} loop: "{{ service_tags }}" when: " (health_info_result['device_info']['device_service_tag'][service_tags[index]]['value'] | json_query('[?FaultList].FaultList[]') | length) > 0" loop_control: index_var: index no_log: true
In the next task, we loop through the devices with issues and gather more detailed fault information for each. The tasks to perform this extraction are included in an external task file named device_issues.yml which is run for every member of the inventory_issues dictionary. Note that we are passing device_item and device_index as variables for each iteration of device_issues.yml:
- name: Gather fault details include_tasks: device_issues.yml vars: device_item: "{{ item }}" device_index: "{{ index }}" loop: "{{ inventory_issues }}" loop_control: index_var: index no_log: true
Within the device_issues.yml, we first initialize a dictionary variable that can gather information about the faults for the device. The variable captures the subsystem, fault message, and the recommended action:
- name: Initialize specifics structure set_fact: current_device: { 'service_tag': '', 'subsystem': [], 'Faults': [], 'Recommendations':[] }
We loop through all the faults for the device and populate the objects of the dictionary variable:
- name: Assign fault specifics set_fact: current_device: service_tag: "{{ device_item.service_tag }}" Faults: "{{ current_device.Faults + [fault.Message] }}" Recommendations: "{{ current_device.Recommendations + [fault.RecommendedAction] }}" loop: "{{ device_item.fault_list }}" loop_control: loop_var: fault when: device_item.fault_list is defined no_log: true
We then append to a global variable that is aggregating the information for all the devices:
- name: Append current device to all_faults set_fact: fault_details: "{{ fault_details + [current_device] }}"
Back to the main YML script, once we have all the information captured in fault_details, we can print the information we need to store to a file:
- name: Print fault details debug: msg: "Fault details: {{ item.Faults }}" loop: "{{ fault_details }}" loop_control: label: "{{ item.service_tag }}"
- name: Print recommendations debug: msg: "Recommended actions: {{ item.Recommendations }}" loop: "{{ fault_details }}" loop_control: label: "{{ item.service_tag }}"
Check out the following video to see how the different steps of the workflow are run:
Conclusion
To recap, we looked at the various information gathering tasks within inventory management of a large PowerEdge server footprint. Note that I used health information objects to demonstrate how to drill down to find the information you need, however you can do this with any fact subset that can retrieved using the dellemc.openmanage.ome_device_info module. You can find the code from this blog on GitHub as part of this Automation examples repo.
Author: Parasar Kodati, Engineering Technologist, Dell ISG