OneFS SupportAssist Architecture and Operation
Fri, 21 Apr 2023 16:41:36 -0000
|Read Time: 0 minutes
The previous article in this series looked at an overview of OneFS SupportAssist. Now, we’ll turn our attention to its core architecture and operation.
Under the hood, SupportAssist relies on the following infrastructure and services:
Service | Name |
ESE | Embedded Service Enabler. |
isi_rice_d | Remote Information Connectivity Engine (RICE). |
isi_crispies_d | Coordinator for RICE Incidental Service Peripherals including ESE Start. |
Gconfig | OneFS centralized configuration infrastructure. |
MCP | Master Control Program – starts, monitors, and restarts OneFS services. |
Tardis | Configuration service and database. |
Transaction journal | Task manager for RICE. |
Of these, ESE, isi_crispies_d, isi_rice_d, and the Transaction Journal are new in OneFS 9.5 and exclusive to SupportAssist. By contrast, Gconfig, MCP, and Tardis are all legacy services that are used by multiple other OneFS components.
The Remote Information Connectivity Engine (RICE) represents the new SupportAssist ecosystem for OneFS to connect to the Dell backend. The high level architecture is as follows:
Dell’s Embedded Service Enabler (ESE) is at the core of the connectivity platform and acts as a unified communications broker between the PowerScale cluster and Dell Support. ESE runs as a OneFS service and, on startup, looks for an on-premises gateway server. If none is found, it connects back to the connectivity pipe (SRS). The collector service then interacts with ESE to send telemetry, obtain upgrade packages, transmit alerts and events, and so on.
Depending on the available resources, ESE provides a base functionality with additional optional capabilities to enhance serviceability. ESE is multithreaded, and each payload type is handled by specific threads. For example, events are handled by event threads, binary and structured payloads are handled by web threads, and so on. Within OneFS, ESE gets installed to /usr/local/ese and runs as ‘ese’ user and group.
The responsibilities of isi_rice_d include listening for network changes, getting eligible nodes elected for communication, monitoring notifications from CRISPIES, and engaging Task Manager when ESE is ready to go.
The Task Manager is a core component of the RICE engine. Its responsibility is to watch the incoming tasks that are placed into the journal and to assign workers to step through the tasks until completion. It controls the resource utilization (Python threads) and distributes tasks that are waiting on a priority basis.
The ‘isi_crispies_d’ service exists to ensure that ESE is only running on the RICE active node, and nowhere else. It acts, in effect, like a specialized MCP just for ESE and RICE-associated services, such as IPA. This entails starting ESE on the RICE active node, re-starting it if it crashes on the RICE active node, and stopping it and restarting it on the appropriate node if the RICE active instance moves to another node. We are using ‘isi_crispies_d’ for this, and not MCP, because MCP does not support a service running on only one node at a time.
The core responsibilities of ‘isi_crispies_d’ include:
- Starting and stopping ESE on the RICE active node
- Monitoring ESE and restarting, if necessary. ‘isi_crispies_d’ restarts ESE on the node if it crashes. It will retry a couple of times and then notify RICE if it’s unable to start ESE.
- Listening for gconfig changes and updating ESE. Stopping ESE if unable to make a change and notifying RICE.
- Monitoring other related services.
The state of ESE, and of other RICE service peripherals, is stored in the OneFS tardis configuration database so that it can be checked by RICE. Similarly, ‘isi_crispies_d’ monitors the OneFS Tardis configuration database to see which node is designated as the RICE ‘active’ node.
The ‘isi_telemetry_d’ daemon is started by MCP and runs when SupportAssist is enabled. It does not have to be running on the same node as the active RICE and ESE instance. Only one instance of ‘isi_telemetry_d’ will be active at any time, and the other nodes will be waiting for the lock.
You can query the current status and setup of SupportAssist on a PowerScale cluster by using the ‘isi supportassist settings view’ CLI command. For example:
# isi supportassist settings view Service enabled: Yes Connection State: enabled OneFS Software ID: ELMISL08224764 Network Pools: subnet0:pool0 Connection mode: direct Gateway host: - Gateway port: - Backup Gateway host: - Backup Gateway port: - Enable Remote Support: Yes Automatic Case Creation: Yes Download enabled: Yes
You can also do this from the WebUI by navigating to Cluster management > General settings > SupportAssist:
You can enable or disable SupportAssist by using the ‘isi services’ CLI command set. For example:
# isi services isi_supportassist disable The service 'isi_supportassist' has been disabled. # isi services isi_supportassist enable The service 'isi_supportassist' has been enabled. # isi services -a | grep supportassist isi_supportassist SupportAssist Monitor Enabled
You can check the core services, as follows:
# ps -auxw | grep -e 'rice' -e 'crispies' | grep -v grep root 8348 9.4 0.0 109844 60984 - Ss 22:14 0:00.06 /usr/libexec/isilon/isi_crispies_d /usr/bin/isi_crispies_d root 8183 8.8 0.0 108060 64396 - Ss 22:14 0:01.58 /usr/libexec/isilon/isi_rice_d /usr/bin/isi_rice_d
Note that when a cluster is provisioned with SupportAssist, ESRS can no longer be used. However, customers that have not previously connected their clusters to Dell Support can still provision ESRS, but will be presented with a message encouraging them to adopt the best practice of using SupportAssist.
Additionally, SupportAssist in OneFS 9.5 does not currently support IPv6 networking, so clusters deployed in IPv6 environments should continue to use ESRS until SupportAssist IPv6 integration is introduced in a future OneFS release.
Author: Nick Trimbee
Related Blog Posts
OneFS Log Gather Transmission
Wed, 17 Apr 2024 15:45:51 -0000
|Read Time: 0 minutes
The OneFS isi_gather_info utility is the ubiquitous method for collecting and uploading a PowerScale cluster’s context and configuration to assist with the identification and resolution of bugs and issues. As such, it performs the following roles:
- Executes many commands, scripts, and utilities on a cluster, and saves their results
- Collates, or gathers, all these files into a single ‘gzipped’ package
- Optionally transmits this log gather package back to Dell using a choice of several transport methods
By default, a log gather tarfile is written to the /ifs/data/Isilon_Support/pkg/ directory. It can also be uploaded to Dell by the following means:
Upload mechanism | Description | TCP port | OneFS release support |
SupportAssist / ESRS | Uses Dell Secure Remote Support (SRS) for gather upload. | 443/8443 | Any |
FTP | Use FTP to upload the completed gather. | 21 | Any |
FTPS | Use SSH-based encrypted FTPS to upload the gather. | 22 | Default in OneFS 9.5 and later |
HTTP | Use HTTP to upload the gather. | 80/443 | Any |
As indicated in this table, OneFS 9.5 and later releases now leverage FTPS as the default option for FTP upload, thereby protecting the upload of cluster configuration and logs with an encrypted transmission session.
Under the hood, the log gather process comprises an eight phase workflow, with transmission comprising the penultimate ‘Upload’ phase:
The details of each phase are as follows:
Phase | Description |
1. Setup | Reads from the arguments passed in, and from any config files on disk, and sets up the config dictionary, which will be used throughout the rest of the codebase. Most of the code for this step is contained in isilon/lib/python/gather/igi_config/configuration.py. This is also the step in which the program is most likely to exit, if some config arguments end up being invalid. |
2. Run local | Executes all the cluster commands, which are run on the same node that is starting the gather. All these commands run in parallel (up to the current parallelism value). This is typically the second longest running phase. |
3. Run nodes | Executes the node commands across all of the cluster’s nodes. This runs on each node, and while these commands run in parallel (up to the current parallelism value), they do not run in parallel with the ‘Run local’ step. |
4. Collect | Ensures that all of the results end up on the overlord node (the node that started the gather). If the gather is using /ifs, it is very fast; if it is not using /ifs, it needs to SCP all the node results to a single node. |
5. Generate Extra Files | Generates nodes_info.xml and package_info.xml. These two files are present in every gather, and provide important metadata about the cluster. |
6. Packing | Packs (tars and gzips) all the results. This is typically the longest running phase, often by an order of magnitude. |
7. Upload | Transports the tarfile package to its specified destination using SupportAssist, ESRS, FTPS, FTP, HTTP, and so on. Depending on the geographic location, this phase might also be lengthy. |
8. Cleanup | Cleans up any intermediary files that were created on the cluster. This phase will run even if the gather fails, or is interrupted. |
Because the isi_gather_info tool is primarily intended for troubleshooting clusters with issues, it runs as root (or compadmin in compliance mode), because it needs to be able to execute under degraded conditions (such as without GMP, during upgrade, and under cluster splits, and so on). Given these atypical requirements, isi_gather_info is built as a standalone utility, rather than using the platform API for data collection.
While FTPS is the new default and recommended transport, the legacy plaintext FTP upload method is still available in OneFS 9.5 and later. As such, Dell’s log server, ftp.isilon.com, also supports both encrypted FTPS and plaintext FTP, so will not impact older release FTP log upload behavior.
This OneFS 9.5 FTPS security enhancement encompasses three primary areas where an FTPS option is now supported:
- Directly executing the /usr/bin/isi_gather_info utility
- Running using the isi diagnostics gather CLI command set
- Creating a diagnostics gather through the OneFS WebUI
For the isi_gather_info utility, two new options are included in OneFS 9.5 and later releases:
New option for isi_gather_info | Description | Default value |
--ftp-insecure | Enables the gather to use unencrypted FTP transfer. | False |
--ftp-ssl-cert | Enables the user to specify the location of a special SSL certificate file. | Empty string. Not typically required. |
Similarly, there are two corresponding options in OneFS 9.5 and later for the isi diagnostics CLI command:
New option for isi diagnostics | Description | Default value |
--ftp-upload-insecure | Enables the gather to use unencrypted FTP transfer. | No |
--ftp-upload-ssl-cert | Enables the user to specify the location of a special SSL certificate file. | Empty string. Not typically required. |
Based on these options, the following table provides some command syntax usage examples, for both FTPS and FTP uploads:
FTP upload type | Description | Example isi_gather_info syntax | Example isi diagnostics syntax |
Secure upload (default) | Upload cluster logs to the Dell log server (ftp.isilon.com) using encrypted FTP (FTPS). | # isi_gather_info Or # isi_gather_info --ftp | # isi diagnostics gather start Or # isi diagnostics gather start --ftp-upload-insecure=no |
Secure upload | Upload cluster logs to an alternative server using encrypted FTPS. | # isi_gather_info --ftp-host <FQDN> --ftp-ssl-cert <SSL_CERT_PATH> | # isi diagnostics gather start --ftp-upload-host=<FQDN> --ftp-ssl-cert= <SSL_CERT_PATH> |
Unencrypted upload | Upload cluster logs to the Dell log server (ftp.isilon.com) using plaintext FTP. | # isi_gather_info --ftp-insecure | # isi diagnostics gather start --ftp-upload-insecure=yes |
Unencrypted upload | Upload cluster logs to an alternative server using plaintext FTP. | # isi_gather_info --ftp-insecure --ftp-host <FQDN> | # isi diagnostics gather start --ftp-upload-host=<FQDN> --ftp-upload-insecure=yes |
Note that OneFS 9.5 and later releases provide a warning if the cluster admin elects to continue using non-secure FTP for the isi_gather_info tool. Specifically, if the --ftp-insecure option is configured, the following message is displayed, informing the user that plaintext FTP upload is being used, and that the connection and data stream will not be encrypted:
# isi_gather_info --ftp-insecure
You are performing plain text FTP logs upload.
This feature is deprecated and will be removed
in a future release. Please consider the possibility
of using FTPS for logs upload. For further information,
please contact PowerScale support
...
In addition to the command line, log gathers can also be configured using the OneFS WebUI by navigating to Cluster management > Diagnostics > Gather settings.
The Edit gather settings page in OneFS 9.5 and later has been updated to reflect FTPS as the default transport method, plus the addition of radio buttons and text boxes to accommodate the new configuration options.
If plaintext FTP upload is configured, the healthcheck command will display a warning that plaintext upload is used and is no longer a recommended option. For example:
For reference, the OneFS 9.5 and later isi_gather_info CLI command syntax includes the following options:
Option | Description |
--upload <boolean> | Enable gather upload. |
--esrs <boolean> | Use ESRS for gather upload. |
--noesrs | Do not attempt to upload using ESRS. |
--supportassist | Attempt SupportAssist upload. |
--nosupportassist | Do not attempt to upload using SupportAssist. |
--gather-mode (incremental | full) | Type of gather: incremental or full. |
--http-insecure <boolean> | Enable insecure HTTP upload on completed gather. |
--http-host <string> | HTTP Host to use for HTTP upload. |
--http-path <string> | Path on HTTP server to use for HTTP upload. |
--http-proxy <string> | Proxy server to use for HTTP upload. |
--http-proxy-port <integer> | Proxy server port to use for HTTP upload. |
--ftp <boolean> | Enable FTP upload on completed gather. |
--noftp | Do not attempt FTP upload. |
--set-ftp-password | Interactively specify alternate password for FTP. |
--ftp-host <string> | FTP host to use for FTP upload. |
--ftp-path <string> | Path on FTP server to use for FTP upload. |
--ftp-port <string> | Specifies alternate FTP port for upload. |
--ftp-proxy <string> | Proxy server to use for FTP upload. |
--ftp-proxy-port <integer> | Proxy server port to use for FTP upload. |
--ftp-mode <value> | Mode of FTP file transfer. Valid values are both, active, and passive. |
--ftp-user <string> | FTP user to use for FTP upload. |
--ftp-pass <string> | Specify alternative password for FTP. |
--ftp-ssl-cert <string> | Specifies the SSL certificate to use in FTPS connection. |
--ftp-upload-insecure <boolean> | Whether to attempt a plaintext FTP upload. |
--ftp-upload-pass <string> | FTP user to use for FTP upload password. |
--set-ftp-upload-pass | Specify the FTP upload password interactively. |
When a logfile gather arrives at Dell, it is automatically unpacked by a support process and analyzed using the logviewer tool.
Author: Nick Trimbee
OneFS and HTTP Security
Mon, 22 Apr 2024 20:35:30 -0000
|Read Time: 0 minutes
To enable granular HTTP security configuration, OneFS provides an option to disable nonessential HTTP components selectively. This can help reduce the overall attack surface of your infrastructure. Disabling a specific component’s service still allows other essential services on the cluster to continue to run unimpeded. In OneFS 9.4 and later, you can disable the following nonessential HTTP services:
Service | Description |
PowerScaleUI | The OneFS WebUI configuration interface. |
Platform-API-External | External access to the OneFS platform API endpoints. |
Rest Access to Namespace (RAN) | REST-ful access by HTTP to a cluster’s /ifs namespace. |
RemoteService | Remote Support and In-Product Activation. |
SWIFT (deprecated) | Deprecated object access to the cluster using the SWIFT protocol. This has been replaced by the S3 protocol in OneFS. |
You can enable or disable each of these services independently, using the CLI or platform API, if you have a user account with the ISI_PRIV_HTTP RBAC privilege.
You can use the isi http services CLI command set to view and modify the nonessential HTTP services:
# isi http services list ID Enabled ------------------------------ Platform-API-External Yes PowerScaleUI Yes RAN Yes RemoteService Yes SWIFT No ------------------------------ Total: 5
For example, you can easily disable remote HTTP access to the OneFS /ifs namespace as follows:
# isi http services modify RAN --enabled=0
You are about to modify the service RAN. Are you sure? (yes/[no]): yes
Similarly, you can also use the WebUI to view and edit a subset of the HTTP configuration settings, by navigating to Protocols > HTTP settings:
That said, the implications and impact of disabling each of the services is as follows:
Service | Disabling impacts |
WebUI | The WebUI is completely disabled, and access attempts (default TCP port 8080) are denied with the warning Service Unavailable. Please contact Administrator. If the WebUI is re-enabled, the external platform API service (Platform-API-External) is also started if it is not running. Note that disabling the WebUI does not affect the PlatformAPI service. |
Platform API | External API requests to the cluster are denied, and the WebUI is disabled, because it uses the Platform-API-External service. Note that the Platform-API-Internal service is not impacted if/when the Platform-API-External is disabled, and internal pAPI services continue to function as expected. If the Platform-API-External service is re-enabled, the WebUI will remain inactive until the PowerScaleUI service is also enabled. |
RAN | If RAN is disabled, the WebUI components for File System Explorer and File Browser are also automatically disabled. From the WebUI, attempts to access the OneFS file system explorer (File System > File System Explorer) fail with the warning message Browse is disabled as RAN service is not running. Contact your administrator to enable the service. This same warning also appears when attempting to access any other WebUI components that require directory selection. |
RemoteService | If RemoteService is disabled, the WebUI components for Remote Support and In-Product Activation are disabled. In the WebUI, going to Cluster Management > General Settings and selecting the Remote Support tab displays the message The service required for the feature is disabled. Contact your administrator to enable the service. In the WebUI, going to Cluster Management > Licensing and scrolling to the License Activation section displays the message The service required for the feature is disabled. Contact your administrator to enable the service. |
SWIFT | Deprecated object protocol and disabled by default. |
You can use the CLI command isi http settings view to display the OneFS HTTP configuration:
# isi http settings view Access Control: No Basic Authentication: No WebHDFS Ran HTTPS Port: 8443 Dav: No Enable Access Log: Yes HTTPS: No Integrated Authentication: No Server Root: /ifs Service: disabled Service Timeout: 8m20s Inactive Timeout: 15m Session Max Age: 4H Httpd Controlpath Redirect: No
Similarly, you can manage and change the HTTP configuration using the isi http settings modify CLI command.
For example, to reduce the maximum session age from four to two hours:
# isi http settings view | grep -i age Session Max Age: 4H # isi http settings modify --session-max-age=2H # isi http settings view | grep -i age Session Max Age: 2H
The full set of configuration options for isi http settings includes:
Option | Description |
--access-control <boolean> | Enable Access Control Authentication for the HTTP service. Access Control Authentication requires at least one type of authentication to be enabled. |
--basic-authentication <boolean> | Enable Basic Authentication for the HTTP service. |
--webhdfs-ran-https-port <integer> | Configure Data Services Port for the HTTP service. |
--revert-webhdfs-ran-https-port | Set value to system default for --webhdfs-ran-https-port. |
--dav <boolean> | Comply with Class 1 and 2 of the DAV specification (RFC 2518) for the HTTP service. All DAV clients must go through a single node. DAV compliance is NOT met if you go through SmartConnect, or using 2 or more node IPs. |
--enable-access-log <boolean> | Enable writing to a log when the HTTP server is accessed for the HTTP service. |
--https <boolean> | Enable the HTTPS transport protocol for the HTTP service. |
--https <boolean> | Enable the HTTPS transport protocol for the HTTP service. |
--integrated-authentication <boolean> | Enable Integrated Authentication for the HTTP service. |
--server-root <path> | Document root directory for the HTTP service. Must be within /ifs. |
--service (enabled | disabled | redirect | disabled_basicfile) | Enable/disable the HTTP Service or redirect to WebUI or disabled BasicFileAccess. |
--service-timeout <duration> | The amount of time (in seconds) that the server will wait for certain events before failing a request. A value of 0 indicates that the service timeout value is the Apache default. |
--revert-service-timeout | Set value to system default for --service-timeout. |
--inactive-timeout <duration> | Get the HTTP RequestReadTimeout directive from both the WebUI and the HTTP service. |
--revert-inactive-timeout | Set value to system default for --inactive-timeout. |
--session-max-age <duration> | Get the HTTP SessionMaxAge directive from both WebUI and HTTP service. |
--revert-session-max-age | Set value to system default for --session-max-age. |
--httpd-controlpath-redirect <boolean> | Enable or disable WebUI redirection to the HTTP service. |
Note that while the OneFS S3 service uses HTTP, it is considered a tier-1 protocol, and as such is managed using its own isi s3 CLI command set and corresponding WebUI area. For example, the following CLI command forces the cluster to only accept encrypted HTTPS/SSL traffic on TCP port 9999 (rather than the default TCP port 9021):
# isi s3 settings global modify --https-only 1 –https-port 9921 # isi s3 settings global view HTTP Port: 9020 HTTPS Port: 9999 HTTPS only: Yes S3 Service Enabled: Yes
Additionally, you can entirely disable the S3 service with the following CLI command:
# isi services s3 disable The service 's3' has been disabled.
Or from the WebUI, under Protocols > S3 > Global settings:
Author: Nick Trimbee