The following features are the primary methods for monitoring ObjectScale:
- ObjectScale portal—The dashboard on ObjectScale portal provides the first view into the system’s health. From the dashboard, you can review major issues by using the portal’s monitoring panes. Dashboard items to watch include:
- Health alerts. Examine the health pane to determine the critical, error, or warning alerts.
- Capacity. View the Metering pane to see the ObjectScale capacity utilization and determine if disks need to be added.
- Performance data. Use the historical view to determine if an object store’s performance is expected for the workload.
- Grafana integrated. Click Metrics to see the Grafana portal, which provides additional monitoring metrics including latency, bandwidth, and capacity.
- Event notifications—ObjectScale sends alerts and events to the UI, SupportAssist, and SNMP targets.
- ObjectScale service logs—ObjectScale service logs provide additional information. ObjectScale deploys rsyslog statfulset with pod anti-affinity rules to ensure that all nodes have at least one rsyslog instance.
- rsyslog stores logs under the following path inside rsyslog pods: /var/log/namespace/<objectstore name>/<pod name>/<pod_name>_<container_name>_<log_name>
- svc_log is a CLI interface for working with ObjectScale logs and rsyslog. It automatically locates logs for you, parallelizes search, and provides powerful filtering options while employing a relatively simple syntax.
Monitoring recommendations include:
- Watch for unevenness of CPU, memory, and network bandwidth between nodes.
- Become familiar with the performance of the system and the metrics that are expected over time so that if rates are out of the normal range, investigation can be initiated.
- Do not let object stores get beyond 90% capacity, Filling an object store will impact system stability.
- Monitor the alarms and alerts to get a better understanding of the system health status, watch for a higher-than-normal number of failed requests and determine root cause.
- Regularly check events and audit logs.