Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

Dell.com Contact Us
US(English)

Blogs

Short articles related to Dell PowerScale.

blogs (119)

  • PowerScale
  • OneFS

PowerScale OneFS 9.8

Nick Trimbee Nick Trimbee

Tue, 09 Apr 2024 14:00:00 -0000

|

Read Time: 0 minutes

It’s launch season here at Dell Technologies, and PowerScale is already scaling up spring with the innovative OneFS 9.8 release which shipped today, 9th April 2024. This new 9.8 release has something for everyone, introducing PowerScale innovations in cloud, performance, serviceability, and ease of use.

 A figure describing the differences in OneFS versions 9.6, 9.7, and 9.8. OneFS 9.8 includes APEX File Storage for Azure, NFSv4.1 over RDMA, Job Engine SmartThrottling, Serviceability enhancements in SmartLog and Auto-analysis, IPv6 Source-based routing, streaming write performance, and multipath client driver.Figure 1. OneFS 9.8 release features

APEX File Storage for Azure

After the debut of APEX File Storage for AWS last year, OneFS 9.8 amplifies PowerScale’s presence in the public cloud by introducing APEX File Storage for Azure.

A figure describing how APEX File Storage for Azure interacts with OneFS and the cloud.Figure 2. OneFS 9.8 APEX File Storage for Azure

In addition to providing the same OneFS software platform on-prem and in the cloud as well as customer-managed for full control, APEX File Storage for Azure in OneFS 9.8 provides linear capacity and performance scaling from four to eighteen SSD nodes and up to 3PB per cluster, making it a solid fit for AI, ML, and analytics applications, as well as traditional file shares and home directories and vertical workloads like M&E, healthcare, life sciences, and financial services.

A diagram showing how OneFS 9.8 works within PowerScale and alongside APEX file storage for Azure and AWS, including multi-protocol access, data reduction, CloudPools, SmartQuotas, SyncIQ, SnapshotIQ, SmartQoS, and SmartConnect.Figure 3. Dell PowerScale scale-out architecture

PowerScale’s scale-out architecture can be deployed on customer-managed AWS and Azure infrastructure, providing the capacity and performance needed to run a variety of unstructured workflows in the public cloud.

Once in the cloud, existing PowerScale investments can be further leveraged by accessing and orchestrating your data through the platform's multi-protocol access and APIs. 

This includes the common OneFS control plane (CLI, WebUI, and platform API) and the same enterprise features, such as Multi-protocol, SnapshotIQ, SmartQuotas, Identity management, and so on.        

Simplicity and efficiency

OneFS 9.8 SmartThrottling is an automated impact control mechanism for the job engine, allowing the cluster to automatically throttle job resource consumption if it exceeds pre-defined thresholds in order to prioritize client workloads. 

OneFS 9.8 also delivers automatic on-cluster core file analysis, and SmartLog provides an efficient, granular log file gathering and transmission framework. Both of these new features help dramatically accelerate the ease and time to resolution of cluster issues.

Performance

OneFS 9.8 also adds support for Remote Direct Memory Access (RDMA) over NFS 4.1 support for applications and clients. This allows for substantially higher throughput performance – especially in the case of single-connection and read-intensive workloads such as machine learning and generative AI model training – while also reducing both cluster and client CPU utilization and provides the foundation for interoperability with NVIDIA’s GPUDirect.

RDMA over NFSv4.1 in OneFS 9.8 leverages the ROCEv2 network protocol. OneFS CLI and WebUI configuration options include global enablement and IP pool configuration, filtering, and verification of RoCEv2 capable network interfaces. NFS over RDMA is available on all PowerScale platforms containing Mellanox ConnectX network adapters on the front end and with a choice of 25, 40, or 100 Gigabit Ethernet connectivity. The OneFS user interface helps easily identify which of a cluster’s NICs support RDMA.

Under the hood, OneFS 9.8 introduces efficiencies such as lock sharding and parallel thread handling, delivering a substantial performance boost for streaming write-heavy workloads such as generative AI inferencing and model training. Performance scales linearly as compute is increased, keeping GPUs busy and allowing PowerScale to easily support AI and ML workflows both small and large. OneFS 9.8 also includes infrastructure support for future node hardware platform generations.

Multipath Client Driver

The addition of a new Multipath Client Driver helps expand PowerScale’s role in Dell Technologies’ strategic collaboration with NVIDIA, delivering the first and only end-to-end large scale AI system. This is based on the PowerScale F710 platform in conjunction with PowerEdge XE9680 GPU servers and NVIDIA’s Spectrum-X Ethernet switching platform to optimize performance and throughput at scale.

In summary, OneFS 9.8 brings the following new features to the Dell PowerScale ecosystem:

Feature

Info

Cloud

  • APEX File Storage for Azure
  • Up to 18 SSD nodes and 3PB per cluster

Simplicity

  • Job Engine SmartThrottling
  • Source-based routing for IPv6 networks

Performance

  • NFSv4.1 over RDMA
  • Streaming write performance enhancements
  • Infrastructure support for next generation all-flash node hardware platform

Serviceability

  • Automatic on-cluster core file analysis
  • SmartLog efficient, granular log file gathering

We’ll be taking a deeper look at this new functionality in blog articles over the course of the next few weeks. 

Meanwhile, the new OneFS 9.8 code is available on the Dell Online Support site, both as an upgrade and reimage file, allowing installation and upgrade of this new release.

 

Author: Nick Trimbee

Read Full Blog

Unveiling APEX File Storage for Microsoft Azure – Running PowerScale OneFS on Azure

Vincent Shen Vincent Shen

Tue, 09 Apr 2024 20:30:02 -0000

|

Read Time: 0 minutes

Overview

PowerScale OneFS 9.8 now brings a new offering in Azure — APEX File Storage for Microsoft Azure! It is a software-defined cloud file storage service that provides high-performance, flexible, secure, and scalable file storage for Microsoft Azure environments. It is also a fully customer managed service that is designed to meet the needs of enterprise-scale file workloads running on Azure. This offer joins another native cloud solution that was released last year - APEX File Storage for AWS, for more information, refer to the link: https://www.dell.com/en-us/dt/apex/storage/public-cloud/file.htm?hve=explore+file

Benefits of running OneFS in Cloud

APEX File Storage for Microsoft Azure brings the OneFS distributed file system software into the public cloud, allowing users to have the same management experience in the cloud as with their on-premises PowerScale appliance.

With APEX File Storage for Microsoft Azure, you can easily deploy and manage file storage on Azure. The service provides a scalable and elastic storage infrastructure that can grow, according to your actual business needs.

Some of the key features and benefits of APEX File Storage for Microsoft Azure include:

  • Scale-out: APEX File Storage for Microsoft Azure is powered by the Dell PowerScale OneFS distributed file system. You can start with a small OneFS cluster (minimal 4 nodes) and then expand it incrementally as your data storage requirements grow up to 5.6 PiB in cluster capacity with a single namespace (maximum 18 nodes). This large capacity helps support the most demanding, data intensive workloads such as AI.
  • Data management: APEX File Storage for Microsoft Azure provides powerful data management capabilities such as: snapshot, data replication, and backup and restore. Because OneFS features are the same in the cloud as they are in on-premises, organizations can simplify operations and reduce management complexity with a consistent user experience.
  • Simplified journey to hybrid cloud: More and more organizations operate in a hybrid cloud environment, where they need to move data between on-premises and cloud-based environments. APEX File Storage for Microsoft Azure can help you bridge this gap by facilitating seamless data mobility between on-premises and the cloud with native replication and by providing a consistent data management platform across both environments. Once in the cloud, customers can take advantage of enterprise-class OneFS features such as: multi-protocol support, CloudPools, data reduction, security, and snapshots, to run their workloads in the same way as they do on-premises.
  • Data resilience: Ensuring data resilience is critical for businesses to maintain continuity and to safeguard information. APEX File Storage for Microsoft Azure implements erasure coding techniques. This advanced approach optimizes storage efficiency and enhances fault tolerance, enabling the cluster to withstand multiple node failures. By spreading nodes across different racks using Azure availability set, the cluster ensures that data accessibility is maintained in the event of a rack failure.
  • High performance: APEX File Storage for Microsoft Azure delivers high-performance file storage with low-latency access to data, ensuring that you can access data quickly and efficiently. Compared to Azure NetApp Files, Dell APEX File Storage for Microsoft Azure enables: about 6x greater cluster performance, up to 11x larger namespace, up to 23x more snapshots per volume, 2x higher cluster resiliency, and an easier and more robust cluster expansion.
  • Proactive support: With a 97% customer satisfaction rate, Dell Support Services provides highly trained experts around the clock and around the globe to address your OneFS needs, minimize disruptions, and help you maintain a high level of productivity and outcomes.  

Architecture

APEX File Storage for Microsoft Azure is a software-defined cloud file storage service that combines the power of OneFS distributed file system with the flexibility and scalability of cloud infrastructure. It is a fully customer-managed service that is designed to meet the needs of enterprise-scale file workloads running on Azure.

The architecture of APEX File Storage for Microsoft Azure is built on the OneFS distributed file system. This architecture uses multiple cluster nodes to establish a single global namespace. Each cluster node operates as an instance of the OneFS software, running on an Azure VM to deliver storage capacity and compute resources. It is worth noting that the network bandwidth limit at the Azure VM level is shared between the cluster internal network and the external network.

APEX File Storage for Microsoft Azure uses cloud-native technologies and leverages the elasticity of cloud infrastructure, so that you can easily scale the storage infrastructure as your business requirements grow. APEX File Storage for Microsoft Azure can dynamically scale storage capacity and performance to meet changing demands. It is able to add additional cluster nodes without disruption enabling the storage infrastructure to scale in a more cost-effective and efficient manner. To guarantee the durability and resiliency of data, APEX File Storage for Microsoft Azure distributes data across multiple nodes within the cluster. It also uses advanced data protection techniques such as erasure coding and it provides features such as SyncIQ to ensure that data is available. Even in the event of one or more node failures, the data remains accessible from the remaining cluster nodes.

 

Availability set and proximity placement group: APEX File Storage for Microsoft Azure is designed to run in an availability set, and the availability set is associated with a dedicated proximity placement group. In this way, APEX File Storage for Microsoft Azure can have better reliability by ensuring more consistent, lower latency on the cluster backend network.

Virtual network: APEX File Storage for Microsoft Azure requires an Azure virtual network to provide network connectivity.

  • OneFS cluster internal subnet: The cluster nodes communicate with each other through the internal subnet. The internal subnet must be isolated from VMs that are not in the cluster. Therefore, a dedicated subnet is required for the internal network interfaces of cluster nodes that do not share the internal subnets with other Azure VMs.
  • OneFS cluster external subnet: The cluster nodes communicate with clients through the external subnet by using different protocols, such as NFS, SMB, and S3. 
  • OneFS cluster internal network interfaces: Network interfaces are in the internal subnet.
  • OneFS cluster external network interfaces: Network interfaces are in the external subnet.
  • Network security group: The network security group applies to the cluster network interfaces, which allows/denies specific traffic to OneFS cluster.
  • Azure VMs: These VMs serve as cluster nodes running the OneFS file system, backed by Azure managed disks. Each node within the cluster is strategically placed in an availability set and a proximity placement group. This configuration ensures that all nodes reside in separate fault domains, enhancing reliability, and it brings them physically closer together to enable lower network latency between cluster nodes. See the Azure availability sets overview and Azure proximity placement groups documentation for more details.

Overall, APEX File Storage for Microsoft Azure offers a powerful and flexible scale-out file storage solution that can help you improve data management, optimize costs, scalability, and security in a cloud-based environment.

Supported cluster configurations

Table 1 shows the supported configuration for APEX File Storage on Azure. It provides you the flexibility to choose different cluster size, various Azure VM size/SKU and so many Azure disk options to meeting your business requirements. For the detailed explanation of these configurations, refer to https://infohub.delltechnologies.com/en-US/t/apex-file-storage-for-microsoft-azure-deployment-guide.

  1. Supported configuration for a single cluster

Configuration items

Supported options

Cluster size

4 to 18 nodes

Azure VM size/SKU

All nodes in a cluster must use the same VM size/SKU. The supported VM sizes are: 

Azure managed disk type

All nodes in a cluster muse use the same disk type. The supported disk types are:

Note: Premium SSDs are only supported with Ddsv5-series and Edsv5-series

Azure managed disk size

All nodes in a cluster muse use the same disk size. The supported disk sizes are:

  • 0.5 TiB: P20 or E20 
  •    1 TiB: P30 or E30 
  •    2 TiB: P40, E40, or S40
  •    4 TiB: P50, E50, or S50
  •    8 TiB: P60, E60, or S60
  •  16 TiB: P70, E70, or S70

Disk count per node

All nodes in a cluster muse use the same disk count. The supported disk counts are:

  • 5, 6, 10, 12, 15, 18, 20, 24, 25, or 30

Cluster raw capacity

Minimum: 10 TiB, maximum: 5760 TiB

Cluster protection level

Default is +2n. Also supports +2d:1n with additional capacity restrictions. 

Support Regions

APEX File Storage for Microsoft Azure is globally available. For the detailed regions, refer to https://infohub.delltechnologies.com/en-US/t/apex-file-storage-for-microsoft-azure-deployment-guide.

Performance

Compared to Azure NetApp Files, Dell APEX File Storage for Microsoft Azure enables about 6x greater cluster performance, up to 11x larger namespace, up to 23x more snapshots per volume, 2x higher cluster resiliency, and easier and more robust cluster expansion.

Besides that, I will show you an example of how sequential read and sequential write performance can be linearly scaled out from 4 nodes to 18 nodes to make sure it meets your business requirements. 

This is what we have set up: 

We are using Azure Standard D48ds_v5 VM type and we scale from 10 nodes to 14 nodes and in the end to 18 nodes for testing purposes. With each deployment we kept all the other factors the same, we maintained 12 P40 Azure premium SSDs for data disks in each node. The following table displays the configuration:

Node type 

Node count 

Data disk type 

Data disk count 

Standard_D48ds_v5 

10 

P40 

12 

Standard_D48ds_v5 

14 

P40 

12 

Standard_D48ds_v5 

18 

P40 

12 

The diagram below demonstrates how read performance increases when we scale out our APEX File Storage for Microsoft Azure. You can see a clear positive trend starting from 10 nodes to 18 nodes. The same conclusion also applies with write performance. 

 

Another example is that you can also scale-up the overall performance of an APEX File Storage for Microsoft Azure by choosing a more powerful Azure VM size/SKU:

In this example, we tested the following Azure VM size/SKU with the same node number (4) and disk number per node (12):

  • D32ds_v5
  • D48ds_v5
  • D64ds_v5
  • E104ids_v5

From the results, we can easily find that with the scale-up of Azure VM size/SKU, both read and write performance increase:

 For more details on the performance results and best practices, refer to the following whitepaper https://infohub.delltechnologies.com/en-US/t/introduction-to-apex-file-storage-for-azure-1/.

Resources

https://infohub.delltechnologies.com/en-US/t/apex-file-storage-for-microsoft-azure-deployment-guide

https://infohub.delltechnologies.com/en-US/t/introduction-to-apex-file-storage-for-azure-1/

https://www.dell.com/en-us/blog/ai-anywhere-with-apex-file-storage-for-microsoft-azure/

Authors:

Vincent Shen, Lieven Lin, and Jason He

 

 

 

Read Full Blog
  • AI
  • deep learning
  • machine learning
  • PowerScale
  • OneFS
  • Unstructured Data

Optimizing AI: Meeting Unstructured Storage Demands Efficiently

Aqib Kazi Aqib Kazi

Thu, 21 Mar 2024 14:46:23 -0000

|

Read Time: 0 minutes

The surge in artificial intelligence (AI) and machine learning (ML) technologies has sparked a revolution across industries, pushing the boundaries of what's possible. However, this innovation comes with its own set of challenges, particularly when it comes to storage. The heart of AI's potential lies in its ability to process and learn from vast amounts of data, most of which is unstructured. This has placed unprecedented demands on storage solutions, becoming a critical bottleneck for advancing AI technologies.

Navigating the complex landscape of unstructured data storage is no small feat. Traditional storage systems struggle to keep up with the scale and flexibility required by AI workloads. Enterprises find themselves at a crossroads, seeking solutions that can provide scalable, affordable, and fault-tolerant storage. The quest for such a platform is not just about meeting current needs but also paving the way for the future of AI-driven innovation.

The current state of ML and AI

The evolution of ML and AI technologies has reshaped industries far and wide, setting new expectations for data processing and analysis capabilities. These advancements are directly tied to an organization's capacity to handle vast volumes of unstructured data, a domain where traditional storage solutions are being outpaced.

ML and AI applications demand unprecedented levels of data ingestion and computational power, necessitating scalable and flexible storage solutions. Traditional storage systems—while useful for conventional data storage needs—grapple with scalability issues, particularly when faced with the immense file quantities AI and ML workloads generate.

Although traditional object storage methods are capable of managing data as objects within a pool, they fall short when meeting the agility and accessibility requirements essential for AI and ML processes. These storage models struggle with scalability and facilitating the rapid access and processing of data crucial for deep learning and AI algorithms.

The dire necessity of a new kind of storage solution is evident as the current infrastructure is unable to cope with the silos of unstructured data. These silos make it challenging to access, process, and unify data sources, which in turn cripples the effectiveness of AI and ML projects. Furthermore, the maximum storage capacity of traditional storage, tethering at tens of terabytes, is insufficient for the needs of AI-driven initiatives which often require petabytes of data to train sophisticated models.

As ML and AI continue to advance, the quest for a storage solution that can support the growing demands of these technologies remains pivotal. The industry is in dire need of systems that provide ample storage and ensure the flexibility, reliability, and performance efficiency necessary to propel AI and ML into their next phase of innovation.

Understanding unstructured storage demands for AI

The advent of AI and ML has brought unprecedented advancements across industries, enhancing efficiency, accuracy, and the ability to manage and process large datasets. However, the core of these technologies relies on the capability to store, access, and analyze unstructured data efficiently. Understanding the storage demands essential for AI applications is crucial for businesses looking to harness the full power of AI technology.

High throughput and low latency

For AI and ML applications, time is of the essence. The ability to process data at high speeds with high throughput and access it with minimal delay and low latency are non-negotiable requirements. These applications often involve complex computations performed on vast datasets, necessitating quick access to data to maintain a seamless process. For instance, in real-time AI applications such as voice recognition or instant fraud detection, any delay in data processing can critically impact performance and accuracy. Therefore, storage solutions must be designed to accommodate these needs, delivering data as swiftly as possible to the application layer.

Scalability and flexibility

As AI models evolve and the volume of data increases, the need for scalability in storage solutions becomes paramount. The storage architecture must accommodate growth without compromising on performance or efficiency. This is where the flexibility of the storage solutions comes into play. An ideal storage system for AI would scale in capacity and performance, adapting to the changing demands of AI applications over time. Combining the best of on-premises and cloud storage, hybrid storage solutions offer a viable path to achieving this scalability and flexibility. They enable businesses to leverage the high performance of on-premise solutions and the scalability and cost-efficiency of cloud storage, ensuring the storage infrastructure can grow with the AI application needs.

Data durability and availability

Ensuring the durability and availability of data is critical for AI systems. Data is the backbone of any AI application, and its loss or unavailability can lead to significant setbacks in development and performance. Storage solutions must, therefore, provide robust data protection mechanisms and redundancies to safeguard against data loss. Additionally, high availability is essential to ensure that data is always accessible when needed, particularly for AI applications that require continuous operation. Implementing a storage system with built-in redundancy, failover capabilities, and disaster recovery plans is essential to maintain continuous data availability and integrity.

In the context of AI where data is continually ingested, processed, and analyzed, the demands on storage solutions are unique and challenging. Key considerations include maintaining high throughput and low latency for real-time processing, establishing scalability and flexibility to adapt to growing data volumes, and ensuring data durability and availability to support continuous operation. Addressing these demands is critical for businesses aiming to leverage AI technologies effectively, paving the way for innovation and success in the digital era.

What needs to be stored for AI?

The evolution of AI and its underlying models depends significantly on various types of data and artifacts generated and used throughout its lifecycle. Understanding what needs to be stored is crucial for ensuring the efficiency and effectiveness of AI applications.

Raw data

Raw data forms the foundation of AI training. It's the unmodified, unprocessed information gathered from diverse sources. For AI models, this data can be in the form of text, images, audio, video, or sensor readings. Storing vast amounts of raw data is essential as it provides the primary material for model training and the initial step toward generating actionable insights.

Preprocessed data

Once raw data is collected, it undergoes preprocessing to transform it into a more suitable format for training AI models. This process includes cleaning, normalization, and transformation. As a refined version of raw data, preprocessed data needs to be stored efficiently to streamline further processing steps, saving time and computational resources.

Training datasets

Training datasets are a selection of preprocessed data used to teach AI models how to make predictions or perform tasks. These datasets must be diverse and comprehensive, representing real-world scenarios accurately. Storing these datasets allows AI models to learn and adapt to the complexities of the tasks they are designed to perform.

Validation and test datasets

Validation and test datasets are critical for evaluating an AI model's performance. These datasets are separate from the training data and are used to tune the model's parameters and test its generalizability to new, unseen data. Proper storage of these datasets ensures that models are both accurate and reliable.

Model parameters and weights

An AI model learns to make decisions through its parameters and weights. These elements are fine-tuned during training and crucial for the model's decision-making processes. Storing these parameters and weights allows models to be reused, updated, or refined without retraining from scratch.

Model architecture

The architecture of an AI model defines its structure, including the arrangement of layers and the connections between them. Storing the model architecture is essential for understanding how the model processes data and for replicating or scaling the model in future projects.

Hyperparameters

Hyperparameters are the configuration settings used to optimize model performance. Unlike parameters, hyperparameters are not learned from the data but set prior to the training process. Storing hyperparameter values is necessary for model replication and comparison of model performance across different configurations.

Feature engineering artifacts

Feature engineering involves creating new input features from the existing data to improve model performance. The artifacts from this process, including the newly created features and the logic used to generate them, need to be stored. This ensures consistency and reproducibility in model training and deployment.

Results and metrics

The results and metrics obtained from model training, validation, and testing provide insights into model performance and effectiveness. Storing these results allows for continuous monitoring, comparison, and improvement of AI models over time.

Inference data

Inference data refers to new, unseen data that the model processes to make predictions or decisions after training. Storing inference data is key for analyzing the model's real-world application and performance and making necessary adjustments based on feedback.

Embeddings

Embeddings are dense representations of high-dimensional data in lower-dimensional spaces. They play a crucial role in processing textual data, images, and more. Storing embeddings allows for more efficient computation and retrieval of similar items, enhancing model performance in recommendation systems and natural language processing tasks.

Code and scripts

The code and scripts used to create, train, and deploy AI models are essential for understanding and replicating the entire AI process. Storing this information ensures that models can be retrained, refined, or debugged as necessary.

Documentation and metadata

Documentation and metadata provide context, guidelines, and specifics about the AI model, including its purpose, design decisions, and operating conditions. Proper storage of this information supports ethical AI practices, model interpretability, and compliance with regulatory standards.

Challenges of unstructured data in AI

In the realm of AI, handling unstructured data presents a unique set of challenges that must be navigated carefully to harness its full potential. As AI systems strive to mimic human understanding, they face the intricate task of processing and deriving meaningful insights from data that lacks a predefined format. This section delves into the core challenges associated with unstructured data in AI, primarily focusing on data variety, volume, and velocity.

Data variety

Data variety refers to the myriad types of unstructured data that AI systems are expected to process, ranging from texts and emails to images, videos, and audio files. Each data type possesses its unique characteristics and demands specific preprocessing techniques to be effectively analyzed by AI models.

  • Richer Insights but Complicated Processing: While the diverse data types can provide richer insights and enhance model accuracy, they significantly complicate the data preprocessing phase. AI tools must be equipped with sophisticated algorithms to identify, interpret, and normalize various data formats.
  • Innovative AI Applications: The advantage of mastering data variety lies in the development of innovative AI applications. By handling unstructured data from different domains, AI can contribute to advancements in natural language processing, computer vision, and beyond.

Data volume

The sheer volume of unstructured data generated daily is staggering. As digital interactions increase, so does the amount of data that AI systems need to analyze.

  • Scalability Challenges: The exponential growth in data volume poses scalability challenges for AI systems. Storage solutions must not only accommodate current data needs but also be flexible enough to scale with future demands.
  • Efficient Data Processing: AI must leverage parallel processing and cloud storage options to keep up with the volume. Systems designed for high-throughput data analysis enable quicker insights, which are essential for timely decision-making and maintaining relevance in a rapidly evolving digital landscape.

Data velocity

Data velocity refers to the speed at which new data is generated and the pace at which it needs to be processed to remain actionable. In the age of real-time analytics and instant customer feedback, high data velocity is both an opportunity and a challenge for AI.

  • Real-Time Processing Needs: AI systems are increasingly required to process information in real-time or near-real-time to provide timely insights. This necessitates robust computational infrastructure and efficient data streaming technologies.
  • Constant Adaptation: The dynamic nature of unstructured data, coupled with its high velocity, demands that AI systems constantly adapt and learn from new information. Maintaining accuracy and relevance in fast-moving data environments is critical for effective AI performance.

In addressing these challenges, AI and ML technologies are continually evolving, developing more sophisticated systems capable of handling the complexity of unstructured data. The key to unlocking the value hidden within this data lies in innovative approaches to data management where flexibility, scalability, and speed are paramount.

Strategies to manage unstructured data in AI

The explosion of unstructured data poses unique challenges for AI applications. Organizations must adopt effective data management strategies to harness the full potential of AI technologies. In this section, we delve into key strategies like data classification and tagging and the use of PowerScale clusters to efficiently manage unstructured data in AI.

Data classification and tagging

Data classification and tagging are foundational steps in organizing unstructured data and making it more accessible for AI applications. This process involves identifying the content and context of data and assigning relevant tags or labels, which is crucial for enhancing data discoverability and usability in AI systems.

  • Automated tagging tools can significantly reduce the manual effort required to label data, employing AI algorithms to understand the content and context automatically.
  • Custom metadata tags allow for the creation of a rich set of file classification information. This not only aids in the classification phase but also simplifies later iterations and workflow automation.
  • Effective data classification enhances data security by accurately categorizing sensitive or regulated information, enabling compliance with data protection regulations.

Implementing these strategies for managing unstructured data prepares organizations for the challenges of today's data landscape and positions them to capitalize on the opportunities presented by AI technologies. By prioritizing data classification and leveraging solutions like PowerScale clusters, businesses can build a strong foundation for AI-driven innovation.

An image of a human using AI for analytics.

Best practices for implementing AI storage solutions

Implementing the right AI storage solutions is crucial for businesses seeking to harness the power of artificial intelligence. With the explosive growth of unstructured data, adhering to best practices that optimize performance, scalability, and cost is imperative. This section delves into key practices to ensure your AI storage infrastructure meets the demands of modern AI workloads.

Assess workload requirements

Before diving into storage solutions, one must thoroughly assess AI workload requirements. Understanding the specific needs of your AI applications—such as the volume of data, the necessity for high throughput/low latency, and the scalability and availability requirements—is fundamental. This step ensures you select the most suitable storage solution that meets your application's needs.

AI workloads are diverse, with each having unique demands on storage infrastructure. For instance, training a machine learning model could require rapid access to vast amounts of data, whereas inference workloads may prioritize low latency. An accurate assessment leads to an optimized infrastructure, ensuring that storage solutions are neither overprovisioned nor underperforming, thereby supporting AI applications efficiently and cost-effectively.

Leverage PowerScale

For managing large volumes and varieties of unstructured data, leveraging PowerScale nodes offers a scalable and efficient solution. PowerScale nodes are designed to handle the complexities of AI and machine learning workloads, offering optimized performance, scalability, and data mobility. These clusters allow organizations to store and process vast amounts of data efficiently for a range of AI use cases due to the following:

  • Scalability is a key feature, with PowerScale clusters capable of growing with the organization's data needs. They support massive capacities, allowing businesses to store petabytes of data seamlessly.
  • Performance is optimized for the demanding workloads of AI applications with the ability to process large volumes of data at high speeds, reducing the time for data analyses and model training.
  • Data mobility within PowerScale clusters on-premise and in the cloud ensures that data can be accessed when and where needed, supporting various AI and machine learning use cases across different environments.

PowerScale clusters allow businesses to start small and grow capacity as needed, ensuring that storage infrastructure can scale alongside AI initiatives without compromising on performance. The ability to handle multiple data types and protocols within a single storage infrastructure simplifies management and reduces operational costs, making PowerScale nodes an ideal choice for dynamic AI environments.

Utilize PowerScale OneFS 9.7.0.0

PowerScale OneFS 9.7.0.0 is the latest version of  the Dell PowerScale operating system for scale-out network-attached storage (NAS). OneFS 9.7.0.0 introduces several enhancements in data security, performance, cloud integration, and usability. 

OneFS 9.7.0.0 extends and simplifies the PowerScale offering in the public cloud, providing more features across various instance types and regions. Some of the key features in OneFS 9.7.0.0 include:

  • Cloud Innovations: Extends cloud capabilities and features, building upon the debut of APEX File Storage for AWS
  • Performance Enhancements: Enhancements to overall system performance
  • Security Enhancements: Enhancements to data security features
  • Usability Improvements: Enhancements to make managing and using PowerScale easier

Employ PowerScale F210 and F710

PowerScale, through its continuous innovation, extends into the AI era by introducing the next generation of PowerEdge-based nodes: the PowerScale F210 and F710. These new all-flash nodes leverage the Dell PowerEdge R660 from the PowerEdge platform, unlocking enhanced performance capabilities.

On the software front, both the F210 and F710 nodes benefit from significant performance improvements in PowerScale OneFS 9.7. These nodes effectively address the most demanding workloads by combining hardware and software innovations. The PowerScale F210 and F710 nodes represent a powerful combination of hardware and software advancements, making them well-suited for a wide range of workloads. For more information on the F210 and F710, see PowerScale All-Flash F210 and F710 | Dell Technologies Info Hub.

Ensure data security and compliance

Given the sensitivity of the data used in AI applications, robust security measures are paramount. Businesses must implement comprehensive security strategies that include encryption, access controls, and adherence to data protection regulations. Safeguarding data protects sensitive information and reinforces customer trust and corporate reputation.

Compliance with data protection laws and regulations is critical to AI storage solutions. As regulations can vary significantly across regions and industries, understanding and adhering to these requirements is essential to avoid significant fines and legal challenges. By prioritizing data security and compliance, organizations can mitigate risks associated with data breaches and non-compliance.

Monitor and optimize

Continuous storage environment monitoring and optimization are essential for maintaining high performance and efficiency. Monitoring tools can provide insights into usage patterns, performance bottlenecks, and potential security threats, enabling proactive management of the storage infrastructure.

Regular optimization efforts can help fine-tune storage performance, ensuring that the infrastructure remains aligned with the evolving needs of AI applications. Optimization might involve adjusting storage policies, reallocating resources, or upgrading hardware to improve efficiency, reduce costs, and ensure that storage solutions continue to effectively meet the demands of AI workloads.

By following these best practices, businesses can build and maintain a storage infrastructure that supports their current AI applications and is poised for future growth and innovation.

Conclusion

Navigating the complexities of unstructured storage demands for AI is no small feat. Yet, by adhering to the outlined best practices, businesses stand to benefit greatly. The foundational steps include assessing workload requirements, selecting the right storage solutions, and implementing robust security measures. Furthermore, integrating PowerScale nodes and a commitment to continuous monitoring and optimization are key to sustaining high performance and efficiency. As the landscape of AI continues to evolve, these practices will not only support current applications but also pave the way for future growth and innovation. In the dynamic world of AI, staying ahead means being prepared, and these strategies offer a roadmap to success.

Frequently asked questions

How big are AI data centers?

Data centers catering to AI, such as those by Amazon and Google, are immense, comparable to the scale of football stadiums.

How does AI process unstructured data?

AI processes unstructured data including images, documents, audio, video, and text by extracting and organizing information. This transformation turns unstructured data into actionable insights, propelling business process automation and supporting AI applications.

How much storage does an AI need?

AI applications, especially those involving extensive data sets, might require significant memory, potentially as much as 1TB or more. Such vast system memory efficiently facilitates the processing and statistical analysis of entire data sets.

Can AI handle unstructured data?

Yes, AI is capable of managing both structured and unstructured data types from a variety of sources. This flexibility allows AI to analyze and draw insights from an expansive range of data, further enhancing its utility across diverse applications.

 

Author: Aqib Kazi, Senior Principal Engineer, Technical Marketing

Read Full Blog
  • AI
  • PowerScale
  • Storage
  • Security
  • safety and security
  • Video

The Influence of Artificial Intelligence on Video, Safety, and Security

Mordekhay Shushan Brian St.Onge Mordekhay Shushan Brian St.Onge

Fri, 23 Feb 2024 22:45:15 -0000

|

Read Time: 0 minutes

SIA recently unveiled its 2024 Security Megatrend report in which AI prominently claims the top position, dominating all four top spots. With AI making waves across global industries, there arises a set of concerns that demand thoughtful consideration. The key megatrends highlighted are as follows:

  • AI: Security of AI
  • AI: Visual Intelligence (Distinct from Video Surveillance)
  • AI: Generative AI
  • AI: Regulations of AI

This discussion will specifically delve into the first two trends—AI Security and Visual Intelligence.

Security of AI

The top spot on the list is occupied by the security of AI. Ironically, the most effective security for AI is AI itself. AI is tasked with monitoring behaviors related to data creation and access, identifying anomalies indicative of potential malicious activities. As businesses increasingly adopt AI, the value of data rises significantly for the organization. However, with AI becoming a more integral operational component, a cyber incident could disrupt not only data but also overall operations and production, particularly when there's a lack of metadata for decision-making.

Ensuring robust cyber protection for data becomes crucial, and solutions like the Ransomware Defender in Dell Technologies' unstructured data offering play a key role. Cyber recovery strategies are also imperative to swiftly resume normal operations. An air-gapped cyber recovery vault is essential, minimizing disruptions and securing a clean and complete dataset for rapid recovery from incidents.

 This is an illustration of how an air-gapped cyber recovery vault works. An operational airgap separates the cyber recovery vault and ensures a clean and complete dataset is available for rapid recovery from incidentsFigure 1. Air-gapped cyber recovery vault

AI visual intelligence

AI Visual Intelligence has been increasingly used across various industries for a multitude of purposes, including object recognition and classification, anomaly detection, predictive analytics, customer insights and experience enhancement, autonomous systems, healthcare diagnostics, environmental monitoring, and surveillance and security. By integrating AI Visual Intelligence into their operations, businesses can harness the power of visual data to improve decision-making, automate processes, enhance efficiencies, and unlock new opportunities for innovation and growth.

Video extends beyond security to impact business operations, enhancing efficiencies as the metadata collected from cameras serves business use cases beyond security functions. An example is the collection of this metadata, such as image metadata, timestamps, objects metadate, geo location, and more. The collection of this metadata necessitates a robust storage solution to preserve complete datasets readily available for models to achieve desired outcomes. This data is considered a mission-critical workload, demanding optimal uptime for storage solutions.

Adopting an N+X node-based storage architecture on-premises guarantees that data is consistently written and available, providing 99.9999% (6 nines) availability in an on-prem cloud environment. Dell Unstructured Data Solutions align perfectly with this workload, ensuring uninterrupted business operations compared to server-based storage solutions facing challenges during deployment or encountering issues with public cloud connectivity. The potential cost-prohibitive nature of public cloud storage for the data required in regular AI modeling may lead to a continued trend of cloud repatriation to on-premises.

Security practitioners evaluating the need for cameras must now strategically map out potential stakeholders within organizations to determine camera requirements aligned with their business outcomes. This strategic approach is anticipated to drive a higher demand for cameras and associated services.

Resources

Check out Dell PowerScale for more information about Dell PowerScale solutions.

 

Authors: Mordi Shushan, Brian Stonge  


Read Full Blog
  • PowerScale
  • OneFS
  • F210
  • F710

Introducing the Next Generation of PowerScale – the AI Ready Data Platform

Aqib Kazi Aqib Kazi

Tue, 20 Feb 2024 19:07:47 -0000

|

Read Time: 0 minutes

Generative AI systems thrive on vast amounts of unstructured data, which are essential for training algorithms to recognize patterns, make predictions, and generate new content. Unstructured data – such as text, images, and audio – does not follow a predefined model, making it more complex and varied than structured data.

Preprocessing unstructured data

Unstructured data does not have a predefined format or schema, including text, images, audio, video, or documents. Preprocessing unstructured data involves cleaning, normalizing, and transforming the data into a structured or semi-structured form that the AI can understand and that can be used for analysis or machine learning.

Preprocessing unstructured data for generative AI is a crucial step that involves preparing the raw data for use in training AI models. The goal is to enhance the quality and structure of the data to improve the performance of generative models.

There are different steps and techniques for preprocessing unstructured data, depending on the type and purpose of the data. Some common steps are:

  • Data completion: This step involves filling in missing or incomplete data, either by using average or estimated values or by discarding or ignoring the data points with missing fields.
  • Data noise reduction: This step involves removing or reducing irrelevant, redundant, or erroneous data, such as duplicates, spelling errors, hidden objects, or background noise.
  • Data transformation: This step involves converting the data into a standard or consistent format, including scaling and normalizing numerical data, encoding categorical data, or extracting features from text, image, audio, or video data.
  • Data reduction: This step involves reducing the dimensionality or size of the data, either by selecting a subset of relevant features or data points or by applying techniques such as principal component analysis, clustering, or sampling.
  • Data validation: This step involves checking the quality and accuracy of the preprocessed data by using statistical methods, visualization tools, or domain knowledge.

These steps can help enhance the quality, reliability, and interpretability of the data, which can improve the performance and outcomes of the analysis or machine learning models.

PowerScale F210 and F710 platform

PowerScale’s continuous innovation extends into the AI era with the introduction of the next generation of PowerEdge-based nodes, including the PowerScale F210 and F710. The new PowerScale all-flash nodes leverage Dell PowerEdge R660, unlocking the next generation of performance. On the software front, the F210 and F710 take advantage of significant performance improvements in PowerScale OneFS 9.7. Combining the hardware and software innovations, the F210 and F710 tackle the most demanding workloads with ease.

The F210 and F710 offer greater density in a 1U platform, with the F710 supporting 10 NVMe SSDs per node and the F210 offering a 15.36 TB drive option. The Sapphire Rapids CPU provide 19% lower cycles-per-instruction. PCIe Gen 5 doubles throughput when compared to PCIe Gen 4. Additionally, the nodes take advantage of DDR5, offering greater speed and bandwidth.

From a software perspective, PowerScale OneFS 9.7 introduces a significant leap in performance. OneFS 9.7 updates the protocol stack, locking, and direct-write. To learn more about OneFS 9.7, check out this article on PowerScale OneFS 9.7.

The OneFS journal in the all-flash F210 and F710 nodes uses a 32 GB configuration of the Dell Software Defined Persistent Memory (SDPM) technology. Previous platforms used NVDIMM-n for persistent memory, which consumed a DIMM slot.

For more details about the F210 and F710, see our other blog post at Dell.com: https://www.dell.com/en-us/blog/next-gen-workloads-require-next-gen-storage/.

Performance

The introduction of the PowerScale F210 and F710 nodes capitalizes on significant leaps in hardware and software from the previous generations. OneFS 9.7 introduces tremendous performance-oriented updates, including the protocol stack, locking, and direct-write. The PowerEdge-based servers offer a substantial hardware leap from previous generations. The hardware and software advancements combine to offer enormous performance gains, particularly for streaming reads and writes.

PowerScale F210

The PowerScale F210 is a 1U chassis based on the PowerEdge R660. A minimum of three nodes is required to form a cluster, with a maximum of 252 nodes. The F210 is node pool compatible with the F200.

An image of the PowerScale F210 front bezel

Table 1. F210 specifications

Attribute

PowerScale F210 Specification

Chassis

1U Dell PowerEdge R660

CPU

Single Socket – Intel Sapphire Rapids 4410Y (2G/12C)

Memory

Dual Rank DDR5 RDIMMs 128 GB (8 x 16 GB)

Journal

1 x 32 GB SDPM

Front-end networking

2 x 100 GbE or 25 GbE

Infrastructure networking

2 x 100 GbE or 25 GbE

NVMe SSD drives

4

PowerScale F710

The PowerScale F710 is a 1U chassis based on the PowerEdge R660. A minimum of three nodes is required to form a cluster, with a maximum of 252 nodes.

An image of the PowerScale F710 front bezel

Table 2. F710 specifications

Attribute

PowerScale F710 Specification

Chassis

1U Dell PowerEdge R660

CPU

Dual Socket – Intel Sapphire Rapids 6442Y (2.6G/24C)

Memory

Dual Rank DDR5 RDIMMs 512 GB (16 x 32 GB)

Journal

1 x 32 GB SDPM

Front-end networking

2 x 100 GbE or 25 GbE

Infrastructure networking

2 x 100 GbE

NVMe SSD drives

10

For more details on the new PowerScale all-flash platforms, see the PowerScale All-Flash F210 and F710 white paper.


Author: Aqib Kazi

Read Full Blog
  • Isilon
  • PowerScale
  • OneFS
  • ACL
  • Permission

OneFS Access Control Lists Overview

Lieven Lin Lieven Lin

Thu, 18 Jan 2024 22:29:13 -0000

|

Read Time: 0 minutes

As we know, when users access OneFS cluster data via different protocols, the final permission enforcement happens on the OneFS file system. In OneFS, this is achieved by the Access Control Lists (ACLs) implementation, which provides granular permission control on directories and files. In this article, we will look at the basics of OneFS ACLs.

OneFS ACL

OneFS provides a single namespace for multiprotocol access and has its own internal ACL representation to perform access control. The internal ACL is presented as protocol-specific views of permissions so that NFS exports display POSIX mode bits for NFSv3 and ACL for NFSv4 and SMB. 

When connecting to an PowerScale cluster with SSH, you can manage not only POSIX mode bits but also ACLs with standard UNIX tools such as chmod commands. In addition, you can edit ACL policies through the web administration interface to configure OneFS permissions management for networks that mix Windows and UNIX systems.

The OneFS ACL design is derived from Windows NTFS ACL. As such, many of its concept definitions and operations are similar to the Windows NTFS ACL, such as ACE permissions and inheritance.

OneFS synthetic ACL and real ACL

To deliver cross-protocol file access seamlessly, OneFS stores an internal representation of a file-system object’s permissions. The internal representation can contain information from the POSIX mode bits or the ACL. 

OneFS has two types of ACLs to fulfill different scenarios:

  • OneFS synthetic ACL: Under the default ACL policy, if no inheritable ACL entries exist on a parent directory – such as when a file or directory is created through a NFS or SSH session on OneFS within the parent directory – the directory will only contain POSIX mode bits permission. OneFS uses the internal representation to generate a OneFS synthetic ACL, which is an in-memory structure that approximates the POSIX mode bits of a file or directory for an SMB or NFSv4 client. 
  • OneFS real ACL: Under the default ACL policy, when a file or directory is created through SMB or when the synthetic ACL of a file or directory is modified through an NFSv4 or SMB client, the OneFS real ACL is initialized and stored on disk. The OneFS real ACL can also be initialized using the OneFS enhanced chmod command tool with the +a, -a, or =a option to modify the ACL. 

OneFS access control entries

In contrast to the Windows DACL and NFSv4 ACL, the OneFS ACL access control entry (ACE) adds an additional identity type. OneFS ACEs contain the following information:

  • Identity name: The name of a user or group
  • ACE type: The type of the ACE (allow or deny)
  • ACE permissions and inheritance flags: A list of permissions and inheritance flags separated with commas

OneFS ACE permissions

Similar to the Windows permission level, OneFS divides permissions into the following three types:

  • Standard ACE permissions: These apply to any object in the file system
  • Generic ACE permissions: These map to a bundle of specific permissions
  • Constant ACE permissions: These are specific permissions for file-system objects

The standard ACE permissions that can appear for a file-system object are shown in the following table:

ACE permission

Applies to

Description

std_delete

Directory or file

The right to delete the object

std_read_dac

Directory or file

The right to read the security descriptor, not including the SACL

std_write_dac

Directory or file

The right to modify the DACL in the object's security descriptor

std_write_owner

Directory or file

The right to change the owner in the object's security descriptor

std_synchronize

Directory or file

The right to use the object as a thread synchronization primitive

std_required

Directory or file

Maps to std_delete, std_read_dac, std_write_dac, and std_write_owner

The generic ACE permissions that can appear for a file system object are shown in the following table:

ACE permission

Applies to

Description

generic_all

Directory or file

Read, write, and execute access. Maps to file_gen_all or dir_gen_all.

generic_read

Directory or file

Read access. Maps to file_gen_read or dir_gen_read.

generic_write

Directory or file

Write access. Maps to file_gen_write or dir_gen_write.

generic_exec

Directory or file

Execute access. Maps to file_gen_execute or dir_gen_execute.

dir_gen_all

Directory

Maps to dir_gen_read, dir_gen_write, dir_gen_execute, delete_child, and std_write_owner.

dir_gen_read

Directory

Maps to list, dir_read_attr, dir_read_ext_attr, std_read_dac, and std_synchronize.

dir_gen_write

Directory

Maps to add_file, add_subdir, dir_write_attr, dir_write_ext_attr, std_read_dac, and std_synchronize.

dir_gen_execute

Directory

Maps to traverse, std_read_dac, and std_synchronize.

file_gen_all

File

Maps to file_gen_read, file_gen_write, file_gen_execute, delete_child, and std_write_owner.

file_gen_read

File

Maps to file_read, file_read_attr, file_read_ext_attr, std_read_dac, and std_synchronize.

file_gen_write

File

Maps to file_write, file_write_attr, file_write_ext_attr, append, std_read_dac, and std_synchronize.

file_gen_execute

File

Maps to execute, std_read_dac, and std_synchronize.

The constant ACE permissions that can appear for a file-system object are shown in the following table:

ACE permission

Applies to

Description

modify

File

Maps to file_write, append, file_write_ext_attr, file_write_attr, delete_child, std_delete, std_write_dac, and std_write_owner

file_read

File

The right to read file data

file_write

File

The right to write file data

append

File

The right to append to a file

execute

File

The right to execute a file

file_read_attr

File

The right to read file attributes

file_write_attr

File

The right to write file attributes

file_read_ext_attr

File

The right to read extended file attributes

file_write_ext_attr

File

The right to write extended file attributes

delete_child

Directory or file

The right to delete children, including read-only files within a directory; this is currently not used for a file, but can still be set for Windows compatibility

list

Directory

List entries

add_file

Directory

The right to create a file in the directory

add_subdir

Directory

The right to create a subdirectory

traverse

Directory

The right to traverse the directory

dir_read_attr

Directory

The right to read directory attributes

dir_write_attr

Directory

The right to write directory attributes

dir_read_ext_attr

Directory

The right to read extended directory attributes

dir_write_ext_attr

Directory

The right to write extended directory attributes

OneFS ACL inheritance

Inheritance allows permissions to be layered or overridden as needed in an object hierarchy and allows for simplified permissions management. The semantics of OneFS ACL inheritance are the same as Windows ACL inheritance and will feel familiar to someone versed in Windows NTFS ACL inheritance. The following table shows the ACE inheritance flags defined in OneFS:

ACE inheritance flag

Set on directory or file

Description

object_inherit

Directory only

Indicates an ACE applies to the current directory and files within the directory

container_inherit

Directory only

Indicates an ACE applies to the current directory and subdirectories within the directory

inherit_only

Directory only

Indicates an ACE applies to subdirectories only, files only, or both within the directory.

no_prop_inherit

Directory only

Indicates an ACE applies to the current directory or only the first-level contents of the directory, not the second-level or subsequent contents

inherited_ace

File or directory

Indicates an ACE is inherited from the parent directory

 

Author: Lieven Lin

Read Full Blog
  • PowerScale
  • OneFS
  • CloudPools

CloudPools Operation Workflows

Jason He Jason He

Fri, 12 Jan 2024 21:01:01 -0000

|

Read Time: 0 minutes

The Dell PowerScale CloudPools feature of OneFS allows tiering cold or infrequently accessed data to move to lower-cost cloud storage. CloudPools extends the PowerScale namespace to the private cloud, or the public cloud. For CloudPools supported cloud providers, see the CloudPools Supported Cloud Providers blog.

This blog focuses on the following CloudPools operation workflows:

  • Archive
  • Recall
  • Read
  • Update

Archive

The archive operation is the CloudPools process of moving file data from the local PowerScale cluster to cloud storage. Files are archived either using the SmartPools Job or from the command line. The CloudPools archive process can be paused or resumed.

The following figure shows the workflow of the CloudPools archive.

 This figure illustrates the workflow of the CloudPools archive: 1. A file matches a file pool policy. 2. The file data is split into chunks Cloud Data Object (CDO). 3. The chunks are sent from the PowerScale cluster to cloud. 4. The file is truncated into a SmartLink file and a Cloud Metadata Object (CMO) is written to cloud.

Figure 1.  Archive workflow

More workflow details include:

  • The file pool policy in Step 1 specifies a cloud target and cloud-specific parameters. Policy examples include:
  • Encryption: CloudPools provides an option to encrypt data before the data is sent to the cloud storage. It uses the PowerScale key management module for data encryption and uses AES-256 as the encryption algorithm. The benefit of encryption is that only encrypted data is being sent over the network.
  • Compression: CloudPools provides an option to compress data before the data is sent to the cloud storage. It implements block-level compression using the zlib compression library. CloudPools does not compress data that is already compressed.
  • Local data cache: Caching is used to support local reading and writing of SmartLink files. To optimize performance, it reduces bandwidth costs by eliminating repeated fetching of file data for repeated reads and writes. The data cache is used for temporarily caching file data from the cloud storage on PowerScale disk storage for files that have been moved off cluster by CloudPools.
  • Data retention: Data retention is a concept used to determine how long to keep cloud objects on the cloud storage.
  • When chunks are sent from the PowerScale cluster to cloud in Step 3, a checksum is applied for each chunk to ensure data integrity.

Recall

The recall operation is the CloudPools process of reversing the archive process. It replaces the SmartLink file by restoring the original file data on the PowerScale cluster and removing the cloud objects in cloud. The recall process can only be performed using the command line. The CloudPools recall process can be paused or resumed.

The following figure shows the workflow of CloudPools recall. 

This figure illustrates the workflow of the CloudPools recall: 1. OneFS retrieves the CDOs from cloud to the PowerScale cluster. 2. The SmartLink file is replaced by restoring the original file data. 3. The cloud objects are removed in cloud asynchronously if the data retention period has expired.

Figure 2.  Recall workflow

Read

The read operation is the CloudPools process of client data access, known as inline access. When a client opens a file for read, the blocks are added to the cache in the associated SmartLink file by default. The cache can be disabled by setting the accessibility in the file pool policy for CloudPools. The accessibility setting is used to specify how data is cached in SmartLink files when a user or application accesses a SmartLink file on the PowerScale cluster. Values are cached (default) and no cache.

The following figure shows the workflow of CloudPools read by default. 

This figure illustrates the workflow of the CloudPools read: 1. Client accesses the file through the SmartLink file. 2. OneFS retrieves CDOs from cloud to the local cache on the PowerScale cluster. 3. FIle data is sent to the client from the local cache on the PowerScale cluster. 4. OneFS purges expired cache information for the SmartLink file.

Figure 3.  Read workflow

Starting from OneFS 9.1.0.0, cloud object cache is introduced to enhance CloudPools functions for communicating with cloud. In Step 1, OneFS looks for data in the object cache first and OneFS retrieves data from the object cache if the data is already in the object cache. Cloud object cache reduces the number of requests to cloud when reading a file.

Prior to OneFS 9.1.0.0, in Step 1, OneFS looks for data in the local data cache first. It moves to Step 3 if the data is already in the local data cache.

Note: Cloud object cache is per node. Each node maintains its own object cache on the cluster. 

Update

The update operation is the CloudPools process that occurs when clients update data. When clients change to a SmartLink file, CloudPools first writes the changes in the data local cache and then periodically sends the updated file data to cloud. The space used by the cache is temporary and configurable.

The following figure shows the workflow of CloudPools update. 

This figure illustrates the workflow of the CloudPools update: 1. The client accesses the file through the SmartLink file. 2. OneFS retrieves CDOs from cloud, putting the file data in the local cache. 3. The client updates the file and those changes are stored in the local cache. 4. OneFS sends the updated file data from the local cache to cloud. 5. OneFS purges expired cache information for the SmartLink file.

Figure 4.  Update workflow

Thank you for taking the time to read this blog, and congratulations on gaining a clear understanding of how the OneFS CloudPools operation works!

Author: Jason He, Principal Engineering Technologist

Read Full Blog
  • PowerScale
  • OneFS
  • CloudPools

CloudPools Reporting

Jason He Jason He

Fri, 12 Jan 2024 20:33:21 -0000

|

Read Time: 0 minutes

This blog focuses on CloudPools reporting, specifically:

  • CloudPools network stats
  • The isi_fsa_pools_usage feature

CloudPools network stats

Dell PowerScale CloudPools network stats collect every network transaction and provide network activity statistics from CloudPools connections to the cloud storage.

Displaying network activity statistics

The network activity statistics include bytes In, bytes Out, and the number of GET, PUT, and DELETE operations. CloudPools network stats are available in two categories:

  • Per CloudPools account
  • Per file pool policy

Note: CloudPools network stats do not provide file statistics, such as the file list being archived or recalled.

Run the following command to check the CloudPools network stats by CloudPools account:

isi_test_cpool_stats -Q --accounts <account_name>

For example, the following command shows the current CloudPools network stats by CloudPools account:

isi_test_cpool_stats -Q --accounts testaccount
Account Name   Bytes In    Bytes Out   Num Reads   Num Writes   Num Deletes
testaccount    4194896000  4194905034  4000        2001         8001  

Similarly, you can run the following command to check the CloudPools network stats by file pool policy:

isi_test_cpool_stats -Q --policies <policy_name>

And here is an example of current CloudPools network stats by file pool policy:

isi_test_cpool_stats -Q --policies testpolicy
Policy Name    Bytes In       Bytes Out      Num Reads      Num Writes
testpolicy     4154896000     4154905034     4000           2001

Note: The command output does not include the number of deletes by file pool policy.

Run the following command to check the history for CloudPools network stats:

isi_test_cpool_stats -q –s <number of seconds in the past to start stat query>

Use the s parameter to define the number of seconds in the past. For example, set it as 86,400 to query CloudPools network stats over the last day, as in the following example:

isi_test_cpool_stats -q -s 86400
Account          bytes-in     bytes-out    gets   puts   deletes
testaccount    | 4194896000 | 4194905034 | 4000 | 2001 | 8001

You can also run the following command to flush stats from memory to database and get the real-time CloudPools network stats:

isi_test_cpool_stats -f

Displaying stats for CloudPools activities

The cloud statistics namespace with CloudPools is added in OneFS 9.4.0.0. This feature leverages existing OneFS daemons and systems to track statistics about CloudPools activities. The statistics include bytes In, bytes Out, and the number of Reads, Writes, and Deletions. CloudPools statistics are available in two categories:

  • Per CloudPools account
  • Per file pool policy

Note: The cloud statistics namespace with CloudPools does not provide file statistics, such as the file list being archived or recalled.

You can run these isi statistics cloud commands to view statistics about CloudPools activities:

isi statistics cloud --account <account_name>
isi statistics cloud --policy <policy_name>

The following command shows an example of current CloudPools statistics by CloudPools account:

isi statistics cloud --account s3                    
Account Policy In      Out     Reads   Writes  Deletions       Cloud      Node
s3             218.5KB 218.7KB 1       2       0               AWS        3
s3             0.0B    0.0B    0       0       0               AWS        1
s3             0.0B    0.0B    0       0       0               AWS        2

The following command shows an example of current CloudPools statistics by file pool policy:

isi statistics cloud --policy s3policy        
Account Policy         In      Out     Reads   Writes  Deletions  Cloud       Node
s3      s3policy       218.5KB 218.7KB  1      2       0          AWS         3
s3      s3policy       0.0B    0.0B     0      0       0          AWS         1
s3      s3policy       0.0B    0.0B     0      0       0          AWS         2

The isi_fsa_pools_usage feature

Starting from OneFS 8.2.2, you can run the following command to list Logical Size and Physical Size of stubs in one directory. This feature leverages IndexUpdate and FSA (File System Analytics) jobs. To enable this feature, it requires:

  • Scheduling the IndexUpdate job. It’s recommended to run it every four hours.
  • Scheduling the FSA job. It’s recommended to run it every day, but not more often than the IndexUpdate job.
isi_fsa_pools_usage /ifs
Node Pool                  Dirs  Files  Streams  Logical Size   Physical Size
Cloud                      0     1       0       338.91k           24.00k
h500_30tb_3.2tb-ssd_128gb  42    300671  0       879.23G            1.20T

Now, you get how to use commands for CloudPools reporting. It’s simple and straightforward. Thanks for reading!

Author: Jason He, Principal Engineering Technologist

Read Full Blog
  • PowerScale
  • OneFS
  • CloudPools

Protecting CloudPools SmartLink Files

Jason He Jason He

Fri, 12 Jan 2024 17:20:14 -0000

|

Read Time: 0 minutes

Dell PowerScale CloudPools SmartLink files are the sole means to access file data stored in the cloud, so ensure that you protect them from accidental deletion.

Note: SmartLink files cannot be backed up using a copy command, such as secure copy (scp).

This blog focuses on backing up SmartLink files using OneFS SyncIQ and NDMP (Network Data Management Protocol).

When the CloudPools version differs between the source cluster and the target PowerScale cluster, the CloudPools cross-version compatibility is handled.

NDMP and SyncIQ provide two types of copy or backup:

  • Shallow copy (SC)/backup: Replicates or backs up SmartLink files to the target PowerScale cluster or tape as SmartLink files without file data.
  • Deep copy (DC)/backup: Replicates or backs up SmartLink files to the target PowerScale cluster or tape as regular files or unarchived files. The backup or replication will be slower than for a shallow copy backup. Disk space will be consumed on the target cluster for replicating data.

The following table shows the CloudPools and OneFS mapping information. CloudPools 2.0 is released along with OneFS 8.2.0. CloudPools 1.0 is running in OneFS 8.0.x or 8.1.x.

Table 1.  CloudPools and OneFS mapping information

OneFS version

CloudPools version

OneFS 8.0.x/OneFS 8.1.x

CloudPools 1.0

OneFS 8.2.0 or higher

CloudPools 2.0

The following table shows the NDMP and SyncIQ supported use cases when different versions of CloudPools are running on the source and target clusters. As noted in the following table, if CloudPools 2.0 is running on the source PowerScale cluster and CloudPools 1.0 is running on the target PowerScale cluster, shallow copies are not allowed.

Table 2.  NDMP and SyncIQ supported use cases with CloudPools  

Source

Target

SC NDMP

DC NDMP

SC SyncIQ replication

DC SyncIQ replication

CloudPools 1.0

CloudPools 2.0

Supported

Supported

Supported

Supported

CloudPools 2.0

CloudPools 1.0

Not Supported

Supported

Not Supported

Supported

SyncIQ

SyncIQ is CloudPools-aware but consider the guidance in snapshot efficiency, especially where snapshot retention periods on the target cluster will be long.

SyncIQ policies support two types of data replication for CloudPools:

  • Shallow copy: This option is used to replicate files as SmartLink files without file data from the source PowerScale cluster to the target PowerScale cluster.
  • Deep copy: This option is used to replicate files as regular files or unarchived files from the source PowerScale cluster to the target PowerScale cluster.

SyncIQ, SmartPools, and CloudPools licenses are required on both the source and target PowerScale cluster. It is highly recommended to set up a scheduled SyncIQ backup of the SmartLink files.

When SyncIQ replicates SmartLink files, it also replicates the local cache state and unsynchronized cache data from the source PowerScale cluster to the target PowerScale cluster. The following figure shows the SyncIQ replication when replicating directories including SmartLink files and unarchived normal files. Both unidirectional and bi-directional replication are supported.

Note: OneFS manages cloud access at the cluster level and does not support managing cloud access at the directory level. When failing over a SyncIQ directory containing SmartLink files to a target cluster, you need to remove cloud access on the source cluster and add cloud access on the target cluster. If there are multiple CloudPools storage accounts, removing/adding cloud access will impact all CloudPools storage accounts on the source/target cluster.

Protecting CloudPools SmartLink files using SyncIQ replication. This figure illustrates the SyncIQ replication when replicating directories including SmartLink files and unarchived normal files from source Site 1 to the target Site 2. The figure also shows the supported SyncIQ unidirectional (from Site 1 to Site 2 only) and bi-directional replication (from Site 1 to Site 2 and from Site 2 to Site 1).

Figure 1.  SyncIQ replication

Note: If encryption is enabled in a file pool policy for CloudPools, SyncIQ also replicates all the relevant encryption keys to the secondary PowerScale cluster along with the SmartLink files.

NDMP

NDMP is also CloudPools-aware and supports three backup and restore methods for CloudPools:

  • DeepCopy: This option is used to back up files as regular files or unarchived files. Files can only be restored as regular files.
  • ShallowCopy: This option is used to back up files as SmartLink files without file data. Files can only be restored as SmartLink files.
  • ComboCopy: This option is used to back up files as SmartLink files with file data. Files can be restored as regular files or SmartLink files.

It is possible to update the file data and send the updated data to the cloud storage. Multiple version SmartLink files can be backed up to tape using NDMP, and multiple versions of CDOs (Cloud Data Objects) are protected in the cloud under the data retention setting. You can restore a specific version of a SmartLink file from tape to a PowerScale cluster and continue to access (read or update) the file as before.

Note: If encryption is enabled in the file pool policy for CloudPools, NDMP also backs up all the relevant encryption keys to tapes along with the SmartLink files.

Thank you for taking the time to read this blog, and congratulations on knowing the solutions for protecting SmartLink files using OneFS SyncIQ and NDMP.

Author: Jason He, Principal Engineering Technologist

Read Full Blog
  • PowerScale
  • AWS
  • APEX

How to Size Disk Capacity When Cluster Has Data Reduction Enabled

Yunlong Zhang Yunlong Zhang

Mon, 08 Jan 2024 18:22:11 -0000

|

Read Time: 0 minutes

When sizing a storage solution for OneFS, two major aspects need to be considered – capacity and performance. In this blog, we will talk about how to calculate the raw capacity in each node in the AWS cloud environment.

Consider a customer who wants to have 30TB of data capacity on APEX File Storage on AWS. The data reduction ratio is 1.6, and the cluster contains 6 nodes. How much capacity is needed for each node of the cluster?

1. The usable capacity is calculated by dividing the application data size by the data reduction ratio: 30TB/1.6 = 18.75TB

2. OneFS in the AWS environment uses +2n as the default protection level. The +2n protection level striping pattern of 6 nodes is 4+2. The raw capacity necessary can be calculated by dividing the usable capacity by the striping pattern for the number of nodes involved: 18.75TB/66% = 28.41TB

3. Single disk capacity is then calculated by dividing the total raw capacity by the number of nodes involved:  28.41TB/6 nodes = 4.735TB

4. When each node contains 10 disks, each disk’s raw capacity should be 474GB.

OK, let's take a look at the formula of this calculation:

single disk capacity = (((application data size/data reduction ratio)/striping efficiency)/cluster node count)/node disk count

For reference, the striping patterns of 4, 5, and 6 nodes are listed as follows:

* 4 nodes: 2+2 (50%)

* 5 nodes: 3+2 (60%)

* 6 nodes: 4+2 (66%)

Now, knowing the logical data capacity, you can calculate the appropriate amount of capacity of each single EBS volume in the cluster.


Author: Yunlong Zhang

Read Full Blog
  • OneFS
  • S3
  • Performance

Running COSBench Performance Test on PowerScale

Yunlong Zhang Yunlong Zhang

Tue, 09 Jan 2024 14:21:02 -0000

|

Read Time: 0 minutes

Starting with OneFS version 9.0, PowerScale enables data access through the Amazon Simple Storage Service (Amazon S3) application programing interface (API) natively. PowerScale implements the S3 API as a first-class protocol along with other NAS protocols on top of its distributed OneFS file system.

COSBench is a popular benchmarking tool to measure the performance of Cloud Object Storage services and supports the S3 protocol. In the following blog, we will walk through how to set up COSBench to test the S3 performance of an PowerScale cluster.

Step 1:Choose v0.4.2.c4 version

I suggest choosing the v0.4.2 release candidate 4 instead of the latest v0.4.2 release, especially if you receive an error message like the following and your COSBench service cannot be started:

# cat driver-boot.log     
Listening on port 0.0.0.0/0.0.0.0:18089 ...
!SESSION 2020-06-03 10:12:59.683 -----------------------------------------------
eclipse.buildId=unknown
java.version=1.7.0_261
java.vendor=Oracle Corporation
BootLoader constants: OS=linux, ARCH=x86_64, WS=gtk, NL=en_US
Command-line arguments:  -console 18089
!ENTRY org.eclipse.osgi 4 0 2020-06-03 10:13:00.367
!MESSAGE Bundle plugins/cosbench-castor not found.
!ENTRY org.eclipse.osgi 4 0 2020-06-03 10:13:00.368
!MESSAGE Bundle plugins/cosbench-log4j not found.
!ENTRY org.eclipse.osgi 4 0 2020-06-03 10:13:00.368
!MESSAGE Bundle plugins/cosbench-log@6:start not found.
!ENTRY org.eclipse.osgi 4 0 2020-06-03 10:13:00.369
!MESSAGE Bundle plugins/cosbench-config@6:start not found.

Step 2: Install Java

Both Java 1.7 and 1.8 work well with COSBench.

Step 3: Config ncat

Ncat is necessary for COSBench to work. Without it, you will receive the following error message:

[root]hopisdtmelabs14# bash ./start-driver.sh  
Launching osgi framwork ...
Successfully launched osgi framework!
Booting cosbench driver ...
which: no nc in (/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/tme/bin:/usr/local/tme/tme_portal/perf_web/bin)
No appropriate tool found to detect cosbench driver status.

Use the following commands to install Ncat (example here is CentOS 7) and config it for COSBench:

yum -y install wget
wget [https://nmap.org/dist/ncat-7.80-1.x86_64.rpm](https://nmap.org/dist/ncat-7.80-1.x86_64.rpm)
yum localinstall ncat-7.80-1.x86_64.rpm
cd /usr/bin
ln -s ncat nc

Step 4: Unzip the COSBench files

After you download the 0.4.2.c4.zip, you can unzip it to a directory:

unzip 0.4.2.c4.zip

Grant all the bash script permission to be executed:

chmod +x /tmp/cosbench/0.4.2.c4/*.sh

Step 5: Start drivers and controller

On drivers and controller, find the cosbench-start.sh. Locate the java launching line, then add the following two options:

-Dcom.amazonaws.services.s3.disableGetObjectMD5Validation=true
-Dcom.amazonaws.services.s3.disablePutObjectMD5Validation=true

The COSBench tool has two roles: controller and driver. You can use the following command to start the driver:

bash ./cosbench/start-driver.sh

Before we start the controller, we need to change the configuration to let the controller knows how many drivers it has and their addresses. This is done by filling in information in the controller's main configuration file. The configuration file is under ./conf, and the name of the file is controller.conf. Following is an example of the controller.conf:

[controller]
drivers = 4
log_level = INFO
log_file = log/system.log
archive_dir = archive
 
[driver1]
name = driver1
url = [http://10.245.109.115:18088/driver](http://10.245.109.115:18088/driver)
 
[driver2]
name = driver2
url = [http://10.245.109.116:18088/driver](http://10.245.109.116:18088/driver)
 
[driver3]
name = driver3
url = [http://10.245.109.117:18088/driver](http://10.245.109.117:18088/driver)
 
[driver4]
name = driver4
url = [http://10.245.109.118:18088/driver](http://10.245.109.118:18088/driver)

Run the start-controller.sh to start the controller role:

bash ./start-controller.sh

Step 6: Prepare PowerScale

First, you need to prepare your PowerScale cluster for the S3 test. Make sure to record the secret key of the newly created user, s3. Run the following commands to prepare PowerScale for the S3 performance test:

isi services s3 enable
isi s3 settings global modify --https-only=false
isi auth users create s3 --enabled=true
isi s3 keys create s3
mkdir -p -m 777 /ifs/s3/bkt1  
chmod 777 /ifs/s3
isi s3 buckets create --owner=s3 --name=bkt1  --path=/ifs/s3/bkt1

Compose the workload XML file, and use it to specify the details of the test you want to run. Here is an example:

<?xml version="1.0" encoding="UTF-8"?>
<workload name="S3-F600-Test1" description="Isilon F600 with original configuration">
        <storage type="s3" config="accesskey=1_s3_accid;secretkey=wEUqWNWkQGmgMos70NInqW26WpGf;endpoint=http://f600-2:9020/bkt1;path_style_access=true"/>
        <workflow>
               <workstage name="init-for-write-1k">
                       <work type="init" workers="1" config="cprefix=write-bucket-1k; containers=r(1,6)"/>
               </workstage>
               <workstage name="init-for-read-1k">
                       <work type="init" workers="1" config="cprefix=read-bucket-1k; containers=r(1,6)"/>
               </workstage>
               <workstage name="prepare-1k">
                       <work type="prepare" workers="1" config="cprefix=read-bucket-1k;containers=r(1,6);oprefix=1kb_;objects=r(1,1000);sizes=c(1)KB"/>
               </workstage>
               <workstage name="write-1kb">
                       <work name="main" type="normal" interval="5" division="container" chunked="false" rampup="0" rampdown="0" workers="6" totalOps="6000">
                               <operation type="write" config="cprefix=write-bucket-1k; containers=r(1,6); oprefix=1kb_; objects=r(1,1000); sizes=c(1)KB"/>
                       </work>
               </workstage>
               <workstage name="read-1kb">
                       <work name="main" type="normal" interval="5" division="container" chunked="false" rampup="0" rampdown="0" workers="6" totalOps="6000">
                               <operation type="read" config="cprefix=read-bucket-1k; containers=r(1,6); oprefix=1kb_; objects=r(1,1000)"/>
                       </work>
               </workstage>
        </workflow>
</workload>

Step 7: Run the test

You can directly submit the XML in the COSBench WebUI, or you can use the following command line in the controller console to start the test:

bash ./cli.sh submit ./conf/my-s3-test.xml

You will see the test successfully finished, as shown in the following figure.

This figure shows the logging of the stages in the workload. The State column, which is second to last, shows each stage in the workload as complete.

Figure 1. Completion screen after testing

Have fun testing!

 

Author: Yunlong Zhang


Read Full Blog
  • AWS
  • OneFS
  • APEX
  • Performance

Will More Disks Lead to Better Performance in APEX File Storage in AWS?

Yunlong Zhang Yunlong Zhang

Mon, 08 Jan 2024 18:02:59 -0000

|

Read Time: 0 minutes

Dell Technologies has developed a range of PowerScale platforms, including all flash models, hybrid models, and archive models, all of which exhibit exceptional design. The synergy between the disk system and the compute system is highly effective, showcasing a well-matched integration.

In the cloud environment, customers have the flexibility to control the number of CPU cores and memory sizes by selecting different instance types. APEX File Storage for AWS uses EBS volumes as its node disks. Customers can also select a different number of EBS volumes in each node, and for gp3 volumes, customers are able to customize the performance of each volume by specifying the throughput or IOPS capability.

With this level of flexibility, how shall we configure the disk system to make the most out of the entire OneFS system? Typically, in an on-prem appliance, the more disks a PowerScale node contains, the better performance the disk system can provide thanks to a greater number of devices contributing to the delivery of throughput or IOPS.

In a OneFS cloud environment, does it hold true that more EBS volumes indicates better performance? In short, it depends. When the aggregated EBS volume performance is smaller than the instance EBS bandwidth limit, test results show that more EBS volumes can improve performance. When aggregated EBS volume performance is larger than EBS bandwidth limit, adding more EBS volumes will not improve performance.

What is the best practice of setting the number of EBS volumes of each node?

1. Make the aggregated EBS volume bandwidth limit match the instance type EBS bandwidth limit. 

For example, we want to use m5dn.16xlarge as the instance type of our OneFS cloud system. According to AWS, the EBS Bandwidth of m5dn.16xlarge is 13,600 Mbps, which is 1700 MB/sec. If we choose to use 10 EBS volumes in each node, then we should config each gp3 EBS volume to be capable of delivering 170 MB/sec throughput. This will make the aggregated EBS volume throughput equal to the m5dn.16xlarge EBS bandwidth limit.

Note that each gp3 EBS volume has 125MB/sec free throughput and 3,000 IOPS for free. As a cost-saving measure, we can config each node to have 12 EBS volumes to better leverage free EBS volume throughput.

 For example, considering an m5dn.16xlarge instance type with 12 TB raw capacity per node, the disk cost of 10 volumes and 12 volumes are as follows:

      1. For 10 drives, each EBS volume should support 170 MB/sec throughput, and each node EBS storage cost is 1001.2 USD a month.
      2. For 12 drives, each EBS volume should support 142 MB/sec throughput, and each node EBS storage cost is 991.20 USD a month.

Using 12 EBS volumes can save $10 per node per month.

2. Do not set up more than 12 EBS volumes in each node.

Although APEX File Storage for AWS also supports 15, 18, and 20 gp3 volumes in each node, we do not recommend configuring more than 12 EBS volumes in each node for OneFS 9.7. This is best practice for keeping software journal space for each disk from being too small and is beneficial for write performance.  

 

Author: Yunlong Zhang


Read Full Blog

Simplifying OneFS Deployment on AWS with Terraform

Lieven Lin Lieven Lin

Wed, 20 Dec 2023 20:07:34 -0000

|

Read Time: 0 minutes

In the first release of APEX File Storage for AWS in May 2023, users gained the capability to execute file workloads in the AWS cloud, thus harnessing the power of the PowerScale OneFS scale-out NAS storage solution. However, the initial implementation required the manual provisioning of all necessary AWS resources to provision the OneFS cluster—a less than optimal experience for embarking on the APEX File Storage journey in AWS.

With the subsequent release of APEX File Storage for AWS in December 2023, we are pleased to introduce a new, user-friendly open-source Terraform module. This module is designed to enhance and simplify the deployment process, alleviating the need for manual resource provisioning. In this blog post, we will delve into the details of leveraging this Terraform module, providing you with a comprehensive guide to expedite your APEX File Storage deployment on AWS.

Overview of Terraform onefs module

Terraform onefs module is an open-source module for the auto-deployment of AWS resources for a OneFS cluster. It is released and licensed under the MPL-2.0 license. You can find more details on the onefs module from the Terraform Registry. The onefs module provides the following features to help you deploy APEX File Storage for AWS OneFS clusters in AWS:

  • Provision necessary AWS resources for a single OneFS cluster, including EC2 instances, EBS volumes, placement group, and network interfaces.
  • Expand cluster size by provisioning additional AWS resources, including EC2 instances, EBS volumes, and network interfaces.

Getting Started

To use the Terraform onefs module, you need a machine that has Terraform installed and can connect to your AWS account. After you have fulfilled the prerequisites in documentation, you can start to deploy AWS resources for a OneFS cluster.  

This blog provides instructions for deploying the required AWS infrastructure resources for APEX File Storage for AWS with Terraform.This includes: EC2 instances, spread strategy placement group, network interfaces, and EBS volumes.

1. Get the latest version of the onefs module from the Terraform Registry.

 

2. Prepare a main.tf file that uses the onefs module version collected in Step 1. The onefs module requires a set of input variables. The following is an example file named main.tf for creating a 4-nodes OneFS cluster.

module "onefs" {

   source  = "dell/onefs/aws"

   version = "1.0.0"

 

   region = "us-east-1"

   availability_zone = "us-east-1a"

   iam_instance_profile = "onefs-runtime-instance-profile"

   name = "vonefs-cfv"

   id = "vonefs-cfv"

   nodes = 4

   instance_type = "m5dn.12xlarge"

   data_disk_type = "gp3"

   data_disk_size = 1024

   data_disks_per_node = 6

   internal_subnet_id = "subnet-0c0106598b95ee7b6"

   external_subnet_id = "subnet-0837801239d54e245"

   contiguous_ips= true

   first_external_node_hostnum = 5

   internal_sg_id = "sg-0ee87249a52397219"

   security_group_external_id = "sg-0635f298c9cb764da"

   image_id = "ami-0f1a267119a34361c"

   credentials_hashed = true

   hashed_root_passphrase = "$5$9874f5d2c724b8ca$IFZZ5e9yfUVqNKVL82s.iFLIktr4WLavFhUVa8A"

   hashed_admin_passphrase = "$5$9874f5d2c724b8ca$IFZZ5e9yfUVqNKVL82s.iFLIktr4WLavFhUVa8A"

   dns_servers = ["169.254.169.253"]

   timezone = "Greenwich Mean Time"

}

 

output "onefs-outputs" {

   value = module.onefs

   sensitive = true

}

3. Change your current working directory to the main.tf directory.

4. Initialize the module’s root directory by installing the required providers and modules for the deployment. In the following example, the onefs module is downloaded automatically from the Terraform Registry.

# terraform init

Initializing the backend...

Initializing modules...

Downloading registry.terraform.io/dell/onefs/aws 1.0.0 for onefs...

- onefs in .terraform\modules\onefs

- onefs.onefsbase in .terraform\modules\onefs\modules\base

- onefs.onefsbase.machineid in .terraform\modules\onefs\modules\machineid

 

Initializing provider plugins...

- Finding latest version of hashicorp/aws...

- Installing hashicorp/aws v5.30.0...

- Installed hashicorp/aws v5.30.0 (signed by HashiCorp)

5. Verify the configuration files in the onefs directory.

# terraform validate

6. Apply the configurations by running the following command.

# terraform apply

7. Enter “yes” after you have previewed and confirmed the changes.

Do you want to perform these actions?

   Terraform will perform the actions described above.

   Only 'yes' will be accepted to approve.

 

   Enter a value: yes

8. Wait for the AWS resources to be provisioned. The output displays all the cluster information. If the deployment fails, re-run the terraform apply command to deploy.

Apply complete! Resources: 13 added, 0 changed, 0 destroyed.

Outputs:

onefs-outputs = <sensitive>

9. Get the cluster details information by running the following command.

# terraform output --json

The following example output is truncated.

additional_nodes = 3

cluster_id = "vonefs-cfv"

control_ip_address = "10.0.32.5"

external_ip_addresses = [

   "10.0.32.5",

   "10.0.32.6",

   "10.0.32.7",

   "10.0.32.8",

]

gateway_hostnum = 1

instance_id = [

   "i-0eead1ee1dd67da6e",

   "i-054efe96f6e605009",

   "i-06e0b1ce06bad42a1",

   "i-0e463c742974641d7",

]

internal_ip_addresses = [

   "10.0.16.5",

   "10.0.16.6",

   "10.0.16.7",

   "10.0.16.8",

]

internal_network_high_ip = "10.0.16.8"

internal_network_low_ip = "10.0.16.5"

mgmt_ip_addresses = []

node_configs = {

   "0" = {

     "external_interface_id" = "eni-09ddea1fd79f0d0ab"

     "external_ips" = [

       "10.0.32.5",

     ]

     "internal_interface_id" = "eni-0caeee71581a8c429"

     "internal_ips" = [

       "10.0.16.5",

     ]

     "mgmt_interface_id" = null

     "mgmt_ips" = null /* tuple */

     "serial_number" = "SV200-930073-0000"

   }

   "1" = {

     "external_interface_id" = "eni-00869c96a27c20c93"

     "external_ips" = [

       "10.0.32.6",

     ]

     "internal_interface_id" = "eni-0471bbba5a7f6596d"

     "internal_ips" = [

       "10.0.16.6",

     ]

     "mgmt_interface_id" = null

     "mgmt_ips" = null /* tuple */

     "serial_number" = "SV200-930073-0001"

   }

   "2" = {

     "external_interface_id" = "eni-0dac5052668bd3a4f"

     "external_ips" = [

       "10.0.32.7",

     ]

     "internal_interface_id" = "eni-09d35ffa61b3dcd60"

     "internal_ips" = [

       "10.0.16.7",

     ]

     "mgmt_interface_id" = null

     "mgmt_ips" = null /* tuple */

     "serial_number" = "SV200-930073-0002"

   }

   "3" = {

     "external_interface_id" = "eni-028d211ef2d5b577c"

     "external_ips" = [

       "10.0.32.8",

     ]

     "internal_interface_id" = "eni-02a99febea713f2d1"

     "internal_ips" = [

       "10.0.16.8",

     ]

     "mgmt_interface_id" = null

     "mgmt_ips" = null /* tuple */

     "serial_number" = "SV200-930073-0003"

   }

}

region = "us-east-1"

10. Write down the following output variables for setting up a cluster described in documentation.

  • control_ip_address: The external IP address of the cluster’s first node
  • external_ip_addresses: The external IP addresses of all provisioned cluster nodes
  • internal_ip_addresses: The internal IP addresses of all provisioned cluster nodes
  • internal_network_high_ip: The highest internal IP address assigned
  • internal_network_low_ip: The lowest internal IP address assigned
  • instance_id: The EC2 instance IDs of the cluster nodes

11. All AWS resources are now provisioned. After the cluster’s first node starts, it will form a single node cluster. You can use the cluster’s first node to add additional nodes to the cluster described in documentation. Below are the provisioned AWS EC2 instances with Terraform onefs module.

Available input variables

The Terraform onefs module provides a set of input variables for you to specify your own settings, including AWS resources and OneFS cluster, for example: AWS network resources, cluster name and password. See the table below for details used in the main.tf file. 

 

Variable Name

Type

Description

region

string

(Required) The AWS region of OneFS cluster nodes.

availability_zone

string

(Required) The AWS availability zone of OneFS cluster nodes.

iam_instance_profile

string

(Required) The AWS instance profile name of OneFS cluster nodes. For more details, see the AWS documentation Instance profiles.

name

string

(Required) The OneFS cluster name. Cluster names must begin with a letter and can contain only numbers, letters, and hyphens. If the cluster is joined to an Active Directory domain, the cluster name must be 11 characters or fewer.

id

string

(Required) The ID of the OneFS cluster. The onefs module uses the ID to add tags to the AWS resources. It is recommended to set the ID to your cluster name.

nodes

number

(Required) The number of OneFS cluster nodes: it should be 4, 5, or 6.

instance_type

string

(Required) The EC2 instance type of OneFS cluster nodes. All nodes in a cluster must have the same instance size. The supported instance sizes are:

  • EC2 m5dn instances: m5dn.8xlarge, m5dn.12xlarge, m5dn.16xlarge, m5dn.24xlarge
  • EC2 m6idn instances: m6idn.8xlarge, m6idn.12xlarge, m6idn.16xlarge, m6idn.24xlarge
  • EC2 m5d instances: m5d.24xlarge
  • EC2 i3en instances: i3en.12xlarge

Note: You must run PoC if you intend to use m5d.24xlarge or i3en.12xlarge EC2 instance types. For details, contact your Dell account team.

data_disk_type

string

(Required) The EBS volume type for the cluster, gp3 or st1.

data_disk_size

number

(Required) The single EBS volume size in GiB. Consider the Supported cluster configuration, it should be 1024 to 16384 for gp3, 4096 or 10240 for st1.

data_disks_per_node

number

(Required) The number of EBS volumes per node. Consider the Supported cluster configuration, it should be 5, 6, 10, 12, 15, 18, or 20 for gp3, 5 or 6 for st1.

internal_subnet_id

string

(Required) The AWS subnet ID for the cluster internal network interfaces.

external_subnet_id

string

(Required) The AWS subnet ID for the cluster external network interfaces.

contiguous_ips

bool

(Required) A boolean flag to indicate whether to allocate contiguous IPv4 addresses to the cluster nodes’ external network interfaces. It is recommended to set to true.

first_external_node_hostnum

number

(Required if contiguous_ips=true)

The host number of the first node’s external IP address in the given AWS subnet. Default is set to 5, The first four IP addresses in an AWS subnet are reserved by AWS, so the onefs module will allocate the fifth IP address to the cluster’s first node. If the IP is in use, the module will fail. Therefore, when setting contiguous_ips=true, ensure that you set a correct host number that has sufficient contiguous IPs for your cluster. Refer to Terraform cidrhost Function for more details about host number.

internal_sg_id

string

(Required) The AWS security group ID for the cluster internal network interfaces.

security_group_external_id

string

(Required) The AWS security group ID for the cluster external network interfaces.

image_id

string

(Required) The OneFS AMI ID described in Find the OneFS AMI ID.

credentials_hashed

bool

(Required) A boolean flag to indicate whether the credentials are hashed or in plain text.

hashed_root_passphrase

string

(Required if credentials_hashed=true)

The hashed root password for the OneFS cluster

hashed_admin_passphrase

string

(Required if credentials_hashed=true)

The hashed admin password for the OneFS cluster

root_password

string

(Required if credentials_hashed=false)

The root password for the OneFS cluster

admin_password

string

(Required if credentials_hashed=false)

The admin password for the OneFS cluster

dns_servers

list(string)

(Optional) The cluster DNS server, default is set to ["169.254.169.253"], which is the AWS Route 53 Resolver. For details, see Amazon DNS server.

dns_domains

list(string)

(Optional) The cluster DNS domain default is set to ["<region>.compute.internal"]

timezone

string

(Optional) The cluster time zone, default is set to "Greenwich Mean Time". Several available options are: Greenwich Mean Time, Eastern Time Zone, Central Time Zone, Mountain Time Zone, Pacific Time Zone. You can change the time zone after the cluster is deployed by following the steps in the section OneFS documentation – Set the cluster date and time.

resource_tags

map(string)

(Optional) The tags that will be attached to provisioned AWS resources. For example, resource_tags={“project”: “onefs-poc”, “tester”: “bob”}.

 

Learn More

In this article, we have shown how to use Terraform onefs module. You can refer to the documentation below for more details about APEX File Storage for AWS:

 

Author: Lieven Lin

 

Read Full Blog
  • backup
  • PowerScale
  • NDMP

OneFS NDMP Backup Overview

Jason He Jason He

Fri, 15 Dec 2023 15:00:00 -0000

|

Read Time: 0 minutes

NDMP (Network Data Management Protocol) specifies a common architecture and data format for backups and restores of NAS (Network Attached Storage), allowing heterogeneous network file servers to directly communicate to tape devices for backup and restore operations. NDMP addresses the problems caused by the integrations of different backup software or DMA (Data Management Applications), file servers, and tape devices.  

The NDMP architecture is a client/server model with the following characteristics:

    • The NDMP host is a file server that is being protected with an NDMP backup solution.
    • The NDMP server is a virtual state machine on the NDMP host that is controlled using NDMP.
    • The backup software is considered as a client to the NDMP server.

OneFS supports the following two types of NDMP backups:

    • NDMP two-way backup
    • NDMP three-way backup

In both backup models, OneFS takes a snapshot of the backup directory to ensure consistency of data. The backup operates on the snapshot instead of the source directory, which allows users to continue read/write activities as normal. OneFS makes entries in the file history that are transferred from the PowerScale cluster to the backup server during the backup.

NDMP two-way backup

The NDMP two-way backup is also known as the local or direct NDMP backup, which is considered the most efficient model and usually provides the best performance. The backup moves the backup data directly from the PowerScale cluster to the tape devices without moving to the backup server over the network.

In this model, OneFS must detect the tape devices before you back up data to them. The PowerScale cluster provides the option for NDMP two-way backups as shown in the following figure. You can connect the PowerScale cluster to a Backup Accelerator node and connect tape devices to that node. The Backup Accelerator node is synonymous with a Fibre Attached Storage node without adding primary storage and offloads NDMP workloads from the primary storage nodes. You can directly connect tape devices to the Fibre Channel ports on the PowerScale cluster or Backup Accelerator node using Fibre Channel. Alternatively, you can connect Fibre Channel switches to the Fibre Channel ports that connect tape devices to the PowerScale cluster or Backup Accelerator node.

 Figure 1. NDMP two-way backup with B100 backup accelerator connected to the PowerScale cluster

The following table shows details of the NDMP two-way backup supported by PowerScale:  

NDMP two-way backup option

Generation 5 PowerScale nodes with an InfiniBand back end

Generation 6+ PowerScale nodes with an InfiniBand back end

Generation 6+ PowerScale nodes with an Ethernet back end

B100 backup accelerator

Supported

Supported

Supported


Note: The B100 backup accelerator requires OneFS 9.3.0.0 or later.


 NDMP three-way backup

The NDMP three-way backup, also known as the remote NDMP backup, is shown in the following figure.

Figure 2. NDMP three-way backup

In this backup mode, the tape devices are connected to the backup media server. OneFS does not detect tape devices on the PowerScale cluster, and Fibre Channel ports are not required on the PowerScale cluster. The NDMP service runs on the NDMP server or the PowerScale cluster. The NDMP tape service runs on the backup media server. A DMA on the backup server instructs the PowerScale cluster to start backing up data from the PowerScale cluster to the backup media server over the network. The backup media server moves the backup data to tape devices. Both servers are connected to each other across the network boundary. Sometimes, the backup server and backup media server reside on the same physical machine.

For some specific DMA, DMA can write NDMP data to non-NDMP devices. For example, Dell NetWorker software writes NDMP data to non-NDMP devices, including tape, virtual tape, Advanced File Type Device (AFTD), and Dell PowerProtect DD series appliances. For more information on Data Protection with Dell NetWorker using NDMP, refer to this guide: Dell PowerScale: Data Protection with Dell NetWorker using NDMP.

 

Author: Jason He, Principal Engineering Technologist


Read Full Blog
  • Isilon
  • PowerScale
  • AWS
  • OneFS
  • APEX

Unveiling APEX File Storage for AWS Enhancements

Lieven Lin Lieven Lin

Wed, 13 Dec 2023 15:36:10 -0000

|

Read Time: 0 minutes

We are thrilled to announce the latest version of APEX File Storage for AWS! This release brings a multitude of enhancements to elevate your AWS file storage experience, including expanded AWS regions with the support for additional EC2 instance types, a Terraform module for streamlined deployment, larger raw capacity, and additional OneFS features support.

APEX File Storage delivers Dell’s leading enterprise-class high-performance scale-out file storage as a software-defined customer-managed offer in the public cloud. Based on PowerScale OneFS, APEX File Storage for AWS brings enterprise file capabilities and performance and delivers operational consistency across multicloud environments, simplifying hybrid cloud environments by facilitating seamless data mobility between on-premises and the cloud with native replication and making it the perfect option to run AI workloads. APEX File Storage can enhance customers’ development and innovation initiatives by combining proven data services such as multi-protocol access, security features, and a proven scale-out architecture with the flexibility of public cloud infrastructure and services. APEX File Storage enables organizations to run the software they trust directly in the public cloud without retraining their staff or refactoring their storage architecture.

What's New?

1. Additional EC2 instance types support

We've expanded compatibility by adding support for a wider range of EC2 instance types. This means you have more flexibility in choosing the instance type that best suits your performance and resource requirements. We now support the following EC2 instance types:

    • EC2 m5dn instances: m5dn.8xlarge, m5dn.12xlarge, m5dn.16xlarge, m5dn.24xlarge
    • EC2 m6idn instances: m6idn.8xlarge, m6idn.12xlarge, m6idn.16xlarge, m6idn.24xlarge
    • EC2 m5d instances: m5d.24xlarge
    • EC2 i3en instances: i3en.12xlarge

Please note that it is required to run PoC if you intend to use m5d.24xlarge or i3en.12xlarge EC2 instance types. Please contact your Dell account team for the details.

2. Extended AWS regions support

APEX File Storage is now available in more AWS regions than ever before. A total of 28 regions are available for you. We understand that our users operate globally, and this expansion ensures that you can leverage APEX File Storage wherever your AWS resources are located. The following table lists all available regions for different EC2 instance types:

3. Terraform module: auto-deployment made effortless

Simplify your deployment process with our new Terraform module, which automates the AWS resource deployment process to ensure a smooth and error-free experience.

Once you fulfill the deployment prerequisites, you can deploy a cluster with a single Terraform command. For more details, refer to documentation: APEX File Storage for AWS Deployment Guide with Terraform. Stay tuned for a blog with additional details coming soon. 

4. Larger raw capacity: more room for your data

Your data is growing, and so should your storage capacity. APEX File Storage for AWS can now support up to 1.6PiB raw capacity, enabling workloads that produce a vast amount of data such as AI and ensuring that you have ample space to store, manage, and scale your data effortlessly.

5. Additional OneFS features support

The OneFS features not supported in the first release of APEX File Storage for AWS are now supported, including:

    • Enhanced Protocols: With HDFS protocol support, you can seamlessly integrate HDFS into your workflows, enhancing your data processing capabilities in AWS. Enjoy expanded connectivity with support for HTTP and FTP protocols, providing more flexibility in accessing and managing your files.
    • Quality of Service – SmartQoS: Ensure a consistent and reliable user experience with SmartQoS, which enables you to prioritize workloads and applications based on performance requirements.
    • Immutable Data Protection - SmartLock: Enhance data protection by leveraging SmartLock to create Write Once Read Many (WORM) files, providing an added layer of security against accidental or intentional data alteration.
    • Large File Support: Address the needs of large-scale data processing with improved support for large files, facilitating efficient storage and retrieval. A single file size can be up to 16TiB now.

Learn More

For deployment instructions and detailed information on these exciting new features, refer to our documentation:

Author: Lieven Lin

Read Full Blog
  • REST API
  • IIQ 5.0.0

REST API in IIQ 5.0.0

Vincent Shen Vincent Shen

Tue, 12 Dec 2023 15:00:00 -0000

|

Read Time: 0 minutes

REST APIs have been introduced in IIQ 5.0.0, providing the equivalent of the CLI command in the previous IIQ version. CLI will not be available in IIQ 5.0.0. In order to understand how REST APIs work in IIQ, we will cover:

  • REST API Authentication
  • Creating a REST API Session
  • Getting a REST API Session
  • Managing PowerScale Clusters using REST API
  • Exporting a Performance Report
  • Deleting a REST API Session

Let’s get started!

REST API Authentication

IIQ 5.0.0 leverages JSON Web Tokens (JWT) along with x-csrf-token for session-based authentication. Following are some of the benefits of using JWTs:

  • JWTs contains the user’s details
  • JWTs incorporate digital signatures to ensure their integrity and protect against unauthorized modifications by potential attackers
  • JWTs offer efficiency and rapid verification processes

Creating a REST API Session

A POST request to /insightiq/rest/security-iam/v1/auth/login/ will create a session with JWT cookie and x-csrf-token. A status code of 201 (Created) is returned upon successful user authentication. If the authentication process fails, the API responds with a status code of 401 (Unauthorized).

The following is an example of getting the JWT token with the POST method.

The POST request is:

curl -vk -X POST https://172.16.202.71:8000/insightiq/rest/security-iam/v1/auth/login -d '{"username": "administrator", "password": "a"}'  -H 'accept: application/json'  -H 'Content-Type: application/json'

The POST response is:

Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 172.16.202.71:8000...
* Connected to 172.16.202.71 (172.16.202.71) port 8000 (#0)
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server accepted h2
* Server certificate:
*  subject: O=Test; CN=cmo.ingress.dell
*  start date: Dec  4 05:59:14 2023 GMT
*  expire date: Dec  4 07:59:44 2023 GMT
*  issuer: C=US; ST=TX; L=Round Rock; O=DELL EMC; OU=Storage; CN=Platform Root CA; emailAddress=a@dell.com
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* using HTTP/2
* h2h3 [:method: POST]
* h2h3 [:path: /insightiq/rest/security-iam/v1/auth/login]
* h2h3 [:scheme: https]
* h2h3 [:authority: 172.16.202.71:8000]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [accept: application/json]
* h2h3 [content-type: application/json]
* h2h3 [content-length: 46]
* Using Stream ID: 1 (easy handle 0x5618836b5eb0)
> POST /insightiq/rest/security-iam/v1/auth/login HTTP/2
> Host: 172.16.202.71:8000
> user-agent: curl/8.0.1
> accept: application/json
> content-type: application/json
> content-length: 46
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* We are completely uploaded and fine
< HTTP/2 201
< server: istio-envoy
< date: Mon, 04 Dec 2023 07:19:19 GMT
< content-type: application/json
< content-length: 54
< set-cookie: insightiq_auth=eyJ0eXAiOiJKV1QiLCJhbGciOiJQUzUxMiJ9.eyJjc3JmIjoiN3Z0eG5sMWRxbHIzaGtubGp3MjdwYXl3eW54bzQzdGs0Zmx4IiwiZXhwIjoxNzAxNzg5MTM5LCJpYXQiOjE3MDE3NDU5MzksImlzcyI6IkRlbGwgVGVjaG5vbG9naWVzIiwicm9sZSI6ImFkbWluIiwic2Vzc2lvbiI6InJ6ZnA3ZTRpMXdzd2xuYjBuNGo3YmQwNmF5dWs3emNkeXp1ZSIsInN1YiI6ImFkbWluaXN0cmF0b3IifQ.yKyfXbezscqn6UPa9fXxxjh71MCgeRAXPZhXkG-v92siwXAEP40ASb5bQUFHnAmWwwtlB4Jt8lX9kY8LmRkqi1V7B3v0LgxUp68heAc0HZAh6XO92ac9AfZ9dAuE9H3U4RNELm4vVx8mGrGmuzQymWUG5yRCNk03SpeW8esHnTPRVoGGsE4Cf6ta3BrUXBfic-D_TL01YgyY3Dy_T8Z1oqhkD508GPEYnEeNMU1QtZAwkmj6MJHtGmp69T0ljtQdIW2oi5xYdPs-ZHGSFRGG4j2o8xAEFV8A4igzP-5XOkE9NCcx2mkj67OdvVgNBxCcY-X7cnYyfLgagkanyQSgdA; Secure; HttpOnly; Path=/
< set-cookie: csrf_token=7vtxnl1dqlr3hknljw27paywynxo43tk4flx; Secure; HttpOnly; Path=/
< x-csrf-token: 7vtxnl1dqlr3hknljw27paywynxo43tk4flx
< x-envoy-upstream-service-time: 3179
< content-security-policy: default-src 'self' 'unsafe-inline' 'unsafe-eval' data:; style-src 'unsafe-inline' 'self';
< x-frame-options: sameorigin
< x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
< referrer-policy: strict-origin-when-cross-origin
< 
{"timeout_absolute":43200,"username":"administrator"}
* Connection #0 to host 172.16.202.71 left intact

The JWT cookie and x-csrf-token have been created and highlighted in the POST response section. The timeout for the session is 43,200 seconds (12 hours). You can save them for future uses:

export TOK="insightiq_auth=eyJ0eXAiOiJKV1QiLCJhbGciOiJQUzUxMiJ9.eyJjc3JmIjoiN3Z0eG5sMWRxbHIzaGtubGp3MjdwYXl3eW54bzQzdGs0Zmx4IiwiZXhwIjoxNzAxNzg5MTM5LCJpYXQiOjE3MDE3NDU5MzksImlzcyI6IkRlbGwgVGVjaG5vbG9naWVzIiwicm9sZSI6ImFkbWluIiwic2Vzc2lvbiI6InJ6ZnA3ZTRpMXdzd2xuYjBuNGo3YmQwNmF5dWs3emNkeXp1ZSIsInN1YiI6ImFkbWluaXN0cmF0b3IifQ.yKyfXbezscqn6UPa9fXxxjh71MCgeRAXPZhXkG-v92siwXAEP40ASb5bQUFHnAmWwwtlB4Jt8lX9kY8LmRkqi1V7B3v0LgxUp68heAc0HZAh6XO92ac9AfZ9dAuE9H3U4RNELm4vVx8mGrGmuzQymWUG5yRCNk03SpeW8esHnTPRVoGGsE4Cf6ta3BrUXBfic-D_TL01YgyY3Dy_T8Z1oqhkD508GPEYnEeNMU1QtZAwkmj6MJHtGmp69T0ljtQdIW2oi5xYdPs-ZHGSFRGG4j2o8xAEFV8A4igzP-5XOkE9NCcx2mkj67OdvVgNBxCcY-X7cnYyfLgagkanyQSgdA"

Getting a REST API Session

Use the GET method against /insightiq/rest/security-iam/v1/auth/session/ to get the session information. In the request header, include the cookie and the x-csrf-token field for authentication.

curl -k -v -X GET https://172.16.202.71:8000/insightiq/rest/security-iam/v1/auth/session --cookie $TOK -H 'accept: application/json'  -H 'Content-Type: application/json' -H 'x-csrf-token: 7vtxnl1dqlr3hknljw27paywynxo43tk4flx'

The response is:

Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying 172.16.202.71:8000...
* Connected to 172.16.202.71 (172.16.202.71) port 8000 (#0)
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server accepted h2
* Server certificate:
*  subject: O=Test; CN=cmo.ingress.dell
*  start date: Dec  5 03:16:34 2023 GMT
*  expire date: Dec  5 05:17:04 2023 GMT
*  issuer: C=US; ST=TX; L=Round Rock; O=DELL EMC; OU=Storage; CN=Platform Root CA; emailAddress=a@dell.com
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* using HTTP/2
* h2h3 [:method: GET]
* h2h3 [:path: /insightiq/rest/security-iam/v1/auth/session]
* h2h3 [:scheme: https]
* h2h3 [:authority: 172.16.202.71:8000]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [cookie: insightiq_auth=eyJ0eXAiOiJKV1QiLCJhbGciOiJQUzUxMiJ9.eyJjc3JmIjoiN3Z0eG5sMWRxbHIzaGtubGp3MjdwYXl3eW54bzQzdGs0Zmx4IiwiZXhwIjoxNzAxNzg5MTM5LCJpYXQiOjE3MDE3NDU5MzksImlzcyI6IkRlbGwgVGVjaG5vbG9naWVzIiwicm9sZSI6ImFkbWluIiwic2Vzc2lvbiI6InJ6ZnA3ZTRpMXdzd2xuYjBuNGo3YmQwNmF5dWs3emNkeXp1ZSIsInN1YiI6ImFkbWluaXN0cmF0b3IifQ.yKyfXbezscqn6UPa9fXxxjh71MCgeRAXPZhXkG-v92siwXAEP40ASb5bQUFHnAmWwwtlB4Jt8lX9kY8LmRkqi1V7B3v0LgxUp68heAc0HZAh6XO92ac9AfZ9dAuE9H3U4RNELm4vVx8mGrGmuzQymWUG5yRCNk03SpeW8esHnTPRVoGGsE4Cf6ta3BrUXBfic-D_TL01YgyY3Dy_T8Z1oqhkD508GPEYnEeNMU1QtZAwkmj6MJHtGmp69T0ljtQdIW2oi5xYdPs-ZHGSFRGG4j2o8xAEFV8A4igzP-5XOkE9NCcx2mkj67OdvVgNBxCcY-X7cnYyfLgagkanyQSgdA]
* h2h3 [accept: application/json]
* h2h3 [content-type: application/json]
* h2h3 [x-csrf-token: 7vtxnl1dqlr3hknljw27paywynxo43tk4flx]
* Using Stream ID: 1 (easy handle 0x561f902392c0)
> GET /insightiq/rest/security-iam/v1/auth/session HTTP/2
> Host: 172.16.202.71:8000
> user-agent: curl/8.0.1
> cookie: insightiq_auth=eyJ0eXAiOiJKV1QiLCJhbGciOiJQUzUxMiJ9.eyJjc3JmIjoiN3Z0eG5sMWRxbHIzaGtubGp3MjdwYXl3eW54bzQzdGs0Zmx4IiwiZXhwIjoxNzAxNzg5MTM5LCJpYXQiOjE3MDE3NDU5MzksImlzcyI6IkRlbGwgVGVjaG5vbG9naWVzIiwicm9sZSI6ImFkbWluIiwic2Vzc2lvbiI6InJ6ZnA3ZTRpMXdzd2xuYjBuNGo3YmQwNmF5dWs3emNkeXp1ZSIsInN1YiI6ImFkbWluaXN0cmF0b3IifQ.yKyfXbezscqn6UPa9fXxxjh71MCgeRAXPZhXkG-v92siwXAEP40ASb5bQUFHnAmWwwtlB4Jt8lX9kY8LmRkqi1V7B3v0LgxUp68heAc0HZAh6XO92ac9AfZ9dAuE9H3U4RNELm4vVx8mGrGmuzQymWUG5yRCNk03SpeW8esHnTPRVoGGsE4Cf6ta3BrUXBfic-D_TL01YgyY3Dy_T8Z1oqhkD508GPEYnEeNMU1QtZAwkmj6MJHtGmp69T0ljtQdIW2oi5xYdPs-ZHGSFRGG4j2o8xAEFV8A4igzP-5XOkE9NCcx2mkj67OdvVgNBxCcY-X7cnYyfLgagkanyQSgdA
> accept: application/json
> content-type: application/json
> x-csrf-token: 7vtxnl1dqlr3hknljw27paywynxo43tk4flx
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
< HTTP/2 200
< server: istio-envoy
< date: Tue, 05 Dec 2023 03:19:41 GMT
< content-type: application/json
< content-length: 57
< x-envoy-upstream-service-time: 5
< content-security-policy: default-src 'self' 'unsafe-inline' 'unsafe-eval' data:; style-src 'unsafe-inline' 'self';
< x-frame-options: sameorigin
< x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
< referrer-policy: strict-origin-when-cross-origin
< 
{"username": "administrator", "timeout_absolute": 42758}
* Connection #0 to host 172.16.202.71 left intact

The response body will return the username of the session and its remaining timeout value in seconds.

Managing PowerScale Clusters using REST API

You can also use the IIQ REST API to manage your PowerScale cluster. Like what we’ve seen so far, all requests must include the parameter cookie and x-csrf-token for authentication. When adding clusters into IIQ, you will also need to provide the cluster IP with username and password. For details, refer to the following table:

Table 1. Using REST API to manage PowerScale Clusters

Functionality

REST API Endpoint

REST API Details

Add Cluster to IIQ

POST /insightiq/rest/clustermanager/v1/clusters
curl -k -v -X 'POST' \
'https://<EXTERNAL_IP>:8000/insightiq/rest/clustermanager/v1/clusters' \
  --cookie <COOKIE> \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'x-csrf-token: <X-CSRF-TOKEN>'
  -d '{
    "host": "<HOST>",
    "username": "<USERNAME>",
    "password": "<PASSWORD>"
}'

Delete Cluster from IIQ

DELETE /insightiq/rest/clustermanager/v1/clusters/<GUID>
curl -k -v -X 'DELETE' \
'https://<EXTERNAL_IP>:8000/insightiq/rest/clustermanager/v1/clusters/<GUID>' \
--cookie <COOKIE> \
-H 'accept: application/json' \
-H 'x-csrf-token: <X-CSRF-TOKEN>'

Exporting a Performance Report

To export a performance report from IIQ 5.0.0, you can use the GET method against /iiq/reporting/api/v1/timeseries/download_data with the following query parameters:

  • cluster – PowerScale cluster id 
  • start_time – UNIX epoch timestamp of the beginning of the data range. Defaults to most recent saved report data and time.
  • end_time – UNIX epoch timestamp of the end of the data range. Defaults to most recent saved report data and time.
  • key – the performance key. To get a list of all the supported keys, use the GET method against http://<lP>:8000/iiq/reporting/api/v1/reports/data-element-types

Note: To get the performance key, use values from the response list for data-element-types in definition.timeseries_keys from data elements where report_group is equal to performance and definition.layout is equal to chart. See the following screenshot for an example. In this example, the performance key is ext_net.


 Figure 1. Getting the performance key where the key is ext_net

The following is an example of using the IIQ REST API to export the cluster performance of external network throughput to a CSV file:

curl -vk -X GET "https://10.246.159.113:8000/iiq/reporting/api/v1/timeseries/download_data?cluster=0007433384d03e80b4582103b56e1cac33a2&start_time=1694143511&end_time=1694366711&key=ext_net" -H 'x-csrf-token: vny27rem4l6ww29hvkhuaka0ix7x172wufbv' --cookie $TOK >>perf.csv

Deleting a REST API Session

To remove an IIQ REST API session, use the following API:

curl -k -v -X GET https://<EXTERNAL_IP>:8000/insightiq/rest/security-iam/v1/auth/logout --cookie <COOKIE> -H 'accept: application/json'  -H 'Content-Type: application/json' -H 'x-csrf-token: <X-CSRF-TOKEN>'

Conclusion

The IIQ REST API is a powerful tool. Please refer to the Dell Technologies InsightIQ 5.0.0 User Guide for more information. For any questions, feel free to reach out to me at Vincent.shen@dell.com

 

Author: Vincent Shen


Read Full Blog
  • Isilon
  • PowerScale
  • AWS
  • OneFS
  • APEX

Alert in IIQ 5.0.0 – Part I

Vincent Shen Vincent Shen

Wed, 13 Dec 2023 17:40:06 -0000

|

Read Time: 0 minutes

Alert is a new feature introduced with the release of IIQ 5.0.0. It provides the capability and flexibility to configure alerts based on the KPI threshold.

This blog will walk you through the following aspects of this feature:

  1. Introduction to Alert
  2. How to configure alerts using Alert

Let’s get started:

Introduction

IIQ 5.0.0 can send email alerts based on your defined KPI and threshold. The supported KPIs are listed in the following table:

KPI Name

Description

Scope

Protocol Latency SMB

Average latency within last 10 minutes required for the various operations for the SMB protocol

Across all nodes and clients per cluster.

Protocol Latency NFS

Average latency within last 10 minutes required for the various operations for the NFS protocol.

Across all nodes and clients per cluster.

Active Clients NFS

The current number of active clients using NFS. The client is active when it is transmitting or receiving data.

Across all nodes per cluster.

Active Clients SMB 1

 The current number of active clients using SMB 1. The client is active when it is transmitting or receiving data.

Across all nodes per cluster.

Active Clients SMB 2

The current number of active clients using SMB 2. The client is active when it is transmitting or receiving data.

Across all nodes per cluster.

Connected Clients NFS

The current number of connected clients using NFS. The client is connected when it has an open TCP connection to the cluster. It can transmit or receive data or it can be in an idle state.

Across all nodes per cluster.

Connected Clients SMB

The current number of connected clients using SMB. The client is connected when it has an open TCP connection to the cluster. It can transmit or receive data or it can be in an idle state.

Across all nodes per cluster.

Pending Disk Operation Count

The average pending disk operation count within the last 10 minutes. It is the number of I/O operations that are pending at the file system level and waiting to be issued to an individual drive.

Across all disks per cluster.

CPU Usage

The average usage of CPU cores including the physical cores and hyperthreaded core within last 10 minutes.

Across all nodes per cluster.

Cluster Capacity

The current used capacity for the cluster.

N/A

Nodepool Capacity

The current used capacity for the node pool in a cluster.

N/A

Drive Capacity

The current used capacity for a drive in a cluster.

N/A

Node Capacity

The current used capacity for a node in a cluster.

N/A

Network Throughput Equivalency

Checks whether the network throughput for each node within the last 10 minutes is within the specified threshold percentage of the average network throughput of all nodes in the node pool for the same time.

Across all nodes per node pool.

 

Each KPI requires a threshold and a severity level, together forming an alert rule. You can customize the alert rules to align with specific business use cases.

 

Here is an example of an alert rule:

If CPU usage (KPI) is greater than or equal to 96% (threshold), a critical alert (severity) will be triggered.

The supported severities are:

  1. Emergency
  2. Critical
  3. Warning
  4. Information

You can combine multiple alert rules into a single alert policy for easy management purposes.

If you take a look at the chart above, you will find a new concept called Notification Rule. This is used to define the recipients' Email address and from what severity they will receive an Email:

An example of a notification rule is like this: for user A (user_a@lled.com) and user B (user_b@lled.com), they both will receive Email alerts from all severity.

If you combine the above two examples and put them into the view of alert policy, you will get:

 

At this point, you should understand the  big picture of the alert feature in IIQ 5.0.0. In my next post, I will walk you through the details of how to configure it.

 

 

Read Full Blog

Alert in IIQ 5.0.0 – Part II

Vincent Shen Vincent Shen

Mon, 11 Dec 2023 16:10:19 -0000

|

Read Time: 0 minutes

My previous post introduced one of the key features in IIQ 5.0.0 – Alert and explained how it works. In this blog, we will go into the details of how to configure it.  

How to configure an alert in IIQ 5.0.0

Configure SMTP server in IIQ

Follow the steps below to add the SMTP server in IIQ:

  1. Access Configure SMTP under Settings from the left side menu.
  2. Enter the SMTP Server IP or FQDN. Username and Password are optional.
  3. Click the Save button.

You can send a test email to verify the settings.

 

  1. SMTP configuration

Note: If you keep the SMTP Port number blank, the default will be 25 or 587 for TLS.

Manage Alerts

Create Alert Rules

To create alert rules, follow these steps:

  1. Navigate to Manage Alerts under Alerts from the left side menu.
  2. Click the Alert Rules.
  3. Click the Create Alert Rule button and a pop-up window will appear as shown below:

  1. Create Alert Rule
  2. Specify the KPI, Severity, and Threshold for it. Click the Save button.
  3. (Optional) You can create multiple Alert Rules.

Create a Notification Rule

A notification rule specifies the recipient(s) of SMTP alerts and its associated alert severity. To create a notification rule, follow these steps:

  1. Navigate to Manage Alerts located below  Alerts from the left side menu.
  2. Click the Notification Rules.
  3. Click the button Create Notification Rule and it will pop up a window as shown below.

  1. Create Notification Rule
  2. Input the Recipient Email ID(s) and choose the severity from the dropdown list of Receive Emails for.
  3. Click the Save button.

Create an alert policy

To create an alert policy, follow these steps:

  1. Navigate to Manage Alerts located under Alerts from the left side menu.
  2. Click the Alert Policies.
  3. Click the Create Policy button.
  4. Input the Name and Description in the Policy Details window and click the Next button.
  5. In the Alert Rules subpage, you can choose either Existing Alert Rules or Create Alert Rule by clicking the corresponding button. After you create the alert rules, click the Next Button.

  1. Add Alert Rules
  2. On the Cluster subpage, choose the cluster to which you want to apply the alert settings, and click the Next button.

  1. Choose clusters
  2. On the Notification Rules subpage, you can choose either Existing Notification Rules or Create Notification Rule by clicking the corresponding button. After you choose the rule, click the Next button.

  1. Specify Notification Rules
  2. Click the Save button in the final Review subpage.
  3. The following screenshot is a sample alert email:

  1. Sample email alert

View Alerts

All the alerts can be accessed in Alerts > View Alerts from the left side menu.

  1. View Alert

On this page you can:

  1. Filter alerts by selecting the Duration.
  2. Show alerts by choosing specific Clusters.
  3. Categorize alerts by different severity levels.
  4. Sort alerts by the Date & Time.

Hope you enjoy the reading. If you have any questions for suggestion on this feature, please feel free to reach out to me. (Vincent.shen@dell.com)

Read Full Blog
  • PowerScale OneFS
  • InsightIQ

Mastering Monitoring and Reporting with InsightIQ 5.0.0

Shaofei Liu Shaofei Liu

Mon, 11 Dec 2023 16:32:33 -0000

|

Read Time: 0 minutes

Overview

In the complex landscape of data management, having robust tools to monitor and analyze your data is paramount. InsightIQ 5.0.0 is your gateway to exploring the depths of historical data sourced from PowerScale OneFS. By leveraging its capabilities, you can monitor cluster activities, analyze performance, and gain insights to ensure optimal functionality.

Monitoring Clusters with Dynamic Dashboard

The InsightIQ Dashboard stands as a central hub for monitoring all your clusters, offering a comprehensive overview of their statuses and vital statistics. The dashboard facilitates quick interpretation of data and the action-based navigation links allow you to easily check on the observed statuses.

Here's a breakdown of the essential sections within this powerful monitoring interface:

 

Figure 1 IIQ Dashboard

InsightIQ Status

This section provides an overview of connected clusters, highlighting monitoring errors and any suspended monitoring activities. The InsightIQ Status icons offer a quick assessment of the monitored clusters: green signifies active monitoring, red indicates monitoring errors, grey denotes suspended monitoring or incomplete credentials. There is a fourth status icon. Blue indicates the number of PowerScale clusters whose status is outside of green, red, or grey status values. This is typically due to an internal error.

Additionally, the InsightIQ Datastore Usage icons provide insights into datastore health, with green indicating health, yellow signaling near-full capacity, and red alerting that the datastore has reached its maximum limit.

Alerts

The Alerts section within InsightIQ is a pivotal area displaying crucial data accumulated over the past 24 hours, categorized by severity—emergency, critical, warning, and information. This section shows the top three clusters with the highest number of alerts, granting immediate visibility into potential issues impacting PowerScale clusters. The dashboard offers a swift way to access the 'Alerts' section, where you have the capability to create alerts, defining Key Performance Indicators (KPIs) and thresholds, easily viewable on the dashboard for prompt action. This comprehensive alert system ensures timely responses to potential issues.

Aggregated Capacity for monitored clusters

Get insights into the used and free raw capacity across monitored clusters, as well as the estimated total usable capacity.

Performance Overview

This section presents average values for critical performance metrics like protocol latency, network throughput, CPU usage, protocol operations, active clients, and active jobs, displaying changes in statistics over the past 24 hours. It also offers a convenient link to navigate to the 'View Performance Report', facilitating in-depth analysis across various metrics.   

Monitored Clusters by % Used Capacity

This detailed breakdown showcases used capacity, free raw capacity, estimated usable capacity, and data reduction ratio for each monitored cluster. While it doesn't offer historical data, it provides real-time insights into the present cluster status. It offers quick navigation links to access both Capacity Reports and the Data Reduction Report for easy reference.   

Performance and File System Reports

The heart of InsightIQ lies in its ability to provide detailed performance reports and file system reports. These reports can be standardized or tailored to your specific needs, enabling you to track storage cluster performance efficiently. You also have the flexibility to generate Performance Reports as PDF files on a predefined schedule, enabling easy distribution via email attachments. 

InsightIQ reports are configured using modules, breakouts, and filter rules, providing a granular view of cluster components at specific data points. By employing modules and applying breakouts or filter rules, users can focus on distinct cluster components or specific attributes across the entire report. This flexibility allows the creation of tailored reports for various monitoring purposes.

Harnessing detailed metrics and insights empowers decision-making for crafting insightful performance reports. For instance, if network traffic surpasses anticipated levels across all monitored clusters, InsightIQ enables the creation of customized reports displaying detailed network throughput data. Analyzing direction-specific throughput assists in pinpointing any specific contribution to the overall traffic, aiding in precise troubleshooting and optimization strategies.

 

Figure 2 Sample Cluster Performance Report

Partitioned Performance

The Partition Performance report presents data from configurable datasets, offering insights into the top workloads consuming the most resources within a specific time range. Key data modules include: Dataset Summary, Workload Latency, Workload IOPS, Workload CPU Time, Workload Throughput, and Workload L2/L3 Cache Hits.

For more detailed information, users can focus on modules by average, top workload by max value, or pinned workload by average. 

Note: access to the Partition Performance Report, the InsightIQ user on the PowerScale cluster needs ISI_PRIV_PERFORMANCE with read permission. If unable to view the report, contact the InsightIQ admin or refer to the Dell Technologies InsightIQ 5.0.0 Administration Guide for permission configuration.

 

Figure 3  Sample Partitioned Performance Report

File System Analytic Report

File System Analytics (FSA) reports offer a comprehensive overview of the files stored within a cluster, providing essential insights into their types, locations, and usage.

InsightIQ supports two key FSA report categories: 

  • Data Usage reports focus on individual file data, revealing details like unchanged file durations. 
  • Data Property reports offer insights into the entire cluster file system, showcasing data such as file changes over specific periods and facilitating comparisons between different timeframes.

These reports help in understanding relative changes in file counts based on physical size, offering nuanced perspectives for effective file system management. For instance, by comparing Data Property reports of different clusters, you can observe patterns in file utilization—identifying clusters with regular file changes versus those housing less frequently modified files. Detecting inactive files through Data Usage reports facilitates efficient storage archiving strategies, optimizing cluster space.

These reports also play a pivotal role in verifying the expected behavior of cluster file systems. For example, dedicated archival clusters can be monitored using Data Property reports to observe file count changes. An unexpectedly high count might prompt storage admin to consider relocating files to development clusters, ensuring efficient resource utilization.

Figure 4 Data Properties Report

These FSA reports within InsightIQ not only provide visibility into cluster file systems but also serve as strategic tools for efficient storage management and troubleshooting unexpected discrepancies.

Conclusion: Empowering Data Management

InsightIQ isn't just a monitoring tool. It's a comprehensive suite offering a multitude of functionalities. It's about transforming data into actionable insights, enabling users to make informed decisions and stay ahead in the dynamic world of data management. The robust features, customizable reports, and analytics capabilities make it an invaluable asset for ensuring the optimal performance and health of PowerScale OneFS clusters.

Read Full Blog
  • PowerScale OneFS
  • InsightIQ

Understanding InsightIQ 5.0.0 Deployment Options: Simple vs. Scale

Shaofei Liu Shaofei Liu

Mon, 11 Dec 2023 16:32:33 -0000

|

Read Time: 0 minutes

Overview

InsightIQ 5.0.0 introduces two distinct deployment options catering to varying scalability needs: InsightIQ Simple and InsightIQ Scale. Let's delve into the overview of both offerings to guide your deployment decision-making process.

InsightIQ Simple

Designed for the straightforward deployment and moderate scalability, IIQ Simple accommodates up to 252 nodes or 10 clusters. Here's a snapshot of its key requirements:

  • Target Use Case: Simple deployment scenarios with moderate scaling requirements.
  • Deployment Method: VMware-based deployment using the OVA template.
  • OS Requirements: ESXi 7.0.3 and 8.0.2.
  • Hardware Requirements: VMware hardware version 15 or higher, requires a CPU of 12 vCPU, 32GB memory, and 1.5TB disk space. 

InsightIQ Scale

For organizations demanding extreme scalability, IIQ Scale steps in, supporting up to 504 nodes or 20 clusters, with potential expansion post IIQ 5.0. The details include:

  • Target Use Case: Extreme scalability requirements.
  • Deployment Method: RHEL-based deployment utilizing a specialized deployment script.
  • OS Requirements: RHEL 8.6 x64.
  • Hardware Requirements: Three virtual machines or physical servers, each with a configuration of 12 vCPU, 32GB memory, and specific storage options based on the chosen datastore location.

Here is the summary table:


InsightIQ Simple

InsightIQ Scale

Target Use Case  

Simple deployment and moderate scalability – up to 252 nodes or 10 clusters 

Extreme scalability – up to 504 nodes or 20 clusters (more in post-5.0) 

Deployment Method 

On VMware, using OVA template   

On Red Hat Enterprise Linux (RHEL) system, using an installation script

OS Requirements

ESXi 7.0.3 and 8.0.2 

RHEL 8.6 x64 

Hardware Requirements 

VMware hardware version 15 and higher – 

  • CPU: 12 vCPU 
  • Memory: 32GB 
  • Disk: 1.5TB
  • (Optional) NFS export should contain 1.5 TB

Compute: 

3 virtual machines or physical servers, with each VM or server having: 

  • CPU: 12 vCPU or Cores
  • Memory: 32GB 

 

Storage: 

  • InsightIQ datastore on NFS server.
  • 200GB local disk space per VM or server
  • 1.5TB on NFS server
  • InsightIQ datastore on local partition “/”.
  • 1TB per VM or physical server

Networking Requirements 

2 static IPs on the same network subnet, with PowerScale cluster connectivity

4 static IPs on the same network subnet, with PowerScale cluster connectivity

Leveraging the NFS export

In InsightIQ deployments, leveraging an NFS export for the datastore, whether from a PowerScale cluster or a Linux NFS server, can significantly enhance scalability. However, to ensure a seamless setup, specific prerequisites must be addressed:

Access and Permissions:

  • Guarantee accessibility of the NFS server and export from all servers/VMs where InsightIQ is deployed. This accessibility is crucial for uninterrupted data flow.
  • Set read/write permissions (chmod 777 <export_path>) to ensure unrestricted access for all users utilizing the NFS export. 

Security Measures:

  • Root User Mapping: Avoid mapping the root user (no_root_squash) to maintain secure access control.
  • Mount Access: Enable mount access to subdirectories for streamlined data retrieval and utilization.

Resource Allocation:

  • Allocate a substantial 1.5TB for the NFS export, ensuring ample space for data storage and future scalability.
  • Allocate 200GB free space on the root partition ("/") of all servers/VMs hosting InsightIQ.

By ensuring compliance with these guidelines, organizations can unlock the full potential of InsightIQ while maintaining a robust and reliable infrastructure.

IIQ 5.0.0 Support Matrix

A concise summary detailing supported OneFS versions, host OS, recommended client display configurations, and browser compatibility for both IIQ Simple and IIQ Scale deployments.

InsightIQ uses TLS 1.3 exclusively. A web browser without TLS 1.3 enabled or supported cannot access InsightIQ 5.0.0.

InsightIQ Simple

InsightIQ Scale

PowerScale OneFS

From OneFS 9.2.0.0 to 9.5.x, OneFS 9.7.x

Host OS

ESXi 7.0.3 and 8.0.2

RHEL 8.6 x64

Recommended Client display configuration

  • 1920 x 1080 resolution
  • 100% browser zoom
  • 100% "Display scale and layout" (under OS Display Properties)
  • Maximized browser window

Supported Browser

Chrome (recommended)

Mozilla Firefox

Microsoft Edge

Summary

Unlocking the full potential of InsightIQ 5.0.0 is a journey that begins with a clear understanding of deployment options and their prerequisites. Embracing your organization's scalability needs and infrastructure nuances ensures a seamless 

Read Full Blog
  • PowerScale
  • OneFS
  • CloudPools

CloudPools Supported Cloud Providers

Jason He Jason He

Thu, 07 Dec 2023 20:43:11 -0000

|

Read Time: 0 minutes

The Dell PowerScale CloudPools feature of OneFS allows tiering, enabling you to move cold or infrequently accessed data to lower-cost cloud storage. CloudPools extends the PowerScale namespace to the private cloud and the public cloud.

This blog focuses on CloudPools supported cloud providers.

CloudPools supported cloud providers

Each cloud provider offers a range of storage classes that you can choose from based on data access and cost requirements. CloudPools does not support some specific storage classes that may cause CloudPools operations failure due to unacceptable object read/write latency.

Table 1. Supported and unsupported cloud providers and storage classes for CloudPools

Cloud providers

Supported storage classes

Unsupported storage classes

Dell ECS

All

N/A

Amazon S3/C2S S3

S3 Standard

  • S3 Intelligent-Tiering
  • S3 Standard-IA
  • S3 One Zone-IA
  • S3 Glacier Instant Retrieval
  • S3 Glacier Flexible Retrieval
  • S3 Glacier Deep Archive
  • S3 Outposts

Microsoft Azure Blob Storage

Hot access tier

  • Cool access tier
  • Cold access tier
  • Archive access tier

Google Cloud

  • Standard storage
  • Nearline storage
  • Coldline storage

Archive storage

Alibaba Cloud

Standard

  • Infrequent Access(IA)
  • Archive
  • Cold Archive
  • Deep Cold Archive

To address cost requirements, users can move objects from higher cost storage class to lower cost storage class on the cloud. The tiers are as follows:

  • Supported/Tier1 storage classes: CloudPools supports the storage classes. 
  • Tier2 storage classes:  Amazon S3 supports using S3 Lifecycle policy to move CloudPools objects from the Tier1 storage class to the Tier2 storage class. This movement will not break CloudPools operations.
  • Tier3 storage classes: Amazon S3 supports using S3 Lifecycle policy to move CloudPools objects from the Tier1 storage class to the Tier3 storage class. This movement will break CloudPools operations. CloudPools objects must first be moved to Tier2 storage class or Tier1 storage class to be accessed.

Table 2. Storage classes per tier per cloud provider in CloudPools

Cloud providers

Supported/Tier1 storage classes

Tier2 storage classes

Tier3 storage classes

Amazon S3/C2S S3

S3 Standard

  • S3 Intelligent-Tiering[1]
  • S3 Standard-IA
  • S3 One Zone-IA[3]
  • S3 Glacier Instant Retrieval
  • S3 Outposts
  • S3 Glacier Flexible Retrieval
  • S3 Glacier Deep Archive

Microsoft Azure Blob Storage

Hot access tier

  • Cool access tier
  • Cold access tier
  • Archive access tier

Google Cloud

  • Standard storage
  • Nearline storage
  • Coldline storage

Archive storage

 

Alibaba Cloud

Standard

  • Infrequent Access(IA)
  • Archive[2]
  • Cold Archive
  • Deep Cold Archive

[1] Assumes no opt-in to Deep Archive Access Tier.
[2] Assumes real-time access is enabled.
[3] S3 directory buckets only allow objects stored in the S3 Express One Zone storage class and do not support S3 Lifecycle policies.

 

Resources

 

Author: Jason He, Principal Engineering Technologist


Read Full Blog
  • backup
  • PowerScale
  • OneFS
  • clusters
  • restore

Backing Up and Restoring PowerScale Cluster Configurations in OneFS 9.7

Vincent Shen Vincent Shen

Wed, 13 Dec 2023 14:00:00 -0000

|

Read Time: 0 minutes

Backing up and restoring OneFS cluster configurations is not new, as it was introduced in OneFS 9.2. However, only a limited set of components can be backed up or restored. This is a popular feature and we have received a lot of feedback that we should add more supported components. Now, with the release of OneFS 9.7, this feature gets a big enhancement. The following is a complete list of the components we support in 9.7. (The new ones are marked in blue.)

Some other enhancements include:

  1. Lock configuration during backup
  2. Support custom rules for restoring subnet IP addresses

Next, I’ll walk you through an example and explain the details of these enhancements.

Let’s take a look at the backup first.

Like what we have in the previous version, backup and restore are only available through PAPI and CLI (there is no WebUI at this stage). But I can guarantee you that the overall process is very simple and straightforward. If you are familiar with how to do it in the previous version, it’s almost the same.

You can use the following CLI command to back up a cluster configuration:

isi cluster config exports create [--components …]

Here is an example where I want to export the network configuration:

# isi cluster config exports create –components=Network
The following components’ configuration are going to be exported:
[‘Network’]
Notice:
    The exported configuration will be saved in plain text. It is recommended to encrypt it according to your specific requirements.
Do you want to continue? (yes/[no]): yes
This may take a few seconds, please wait a moment
Created export task ‘vshen-0eis0wn-20231128032252’

You can see that once the backup is triggered, a task is automatically created, and you can use the following command to view the details of the task:

isi cluster config exports view <export-id>

Here is what I have in my environment:

# isi cluster config exports view –id vshen-0eis0wn-20231128032252
     ID: vshen-0eis0wn-20231128032252
 Status: Successful
   Done: [‘network’]
 Failed: []
Pending: []
Message:
   Path: /ifs/data/Isilon_Support/config_mgr/backup/vshen-0eis0wn-20231128032252

During backup, to make a consistent configuration, a temporary lock is enabled to prevent new PAPI calls like POST, PUT, and DELETE. (The GET method will not be impacted.) In most cases, the backup job is completed quickly and it releases the lock when it finishes running.

You can use the following command to view the backup lock:

# isi cluster config lock view
Configuration lock enabled: Yes

You can also use the CLI command to manually enable or disable the lock:

# isi cluster config lock modify –action=enable
WARNING: User won’t be able to make any configuration changes after enabling configuration lock.
Are you sure you want to enable configuration lock? (yes/[no]): yes

After the backup task completes, the backup files will be generated under: /ifs/data/Isilon_Support/config_mgr/backup. Although the backup files are in plain text format, the sensitive information doesn’t appear here.

cat ./network_vshen-0eis0wn-20231128032252.json
{
  "description": {
    "component": "network",
    "release": "9.7.0.0",
    "action": "backup",
    "job_id": "vshen-0eis0wn-20231128032252",
    "result": "successful",
    "errors": []
  },
  "network": {
    "dnscache": {
      "cache_entry_limit": 65536,
       "cluster_timeout": 5,
       "dns_timeout": 5,
       "eager_refresh": 0,
       "testping_delta": 30,
       "ttl_max_noerror": 3600,
       "ttl_max_nxdomain": 3600,
…

When doing an import, you can use a command similar to the following:

# isi cluster config imports create --export-id=vshen-0eis0wn-20231128032252
Source Cluster Information:
          Cluster name: vshen-0eis0wn
       Cluster version: 9.7.0.0
            Node count: 1
  Restoring components: ['network']
Notice:
    Please review above information and make sure the target cluster has the same hardware configuration as the source cluster, otherwise the restore may fail due to hardware incompatibility. Please DO NOT use or change the cluster while configurations are being restored. Concurrent modifications are not guaranteed to be retained and some data services may be affected.
Do you want to continue? (yes/[no]): yes
This may take a few seconds, please wait a moment
Created import task 'vshen-0eis0wn-20231128064821'

When you deal with network component restore, to avoid connectivity breaks you can restore the configuration without destroying any existing subnets or pools’ IP addresses.

To do this, use the parameter “--network-subnets-ip”:

# isi cluster config imports create --export-id=vshen-0eis0wn-20231128032252 --network-subnets-ip="groupnet0.subnet0:10.242.114.0/24"
Source Cluster Information:
          Cluster name: vshen-0eis0wn
       Cluster version: 9.7.0.0
            Node count: 1
  Restoring components: ['network']
Notice:
    Please review above information and make sure the target cluster has the same hardware configuration as the source cluster, otherwise the restore may fail due to hardware incompatibility. Please DO NOT use or change the cluster while configurations are being restored. Concurrent modifications are not guaranteed to be retained and some data services may be affected.
Do you want to continue? (yes/[no]): yes
This may take a few seconds, please wait a moment
Created import task 'vshen-0eis0wn-20231128070157'

That’s how it works! As I said, it’s very simple and straightforward. If you see any errors, you can check the log: /var/log/config_mgr.log.

Author: Vincent Shen

Read Full Blog
  • PowerScale
  • AWS
  • OneFS

PowerScale OneFS 9.7

Nick Trimbee Nick Trimbee

Wed, 13 Dec 2023 13:55:00 -0000

|

Read Time: 0 minutes

Dell PowerScale is already powering up the holiday season with the launch of the innovative OneFS 9.7 release, which shipped today (13th December 2023). This new 9.7 release is an all-rounder, introducing PowerScale innovations in Cloud, Performance, Security, and ease of use.

After the debut of APEX File Storage for AWS earlier this year, OneFS 9.7 extends and simplifies the PowerScale in the public cloud offering, delivering more features on more instance types across more regions.

In addition to providing the same OneFS software platform on-prem and in the cloud, and customer-managed for full control, APEX File Storage for AWS in OneFS 9.7 sees a 60% capacity increase, providing linear capacity and performance scaling up to six SSD nodes and 1.6 PiB per namespace/cluster, and up to 10GB/s reads and 4GB/s writes per cluster. This can make it a solid fit for traditional file shares and home directories, vertical workloads like M&E, healthcare, life sciences, finserv, and next-gen AI, ML and analytics applications.

Enhancements to APEX File Storage for AWS

PowerScale’s scale-out architecture can be deployed on customer managed AWS EBS and ECS infrastructure, providing the scale and performance needed to run a variety of unstructured workflows in the public cloud. Plus, OneFS 9.7 provides an ‘easy button’ for streamlined AWS infrastructure provisioning and deployment.

Once in the cloud, you can further leverage existing PowerScale investments by accessing and orchestrating your data through the platform's multi-protocol access and APIs.

This includes the common OneFS control plane (CLI, WebUI, and platform API), and the same enterprise features: Multi-protocol, SnapshotIQ, SmartQuotas, Identity management, and so on.

With OneFS 9.7, APEX File Storage for AWS also sees the addition of support for HDFS and FTP protocols, in addition to NFS, SMB, and S3. Granular performance prioritization and throttling is also enabled with SmartQoS, allowing admins to configure limits on the maximum number of protocol operations that NFS, S3, SMB, or mixed protocol workloads can consume on an APEX File Storage for AWS cluster.

Security

With data integrity and protection being top of mind in this era of unprecedented cyber threats, OneFS 9.7 brings a bevy of new features and functionality to keep your unstructured data and workloads more secure than ever. These new OneFS 9.7 security enhancements help address US Federal and DoD mandates, such as FIPS 140-2 and DISA STIGs – in addition to general enterprise data security requirements. Included in the new OneFS 9.7 release is a simple cluster configuration backup and restore utility, address space layout randomization, and single sign-on (SSO) lookup enhancements.

Data mobility

On the data replication front, SmartSync sees the introduction of GCP as an object storage target in OneFS 9.7, in addition to ECS, AWS and Azure. The SmartSync data mover allows flexible data movement and copying, incremental resyncs, push and pull data transfer, and one-time file to object copy.

Performance improvements

Building on the streaming read performance delivered in a prior release, OneFS 9.7 also unlocks dramatic write performance enhancements, particularly for the all-flash NVMe platforms - plus infrastructure support for future node hardware platform generations. A sizable boost in throughput to a single client helps deliver performance for the most demanding GenAI workloads, particularly for the model training and inferencing phases. Additionally, the scale-out cluster architecture enables performance to scale linearly as GPUs are increased, allowing PowerScale to easily support AI workflows from small to large.

Cluster support for InsightIQ 5.0

The new InsightIQ 5.0 software expands PowerScale monitoring capabilities, including a new user interface, automated email alerts, and added security. InsightIQ 5.0 is available today for all existing and new PowerScale customers at no additional charge. These innovations are designed to simplify management, expand scale and security, and automate operations for PowerScale performance monitoring for AI, GenAI, and all other workloads.

In summary, OneFS 9.7 brings the following new features and functionality to the Dell PowerScale ecosystem:

We’ll be taking a deeper look at these new features and functionality in blog articles over the course of the next few weeks. 

Meanwhile, the new OneFS 9.7 code is available on the Dell Support site, as both an upgrade and reimage file, allowing both installation and upgrade of this new release. 

Author: Nick Trimbee
Read Full Blog
  • PowerScale
  • OneFS

PowerScale Platform Update

Nick Trimbee Nick Trimbee

Thu, 07 Dec 2023 00:51:33 -0000

|

Read Time: 0 minutes

In this article, we’ll take a quick peek at the new PowerScale Hybrid H700/7000 and Archive A300/3000 hardware platforms that were released last month. So, the current PowerScale platform family hierarchy is as follows:

 

Here’s the lowdown on the new additions to the hardware portfolio: 

Model

Tier

Drive per Chassis & Drives

Max Chassis Capacity (16TB HDD)

CPU per Node

Memory per Node

Network

H700

Hybrid/Utility

Standard:

60 x 3.5” HDD

960TB

CPU: 2.9Ghz, 16c

Mem: 384GB

FE: 100GbE

BE: 100GbE or IB

H7000

Hybrid/Utility

Deep:

80 x 3.5” HDD

1280TB

CPU: 2.9Ghz, 16c

Mem: 384GB

FE: 100GbE

BE: 100GbE or IB

A300

Archive

Standard:

60 x 3.5” HDD

960TB

CPU: 1.9Ghz, 6c

Mem: 96GB

FE: 25GbE

BE: 25GbE or IB

A3000

Archive

Deep:

80 x 3.5” HDD

1280TB

CPU: 1.9Ghz, 6c

Mem: 96GB

FE: 25GbE

BE: 25GbE or IB

 

The PowerScale H700 provides performance and value to support demanding file workloads. With up to 960 TB of HDD per chassis, the H700 also includes inline compression and deduplication capabilities to further extend the usable capacity.

The PowerScale H7000 is a versatile, high performance, and high capacity hybrid platform with up to 1280 TB per chassis. The deep chassis based H7000 is an ideal to consolidate a range of file workloads on a single platform. The H7000 includes inline compression and deduplication capabilities.

On the active archive side, the PowerScale A300 combines performance, near-primary accessibility, value, and ease of use. The A300 provides between 120 TB to 960 TB per chassis and scales to 60 PB in a single cluster. The A300 includes inline compression and deduplication capabilities. 

PowerScale A3000: is an ideal solution for high performance, high density, and deep archive storage that safeguards data efficiently for long-term retention. The A3000 stores up to 1280 TB per chassis and scales to north of 80 PB in a single cluster. The A3000 also includes inline compression and deduplication.

These new H700/7000 and A300/3000 nodes require OneFS 9.2.1, and can be seamlessly added to an existing cluster. The benefits of offering the full complement of OneFS data services includes: snapshots, replication, quotas, analytics, data reduction, load balancing, and local and cloud tiering. All also contain SSD.

Unlike the all-flash PowerScale F900, F600, and F200 stand-alone nodes, which required a minimum of 3 nodes to form a cluster, a single chassis of 4 nodes is required to create a cluster which offers support for both InfiniBand and Ethernet backend network connectivity. 

Each F700/7000 and A300/3000 chassis contains four compute modules (one per node), and five drive containers, or sleds, per node. These sleds occupy bays in the front of each chassis, with a node’s drive sleds stacked vertically:

 

 

The drive sled is a tray which slides into the front of the chassis and contains between three and four 3.5 inch drives in an H700/0 or A300/0, depending on the drive size and configuration of the particular node. Both regular hard drives or self-encrypting drives (SEDs) are available in 2,4, 8, 12, and 16TB capacities.

 

 

Each drive sled has a white ‘not safe to remove’ LED on its front top left, as well as a blue power/activity LED, and an amber fault LED.

The compute modules for each node are housed in the rear of the chassis, and contain CPU, memory, networking, and SSDs, and power supplies. Nodes 1 & 2 are a node pair, as are nodes 3 & 4. Each node-pair shares a mirrored journal and two power supplies.

Here’s the detail of an individual compute module, which contains a multi core Cascade Lake CPU, memory, M2 flash journal, up to two SSDs for L3 cache, six DIMM channels, front end 40/100 or 10/25 Gb ethernet, 40/100 or 10/25 Gb ethernet or Infiniband, an ethernet management interface, and power supply and cooling fans:

Of particular interest is the ‘journal active’ LED, which is displayed as a white ‘hand icon’. When this is illuminated, it indicates that the mirrored journal is actively vaulting. 

A node’s compute module should not be removed from the chassis while this while LED is lit!

On the front of each chassis is an LCD front panel control with back-lit buttons and 4 LED Light Bar Segments - 1 per Node. These LEDs typically display blue for normal operation or yellow to indicate a node fault. This LCD display is hinged so it can be swung clear of the drive sleds for non-disruptive HDD replacement, for example.

So, in summary, the new Gen6 hardware delivers:

  • More Power
    • More cores, more memory and more cache 
    • A300/3000 up to 2x faster than previous generation (A200/2000)
  • More Choice
    • 100GbE, 25GbE and Infiniband options for cluster interconnect
    • Node compatibility for all hybrid and archive nodes
    • 30TB to 320TB per rack unit 
  • More Value
    • Inline data reduction across the PowerScale family
    • Lowest $/GB and most density among comparable solutions

 

Read Full Blog
  • AI
  • PowerScale
  • ECS
  • safety and security
  • UDS

Harnessing Artificial Intelligence for Safety and Security

Mordekhay Shushan Brian St.Onge Mordekhay Shushan Brian St.Onge

Wed, 22 Nov 2023 00:17:57 -0000

|

Read Time: 0 minutes

In the rapidly evolving landscape of technology, we find ourselves on the brink of a major technological leap with the integration of artificial intelligence (AI) into our daily lives. The potential impact of AI on the global economy is staggering, with forecasts predicting a whopping $13 trillion contribution. While the idea of AI isn't entirely new in the security sector which has previously employed analytics to monitor and report pixel changes in CCTV footage, the integration of AI technologies such as machine and deep learning has opened up a world of possibilities. One particularly rich source of data that organizations are eager to harness is video data, which is pivotal in a variety of use cases including operational improvements for retail, marketing strategies, and the enhancement of overall customer experiences.

Industries across the board are exploring AI's ability to enhance business efficiency, underscored by a whopping 63% of enterprise clients considering their security data as mission critical. That said, the success of AI deployments hinges on the collection and storage of data. AI models thrive on large, diverse datasets to achieve effectiveness and accuracy. For instance, when analyzing traffic patterns within a city, having access to comprehensive data spanning multiple seasons allows for more accurate planning. This necessity has led to the emergence of exceptionally large storage volumes to cater to AI's insatiable appetite for data.

A considerable portion of data – approximately 80% – collected by organizations is unstructured, including video data. Data scientists are faced with the arduous task of mapping this unstructured data into their models, thanks in part to the fragmented nature of security solutions. Shockingly, over 79% of a data scientists’ time is consumed by data wrangling and collection rather than actual data analysis due to siloed data storage. Complex scenarios involving thousands of cameras pointed at different targets further complicate the application of AI models to this data.

Recent discussions in the field of AI have introduced the concept of ‘Data Fuzion,’ which underscores the importance of consolidating and harmonizing data, overcoming the current infrastructure's obstacles, and making data more accessible and usable for data science applications in the security industry. There is a significant divide between the potential for data science solutions to drive business outcomes and the actual implementation, largely attributed to – as previously mentioned – the fragmented, siloed nature of data storage and the scarcity of in-house data science expertise.

The AI solutions available today in the security domain often come as black box offerings with pre-programmed models, however end-users are increasingly seeking low- or no-code AI tools that allow them to tailor and modify models to meet their specific organizational needs. This shift enables organizations to fine-tune AI to their precise requirements, further optimizing business outcomes. Additionally, the rise of cloud computing has presented budgetary challenges as organizations are increasingly paying for data access, leading to a trend of cloud repatriation – moving data back to on-premises environments to better manage costs and reduce latency in real-time applications.

AI is transforming the way organizations protect not only their external security but also their internal data. Dell Technologies, for example, offers a solution known as Ransomware Defender within its unstructured data offerings, an AI-based detection tool which identifies anomalies and takes action when malicious actors attempt to encrypt or delete data by modeling typical behaviors and sounding alarms when suspicious activities occur. Check out the Dell Technologies cyber security solution page for more information.

To fully harness the power of AI and navigate these complex data landscapes, organizations are turning to single-volume unstructured data solutions that embody the concept of ‘Data Fuzion.’ Dell Technologies Unstructured Data Solutions, with their petabyte-scale single-volume architecture, offer not only the ability to support this burgeoning workload but also robust cyber protection and multi-cloud capabilities. In this way, organizations can chart a seamless path towards AI adoption while ensuring data-driven security and efficiency. Visit the Dell Technologies PowerScale solutions page to learn more.

Resources

Authors: Mordekhay Shushan | Safety and Security Solution Architect & Brian Stonge | Business Development Manager, Video Safety and Security


 

Read Full Blog
  • PowerScale
  • OneFS
  • troubleshooting
  • SSO

OneFS WebUI Single Sign-on Management and Troubleshooting

Nick Trimbee Nick Trimbee

Thu, 16 Nov 2023 20:53:16 -0000

|

Read Time: 0 minutes

Earlier in this series, we took a look at the architecture of the new OneFS WebUI SSO functionality. Now, we move on to its management and troubleshooting.

As we saw in the previous article, once the IdP and SP are configured, a cluster admin can enable SSO per access zone using the OneFS WebUI by navigating to Access > Authentication providers > SSO. From here, select the desired access zone and click the ‘Enable SSO’ toggle:

Or from the OneFS CLI using the following syntax:

# isi auth sso settings modify --sso-enabled 1

Once complete, the SSO configuration can be verified from a client web browser by browsing to the OneFS login screen. If all is operating correctly, redirection to the ADFS login screen will occur. For example:

After successful authentication with ADFS, cluster access is granted and the browser session is redirected back to the OneFS WebUI .

In addition to the new SSO WebUI pages, OneFS 9.5 also adds a subcommand to the ‘isi auth’ command set for configuring SSO from the CLI. This new syntax includes:

  • isi auth sso idps
  • isi auth sso settings  
  • isi auth sso sp

With these, you can use the following procedure to configure and enable SSO using the OneFS command line.

1. Define the ADFS instance in OneFS.

Enter the following command to create the IdP account:

# isi auth ads create <domain_name> <user> --password=<password> ...

where:

Attribute

Description

<domain_name>

Fully qualified Active Directory domain name that identifies the ADFS server. For example, idp1.isilon.com.

<user>

The user account that has permission to join machines to the given domain.

<password>

The password for <user>.

2. Next, add the IdP to the pertinent OneFS zone. Note that each of a cluster’s access zone(s) must have an IdP configured for it. The same IdP can be used for all the zones, but each access zone must be configured separately.

# isi zone zones modify --add-auth-providers

For example:

# isi zone zones modify system --add-auth-providers=lsa-activedirectoryprovider:idp1.isilon.com

3. Verify that OneFS can find users in Active Directory.

# isi auth users view idp1.isilon.com\\<username>

In the output, ensure that an email address is displayed. If not, return to Active Directory and assign email addresses to users.

4. Configure the OneFS hostname for SAML SSO.

# isi auth sso sp modify --hostname=<name>

Where <name> is the name that SAML SSO can use to represent the OneFS cluster to ADFS. SAML redirects clients to this hostname.

5. Obtain the ADFS metadata and store it under /ifs on the cluster.

In the following example, an HTTPS GET request is issued using the 'curl' utility to obtain the metadata from the IDP and store it under /ifs on the cluster.

# curl -o /ifs/adfs.xml https://idp1.isilon.com/FederationMetadata/2007-06/ FederationMetadata.xml

6. Create the IdP on OneFS using the ‘metadata-location’ path for the xml file in the previous step.

# isi auth sso idps create idp1.isilon.com --metadata-location="/ifs/adfs.xml"

7. Enable SSO:

# isi auth sso settings modify --sso-enabled=yes -–zone <zone>

Use the following syntax to view the IdP configuration:

# isi auth sso idps view <idp_ID>

For example:

# isi auth sso idps view idp
ID: idp
Metadata Location: /ifs/adfs.xml
Entity ID: https://dns.isilon.com/adfs/services/trust
Login endpoint of the IDP
URL: https://dns.isilon.com/adfs/ls/
Binding: wrn:oasis:names:tc:SAML:2.0:bidings:HTTP-Redirect
Logout endpoint of the IDP
URL: https://dns.isilon.com/adfs/ls/
Binding: wrn:oasis:names:tc:SAML:2.0:bidings:HTTP-Redirect
Response URL: -
Type: metadata
Signing Certificate: -
        Path:
        Issuer: CN-ADFS Signing – dns.isilon.com
        Subject: CN-ADFS Signing – dns.isilon.com
        Not Before: 2023-02-02T22:22:00
        Not After: 2024-02-02T22:22:00
        Status: valid
Value and Type
        Value: -----BEGIN CERTIFICATE-----
MITC9DCCAdygAwIBAgIQQQQc55appr1CtfPNj5kv+DANBgkqhk1G9w8BAQsFADA2
<snip>

Troubleshooting

If the IdP and/or SP Signing certificate happens to expire, users will be unable to login to the cluster with SSO and an error message will be displayed on the login screen.

In this example, the IdP certificate has expired, as described in the alert message. When this occurs, a warning is also displayed on the SSO Authentication page, as shown here:

To correct this, download either a new signing certificate from the identity provider or a new metadata file containing the IdP certificate details. When this is complete, you can then update the cluster’s IdP configuration by uploading the XML file or the new certificate.

Similarly, if the SP certificate has expired, the following notification alert is displayed upon attempted login:

The following error message is also displayed on the WebUI SSO tab, under Access > Authentication providers > SSO, along with a link to regenerate the metadata file:

The expired SP signing key and certificate can also be easily regenerated from the OneFS CLI:

# isi auth sso sp signing-key rekey
This command will delete any existing signing key and certificate and replace them with a newly generated signing key and certificate. Make sure the newly generated certificate is added to the IDP to ensure that the IDP can verify messages sent from the cluster. Are you sure?  (yes/[no]):   yes
# isi auth sso sp signing-key dump
-----BEGIN CERIFICATE-----
MIIE6TCCAtGgAwIBAgIJAP30nSyYUz/cMA0GCSqGSIb3DQEBCwUAMCYxJDAiBgNVBAMMG1Bvd2VyU2NhbGUgU0FNTCBTaWduaWSnIEtleTAeFw0yMjExMTUwMzU0NTFaFw0yMzExMTUwMzU0NTFaMCYxJDAiBgNVBAMMG1Bvd2VyU2NhbGUgU0FNTCBTaWduaWSnIEtleTCCAilwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAMOOmYJ1aUuxvyH0nbUMurMbQubgtdpVBevy12D3qn+x7rgym8/v50da/4xpMmv/zbE0zJ0IVbWHZedibtQhLZ1qRSY/vBlaztU/nA90XQzXMnckzpcunOTG29SMO3x3Ud4*fqcP4sKhV
<snip>

When it is regenerated, either the XML file or certificate can be downloaded, and the cluster configuration updated by either metadata download or manual copy:

Finally, upload the SP details back to the identity provider.

For additional troubleshooting of OneFS SSO and authentication issues, there are some key log files to check. These include:

Log file

Information

/var/log/isi_saml_d.log

SAML specific log messages logged by isi_saml_d.

/var/log/apache2/webui_httpd_error.log

WebUI error messages including some SAML errors logged by the WebUI HTTP server.

/var/log/jwt.log

Errors related to token generation logged by the JWT service.

/var/log/lsassd.log

General authentication errors logged by the ‘lsassd’ service, such as failing to lookup users by email.

Author: Nick Trimbee

Read Full Blog
  • Isilon
  • PowerScale
  • OneFS
  • NAS
  • Dell
  • Cluster
  • Scale-out

OneFS NFS Locking and Reporting – Part 2

Nick Trimbee Nick Trimbee

Mon, 13 Nov 2023 17:58:49 -0000

|

Read Time: 0 minutes

In the previous article in this series, we took a look at the new NFS locks and waiters reporting CLI command set and API endpoints. Next, we turn our attention to some additional context, caveats, and NFSv3 lock removal.

Before the NFS locking enhancements in OneFS 9.5, the legacy CLI commands were somewhat inefficient.  Their output also included other advisory domain locks such as SMB, which made the output more difficult to parse. The table below maps the new 9.5 CLI commands (and corresponding handlers) to the old NLM syntax.

Type / Command set

OneFS 9.5 and later

OneFS 9.4 and earlier

Locks

isi nfs locks

isi nfs nlm locks

Sessions

isi nfs nlm sessions

isi nfs nlm sessions

Waiters

isi nfs locks waiters

isi nfs nlm locks waiters

Note that the isi_classic nfs locks and waiters CLI commands have also been deprecated in OneFS 9.5.

When upgrading to OneFS 9.5 or later from a prior release, the legacy platform API handlers continue to function through and post upgrade. Thus, any legacy scripts and automation are protected from this lock reporting deprecation. Additionally, while the new platform API handlers will work in during a rolling upgrade in mixed-mode, they will only return results for the nodes that have already been upgraded (‘high nodes’).

Be aware that the NFS locking CLI framework does not support partial responses. However, if a node is down or the cluster has a rolling upgrade in progress, the alternative is to query the equivalent platform API endpoint instead.

Performance-wise, on very large busy clusters, there is the possibility that the lock and waiter CLI commands’ output will be sluggish. In such instances, the --timeout flag can be used to increase the command timeout window. Output filtering can also be used to reduce number of locks reported.

When a lock is in a transition state, there is a chance that it may not have/report a version. In these instances, the Version field will be represented as . For example:

# isi nfs locks list -v
Client: 1/TMECLI1:487722/10.22.10.250
Client ID: 487722351064074
LIN: 4295164422
Path: /ifs/locks/nfsv3/10.22.10.250_1
Lock Type: exclusive
Range: 0, 92233772036854775807
Created: 2023-08-18T08:03:52
Version: -
---------------------------------------------------------------
Total: 1

This behavior should be experienced very infrequently. However, if it is encountered, simply execute the CLI command again, and the lock version should be reported correctly.

When it comes to troubleshooting NFSv3/NLM issues, if an NFSv3 client is consistently experiencing NLM_DENIED or other lock management issues, this is often a result of incorrectly configured firewall rules. For example, take the following packet capture (PCAP) excerpt from an NFSv4 Linux client:

   21 08:50:42.173300992  10.22.10.100 → 10.22.10.200 NLM 106    V4 LOCK Reply (Call In 19) NLM_DENIED

Often, the assumption is that only the lockd or statd ports on the server side of the firewall need to be opened and that the client always makes that connection that way. However, this is not the case. Instead, the server will continually respond with a ‘let me get back to you’, then later reconnect to the client. As such, if the firewall blocks access to rcpbind on the client and/or lockd or statd on the client, connection failures will likely occur.

Occasionally, it does become necessary to remove NLM locks and waiters from the cluster. Traditionally, the isi_classic nfs clients rm command was used, however that command has limitations and is fully deprecated in OneFS 9.5 and later. Instead, the preferred method is to use the isi nfs nlm sessions CLI utility in conjunction with various other ancillary OneFS CLI commands to clear problematic locks and waiters. 

Note that the isi nfs nlm sessions CLI command, available in all current OneFS version, is Zone-Aware. The output formatting is seen in the output for the client holding the lock as it now shows the Zone ID number at the beginning. For example:

 4/tme-linux1/10.22.10.250 

This represents:

Zone ID 4 / Client tme-linux1 / IP address of cluster node holding the connection.

A basic procedure to remove NLM locks and waiters from a cluster is as follows: 
 
1. List the NFS locks and search for the pertinent filename. 

In OneFS 8.5 and later, the locks list can be filtered using the --path argument.

# isi nfs locks list --path=<path> | grep <filename>

Be aware that the full path must be specified, starting with /ifs. There is no partial matching or substitution for paths in this command set.

For OneFS 9.4 and earlier, the following CLI syntax can be used:

#  isi_for_array -sX 'isi nfs nlm locks list | grep <filename>'


2. List the lock waiters associated with the same filename using |grep.

For OneFS 8.5 and later, the waiters list can also be filtered using the --path syntax:

# isi nfs locks waiters –path=<path> | grep <filename> 

With OneFS 9.4 and earlier, the following CLI syntax can be used:

# isi_for_array -sX 'isi nfs nlm locks waiters |grep -i <filename>'


3. Confirm the client and logical inode number (LIN) being waited upon. 

This can be accomplished by querying the efs.advlock.failover.lock_waiters sysctrl. For example:

# isi_for_array -sX 'sysctl efs.advlock.failover.lock_waiters'

[truncated output]
 ...
 client = { '4/tme-linux1/10.20.10.200’, 0x26593d37370041 }
 ...
resource = 2:df86:0218

Note that for sanity checking, the isi get -L CLI utility can be used to confirm the path of a file from its LIN:

isi get -L <LIN>


4. Remove the unwanted locks which are causing waiters to stack up. 

Keep in mind that the isi nfs nlm sessions command syntax is access zone-aware.

List the access zones by their IDs.

# isi zone zones list -v | grep -iE "Zone ID|name"

Once the desired zone ID has been determined, the isi_run -z CLI utility can be used to specify the appropriate zone in which to run the isi nfs nlm sessions commands: 

# isi_run -z 4 -l root

Next, the isi nfs nlm sessions delete CLI command will remove the specific lock waiter which is causing the issue. The command syntax requires specifying the client hostname and node IP of the node holding the lock. 

# isi nfs nlm sessions delete –-zone <AZ_zone_ID> <hostname> <cluster-ip>

For example:

# isi nfs nlm sessions delete –zone 4 tme-linux1 10.20.10.200
 Are you sure you want to delete all NFSv3 locks associated with client tme-linux1 against cluster IP 10.20.10.100? (yes/[no]): yes


5. Repeat the commands in step 1 to confirm that the desired NLM locks and waiters have been successfully culled.
 


BEFORE applying the process....

 # isi_for_array -sX 'isi nfs nlm locks list |grep JUN'
 TME-1: 4/tme-linux1/192.168.2.214  /ifs/tmp/TME/sequences/mncr_fabjob_seq_file_27JUN2017
 TME-1: 4/ tme-linux1/192.168.2.214  /ifs/tmp/TME/sequences/mncr_fabjob_seq_file_28JUN2017
 TME-2: 4/ tme-linux1/192.168.2.214  /ifs/tmp/TME/sequences/mncr_fabjob_seq_file_27JUN2017
 TME-2: 4/ tme-linux1/192.168.2.214  /ifs/tmp/TME/sequences/mncr_fabjob_seq_file_28JUN2017
 TME-3: 4/ tme-linux1/192.168.2.214  /ifs/tmp/TME/sequences/mncr_fabjob_seq_file_27JUN2017
 TME-3: 4/ tme-linux1/192.168.2.214  /ifs/tmp/TME/sequences/mncr_fabjob_seq_file_28JUN2017
 TME-4: 4/ tme-linux1/192.168.2.214  /ifs/tmp/TME/sequences/mncr_fabjob_seq_file_27JUN2017
 TME-4: 4/ tme-linux1/192.168.2.214  /ifs/tmp/TME/sequences/mncr_fabjob_seq_file_28JUN2017
 TME-5: 4/ tme-linux1/192.168.2.214  /ifs/tmp/TME/sequences/mncr_fabjob_seq_file_27JUN2017
 TME-5: 4/ tme-linux1/192.168.2.214  /ifs/tmp/TME/sequences/mncr_fabjob_seq_file_28JUN2017
 TME-6: 4/ tme-linux1/192.168.2.214  /ifs/tmp/TME/sequences/mncr_fabjob_seq_file_27JUN2017
 TME-6: 4/ tme-linux1/192.168.2.214  /ifs/tmp/TME/sequences/mncr_fabjob_seq_file_28JUN2017
 
 
 # isi_for_array -sX 'isi nfs nlm locks waiters |grep -i JUN'
 TME-1: 4/ tme-linux1/192.168.2.214  /ifs/tmp/TME/sequences/mncr_fabjob_seq_file_28JUN2017
 TME-1: 4/ tme-linux1/192.168.2.214  /ifs/tmp/TME/sequences/mncr_fabjob_seq_file_28JUN2017
 TME-2 exited with status 1
 TME-3 exited with status 1
 TME-4 exited with status 1
 TME-5 exited with status 1
 TME-6 exited with status 1


AFTER...

TME-1# isi nfs nlm sessions delete --hostname= tme-linux1 --cluster-ip=192.168.2.214
 Are you sure you want to delete all NFSv3 locks associated with client tme-linux1 against cluster IP 192.168.2.214? (yes/[no]): yes
 TME-1#
 TME-1#
 TME-1# isi_for_array -sX 'sysctl efs.advlock.failover.locks |grep 2:ce75:0319'
 TME-1 exited with status 1
 TME-2 exited with status 1
 TME-3 exited with status 1
 TME-4 exited with status 1
 TME-5 exited with status 1
 TME-6 exited with status 1
 TME-1#
 TME-1# isi_for_array -sX 'isi nfs nlm locks list |grep -i JUN'
 TME-1 exited with status 1
 TME-2 exited with status 1
 TME-3 exited with status 1
 TME-4 exited with status 1
 TME-5 exited with status 1
 TME-6 exited with status 1
 TME-1#
 TME-1# isi_for_array -sX 'isi nfs nlm locks waiters |grep -i JUN'
 TME-1 exited with status 1
 TME-2 exited with status 1
 TME-3 exited with status 1
 TME-4 exited with status 1
 TME-5 exited with status 1
 TME-6 exited with status 1

 

Author: Nick Trimbee


Read Full Blog
  • Isilon
  • PowerScale
  • OneFS
  • NAS
  • Dell
  • Cluster
  • Scale-out

OneFS NFS Locking

Nick Trimbee Nick Trimbee

Mon, 13 Nov 2023 17:56:59 -0000

|

Read Time: 0 minutes

Included among the plethora of OneFS 9.5 enhancements is an updated NFS lock reporting infrastructure, command set, and corresponding platform API endpoints. This new functionality includes enhanced listing and filtering options for both locks and waiters, based on NFS major version, client, LIN, path, creation time, etc. But first, some backstory.

The ubiquitous NFS protocol underwent some fundamental architectural changes between its versions 3 and 4. One of the major differences concerns the area of file locking.

NFSv4 is the most current major version of the protocol, natively incorporating file locking and thereby avoiding the need for any additional (and convoluted) RPC callback mechanisms necessary with prior NFS versions. With NFSv4, locking is built into the main file protocol and supports new lock types, such as range locks, share reservations, and delegations/oplocks, which emulate those found in Window and SMB.

File lock state is maintained at the server under a lease-based model. A server defines a single lease period for all states held by an NFS client. If the client does not renew its lease within the defined period, all states associated with the client's lease may be released by the server. If released, the client may either explicitly renew its lease or simply issue a read request or other associated operation. Additionally, with NFSv4, a client can elect whether to lock the entire file or a byte range within a file. 

In contrast to NFSv4, the NFSv3 protocol is stateless and does not natively support file locking. Instead, the ancillary Network Lock Manager (NLM) protocol supplies the locking layer. Since file locking is inherently stateful, NLM itself is considered stateful. For example, when an NFSv3 filesystem mounted on an NFS client receives a request to lock a file, it generates an NLM remote procedure call instead of an NFS remote procedure call. 

The NLM protocol itself consists of remote procedure calls that emulate the standard UNIX file control (fcntl) arguments and outputs. Because a process blocks waiting for a lock that conflicts with another lock holder – also known as a ‘blocking lock’ – the NLM protocol has the notion of callbacks from the file server to the NLM client to notify that a lock is available. As such, the NLM client sometimes acts as an RPC server in order to receive delayed results from lock calls. 

Attribute

NFSv3

NFSv4

State

Stateless - A client does not technically establish a new session if it has the correct information to ask for files and so on. This allows for simple failover between OneFS nodes using dynamic IP pools.

Stateful - NFSv4 uses sessions to handle communication. As such, both client and server must track session state to continue communicating.

Presentation

User and Group info is presented numerically - Client and Server communicate user information by numeric identifiers, allowing the same user to appear as different names between client and server.

User and Group info is presented as strings - Both the client and server must resolve the names of the numeric information stored. The server must look up names to present while the client must remap those to numbers on its end.

Locking

File Locking is out of band - uses NLM to perform locks. This requires the client to respond to RPC messages from the server to confirm locks have been granted, etc.

File Locking is in band - No longer uses a separate protocol for file locking, instead making it a type of call that is usually compounded with OPENs, CREATEs, or WRITEs.

Transport

Can run over TCP or UDP - This version of the protocol can run over UDP instead of TCP, leaving handling of loss and retransmission to the software instead of the operating system. We always recommend using TCP.

Only supports TCP - Version 4 of NFS has left loss and retransmission up to the underlying operating system. Can batch a series of calls in a single packet, allowing the server to process all of them and reply at the end. This is used to reduce the number of calls involved in common operations.

Since NFSv3 is stateless, it requires more complexity to recover from failures like client and server outages and network partitions. If an NLM server crashes, NLM clients that are holding locks must reestablish them on the server when it restarts. The NLM protocol deals with this by having the status monitor on the server send a notification message to the status monitor of each NLM client that was holding locks. The initial period after a server restart is known as the grace period, during which only requests to reestablish locks are granted. Thus, clients that reestablish locks during the grace period are guaranteed to not lose their locks. 

When an NLM client crashes, ideally any locks it was holding at the time are removed from the pertinent NLM server(s). The NLM protocol handles this by having the status monitor on the client send a message to each server's status monitor once the client reboots. The client reboot indication informs the server that the client no longer requires its locks. However, if the client crashes and fails to reboot, the client's locks will persist indefinitely. This is undesirable for two primary reasons: Resources are indefinitely leaked. Eventually, another client will want to get a conflicting lock on at least one of the files the crashed client had locked and, as a result, the other client is postponed indefinitely.

Therefore, having NFS server utilities to swiftly and accurately report on lock and waiter status and utilities to clear NFS lock waiters is highly desirable for administrators – particularly on clustered storage architectures.

Prior to OneFS 9.5, the old NFS locking CLI commands were somewhat inefficient and also showed other advisory domain locks, which rendered the output somewhat confusing. The following table shows the new CLI commands (and corresponding handlers) which replace the older NLM syntax.

Type / Command set

OneFS 9.4 and earlier

OneFS 9.5

Locks

isi nfs nlm locks

isi nfs locks

Sessions

isi nfs nlm sessions

isi nfs nlm sessions

Waiters

isi nfs nlm locks waiters

isi nfs locks waiters

In OneFS 9.5 and later, the old API handlers will still exist to avoid breaking existing scripts and automation, however the CLI command syntax is deprecated and will no longer work.

Also be aware that the isi_classic nfs locks and waiters CLI commands have also been disabled in OneFS 9.5. Attempts to run these will yield the following warning message: 

# isi_classic nfs locks
This command has been disabled. Please use isi nfs for this functionality.

The new isi nfs locks CLI command output includes the following locks object fields:

Field

Description

Client

The client host name, Frequently Qualified Domain Name, or IP

Client_ID

The client ID (internally generated)

Created

The UNIX Epoch time that the lock was created

ID

The lock ID (Id necessary for platform API sorting, not shown in CLI output)

LIN

The logical inode number (LIN) of the locked resource

Lock_type

The type of lock (shared, exclusive, none)

Path

Path of locked file

Range

The byte range within the file that is locked

Version

The NFS major version: v3, or v4

Note that the ISI_NFS_PRIV RBAC privilege is required in order to view the NFS locks or waiters via the CLI or PAPI. In addition to ‘root’, the cluster’s ‘SystemAdmin’ and ‘SecurityAdmin’ roles contain this privilege by default.

Additionally, the new locks CLI command sets have a default timeout of 60 seconds. If the cluster is very large, the timeout may need to be increased for the CLI command. For example:

# isi –timeout <timeout value> nfs locks list

 The basic architecture of the enhanced NFS locks reporting framework is as follows:

The new API handlers leverage the platform API proxy, yielding increased performance over the legacy handlers. Additionally, updated syscalls have been implemented to facilitate filtering by NFS service and major version.

Since NFSv3 is stateless, the cluster does not know when a client has lost its state unless it reconnects. For maximum safety, the OneFS locking framework (lk) holds locks forever. The isi nfs nlm sessions CLI command allows administrators to manually free NFSv3 locks in such cases, and this command remains available in OneFS 9.5 as well as prior versions. NFSv3 locks may also be leaked on delete, since a valid inode is required for lock operations. As such, lkf has a lock reaper which periodically checks for locks associated with deleted files.

In OneFS 9.5 and later, current NFS locks can be viewed with the new isi nfs locks list command. This command set also provides a variety of options to limit and format the display output. In its basic form, this command generates a basic list of client IP address and the path. For example:

# isi nfs locks list
Client                              Path
-------------------------------------------------------------------
1/TMECLI1:487722/10.22.10.250       /ifs/locks/nfsv3/10.22.10.250_1
1/TMECLI1:487722/10.22.10.250       /ifs/locks/nfsv3/10.22.10.250_2
Linux NFSv4.0 TMECLI1:487722/10.22.10.250       /ifs/locks/nfsv4/10.22.10.250_1
Linux NFSv4.0 TMECLI1:487722/10.22.10.250       /ifs/locks/nfsv4/10.22.10.250_2
-------------------------------------------------------------------
Total: 4

To include more information, the -v flag can be used to generate a verbose locks listing:

 # isi nfs locks list -v
Client: 1/TMECLI1:487722/10.22.10.250
Client ID: 487722351064074
LIN: 4295164422
Path: /ifs/locks/nfsv3/10.22.10.250_1
Lock Type: exclusive
Range: 0, 92233772036854775807
Created: 2023-08-18T08:03:52
Version: v3
---------------------------------------------------------------
Client: 1/TMECLI1:487722/10.22.10.250
Client ID: 5175867327774721
LIN: 42950335042
Path: /ifs/locks/nfsv3/10.22.10.250_1
Lock Type: exclusive
Range: 0, 92233772036854775807
Created: 2023-08-18T08:10:31
Version: v3
---------------------------------------------------------------
Client: Linux NFSv4.0 TMECLI1:487722/10.22.10.250
Client ID: 487722351064074
LIN: 429516442
Path: /ifs/locks/nfsv3/10.22.10.250_1
Lock Type: exclusive
Range: 0, 92233772036854775807
Created: 2023-08-18T08:19:48
Version: v4
---------------------------------------------------------------
Client: Linux NFSv4.0 TMECLI1:487722/10.22.10.250
Client ID: 487722351064074
LIN: 4295426674
Path: /ifs/locks/nfsv3/10.22.10.250_2
Lock Type: exclusive
Range: 0, 92233772036854775807
Created: 2023-08-18T08:17:02
Version: v4
---------------------------------------------------------------
Total: 4

The previous syntax returns more detailed information for each lock, including client ID, LIN, path, lock type, range, created date, and NFS version.

The lock listings can also be filtered by client or client-id. Note that the --client option must be the full name in quotes:

# isi nfs locks list --client="full_name_of_client/IP_address" -v

For example:

# isi nfs locks list --client="1/TMECLI1:487722/10.22.10.250" -v
Client: 1/TMECLI1:487722/10.22.10.250
Client ID: 5175867327774721
LIN: 42950335042
Path: /ifs/locks/nfsv3/10.22.10.250_1
Lock Type: exclusive
Range: 0, 92233772036854775807
Created: 2023-08-18T08:10:31
Version: v3

Additionally, be aware that the CLI does not support partial names, so the full name of the client must be specified.

Filtering by NFS version can be helpful when attempting to narrow down which client has a lock. For example, to show just the NFSv3 locks:

# isi nfs locks list --version=v3 
Client                              Path
-------------------------------------------------------------------
1/TMECLI1:487722/10.22.10.250       /ifs/locks/nfsv3/10.22.10.250_1
1/TMECLI1:487722/10.22.10.250       /ifs/locks/nfsv3/10.22.10.250_2
-------------------------------------------------------------------
Total: 2

Note that the –-version flag supports both v3 and nlm as arguments and will return the same v3 output in either case. For example:

# isi nfs locks list --version=nlm
Client                              Path
-------------------------------------------------------------------
1/TMECLI1:487722/10.22.10.250       /ifs/locks/nfsv3/10.22.10.250_1
1/TMECLI1:487722/10.22.10.250       /ifs/locks/nfsv3/10.22.10.250_2
-------------------------------------------------------------------
Total: 2

Filtering by LIN or path is also supported. For example, to filter by LIN:

# isi nfs locks list --lin=42950335042 -v
Client: 1/TMECLI1:487722/10.22.10.250
Client ID: 5175867327774721
LIN: 42950335042
Path: /ifs/locks/nfsv3/10.22.10.250_1
Lock Type: exclusive
Range: 0, 92233772036854775807
Created: 2023-08-18T08:10:31
Version: v3

Or by path:

# isi nfs locks list --path=/ifs/locks/nfsv3/10.22.10.250_2
 -v
Client: Linux NFSv4.0 TMECLI1:487722/10.22.10.250
Client ID: 487722351064074
LIN: 4295426674
Path: /ifs/locks/nfsv3/10.22.10.250_2
Lock Type: exclusive
Range: 0, 92233772036854775807
Created: 2023-08-18T08:17:02
Version: v4

Be aware that the full path must be specified, starting with /ifs. There is no partial matching or substitution for paths in this command set.

Filtering can also be performed by creation time, for example:

# isi nfs locks list --created=2023-08-17T09:30:00 -v 

Note that when filtering by created, the output will include all locks that were created before or at the time provided.

The —limits argument can be used to curtail the number of results returned, and limits can be used in conjunction with all other query options. For example, to limit the output of the NFSv4 locks listing to one lock:

# isi nfs locks list -–version=v4 --limit=1 

Note that limit can be used with the range of query types.

The filter options are mutually exclusive with the exception of version. Note that version can be used with any of the other filter options. For example, filtering by both created and version.

This can be helpful when troubleshooting and trying to narrow down results.

In addition to locks, OneFS 9.5 also provides the isi nfs locks waiters CLI command set. Note that waiters are specific to NFSv3 clients, and the CLI reports any v3 locks that are pending and not yet granted.

Since NFSv3 is stateless, a cluster does not know when a client has lost its state unless it reconnects. For maximum safety, lk holds locks forever. The isi nfs nlm command allows administrators to manually free locks in such cases. Locks may also be leaked on delete, since a valid inode is required for lock operations. Thus, lkf has a lock reaper which periodically checks for locks associated with deleted files:

# isi nfs locks waiters

The waiters CLI syntax uses a similar range of query arguments as the isi nfs locks list command set.

In addition to the CLI, the platform API can also be used to query both NFS locks and NFSv3 waiters. For example, using curl to view the waiters via the OneFS pAPI:

# curl -k -u <username>:<passwd> https://localhost:8080/platform/protocols/nfs/waiters”
{
“total” : 2,
“waiters”;
}
{
“client” : “1/TMECLI1487722/10.22.10.250”,
“client_id” : “4894369235106074”,
“created” : “1668146840”,
“id” : “1 1YUIAEIHVDGghSCHGRFHTiytr3u243567klj212-MANJKJHTTy1u23434yui-ouih23ui4yusdftyuySTDGJSDHVHGDRFhgfu234447g4bZHXhiuhsdm”,
“lin” : “4295164422”,
“lock_type” : “exclusive”
“path” : “/ifs/locks/nfsv3/10.22.10.250_1”
“range” : [0, 92233772036854775807 ],
“version” : “v3”
}
},
“total” : 1
}

Similarly, using the platform API to show locks filtered by client ID:

# curl -k -u <username>:<passwd> “https://<address>:8080/platform/protocols/nfs/locks?client=<client_ID>”

For example:

# curl -k -u <username>:<passwd> “https://localhost:8080/platform/protocols/nfs/locks?client=1/TMECLI1487722/10.22.10.250”
{
“locks”;
}
{
“client” : “1/TMECLI1487722/10.22.10.250”,
“client_id” : “487722351064074”,
“created” : “1668146840”,
“id” : “1 1YUIAEIHVDGghSCHGRFHTiytr3u243567FCUJHBKD34NMDagNLKYGHKHGKjhklj212-MANJKJHTTy1u23434yui-ouih23ui4yusdftyuySTDGJSDHVHGDRFhgfu234447g4bZHXhiuhsdm”,
“lin” : “4295164422”,
“lock_type” : “exclusive”
“path” : “/ifs/locks/nfsv3/10.22.10.250_1”
“range” : [0, 92233772036854775807 ],
“version” : “v3”
}
},
“Total” : 1
}

Note that, as with the CLI, the platform API does not support partial name matches, so the full name of the client must be specified.

 

Author: Nick Trimbee


Read Full Blog
  • Isilon
  • PowerScale
  • OneFS
  • NAS
  • Dell
  • Cluster
  • Scale-out

OneFS SSL Certificate Creation and Renewal – Part 2

Nick Trimbee Nick Trimbee

Mon, 13 Nov 2023 17:56:44 -0000

|

Read Time: 0 minutes

In the initial article in this series, we took a look at the OneFS SSL architecture, plus the first two steps in the basic certificate renewal or creation flow detailed below:

Backup existing SSL certificate > Renew/create certificate > Sign SSL certificate > Add certificate to cluster > Verify SSL certificate

The following procedure includes options to complete a self-signed certificate replacement or renewal or to request an SSL replacement or renewal from a Certificate Authority (CA).


Signing the SSL Certificate

Sign SSL certificate

At this point, depending on the security requirements of the environment, the certificate can either be self-signed or signed by a Certificate Authority.

Self-Sign the SSL Certificate 

The following CLI syntax can be used to self-sign the certificate with the key, creating a new signed certificate which, in this instance, is valid for 1 year (365 days):     

# openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt

To verify that the key matches the certificate, ensure that the output of the following CLI commands return the same md5 checksum value:    

# openssl x509 -noout -modulus -in server.crt | openssl md5           
# openssl rsa -noout -modulus -in server.key | openssl md5

Next, proceed to the Add certificate to cluster section of this article once this step is complete. 

Use a CA to Sign the Certificate

If a CA is signing the certificate, ensure that the new SSL certificate is in x509 format and includes the entire certificate trust chain.

Note that the CA may return the new SSL certificate, the intermediate cert, and the root cert in different files. If this is the case, the PEM formatted certificate will need to be created manually.

Notably, the correct ordering is important when creating the PEM-formatted certificate. The SSL cert must be the top of the file, followed by the intermediate certificate, with the root certificate at the bottom. For example:        


-----BEGIN CERTIFICATE-----

<Contents of new SSL certificate>

-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----

<Contents of intermediate certificate>

<Repeat as necessary for every intermediate certificate provided by your CA>

-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----

<Contents of root certificate file>

-----END CERTIFICATE-----


A simple method for creating the PEM formatted file from the CLI is to cat them in the correct order as follows:

# cat CA_signed.crt intermediate.crt root.crt > onefs_pem_formatted.crt

Copy the onefs_pem_formatted.crt file to /ifs/tmp and rename it to server.crt

Note that if any of the aforementioned files are generated with a .cer extension, they should be renamed with a .crt extension instead.

The attributes and integrity of the certificate can be sanity checked with the following CLI syntax:       

# openssl x509 -text -noout -in server.crt

         

Adding the certificate to the cluster    

Add certificate to cluster

The first step in adding the certificate involves importing the new certificate and key into the cluster:      

# isi certificate server import /ifs/tmp/server.crt /ifs/tmp/server.key

Next, verify that the certificate imported successfully:     

# isi certificate server list -v 

The following CLI command can be used to show the names and corresponding IDs of the certificates:

# isi certificate server list -v | grep -A1 "ID:"

Set the imported certificate as default:      

# isi certificate settings modify --default-https-certificate=<id_of_cert_to_set_as_default>

Confirm that the imported certificate is being used as default by verifying status of Default HTTPS Certificate:     

# isi certificate settings view

If there is an unused or outdated cert, it can be deleted with the following CLI syntax:      

# isi certificate server delete --id=<id_of_cert_to_delete>

Next, view the new imported cert with command:      

# isi certificate server view --id=<id_of_cert>

Note that ports 8081 and 8083 still use the certificate from the local directory for SSL. Follow the steps below if you want to use the new certificates for port 8081/8083:

# isi services -a isi_webui disable
# chmod 640 server.key
# chmod 640 server.crt
# isi_for_array -s 'cp /ifs/tmp/server.key /usr/local/apache2/conf/ssl.key/server.key'
# isi_for_array -s 'cp /ifs/tmp/server.crt /usr/local/apache2/conf/ssl.crt/server.crt'
isi services -a isi_webui enable


Verifying the SSL certificate

Verify SSL certificate

There are two methods for verifying the updated SSL certificate.:

  • Via the CLI, using the openssl command as follows:
# echo QUIT | openssl s_client -connect localhost:8080
  • Or via a web browser, using the following URL:

https://<cluster_name>:8080

Note that where <cluster_name> is the FQDN or IP address, that’s typically used to access the cluster’s WebUI interface. The security details for the web page will contain the location and contact info, as above.

In both cases, the output includes location and contact info. For example:      

Subject: C=US, ST=<yourstate>, L=<yourcity>, O=<yourcompany>, CN=isilon.example.com/emailAddress=tme@isilon.com

Additionally, OneFS provides warning of an impending certificate expiry by sending a CELOG event alert, similar to the following:


SW_CERTIFICATE_EXPIRING: X.509 certificate default is nearing expiration: 
 
Event: 400170001
Certificate 'default' in '**' store is nearing expiration:
 


Note that OneFS does not attempt to automatically renew a certificate. Instead, an expiring cert has to be renewed manually, per the procedure described above.

When adding an additional certificate, the matching cert is used any time you connect to that SmartConnect name via HTTPS. If no matching certificate is found, OneFS will automatically revert to using the default self-signed certificate.

 

Author: Nick Trimbee 


Read Full Blog
  • PowerScale
  • OneFS
  • SSL

OneFS SSL Certificate Renewal – Part 1

Nick Trimbee Nick Trimbee

Thu, 16 Nov 2023 04:57:00 -0000

|

Read Time: 0 minutes

When using either the OneFS WebUI or platform API (pAPI), all communication sessions are encrypted using SSL (Secure Sockets Layer), also known as Transport Layer Security (TLS). In this series, we will look at how to replace or renew the SSL certificate for the OneFS WebUI.

SSL requires a certificate that serves two principal functions: It grants permission to use encrypted communication using Public Key Infrastructure and authenticates the identity of the certificate’s holder.

Architecturally, SSL consists of four fundamental components:

SSL Component

Description

Alert

Reports issues.

Change cipher spec

Implements negotiated crypto parameters.

Handshake

Negotiates crypto parameters for SSL session. Can be used for many SSL/TCP connections.

Record

Provides encryption and MAC.

These sit in the stack as follows:

The basic handshake process begins with a client requesting an HTTPS WebUI session to the cluster. OneFS then returns the SSL certificate and public key. The client creates a session key, encrypted with the public key it is received from OneFS. At this point, the client only knows the session key. The client now sends its encrypted session key to the cluster, which decrypts it with the private key. Now, both the client and OneFS know the session key. So, finally, the session, encrypted using a symmetric session key, can be established. OneFS automatically defaults to the best supported version of SSL, based on the client request.

A PowerScale cluster initially contains a self-signed certificate, which can be used as-is or replaced with a third-party certificate authority (CA)-issued certificate. If the self-signed certificate is used upon expiry, it must be replaced with either a third-party (public or private) CA-issued certificate or another self-signed certificate that is generated on the cluster. The following are the default locations for the server.crt and server.key files.

File

Location

SSL certificate

/usr/local/apache2/conf/ssl.crt/server.crt

SSL certificate key

/usr/local/apache2/conf/ssl.key/server.key

The ‘isi certificate settings view’ CLI command displays all of the certificate-related configuration options. For example:

# isi certificate settings view

         Certificate Monitor Enabled: Yes

Certificate Pre Expiration Threshold: 4W2D

           Default HTTPS Certificate

                                      ID: default

                                 Subject: C=US, ST=Washington, L=Seattle, O="Isilon", OU=Isilon, CN=Dell, emailAddress=tme@isilon.com

                                  Status: valid

The above ‘certificate monitor enabled’ and ‘certificate pre expiration threshold’ configuration options govern a nightly cron job, which monitors the expiration of each managed certificate and fires a CELOG alert if a certificate is set to expire within the configured threshold. Note that the default expiration is 30 days (4W2D, which represents 4 weeks plus 2 days). The ‘ID: default’ configuration option indicates that this certificate is the default TLS certificate.

The basic certificate renewal or creation flow is as follows:

The steps below include options to complete a self-signed certificate replacement or renewal, or to request an SSL replacement or renewal from a Certificate Authority (CA).

Backing up the existing SSL certificate

The first task is to obtain the list of certificates by running the following CLI command, and identify the appropriate one to renew:

# isi certificate server list

ID      Name    Status  Expires

-------------------------------------------

eb0703b default valid   2025-10-11T10:45:52

-------------------------------------------

It’s always a prudent practice to save a backup of the original certificate and key. This can be easily accomplished using the following CLI commands, which, in this case, create the directory ‘/ifs/data/ssl_bkup’ directory, set the perms to root-only access, and copy the original key and certificate to it:

# mkdir -p /ifs/data/ssl_bkup

# chmod 700 /ifs/data/ssl_bkup

# cp /usr/local/apache24/conf/ssl.crt/server.crt /ifs/data/ssl_bkup

# cp /usr/local/apache24/conf/ssl.key/server.key /ifs/data/ssl_bkup

# cd !$

cd /ifs/data/ssl_bkup

# ls

server.crt      server.key

Renewing or creating a certificate

The next step in the process involves either the renewal of an existing certificate or creation of a certificate from scratch. In either case, first, create a temporary directory, for example /ifs/tmp:

# mkdir /ifs/tmp; cd /ifs/tmp

a)       Renew an existing self-signed Certificate.

The following syntax creates a renewal certificate based on the existing ssl.keyThe value of the ‘-days’ parameter can be adjusted to generate a certificate with the wanted expiration date. For example, the following command will create a one-year certificate.

# cp /usr/local/apache2/conf/ssl.key/server.key ./ ; openssl req -new -days 365 -nodes -x509 -key server.key -out server.crt

Answer the system prompts to complete the self-signed SSL certificate generation process, entering the pertinent information location and contact information. For example:

Country Name (2 letter code) [AU]:US
 State or Province Name (full name) [Some-State]:Washington
 Locality Name (eg, city) []:Seattle
 Organization Name (eg, company) [Internet Widgits Pty Ltd]:Isilon
 Organizational Unit Name (eg, section) []:TME
 Common Name (e.g. server FQDN or YOUR name) []:isilon.com
 Email Address []:tme@isilon.com

When all the information has been successfully entered, the server.csr and server.key files will be generated under the /ifs/tmp directory.

Optionally,  the attributes and integrity of the certificate can be verified with the following syntax:

# openssl x509 -text -noout -in server.crt

Next, proceed directly to the ‘Add the certificate to the cluster’ steps in section 4 of this article.

b)      Alternatively, a certificate and key can be generated from scratch, if preferred.

The following CLI command can be used to create an 2048-bit RSA private key:

# openssl genrsa -out server.key 2048

Generating RSA private key, 2048 bit long modulus

............+++++

 

...........................................................+++++

 

e is 65537 (0x10001)

Next, create a certificate signing request:

# openssl req -new -nodes -key server.key -out server.csr

For example: 

# openssl req -new -nodes -key server.key -out server.csr -reqexts SAN -config <(cat /etc/ssl/openssl.cnf <(printf "[SAN]\nsubjectAltName=DNS:isilon.com"))

You are about to be asked to enter information that will be incorporated

into your certificate request.

What you are about to enter is what is called a Distinguished Name or a DN.

There are quite a few fields but you can leave some blank

For some fields there will be a default value,

If you enter '.', the field will be left blank.

-----

Country Name (2 letter code) [AU]:US

State or Province Name (full name) [Some-State]:WA

Locality Name (eg, city) []:Seattle

Organization Name (eg, company) [Internet Widgits Pty Ltd]:Isilon

Organizational Unit Name (eg, section) []:TME

Common Name (e.g. server FQDN or YOUR name) []:h7001

Email Address []:tme@isilon.com

Please enter the following 'extra' attributes

to be sent with your certificate request

A challenge password []:1234

An optional company name []:

#

Answer the system prompts to complete the self-signed SSL certificate generation process, entering the pertinent information location and contact information. Additionally, a ‘challenge password’ with a minimum of 4-bytes in length will need to be selected and entered.

As prompted, enter the information to be incorporated into the certificate request. When completed, the server.csr and server.key files will appear in the /ifs/tmp directory.

If wanted, a CSR file for a Certificate Authority, which includes Subject-Alternative-Names (SAN) can be generated. For example, additional host name entries can be added using a comma (IE. DNS:isilon.com,DNS:www.isilon.com).

In the next article, we will look at the certificate singing, addition, and verification steps of the process.

 

Read Full Blog
  • OneFS

SMB Redirector Encryption

Nick Trimbee Nick Trimbee

Fri, 10 Nov 2023 19:37:15 -0000

|

Read Time: 0 minutes

As on-the-wire encryption becomes increasingly commonplace, and often mandated via regulatory compliance security requirements, the policies applied in enterprise networks are rapidly shifting towards fully encrypting all traffic.

The OneFS SMB protocol implementation (lwio) has supported encryption for Windows and other SMB client connections to a PowerScale cluster since OneFS 8.1.1.

 

However, prior to OneFS 9.5, this did not include encrypted communications between the SMB redirector and Active Directory (AD) domain controller (DC). While Microsoft added support for SMB encryption in SMB 3.0, the redirector in OneFS 9.4 and prior releases only supported Microsoft’s earlier SMB 2.002 dialect.

When OneFS connects to Active Directory for tasks requiring remote procedure calls (RPCs), such as joining a domain, NTLM authentication, or resolving usernames and SIDs, these SMB connections are established from OneFS as the client connecting to a domain controller server.

As outlined in the Windows SMB security documentation, by default, and starting with Windows 2012 R2, domain admins can choose to encrypt access to a file share, which can include a domain controller. When encryption is enabled, only SMB3 connections are permitted.

With OneFS 9.5, the OneFS SMB redirector now supports SMB3, thereby allowing the Local Security Authority Subsystem Service (LSASS) daemon to communicate with domain controllers running Windows Server 2012 R2 and later over an encrypted session.

The OneFS redirector, also known as the ‘rdr driver’, is a stripped-down SMB client with minimal functionality, only supporting what is absolutely necessary.

Under the hood, OneFS SMB encryption and decryption use standard OpenSSL functions, and AES-128-CCM encryption is negotiated during SMB negotiate phase.

Although everything stems from the NTLM authentication requested by SMB server, the sequence of calls leads to the redirector establishing an SMB connection to the AD domain controller.

With OneFS 9.5, no configuration is required to enable SMB encryption in most situations, and there are no WebUI OR CLI configuration settings for the redirector.

With the default OneFS configuration, the redirector supports encryption if negotiated but it does not require it. Similarly, if the Active Directory domain requires encryption, the OneFS redirector will automatically enable and use encryption. However, if the OneFS redirector is explicitly configured to require encryption and the domain controller does not support encryption, the connection will fail.

The OneFS redirector encryption settings include:

Key

Values

Description

Smb3EncryptionEnabled

Boolean. Default is ‘1’ == Enabled

Enable or disable SMB3 encryption for OneFS redirector.

Smb3EncryptionRequired

Boolean. Default is ‘0’ == Not required.

Require or do not require the redirector connection to be encrypted.

MaxSmb2DialectVersion

Default is ‘max’ == SMB 3.0.2

Set the SMB dialect, so the redirector will support it.  The maximum is currently SMB 3.0.2.

 

The above keys and values are stored in the OneFS Likewise SMB registry and can be viewed and configured with the ‘lwreqshell’ utility. For example, to view the SMB redirector encryption config settings:

# /usr/likewise/bin/lwregshell list_values "HKEY_THIS_MACHINE\Services\lwio\Parameters\Drivers\rdr" | grep -i encrypt

    "Smb3EncryptionEnabled"   REG_DWORD       0x00000001 (1)

    "Smb3EncryptionRequired" REG_DWORD       0x00000000 (0)

The following syntax can be used to disable the ‘Smb3EncryptionRequired’ parameter by setting it to value ‘1’:

# /usr/likewise/bin/lwregshell set_value "[HKEY_THIS_MACHINE\Services\lwio\Parameters\Drivers\rdr]" "Smb3EncryptionRequired" "0x00000001"

# /usr/likewise/bin/lwregshell list_values "HKEY_THIS_MACHINE\Services\lwio\Parameters\Drivers\rdr" | grep -i encrypt

    "Smb3EncryptionEnabled"   REG_DWORD       0x00000001 (1)

   "Smb3EncryptionRequired" REG_DWORD       0x00000001 (1)

Similarly, to restore the ‘Smb3EncryptionRequired’ parameter’s default value of ‘0’ (ie. not required):

# /usr/likewise/bin/lwregshell set_value "[HKEY_THIS_MACHINE\Services\lwio\Parameters\Drivers\rdr]" "Smb3EncryptionEnabled" "0x00000001"

Note that, during the upgrade to OneFS 9.5, any nodes still running the old version will not be able to NTLM-authenticate if the DC they have affinity with requires encryption.

While redirector encryption is implemented in user space (in contrast to the SMB server, which is in the kernel), since it involves OpenSSL, the library does take advantage of hardware acceleration on the processor and utilizes AES-NI. As such, performance is only minimally impacted when the number of NTLM authentications to the AD domain is very large.

Also note that redirector encryption also only currently supports only AES-128-CCM encryption provided in the SMB 3.0.0 and 3.0.2 dialects. OneFS does not use AES-128-GCM encryption, available in the SMB 3.1.1 dialect (the latest), at this time.

When it comes to troubleshooting the redirector, the lwregshell tool can be used to verify its configuration settings. For example, to view the redirector encryption settings:

# /usr/likewise/bin/lwregshell list_values "HKEY_THIS_MACHINE\Services\lwio\Parameters\Drivers\rdr" | grep -i encrypt

    "Smb3EncryptionEnabled"   REG_DWORD       0x00000001 (1)

    "Smb3EncryptionRequired" REG_DWORD       0x00000000 (0)

Similarly, to find the maximum SMB version supported by the redirector:

# /usr/likewise/bin/lwregshell list_values "HKEY_THIS_MACHINE\Services\lwio\Parameters\Drivers\rdr" | grep -i dialect

    "MaxSmb2DialectVersion"   REG_SZ          "max"

The ‘lwsm’ CLI utility with the following syntax will confirm the status of the various lsass components:

# /usr/likewise/bin/lwsm list | grep lsass

lsass                       [service]     running (lsass: 5164)

netlogon                    [service]     running (lsass: 5164)

rdr                         [driver]      running (lsass: 5164)

It can also be used to show and modify the logging level. For example:

# /usr/likewise/bin/lwsm get-log rdr

<default>: syslog LOG_CIFS at WARNING

# /usr/likewise/bin/lwsm set-log-level rdr - debug

# /usr/likewise/bin/lwsm get-log rdr

<default>: syslog LOG_CIFS at DEBUG

When finished, rdr logging can be returned to its previous log level as follows:

# /usr/likewise/bin/lwsm set-log-level rdr - warning

# /usr/likewise/bin/lwsm get-log rdr

<default>: syslog LOG_CIFS at WARNING

Additionally, the existing ‘lwio-tool’ utility has been modified in OneFS 9.5 to include functionality allowing simple test connections to domain controllers (no NTLM) via the new ‘rdr’ syntax:

# /usr/likewise/bin/lwio-tool rdr openpipe //<domain_controller>/NETLOGON

The ‘lwio-tool’ usage in OneFS 9.5 is as follows:

# /usr/likewise/bin/lwio-tool -h

Usage: lwio-tool <command> [command-args]

   commands:

    iotest rundown

    rdr [openpipe|openfile] username@password://domain/path

    srvtest transport [query|start|stop]

    testfileapi [create|createnp] <path>

 

Author: Nick Trimbee

 

Read Full Blog
  • AI
  • data analytics
  • PowerEdge
  • GPU
  • PowerScale
  • performance metrics
  • GenAI

AI and Model Development Performance

Darren Miller Darren Miller

Thu, 31 Aug 2023 20:47:58 -0000

|

Read Time: 0 minutes

There has been a tremendous surge of information about artificial intelligence (AI), and generative AI (GenAI) has taken center stage as a key use case. Companies are looking to learn more about how to build architectures to successfully run AI infrastructures. In most cases, creating a GenAI solution involves fine-tuning a pretrained foundational model and deploying it as an inference service. Dell recently published a design guide – Generative AI in the Enterprise – Inferencing, that provides an outline of the overall process.

All AI projects should start with understanding the business objectives and key performance indicators. Planning, data prep, and training make up the other phases of the cycle. At the core of the development are the systems that drive these phases – servers, GPUs, storage, and networking infrastructures. Dell is well equipped to deliver everything an enterprise needs to build, develop, and maintain analytic models that serve business needs.

GPUs and accelerators have become common practice within AI infrastructures. They pull in data and training/fine-tune models within the computational capabilities of the GPU. As GPUs have evolved, their ability to handle larger models and parallel development cycles has evolved. This has left a lot of us wondering - how do we build an architecture that will support the model development that my business needs? It helps to understand a few parameters.

Defining business objectives and use cases will help shape your architecture requirements.

  • The size and location of the training data set
  • Model size in number of parameters and type of model being trained/fine-tuned
  • Training parallelism and time to complete the training/fine-tuning.

Answering these questions helps determine how many GPUs are needed to train/fine-tune the model. Consider two main factors in GPU sizing. First is the amount of GPU memory needed to store model parameters and optimizer state. Second is the number of floating-point operations (FLOPs) needed to execute the model. Both generally scale with model size. Large models often exceed the resources of a single GPU and require spreading a single model over multiple GPUs.

Estimating the number of GPUs needed to train/fine-tune the model helps determine the server technologies to choose. When sizing servers, it’s important to balance the right GPU density and interconnect, power consumption, PCI bus technology, external port capacity, memory, and CPU. Dell PowerEdge servers include a variety of options for GPU types and density. PowerEdge XE Servers can host up to 8 NVIDIA H100 GPUs in a single server GenAI on PowerEdge XE9680, as well as the latest technologies, including NVLink, NVIDIA GPUDirect, PCIe 5.0, and NVMe disks. PowerEdge mainstream servers range from two to four GPU configurations, offering a variety of GPUs from different manufacturers. PowerEdge servers provide outstanding performance for all phases of model development. Visit Dell.com for more on PowerEdge Servers.  

Now that we understand how many GPUs are needed and the servers to host them, it’s time to tackle storage. At a minimum, the storage should have capacity to host the training data set, the checkpoints during the model training, and any other data that relates to the pruning/preparing phase. The storage also needs to deliver the data at a rate the GPUs request it. The rate of delivery is multiplied by model parallelism, or the number of models being trained in parallel, and subsequently the number of GPUs requesting the data simultaneously (concurrently). Ideally, every GPU is running at 90% or better to maximize our investment, and a storage system that supports high concurrency is suited for these types of workloads.

Tools such as FIO or its cousin GDSIO (used to understand speeds and feeds of the storage system) are great for gaining hero numbers or theoretical maximums for reads/writes, but they are not representative of performance requirements for the AI development cycles. Data prep and stage shows up on the storage as random R/W, while during the training/fine-tuning phase, the GPUs are concurrently streaming reads from the storage system. Checkpoints throughout training are handled as writes back to the storage. These different points during the AI lifecycle require storage that can successfully handle these workloads at the scale determined by our model calculations and parallel development cycles.

Data scientists at Dell take great effort in understanding how different model development affects server and storage requirements. For example, language models like BERT and GPT have little effect on storage performance and resources, whereas image sequencing and DLRM models have significant or show worst case storage performance and resource demand. For this, the Dell storage teams focus testing and benchmarking on AI Deep Learning workflows based on popular image models like ResNet with real GPUs to understand the performance requirements needed to deliver data to the GPU during model training. The following image shows an architecture designed with Dell PowerEdge servers and networking with PowerScale scale-out storage.

Dell PowerScale scale-out file storage is especially suited for these workloads.  Each node in a PowerScale cluster delivers equivalent performance as the cluster and workloads scale. The following images show how PowerScale performance scales linearly as GPUs are increased, while the performance of each individual GPU remains constant. The scale-out architecture of PowerScale file storage easily supports AI workflows from small to large.

Figure 1.  PowerScale linear performance 

Figure 2.  Consistent GPU performance with scale

The predictability of PowerScale allows us to estimate the storage resources needed for model training and fine-tuning. We can easily scale these architectures based on the model type and size along with the number and type of GPUs required.

Architecting for small and large AI workloads is challenging and takes planning. Understanding performance needs and how the components in the architecture will perform as the AI workload demand scales is critical.

Author: Darren Miller

Read Full Blog
  • PowerScale
  • Media and Entertainment
  • NAS

Dell PowerScale and Marvel Partner to Create Optimal Media Workflows

Brian Cipponeri Brian Cipponeri

Tue, 01 Aug 2023 17:03:47 -0000

|

Read Time: 0 minutes

Now in its 9th generation, Emmy-award-winning Dell PowerScale storage has been field proven in media workflows for over two decades and is the world’s most flexible1, efficient2, and secure3 scale-out NAS solution.  

Our partnership with Marvel Studios is a wonderful example of the innovations we collaborate on with leading media and entertainment companies around the world—with PowerScale as the preeminent storage solution that enables data-driven workflows to accelerate content-creation pipelines. 

Hear about Marvel Studios’ implementation of PowerScale directly in this educational video series from The Advanced Imaging Society: 

 

The PowerScale OneFS advantage

The underlying OneFS file system leverages the foundations of clustered high-performance computing to solve the challenges of data protection at scale and client accessibility in a massively parallelized way. In practice, a single namespace that easily scales out with nodes to increase performance and capacity is a fundamentally game-changing architecture. 

Media workflows require increased levels of access for the applications and users to provide for workflow collaboration in balance with security that doesn’t impede performance. Further, performance and access can’t be impeded even during hardware failures such as drive rebuilds or system upgrades to ensure that production work can continue uninterrupted while maintenance is being performed in the background.  

Maximizing uptime correlates with fundamental business needs including meeting project timelines and budgets while ensuring that personnel have access to the content at the required performance levels, even during a background maintenance activity. 

As a sufficiently advanced enterprise-class solution, PowerScale incorporates these capabilities to eliminate complexity and provide for increased uptime through its self-healing and self-managing functionality. (For more information, see the PowerScale OneFS Technical Overview.) This takes many of the traditional storage management burdens off the administrator’s plate, lowering the overhead and time needed to maintain storage, which is often increasing in size and scale.

While the benefits of collaboration over Ethernet-based storage are inherent in PowerScale, the user experience is also paramount in correlation with the performance of the network and underlying storage system. Operations such as playout and scrubbing need to perform reliably with no frame drops while providing response times that look and feel equivalent to working from the local workstation. 

As I’ve participated in the development of media storage solutions from SCSI, Fibre Channel, iSCSI, SAN, and Ethernet, I’ve been able to test and compare solutions over the years and have closely watched the evolution and trends of these protocols in relation to their ability to support media workflows. 

In 2007, I demonstrated real-time color grading with uncompressed content over 1 Gb Ethernet, the first of its kind. At the time, using Ethernet-based storage for color grading was largely unheard of, with few applications supporting it. That was more of an exercise to showcase the art of the possible in comparison with Fibre Channel-based solutions. The wider adoption of Ethernet for this particular use case was not yet of high interest because Ethernet speeds still needed to evolve. However, 1 Gb Ethernet was very appropriate for compressed media workflows and rendering, which were well aligned with the high-performance, scale-out design of PowerScale. 

As 10 Gb Ethernet speeds became prevalent, there was a significant uptick in the adoption of Ethernet-based storage compared to Fibre Channel-based solutions for media use cases. I also started to see more datasets being moved over Ethernet rather than by sneaker net, physically delivering drives and tapes between locations. This led to cost and time savings for project timelines and budgets, among other benefits. 

Fast-forward to 2014, when, with the OneFS 7.1.1 version supporting SMB multi-channel, we were able to use two 10 Gb Ethernet connections to support a stream of full resolution uncompressed 4K, whereas a single 10 Gb connection was only capable of supporting 2K full resolution streams. This began an adoption trend of Ethernet solutions for 4K full-resolution workflows. 

In 2017, with the release of the F800 All-Flash PowerScale and OneFS 8.1, 40 Gb Ethernet speeds were supported. The floodgates were unlocked for media workflows. Multiple full-resolution 2K and 4K workflows could run on a single shared OneFS namespace with uncompromised performance. Workload consolidation could be performed and started to eliminate the need for multiple discreet storage solutions that were each supporting different parts of the pipeline, bringing all those together under a single unified OneFS namespace to streamline environments.  

Complete pipeline transformations were taking place and began to replace iSCSI and Fibre Channel-based solutions at an accelerated pace, as those solutions were siloed within workgroups and inflexible with the emerging needs of collaboration. When the PowerScale F900 NVMe solution supporting 100 Gb Ethernet came out in 2021, the technology was set to change the industry yet again.  

With the increasing prevalence of 100 Gb Ethernet over these past few years, performance parity with Fibre Channel-based solutions to support full-resolution 4K, 8K, and all related media workflows in between is no longer in question. Native Ethernet-based solutions are preferred for many reasons—including cloud capability, scale, cost, and supportability—to facilitate unstructured media datasets, leveraging the abundance of network engineering talent in comparison to Fibre Channel-trained engineers.  

With reliability, performance, and shared access for collaboration delivering uncompromised benefits, we now look to several PowerScale storage capabilities that enable rich media ecosystems to be further streamlined and flourish. 

There are four additional key areas of focus and their underlying feature sets that are increasingly important to today’s media ecosystems. They encompass: 

  • Security, physical and logical 
  • API orchestration 
  • Data movement 
  • Quality of service 

Security

  • In relation to PowerScale being the world’s most secure scale-out NAS solution and in alignment with the Trusted Partner Network (TPN), the OneFS operating system meets or exceeds the Motion Picture Association (MPA) Content Security Best Practices (CSBP) in all relevant areas of the Content Security Model (CSM).4

  • Further, I’m seeing greater adoption of self-encrypting drives (SEDs), which provide encryption at the physical layer. Security auditing and multi-factor authentication are among the features being employed to protect the logical layer. Specifically, auditing has been available and used for many years now to provide a range of benefits beyond security.  

  • The real-time audit logs can be parsed to provide performance introspection, analysis, and user-trend insights in addition to identifying abnormal data-access patterns. The logs can also be used to correlate with levels of access to specific projects and files, correlating back to business-level insights and reports as well. 

  • I’m also keen on mentioning the OneFS embedded firewall, which provides connection-level protection at the storage network layer. Firewalls are typically employed in front of the storage or further upstream on the network, so having an additional firewall within the storage that protects the network ports on the storage itself is a powerful layer of security. 

For more information about OneFS security, see Dell PowerScale OneFS: Security Considerations.

Data orchestration

  • Data orchestration is paramount to workflow automation. If aspects of workflows can be automated and don’t require an operator to make a decision, they should be automated to remove the possibility for operator error and streamline the environment to accelerate workflows where possible. 

  • Orchestration is enabled through API calls to integrate PowerScale with the environment’s application layers, which can take mundane repeatable tasks off the operator’s plate and increase workflow efficiency. 

For more information about PowerScale data orchestration, see the OneFS documentation on Dell Support.

Data movement

  • Data movement is integral to media workflows today, and for that we look to the high-performance, highly reliable SyncIQ protocol embedded in PowerScale OneFS. SyncIQ facilitates secure, parallelized transfer of datasets between PowerScale solutions, providing strong benefits for media workflows. Replication policies can be set up or initiated ad hoc to transfer datasets between PowerScale solutions over the network.  

  • With SyncIQ, PowerScale is both a storage platform and a data transfer engine, so additional servers and transfer applications, which would incur additional cost and management overhead, don’t need to be implemented in front of PowerScale.

The PowerScale Backup and Recovery Guide provides more information about data movement capabilities in OneFS.

Quality of service

  • Quality of service is increasingly important and has always been of interest for media workflows. SmartQoS is an embedded OneFS feature that monitors front-end protocol traffic over NFS, SMB, and S3. It allows limits to be set for the number of protocol operations to tie them back to performance SLAs, prioritization of workloads, and support for throttling to prevent specific clients from saturating a connection. I’ve seen an unthrottled copy job use the available connection bandwidth and interrupt a playout, so that’s an example of a use case where SmartQoS can be applied. 

  • Clients can be logically grouped and monitored with all kinds of metrics being captured to quantify and profile workloads. Introspection into read and write latencies, IOPS, and many other metrics on a per-protocol, path, IP address, user, and group basis can be captured and correlated in real time. Metrics can be tracked and enforced to provide quality of service for specific classes of users and workflows, which can all be defined to manage workloads. 

For more information about quality of service in OneFS, see this blog post: OneFS SmartQoS.

Summary

The capabilities of PowerScale storage with OneFS are delivering unparalleled scale and feature benefits that elevate the capabilities of media entertainment use cases from the highest performance workflows to highly dense archives. Standardization on this enterprise-class, secure, and collaborative platform is the key to unlocking innovation and advancing your media pipelines. 


Based on internal analysis of publicly available information sources, February 2023. CLM-0013892.

2 Based on Dell analysis comparing efficiency-related features: data reduction, storage capacity, data protection, hardware, space, lifecycle management efficiency, and ENERGY STAR certified configurations, June 2023. CLM-008608.

Based on Dell analysis comparing cyber-security software capabilities offered for Dell PowerScale vs. competitive products, September 2022.

4 Dell Technologies Executive Summary of Compliance with Media Industry Security Guidelines, https://www.delltechnologies.com/asset/en-ae/products/storage/briefs-summaries/tpn-executive-summary-compliance-statement.pdf.


Author: Brian Cipponeri, Global Solutions Architect 
Dell Technologies – Unstructured Data Solutions 

Read Full Blog
  • Isilon
  • PowerScale
  • OneFS
  • troubleshooting
  • statistics

Diary of a VFX Systems Engineer—Part 1: isi Statistics

Andy Copeland Andy Copeland

Thu, 17 Aug 2023 20:57:36 -0000

|

Read Time: 0 minutes

Welcome to the first in a series of blog posts to reveal some helpful tips and tricks when supporting media production workflows on PowerScale OneFS.

OneFS has an incredible user-drivable toolset underneath the hood that can grant you access to data so valuable to your workflow that you'll wonder how you ever lived without it.

When working on productions in the past I’ve witnessed and had to troubleshoot many issues that arise in different parts of the pipeline. Often these are in the render part of the pipeline, which is what I’m going to focus on in this blog.

Render pipelines are normally fairly straightforward in their make-up, but they require everything to be just right to ensure that you don’t starve a cluster of resource, which, if your cluster is at the center of all of your production operations can cause a whole studio outage, causing impact to your creatives, revenue loss, and unnecessary delays in production.

Did you know that any command that is run on a OneFS cluster is an API call down to the OneFS API. This can be observed if you add the --debug flag to any command that you run on the CLI. As shown here, this displays the call information that was sent to gather the information requested, which is helpful if you're integrating your own administration tools into your pipeline.

# isi --debug statistics client list
        2023-06-22 10:24:41,086 DEBUG rest.py:80: >>>GET ['3', 'statistics', 'summary', 'client']
        2023-06-22 10:24:41,086 DEBUG rest.py:81:    args={'sort': 'operation_rate,in,out,time_avg,node,protocol,class,user.name,local_name,remote_name', 'degraded': 'False', 'timeout': '15'}
        body={}
        2023-06-22 10:24:41,212 DEBUG rest.py:106: <<<(200, {'content-type': 'application/json', 'allow': 'GET, HEAD', 'status': '200 Ok'}, b'n{\n"client" : [  ]\n}\n')

There are so many potential applications for OneFS API calls, from monitoring statistics on the cluster to using your own tools for creating shares, and so on. (We'll go deeper into the API in a future post!)

When we are facing production-stopping activities on a cluster, they're often caused by a rogue process outside the OneFS environment that is as yet unknown to us, which means we have to figure out what that process is and what it is doing.

In walks isi statistics.

By using the isi statistics command, we can very quickly see what is happening on a cluster at any given time. It can give us live reports on which user or connection is causing an issue, how much I/O they're generating as well as what their IP is, what protocol they’re connected using, and so on.

If the cluster is experiencing a sudden slowdown (during a render, for example), we can run a couple of simple statistics commands to show us what the cluster is doing and who's hitting it the hardest. Some examples of these commands are as follows:

isi statistics system --n=all --format=top

Displays all nodes’ real-time statistics in a *NIX “top” style format:

# isi statistics system --n=all --format=top
Node   CPU SMB FTP HTTP NFS HDFS  S3 Total NetIn NetOut DiskIn DiskOut
 All 33.7% 0.0 0.0  0.0 0.0   0.0 0.0   0.0 401.6  215.6     0.0     0.0
   1 33.7% 0.0 0.0  0.0 0.0   0.0 0.0   0.0 401.6  215.6     0.0     0.0

isi statistics client list --totalby=UserName --sort=Ops

This command displays all clients connected and shows their stats, including the UserName they are connected with. It places the users with the highest number of total Ops at the top so that you can track down the user or account that is hitting the storage the hardest.

# isi statistics client --totalby=UserName --sort=Ops
 Ops     In  Out  TimeAvg   Node  Proto  Class   UserName  LocalName  RemoteName
-----------------------------------------------------------------------------
12.8 12.6M 1.1k  95495.8     *       *      *      root          *           *
-----------------------------------------------------------------------------

isi statistics client --UserName=<username> --sort=Ops

This command goes a bit further and breaks down ALL of the Ops by type being requested by that user. If you know the protocol that the user you’re investigating is using we can also add the operator “--proto=<nfs/smb>” to the command too.

# isi statistics client --user-names=root --sort=Ops
 Ops     In   Out  TimeAvg   Node  Proto           Class  UserName        LocalName    RemoteName
----------------------------------------------------------------------------------------------
 5.8   6.1M 487.2 142450.6     1   smb2           write      root 192.168.134.101 192.168.134.1
 2.8 259.2 332.8    497.2      1   smb2      file_state      root 192.168.134.101 192.168.134.1
 2.6 985.6 549.8  10255.1      1   smb2          create      root 192.168.134.101 192.168.134.1
 2.6 275.0 570.6   3357.5      1   smb2  namespace_read      root 192.168.134.101 192.168.134.1
 0.4   85.6  28.0   3911.5      1   smb2 namespace_write      root 192.168.134.101 192.168.134.1
----------------------------------------------------------------------------------------------

The other useful command, particularly when troubleshooting ad hoc performance issues, is isi statistics heat.

isi statistics heat list --totalby=path --sort=Ops | head -12

This command shows the top 10 file paths that are being hit by the largest number of I/O operations.

# isi statistics heat list --totalby=path --sort=Ops | head -12
  Ops   Node  Event  Class Path
----------------------------------------------------------------------------------------------------
141.7     *       *      * /ifs/
127.8     *       *      * /ifs/.ifsvar
 86.3      *      *      * /ifs/.ifsvar/modules
 81.7      *      *      * SYSTEM (0x0)
 33.3      *      *      * /ifs/.ifsvar/modules/tardis
 28.6      *      *      * /ifs/.ifsvar/modules/tardis/gconfig
 28.3      *      *      * /ifs/.ifsvar/upgrade
 13.1      *      *      * /ifs/.ifsvar/upgrade/logs/UpgradeLog-1.db
 11.9      *      *      * /ifs/.ifsvar/modules/tardis/namespaces/healthcheck_schedules.sqlite
 10.5      *      *      * /ifs/.ifsvar/modules/cloud

Once you have all this information, you can now find the user or process (based on IP, UserName, and so on) and figure out what that user is doing and what's causing the render to fail or high I/O generation. In many situations, it will be an asset that is either sitting on a lower-performance tier of the cluster or, if you're using a front side render cache, an asset that is sitting outside of the pre-cached path, so the spindles in the cluster are taking the I/O hit.

For more tips and tricks that can help to save you valuable time, keep checking back. In the meantime, if you have any questions, please feel free to get in touch and I'll do my best to help!

Author: Andy Copeland
Media & Entertainment Solutions Architect

Read Full Blog
  • security
  • PowerScale
  • OneFS

OneFS Key Manager Rekey Support

Nick Trimbee Nick Trimbee

Mon, 24 Jul 2023 19:16:34 -0000

|

Read Time: 0 minutes

The OneFS key manager is a backend service that orchestrates the storage of sensitive information for PowerScale clusters. To satisfy Dell’s Secure Infrastructure Ready requirements and other public and private sector security mandates, the manager provides the ability to replace, or rekey, cryptographic keys.

The quintessential consumer of OneFS key management is data-at-rest encryption (DARE). Protecting sensitive data stored on the cluster with cryptography ensures that it’s guarded against theft, in the event that drives or nodes are removed from a PowerScale cluster. DARE is a requirement for federal and industry regulations, ensuring data is encrypted when it is stored. OneFS has provided DARE solutions for many years through secure encrypted drives (SEDs) and the OneFS key management system.

A 256-bit key (MK) encrypts the Key Manager Database (KMDB) for SED and cluster domains. In OneFS 9.2 and later, the MK for SEDs can either be stored off-cluster on a KMIP server or locally on a node (the legacy behavior).

However, there are a variety of other consumers of the OneFS key manager, in addition to DARE. These include services and protocols such as:

ServiceDescription

CELOG

Cluster event log

CloudPools

Cluster tier to cloud service

Email

Electronic mail

FTP

File transfer protocol

IPMI

Intelligent platform management interface for remote cluster console access

JWT

JSON web tokens

NDMP

Network data management protocol for cluster backups and DR

Pstore

Active directory and Kerberos password store

S3

S3 object protocol

SyncIQ

Cluster replication service

SmartSync

OneFS push and pull replication cluster and cloud replication service

SNMP

Simple network monitoring protocol

SRS

Old Dell support remote cluster connectivity

SSO

Single sign-on

SupportAssist

Remote cluster connectivity to Dell Support

 OneFS 9.5 introduces a number of enhancements to the venerable key manager, including:

  • The ability to rekey keystores. Rekey operation will generate a new MK and re-encrypt all entries stored with the new key.
  • New CLI commands and WebUI options to perform a rekey operation or schedule key rotation on a time interval.
  • New commands to monitor the progress and status of a rekey operation.

As such, OneFS 9.5 now provides the ability to rekey the MK, irrespective of where it is stored.

Note that when you are upgrading from an earlier OneFS release, the new rekey functionality is only available once the OneFS 9.5 upgrade has been committed.

Under the hood, each provider store in the key manager consists of secure backend storage and an MK. Entries are kept in a SQLite database or key-value store. A provider datastore uses its MK to encrypt all its entries within the store.

During the rekey process, the old MK is only deleted after a successful re-encryption with the new MK. If for any reason the process fails, the old MK is available and remains as the current MK. The rekey daemon retries the rekey every 15 minutes if the process fails.

The OneFS rekey process is as follows:

  1. A new MK is generated, and internal configuration is updated.
  2. Any entries in the provider store are decrypted and encrypted with the new MK.
  3. If the prior steps are successful, the previous MK is deleted.

To support the rekey process, the MK in OneFS 9.5 now has an ID associated with it. All entries have a new field referencing the MK ID.

During the rekey operation, there are two MK values with different IDs, and all entries in the database will associate which key they are encrypted by.

In OneFS 9.5, the rekey configuration and management is split between the cluster keys and the SED keys:

Rekey componentDetail

SED

  • SED provider keystore is stored locally on each node.
  • SED provider domain already had existing CLI commands for handling KMIP settings in prior releases.

Cluster

  • Controls all cluster-wide keystore domains.
  • Status shows information of all cluster provider domains.

SED keys rekey

The SED key manager rekey operation can be managed through a DARE cluster’s CLI or WebUI, and it can either be automatically scheduled or run manually on demand. The following CLI syntax can be used to manually initiate a rekey:

# isi keymanager sed rekey start

Alternatively, to schedule a rekey operation, for example, to schedule a key rotation every two months:

# isi keymanager sed rekey modify --key-rotation=2m

The key manager status for SEDs can be viewed as follows:

# isi keymanager sed status
 Node Status  Location   Remote Key ID  Key Creation Date   Error Info(if any)
-----------------------------------------------------------------------------
1   LOCAL   Local                    1970-01-01T00:00:00
-----------------------------------------------------------------------------
Total: 1

Alternatively, from the WebUI, go to Access > Key Management >  SED/Cluster Rekey, select Automatic rekey for SED keys, and configure the rekey frequency:

Note that for SED rekey operations, if a migration from local cluster key management to a KMIP server is in progress, the rekey process will begin once the migration is complete.

Cluster keys rekey

As mentioned previously, OneFS 9.5 also supports the rekey of cluster keystore domains. This cluster rekey operation is available through the CLI and the WebUI and may either be scheduled or run on demand. The available cluster domains can be queried by running the following CLI syntax:

# isi keymanager cluster status
Domain     Status  Key Creation Date   Error Info(if any)
----------------------------------------------------------
CELOG      ACTIVE  2023-04-06T09:19:16
CERTSTORE  ACTIVE  2023-04-06T09:19:16
CLOUDPOOLS ACTIVE   2023-04-06T09:19:16
EMAIL      ACTIVE  2023-04-06T09:19:16
FTP        ACTIVE  2023-04-06T09:19:16
IPMI_MGMT  IN_PROGRESS  2023-04-06T09:19:16
JWT        ACTIVE  2023-04-06T09:19:16
LHOTSE     ACTIVE  2023-04-06T09:19:11
NDMP       ACTIVE  2023-04-06T09:19:16
NETWORK    ACTIVE  2023-04-06T09:19:16
PSTORE     ACTIVE  2023-04-06T09:19:16
RICE       ACTIVE  2023-04-06T09:19:16
S3         ACTIVE  2023-04-06T09:19:16
SIQ        ACTIVE  2023-04-06T09:19:16
SNMP       ACTIVE  2023-04-06T09:19:16
SRS        ACTIVE  2023-04-06T09:19:16
SSO        ACTIVE  2023-04-06T09:19:16
----------------------------------------------------------
Total: 17

The rekey process generates a new key and re-encrypts the entries for the domain. The old key is then deleted.

Performance-wise, the rekey process does consume cluster resources (CPU and disk) as a result of the re-encryption phase, which is fairly write-intensive. As such, a good practice is to perform rekey operations outside of core business hours or during scheduled cluster maintenance windows.

During the rekey process, the old MK is only deleted once a successful re-encryption with the new MK has been confirmed. In the event of a rekey process failure, the old MK is available and remains as the current MK.

A rekey may be requested immediately or may be scheduled with a cadence. The rekey operation is available through the CLI and the WebUI. In the WebUI, go to Access > Key Management > SED/Cluster Rekey.

To start a rekey of the cluster domains immediately, from the CLI run the following syntax:

# isi keymanager cluster rekey start 
Are you sure you want to rekey the master passphrase? (yes/[no]):yes

Alternatively, from the WebUI, go to Access under the SED/Cluster Rekey tab, and click Rekey Now next to Cluster keys:

A scheduled rekey of the cluster keys (excluding the SED keys) can be configured from the CLI with the following syntax:

# isi keymanager cluster rekey modify –-key-rotation [YMWDhms]

Specify the frequency of the Key Rotation field as an integer, using Y for years, M for months, W for weeks, D for days, h for hours, m for minutes, and s for seconds. For example, the following command will schedule the cluster rekey operation to run every six weeks:

# isi keymanager cluster rekey view
 Rekey Time: 1970-01-01T00:00:00
 Key Rotation: Never
 # isi keymanager cluster rekey modify --key-rotation 6W
 # isi keymanager cluster rekey view
 Rekey Time: 2023-04-28T18:38:45
 Key Rotation: 6W

The rekey configuration can be easily reverted back to on demand from a schedule as follows:

# isi keymanager cluster rekey modify --key-rotation Never
 # isi keymanager cluster rekey view
 Rekey Time: 2023-04-28T18:38:45
 Key Rotation: Never

Alternatively, from the WebUI, under the SED/Cluster Rekey tab, select the Automatic rekey for Cluster keys checkbox and specify the rekey frequency. For example:

In an event of a rekeying failure, a CELOG KeyManagerRekeyFailed or KeyManagerSedsRekeyFailed event is created. Since SED rekey is a node-local operation, the KeyManagerSedsRekeyFailed event information will also include which node experienced the failure.

Additionally, current cluster rekey status can also be queried with the following CLI command:

# isi keymanager cluster status
Domain     Status  Key Creation Date   Error Info(if any)
----------------------------------------------------------
CELOG      ACTIVE  2023-04-06T09:19:16
CERTSTORE  ACTIVE  2023-04-06T09:19:16
CLOUDPOOLS ACTIVE   2023-04-06T09:19:16
EMAIL      ACTIVE  2023-04-06T09:19:16
FTP        ACTIVE  2023-04-06T09:19:16
IPMI_MGMT  ACTIVE  2023-04-06T09:19:16
JWT        ACTIVE  2023-04-06T09:19:16
LHOTSE     ACTIVE  2023-04-06T09:19:11
NDMP       ACTIVE  2023-04-06T09:19:16
NETWORK    ACTIVE  2023-04-06T09:19:16
PSTORE     ACTIVE  2023-04-06T09:19:16
RICE       ACTIVE  2023-04-06T09:19:16
S3         ACTIVE  2023-04-06T09:19:16
SIQ        ACTIVE  2023-04-06T09:19:16
SNMP       ACTIVE  2023-04-06T09:19:16
SRS        ACTIVE  2023-04-06T09:19:16
SSO        ACTIVE  2023-04-06T09:19:16
----------------------------------------------------------
Total: 17

Or, for SEDs rekey status:

# isi keymanager sed status
 Node Status  Location   Remote Key ID  Key Creation Date   Error Info(if any)
-----------------------------------------------------------------------------
1   LOCAL   Local                    1970-01-01T00:00:00
2   LOCAL   Local                    1970-01-01T00:00:00
3   LOCAL   Local                    1970-01-01T00:00:00
4   LOCAL   Local                    1970-01-01T00:00:00
-----------------------------------------------------------------------------
Total: 4

The rekey process also outputs to the /var/log/isi_km_d.log file, which is a useful source for additional troubleshooting.

If an error in rekey occurs, the previous MK is not deleted, so entries in the provider store can still be created and read as normal. The key manager daemon will retry the rekey operation in the background every 15 minutes until it succeeds.

Author: Nick Trimbee

Read Full Blog
  • security
  • PowerScale
  • OneFS

OneFS Password Security Policy

Nick Trimbee Nick Trimbee

Mon, 24 Jul 2023 20:08:49 -0000

|

Read Time: 0 minutes

Among the slew of security enhancements introduced in OneFS 9.5 is the ability to mandate a more stringent password policy. This is required to comply with security requirements such as the U.S. military STIG, which stipulates:

RequirementDescription

Length

An OS or network device must enforce a minimum 15-character password length.

Percentage

An OS must require the change of at least 50% of the total number of characters when passwords are changed.

Position

A network device must require that when a password is changed, the characters are changed in at least eight of the positions within the password.

Temporary password

The OS must allow the use of a temporary password for system logons with an immediate change to a permanent password.

The OneFS password security architecture can be summarized as follows:

Within the OneFS security subsystem, authentication is handled in OneFS by LSASSD, the daemon used to service authentication requests for lwiod.

ComponentDescription

LSASSD

The local security authority subsystem service (LSASS) handles authentication and identity management as users connect to the cluster.

File provider

The file provider includes users from /etc/password and groups from /etc/groups.

Local provider

The local provider includes local cluster accounts such as anonymous, guest, and so on.

SSHD

The OpenSSH Daemon provides secure encrypted communications between a client and a cluster node over an insecure network.

pAPI

The OneFS Platform API provides programmatic interfaces to OneFS configuration and management through a RESTful HTTPS service.

In OneFS AIMA, there are several different kinds of backend providers: Local provider, file provider, AD provider, NIS provider, and so on. Each provider is responsible for the management of users and groups inside the provider. For OneFS password policy enforcement, the local and file providers are the focus.

The local provider is based on an SamDB style file stored with prefix path of /ifs/.ifsvar, and its provider settings can be viewed by the following CLI syntax: 

# isi auth local view System 

On the other hand, the file provider is based on the FreeBSD spwd.db file, and its configuration can be viewed by the following CLI command: 

# isi auth file view System

Each provider stores and manage its own users. For the local provider, isi auth users create CLI command will create a user inside the provider by default. However, for the file provider, there is no corresponding command. Instead, the OneFS pw CLI command can be used to create a new file provider user.

After the user is created, the isi auth users modify <USER> CLI command can be used to change the attributes of the user for both the file and local providers. However, not all attributes are supported for both providers. For example, the file provider does not support password expiry.

The fundamental password policy CLI changes introduced in OneFS 9.5 are as follows:

OperationOneFS 9.5 changeDetails

change-password

Modified

Needed to provide old password for changing so that we can calculate how many chars/percent changed

reset-password

Added

Generates a temp password that meets current password policy for user to log in

set-password

Deprecated

Doesn't need to provide old password

A user’s password can now be set, changed, and reset by either root or admin. This is supported by the new isi auth users change-password or isi auth users reset-password CLI command syntax. The latter, for example, returns a temporary password and requires the user to change it on next login. After logging in with the temporary (albeit secure) password, OneFS immediately forces the user to change it:

# whoami
admin
# isi auth users reset-password user1
4$_x\d\Q6V9E:sH
# ssh user1@localhost
(user1@localhost) Password:
(user1@localhost) Your password has expired.
You are required to immediately change your password.
Changing password for user1
New password:
(user1@localhost) Re-enter password:
Last login: Wed May 17 08:02:47 from 127.0.0.1
PowerScale OneFS 9.5.0.0
# whoami
user1

Also in OneFS 9.5 and later, the CLI isi auth local view system command sees the addition of four new fields:

  • Password Chars Changed
  • Password Percent Changed
  • Password Hash Type
  • Max Inactivity Days

For example:

# isi auth local view system
                    Name: System
                  Status: active
          Authentication: Yes
    Create Home Directory: Yes
 Home Directory Template: /ifs/home/%U
        Lockout Duration: Now
       Lockout Threshold: 0
          Lockout Window: Now
             Login Shell: /bin/zsh
            Machine Name:
        Min Password Age: Now
        Max Password Age: 4W
      Min Password Length: 0
     Password Prompt Time: 2W
      Password Complexity: -
 Password History Length: 0
   Password Chars Changed: 0
Password Percent Changed: 0
      Password Hash Type: NTHash
      Max Inactivity Days: 0

The following CLI command syntax configures OneFS to require a minimum password length of 15 characters, a 50% or greater change, and 8 or more characters to be altered for a successful password reset:

# isi auth local modify system --min-password-length 15 --password-chars-changed 8 --password-percent-changed 50

Next, a command is issued to create a new user, user2, with a 10-character password:

# isi auth users create user2 --password 0123456789
Failed to add user user1: The specified password does not meet the configured password complexity or history requirements

This attempt fails because the password does not meet the configured password criteria (15 chars, 50% change, 8 chars to be altered).

Instead, the password for the new account, user2, is set to an appropriate value: 0123456789abcdef. Also, the --prompt-password-change flag is used to force the user to change their password on next login.

# isi auth users create user2 --password 0123456789abcdef –prompt-password-change 1

When the user logs in to the user2 account, OneFS immediately prompts for a new password. In the following example, a non-compliant password (012345678zyxw) is entered. 

0123456789abcdef -> 012345678zyxw = Failure

This returns an unsuccessful change attempt failure because it does not meet the 15-character minimum:

# su user2
New password:
Re-enter password:
The specified password does not meet the configured password complexity requirements.
Your password must meet the following requirements:
  * Must contain at least 15 characters.
  * Must change at least 8 characters.
  * Must change at least 50% of characters.
New password:

Instead, a compliant password and successful change could be: 

0123456789abcdef -> 0123456zyxwvuts = Success

The following command can also be used to change the password for a user. For example, to update user2’s password:

# isi auth users change-password user2
Current password (hit enter if none):
New password:
Confirm new password:

If a non-compliant password is entered, the following error is returned:

Password change failed: The specified password does not meet the configured password complexity or history requirements

When employed, OneFS hardening automatically enforces security-based configurations. The hardening engine is profile-based, and its STIG security profile is predicated on security mandates specified in the U.S. Department of Defense (DoD) Security Requirements Guides (SRGs) and Security Technical Implementation Guides (STIGs).

On applying the STIG hardening security profile to a cluster (isi hardening apply --profile=STIG), the password policy settings are automatically reconfigured to the following values:

FieldNormal valueSTIG hardened

Lockout Duration

Now

Now

Lockout Threshold

0

3

Lockout Window

Now

15m

Min Password Age

Now

1D

Max Password Age

4W

8W4D

Min Password Length

0

15

Password Prompt Time

2W

2W

Password Complexity

-

lowercase, numeric, repeat, symbol, uppercase

Password History Length

0

5

Password Chars Changed

0

8

Password Percent Changed

0

50

Password Hash Type

NTHash

SHA512

Max Inactivity Days

0

35

For example:

# uname -or
Isilon OneFS 9.5.0.0
 
# isi hardening list
Name  Description                       Status
---------------------------------------------------
STIG  Enable all STIG security settings Applied
---------------------------------------------------
Total: 1
 
# isi auth local view system
                    Name: System
                  Status: active
          Authentication: Yes
   Create Home Directory: Yes
 Home Directory Template: /ifs/home/%U
        Lockout Duration: Now
       Lockout Threshold: 3
          Lockout Window: 15m
             Login Shell: /bin/zsh
             Machine Name:
        Min Password Age: 1D
        Max Password Age: 8W4D
     Min Password Length: 15
    Password Prompt Time: 2W
     Password Complexity: lowercase, numeric, repeat, symbol, uppercase
 Password History Length: 5
  Password Chars Changed: 8
Password Percent Changed: 50
      Password Hash Type: SHA512
     Max Inactivity Days: 35

Note that Password Hash Type is changed from the default NTHash to the more secure SHA512 encoding, in addition to setting the various password criteria.

The OneFS 9.5 WebUI also sees several additions and alterations to the Password policy page. These include:

OperationOneFS 9.5 changeDetails

Policy page

Added

New Password policy page under Access > Membership and roles

reset-password

Added

Generates a random password that meets current password policy for user to log in

The most obvious change is the transfer of the policy configuration elements from the local provider page to a new dedicated Password policy page.

Here’s the OneFS 9.4 View a local provider page, under Access > Authentication providers > Local providers > System:

This is replaced and augmented in the OneFS 9.5 WebUI with the following page, located under Access > Membership and roles > Password policy:

New password policy configuration options are included to require uppercase, lowercase, numeric, or special characters and limit the number of contiguous repeats of a character, and so on.

When it comes to changing a password, only a permitted user can make their change. This can be performed from a couple of locations in the WebUI. First, the user options on the task bar at the top of each screen now provides a Change password option:

A pop-up warning message will also be displayed by the WebUI, informing the user when password expiration is imminent. This warning provides a Change Password link:

Clicking on the Change Password link displays the following page:

A new password complexity tool-tip message is also displayed, informing the user of safe password selection.

Note that re-login is required after a password change.

On the Users page under Access > Membership and roles > Users, the Action drop-down list on the now also contains a Reset Password option:

The successful reset confirmation pop-up offers both a show and copy option, while informing the cluster administrator to share the new password with the user, and for them to change their password during their next login:  

The Create user page now provides an additional field that requires password confirmation. Additionally, the password complexity tool-tip message is also displayed:

The redesigned Edit user details page no longer provides a field to edit the password directly:

Instead, the Action drop-down list on the Users page now contains a Reset Password option. 


Author: Nick Trimbee

 

Read Full Blog
  • PowerScale
  • OneFS
  • single sign-on
  • SSO
  • SAML

OneFS WebUI Single Sign-on Configuration and Deployment

Nick Trimbee Nick Trimbee

Thu, 20 Jul 2023 18:27:32 -0000

|

Read Time: 0 minutes

In the first article in this series, we took a look at the architecture of the new OneFS WebUI SSO functionality. Now, we move on to its provisioning and setup.

SSO on PowerScale can be configured through either the OneFS WebUI or CLI. OneFS 9.5 debuts a new dedicated WebUI SSO configuration page under Access > Authentication Providers > SSO. Alternatively, for command line afficionados, the CLI now includes a new isi auth sso command set.

Here is the overall configuration flow:

 

 
1.  Upgrade to OneFS 9.5

First, ensure the cluster is running OneFS 9.5 or a later release. If upgrading from an earlier OneFS version, note that the SSO service requires this upgrade to be committed prior to configuration and use.

Next, configure an SSO administrator. In OneFS, this account requires at least one of the following privileges:

PrivilegeDescription

ISI_PRIV_LOGIN_PAPI

Required for the admin to use the OneFS WebUI to administer SSO

ISI_PRIV_LOGIN_SSH

Required for the admin to use the OneFS CLI through SSH to administer SSO

ISI_PRIV_LOGIN_CONSOLE

Required for the admin to use the OneFS CLI on the serial console to administer SSO

The user account used for identity provider management should have an associated email address configured.

2.  Setup Identity Provider

OneFS SSO activation also requires having a suitable identity provider (IdP), such as ADFS, provisioned and available before setting up OneFS SSO.

ADFS can be configured through either the Windows GUI or command shell, and detailed information on the deployment and configuration of ADFS can be found in the Microsoft Windows Server documentation.

 
The Windows remote desktop utility (RDP) can be used to provision, connect to, and configure an ADFS server.

  1. When connected to ADFS, configure a rule defining access. For example, the following command line syntax can be used to create a simple rule that permits all users to log in:
    $AuthRules = @" 
    @RuleTemplate="AllowAllAuthzRule" => issue(Type = "http://schemas.microsoft.com/ 
    authorization/claims/permit", Value="true"); 
    "@

    or from the ADFS UI:


    Note that more complex rules can be crafted to meet the particular requirements of an organization.
  2. Create a rule parameter to map the Active Directory user email address to the SAML NameID.
    $TransformRules = @" 
    @RuleTemplate = "LdapClaims" 
    @RuleName = "LDAP mail" 
    c:[Type == "http://schemas.microsoft.com/ws/2008/06/identity/claims/ 
    windowsaccountname", Issuer == "AD AUTHORITY"] 
          => issue(store = "Active Directory", 
               types = 
               ("http://schemas.xmlsoap.org/ws/2005/05/identity/claims/
               emailaddress"), query = ";mail;{0}", param = c.Value); 
    @RuleTemplate = "MapClaims" 
    @RuleName = "NameID" 
    c:[Type == 
    "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress"] 
          => issue(Type = 
    "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/ 
    nameidentifier", Issuer = c.
                Issuer, OriginalIssuer = c.OriginalIssuer, 
                Value = c.Value, ValueType = c.ValueType, 
                Properties["http://schemas.xmlsoap.org/ws/2005/05/identity
                / claimproperties/format"] = 
                "urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress"); 
    "@
  3. Configure AD to trust the OneFS WebUI certificate.
  4. Create the relying party trust.

    Add-AdfsRelyingPartyTrust -Name <cluster-name>\ 
         -MetadataUrl "https://<cluster-node-
    ip>:8080/session/1/saml/metadata" \ 
         -IssuanceAuthorizationRules $AuthRules -IssuanceTransformRules 
    $TransformRules 

or from Windows Server Manager:


3.  Select Access Zone

 Because OneFS SSO is zone-aware, the next step involves choosing the access zone to configure. Go to Access > Authentication providers > SSO, select an access zone (that is, the system zone), and click Add IdP.

 Note that each of a cluster’s access zone or zones must have an IdP configured for it. The same IdP can be used for all the zones, but each access zone must be configured separately.

4.  Add IdP Configuration

 In OneFS 9.5 and later, the WebUI SSO configuration is a wizard-driven, “guided workflow” process involving the following steps:

 
First, go to Access > Authentication providers > SSO, select an access zone (that is, the system zone), and then click Add IdP.

 
On the Add Identity Provider page, enter a unique name for the IdP. For example, Isln-IdP1 in this case:

 
When done, click Next, select the default Upload metadata XML option, and browse to the XML file downloaded from the ADFS system:

 
Alternatively, if the preference is to enter the information by hand, select Manual entry and complete the configuration form fields:

 
If the manual entry method is selected, you must have the IdP certificate ready to upload. With the manual entry option, the following information is required:

FieldDescription

Binding

Select POST or Redirect binding.

Entity ID

Unique identifier of the IdP as configured on the IdP. For example: 

http://idp1.isilon.com/adfs/services/trust

Login URL

Log in endpoint for the IdP. For example: 

http://idp1.isilon.com/adfs/ls/

Logout URL

Log out endpoint for the IdP. For example: http://idp1.example.com/adfs/ls/

Signing Certificate

Provide the PEM encoded certificate obtained from the IdP. This certificate is required to verify messages from the IdP.

Upload the IdP certificate:

 
For example:

Repeat this step for each access zone in which SSO is to be configured.

When complete, click Next to move on to the service provider configuration step.

5.  Configure Service Provider

 On the Service Provider page, confirm that the current access zone is carried over from the previous page.


Select Metadata download or Manual copy, depending on the chosen method of entering OneFS details about this service provider (SP) to the IdP.

 
Provide the hostname or IP address for the SP for the current access zone.

 
Click Generate to create the information (metadata) about OneFS and this access zone for use in configuring the IdP.


This generated information can now be used to configure the IdP (in this case, Windows ADFS) to accept requests from PowerScale as the SP and its configured access zone.

As shown, the WebUI page provides two methods for obtaining the information:

MethodAction

Metadata download

Download the XML file that contains the signing certificate, etc.

Manual copy

Select Copy Link in the lower half of the form to copy the information to the IdP.

 
Next, download the Signing Certificate.

 
When completed, click Next to finish the configuration.

6.  Enable SSO and Verify Operation

Once the IdP and SP are configured, a cluster admin can enable SSO per access zone through the OneFS WebUI by going to Access > Authentication providers > SSO. From here, select the access zone and select the toggle to enable SSO:

 Or from the OneFS CLI, use the following syntax:

# isi auth sso settings modify --sso-enabled 1

  

Author: Nick Trimbee

 

Read Full Blog
  • PowerScale
  • OneFS
  • single sign-on
  • SSO
  • SAML

OneFS WebUI Single Sign-on

Nick Trimbee Nick Trimbee

Thu, 20 Jul 2023 16:32:13 -0000

|

Read Time: 0 minutes

The Security Assertion Markup Language (SAML) is an open standard for sharing security information about identity, authentication, and authorization across different systems. SAML is implemented using the Extensible Markup Language (XML) standard for sharing data. The SAML framework enables single sign-on (SSO), which in turn allows users to log in once, and their login credential can be reused to authenticate with and access other different service providers. It defines several entities including end users, service providers, and identity providers, and is used to manage identity information. For example, the Windows Active Directory Federation Services (ADFS) is one of the ubiquitous identity providers for SAML contexts.

EntityDescription

End user

Requires authentication prior to being allowed to use an application.

Identity provider (IdP)

Performs authentication and passes the user's identity and authorization level to the service provider—for example, ADFS.

Service provider (SP)

Trusts the identity provider and authorizes the given user to access the requested resource. With SAML 2.0, a PowerScale cluster is a service provider. 

SAML Assertion

XML document that the identity provider sends to the service provider that contains the user authorization. 

OneFS 9.5 introduces SAML-based SSO for the WebUI to provide a more convenient authentication method, in addition to meeting the security compliance requirements for federal and enterprise customers. In OneFS 9.5, the WebUI’s initial login page has been redesigned to support SSO and, when enabled, a new Log in with SSO button is displayed on the login page under the traditional username and password text boxes. For example:

 
OneFS SSO is also zone-aware in support of multi-tenant cluster configurations. As such, a separate IdP can be configured independently for each OneFS access zone.


Under the hood, OneFS SSO employs the following high-level architecture:

 

In OneFS 9.5, the SSO operates through HTTP REDIRECT and POST bindings, with the cluster acting as the service provider. 

There are three different types of SAML Assertions—authentication, attribute, and authorization decision.

  • Authentication assertions prove identification of the user and provide the time the user logged in and what method of authentication they used (that is, Kerberos, two-factor, and so on).
  • The attribution assertion passes the SAML attributes to the service provider. SAML attributes are specific pieces of data that provide information about the user.
  • An authorization decision assertion states whether the user is authorized to use the service or if the identify provider denied their request due to a password failure or lack of rights to the service.

SAML SSO works by transferring the user’s identity from one place (the identity provider) to another (the service provider). This is done through an exchange of digitally signed XML documents.

A SAML Request, also known as an authentication request, is generated by the service provider to “request” an authentication.

A SAML Response is generated by the identity provider and contains the actual assertion of the authenticated user. In addition, a SAML Response may contain additional information, such as user profile information and group/role information, depending on what the service provider can support. Note that the service provider never directly interacts with the identity provider, with a browser acting as the agent facilitating any redirections.

Because SAML authentication is asynchronous, the service provider does not maintain the state of any authentication requests. As such, when the service provider receives a response from an identity provider, the response must contain all the necessary information.

The general flow is as follows:


When OneFS redirects a user to the configured IdP for login, it makes an HTTP GET request (SAMLRequest), instructing the IdP that the cluster is attempting to perform a login (SAMLAuthnRequest). When the user successfully authenticates, the IdP responds back to OneFS with an HTTP POST containing an HTML form (SAMLResponse) that indicates whether the login was successful, who logged in, plus any additional claims configured on the IdP. 

On receiving the SAMLResponse, OneFS verifies the signature using the public key (X.509 certificate) in to ensure that it really came from its trusted IdP and that none of the contents have been tampered with. OneFS then extracts the identity of the user, along with any other pertinent attributes. At this point, the user is redirected back to the OneFS WebUI dashboard (landing page), as if logged into the site manually.

In the next article in this series, we’ll take a detailed look at the following procedure to deploy SSO on a PowerScale cluster:

 

Author: Nick Trimbee

Read Full Blog
  • security
  • PowerScale
  • OneFS
  • STIG

OneFS Account Security Policy

Nick Trimbee Nick Trimbee

Thu, 20 Jul 2023 16:23:21 -0000

|

Read Time: 0 minutes

Another of the core security enhancements introduced in OneFS 9.5 is the ability to enforce strict user account security policies. This is required for compliance with both private and public sector security mandates. For example, the account policy restriction requirements expressed within the U.S. military STIG requirements stipulate:

RequirementDescription

Delay

The OS must enforce a delay of at least 4 seconds between logon prompts following a failed logon attempt.

Disable

The OS must disable account identifiers (individuals, groups, roles, and devices) after 35 days of inactivity.

Limit

The OS must limit the number of concurrent sessions to ten for all accounts and/or account types.

 To directly address these security edicts, OneFS 9.5 adds the following account policy restriction controls:

Account policy functionDetails

Delay after failed login

  • After a failed login, OneFS enforces a configurable delay for subsequent logins on same cluster node 
  • Only applicable to administrative logins (not Protocol logins) 

Disable inactive accounts

  • Disables an inactive account after specified number of days. 
  • Only applicable to Local user accounts 
  • Cluster wide 

Concurrent session limit

  • Limits the number of active sessions a user can have on a cluster node 
  • Only applicable to administrative logins 
  • Node specific

Architecture

OneFS provides a variety of access mechanisms for administering a cluster. These include SSH, serial console, WebUI, and platform API, all of which use different underlying access methods. The serial console and SSH are standard FreeBSD third-party applications and are accounted for per node, whereas the WebUI and pAPI use HTTP module extensions to facilitate access to the system and services and are accounted for cluster-wide. Before OneFS 9.5, there was no common mechanism to represent or account for sessions across these disparate applications.

Under the hood, the OneFS account security policy framework encompasses the following high-level architecture: 

 


With SSH, there’s no explicit or reliable “log-off” event sent to OneFS, beyond actually disconnecting the connection. As such, accounting for active sessions can be problematic and unreliable, especially when connections time out or unexpectedly disconnect. However, OneFS does include an accounting database that stores records of system activities like user login and logout, which can be queried to determine active SSH sessions. Each active SSH connection has an isi_ssh_d process owned by the account associated with it, and this information can be gathered via standard syscalls. OneFS enumerates the number of SSHD processes per account to calculate the total number of active established sessions. This value is then used as part of the total concurrent administrative sessions limit. Since SSH only supports user access through the system zone, there is no need for any zone-aware accounting.

The WebUI and platform API use JSON web tokens (JWTs) for authenticated sessions. OneFS stores the JWTs in the cluster-wide kvstore, and access policy uses valid session tokens in the kvstore to account for active sessions when a user logs on through the WebUI or pAPI. When the user logs off, the associated token is removed, and a message is sent to JWT service with an explicit log off notification. If a session times out or disconnects, the JWT service will not get an event, but the tokens have a limited, short lifespan, and any expired tokens are purged from the list on a scheduled basis in conjunction with the JWT timer. OneFS enumerates the unique session IDs associated with each user’s JWT tokens in the kvstore to get a number of active WebUI and pAPI sessions to use as part of user’s session limit check.

For serial console access accounting, the process table will have information when an STTY connection is active, and OneFS extrapolates user data from it to determine the session count, similar to ssh with a syscall for process data. There is an accounting database that stores records of system activities like user login and logout, which is also queried for active console sessions. Serial console access is only from the system zone, so there is no need for zone-aware accounting.

An API call retrieves user session data from the process table and kvstore to calculate number of user active sessions. As such, the checking and enforcement of session limits is performed in similar manner to the verification of user privileges for SSH, serial console, or WebUI access.

Delaying failed login reconnections

OneFS 9.5 provides the ability to enforce a configurable delay period. This delay is specified in seconds, after which every unsuccessful authentication attempt results in the user being denied the ability to reconnect to the cluster until after the configured delay period has passed. The login delay period is defined in seconds through the FailedLoginDelayTime global attribute and, by default, OneFS is configured for no delay through a FailedLoginDelayTime value of 0. When a cluster is placed into hardened mode with the STIG policy enacted, the delay value is automatically set to 4 seconds. Note that the delay happens in the lsass client, so that the authentication service is not affected.

The configured failed login delay time limit can be viewed with following CLI command:

# isi auth settings global view
                            Send NTLMv2: No
                      Space Replacement:
                              Workgroup: WORKGROUP
               Provider Hostname Lookup: disabled
                          Alloc Retries: 5
                 User Object Cache Size: 47.68M
                       On Disk Identity: native
                         RPC Block Time: Now
                       RPC Max Requests: 64
                            RPC Timeout: 30s
Default LDAP TLS Revocation Check Level: none
                   System GID Threshold: 80
                   System UID Threshold: 80
                         Min Mapped Rid: 2147483648
                              Group UID: 4294967292
                               Null GID: 4294967293
                               Null UID: 4294967293
                            Unknown GID: 4294967294
                            Unknown UID: 4294967294
                Failed Login Delay Time: Now
               Concurrent Session Limit: 0


Similarly, the following syntax will configure the failed login delay time to a value of 4 seconds:

# isi auth settings global modify --failed-login-delay-time 4s
# isi auth settings global view | grep -i delay
                Failed Login Delay Time: 4s

However, when a cluster is put into STIG hardening mode, the “Concurrent sessions limit” is automatically set to 10.

# isi auth settings global view | grep -i delay
                Failed Login Delay Time: 10s

The delay time after login failure can also be configured from the WebUI under Access > Settings Global provider settings:


The valid range of the FailedLoginDelayTime global attribute is from 0 to 65535, and the delay time is limited to the same cluster node.

Note that this maximum session limit is only applicable to administrative logins.

Disabling inactive accounts

In OneFS 9.5, any user account that has been inactive for a configurable duration can be automatically disabled. Administrative intervention is required to re-enable a deactivated user account. The last activity time of a user is determined by their previous logon, and a timer runs every midnight during which all “inactive” accounts are disabled. If the last logon record for a user is unavailable, or stale, the timestamp when the account was enabled is taken as their last activity instead. If inactivity tracking is enabled after the last logon (or enabled) time of a user, the inactivity tracking time is considered for inactivity period.

This feature is disabled by default in OneFS, and all users are exempted from inactivity tracking until configured otherwise. However, individual accounts can be exempted from this behavior, and this can be configured through the user-specific DisableWhenInactive attribute. For example:

# isi auth user view user1 | grep -i inactive
   Disable When Inactive: Yes
# isi auth user modify user1 --disable-when-inactive 0
# isi auth user view user1 | grep -i inactive
   Disable When Inactive: No

If a cluster is put into STIG hardened mode, the value for the MaxInactivityDays parameter is automatically reconfigured to 35, meaning a user will be disabled after 35 days of inactivity. All the local users are removed from exemption when in STIG hardened mode.

Note that this functionality is limited to only the local provider and does not apply to file providers.

The inactive account disabling configuration can be viewed from the CLI with the following syntax. In this example, the MaxInactivityDays attribute is configured for 35 days:

# isi auth local view system
                    Name: System
                  Status: active
          Authentication: Yes
   Create Home Directory: Yes
 Home Directory Template: /ifs/home/%U
        Lockout Duration: Now
       Lockout Threshold: 0
          Lockout Window: Now
             Login Shell: /bin/zsh
            Machine Name:
        Min Password Age: Now
        Max Password Age: 4W
     Min Password Length: 15
    Password Prompt Time: 2W
     Password Complexity: -
 Password History Length: 0
  Password Chars Changed: 8
Password Percent Changed: 50
      Password Hash Type: NTHash
     Max Inactivity Days: 35

Inactive account disabling can also be configured from the WebUI under Access > Authentication providers > Local provider:


The valid range of the MaxInactivityDays parameter is from 0 to UINT_MAX. As such, the following CLI syntax will configure the maximum number of days a user account can be inactive before it will be disabled to 10 days:

# isi auth local modify system --max-inactivity-days 10
# isi auth local view system | grep -i inactiv
     Max Inactivity Days: 0tem –max-inactivity-days 10

Setting this value to 0 days will disable the feature:

# isi auth local modify system --max-inactivity-days 0
# isi auth local view system | grep -i inactiv
     Max Inactivity Days: 0tem –max-inactivity-days 0

Inactivity account disabling, as well as password expiry, can also be configured granularly, per user account. For example, user1 has a default configuration of the Disable When Inactive threshold set to No.

# isi auth users view user1
                    Name: user1
                      DN: CN=user1,CN=Users,DC=GLADOS
              DNS Domain: -
                  Domain: GLADOS
                Provider: lsa-local-provider:System
        Sam Account Name: user1
                     UID: 2000
                     SID: S-1-5-21-1839173366-2940572996-2365153926-1000
                 Enabled: Yes
                 Expired: No
                  Expiry: -
                  Locked: No
                   Email: -
                   GECOS: -
           Generated GID: No
           Generated UID: No
           Generated UPN: Yes
           Primary Group
                          ID: GID:1800
                        Name: Isilon Users
          Home Directory: /ifs/home/user1
        Max Password Age: 4W
        Password Expired: No
         Password Expiry: 2023-06-15T17:45:55
       Password Last Set: 2023-05-18T17:45:55
        Password Expired: No
              Last Logon: -
                   Shell: /bin/zsh
                     UPN: user1@GLADOS
User Can Change Password: Yes
   Disable When Inactive: No


The following CLI command will activate the account inactivity disabling setting and enable password expiry for the user1 account:

# isi auth users modify user1 --disable-when-inactive Yes --password-expires Yes 

Inactive account disabling can also be configured from the WebUI under Access > Membership and roles > Users > Providers:

 

Limiting concurrent sessions

OneFS 9.5 can limit the number of administrative sessions active on a OneFS cluster node, and all WebUI, SSH, pAPI, and serial console sessions are accounted for when calculating the session limit. The SSH and console session count is node-local, whereas WebUI and pAPI sessions are tracked cluster-wide. As such, the formula used to calculate a node’s total active sessions is as follows:

Total active user sessions on a node = Total WebUI and pAPI sessions across the cluster + Total SSH and Console sessions on the node

This feature leverages the cluster-wide session management through JWT for calculating the total number of sessions on a cluster’s node. By default, OneFS 9.5 has no configured limit, and the Concurrent Session Limit parameter has a value of 0. For example:

# isi auth settings global view
                            Send NTLMv2: No
                      Space Replacement:
                              Workgroup: WORKGROUP
               Provider Hostname Lookup: disabled
                          Alloc Retries: 5
                 User Object Cache Size: 47.68M
                       On Disk Identity: native
                         RPC Block Time: Now
                       RPC Max Requests: 64
                            RPC Timeout: 30s
Default LDAP TLS Revocation Check Level: none
                   System GID Threshold: 80
                   System UID Threshold: 80
                         Min Mapped Rid: 2147483648
                              Group UID: 4294967292
                               Null GID: 4294967293
                               Null UID: 4294967293
                            Unknown GID: 4294967294
                            Unknown UID: 4294967294
                Failed Login Delay Time: Now
               Concurrent Session Limit: 0

The following CLI syntax will configure Concurrent Session Limit to a value of 5:

# isi auth settings global modify –-concurrent-session-limit 5
# isi auth settings global view | grep -i concur
                Concurrent Session Limit: 5

Once the session limit has been exceeded, attempts to connect, in this case as root through SSH, will be met with the following Access denied error message:

login as: root
Keyboard-interactive authentication prompts from server:
| Password:
End of keyboard-interactive prompts from server                      
Access denied
password:

The concurrent sessions limit can also be configured from the WebUI under Access > Settings > Global provider settings:


However, when a cluster is put into STIG hardening mode, the concurrent session limit is automatically set to a maximum of 10 sessions.

Note that this maximum session limit is only applicable to administrative logins.

Performance

Disabling an account after a period of inactivity in OneFS requires a SQLite database update every time a user has successfully logged on to the OneFS cluster. After a successful logon, the time to logon is recorded in the database, which is later used to compute the inactivity period.

Inactivity tracking is disabled by default in OneFS 9.5, but can be easily enabled by configuring the MaxInactivityDays attribute to a non-zero value. In cases where inactivity tracking is enabled and many users are not exempt from inactivity tracking, a significant number of logons within a short period of time can generate significant SQLite database requests. However, OneFS consolidates multiple database updates during user logon to a single commit to minimize the overall load.

Troubleshooting

When it comes to troubleshooting OneFS account security policy configurations, there are these main logfiles to check:

  • /var/log/lsassd.log
  • /var/log/messages
  • /var/log/isi_papi_d.log

For additional reporting detail, debug level logging can be enabled on the lsassd.log file with the following CLI command:

# /usr/likewise/bin/lwsm set-log-level lsass – debug

When finished, logging can be returned to the regular error level:

# /us/likewise/bin/lwsm set-log-level lsass - error


Author: Nick Trimbee

Read Full Blog
  • NVMe
  • PowerScale
  • OneFS
  • Premiere
  • Media and Entertainment
  • Adobe

OneFS 9.5 Performance Enhancements for Video Editing

Gregory Shiff Gregory Shiff

Wed, 19 Jul 2023 18:16:59 -0000

|

Read Time: 0 minutes

Of the many changes in OneFS 9.5, the most exciting are the performance enhancements on the NVMe-based PowerScale nodes: F900 and F600. These performance increases are the result of some significant changes “under-the-hood” to OneFS. In the lead-up to the National Association of Broadcasters show last April, I wanted to qualify how much of a difference the extra performance would make for Adobe Premiere Pro video editing workflows. Adobe is one of Dell’s biggest media software partners, and Premiere Pro is crucial to all sorts of media production, from broadcast to cinema.

The awesome news is that the changes to OneFS make a big difference. I saw 40% more video streams with the software upgrade: up to 140 streams of UHD ProRes422 from a single F900 node!

Changes to OneFS

Broadly speaking, there were changes to three areas in OneFS that resulted in the performance boost in version 9.5. These areas are L2 cache, backend networking, and prefetch.

L2 cache -- Being smart about how and when to bypass L2 cache and read directly from NVMe is one part of the OneFS 9.5 performance story. PowerScale OneFS clusters maintain a globally accessible L2 cache for all nodes in the cluster. Manipulating L2 cache can be “expensive” computationally speaking. During a read, the cluster needs to determine what data is in cache, whether the read should be added to cache, and what data should be expired from cache. NVMe storage is so performant that bypassing the L2 cache and reading data directly from NVMe frees up cluster resources. Doing so results in even faster reads on nodes that support it.  

Backend networking -- OneFS uses a private backend network for internode communication. With the massive performance of NVMe based storage and the introduction of 100 GbE, limits were getting reached on this private network. OneFS 9.5 gets around these limitations with a custom multichannel approach (similar in concept to nconnect from the NFS world for the Linux folks out there). In OneFS 9.5, the connection channels on the backend network are bonded in a carefully orchestrated way to parallelize some aspects, while still keeping a predictable message ordering.

Prefetch -- The last part of the performance boost for OneFS 9.5 comes from improved file prefetch. How OneFS prefetches file system metadata was reworked to more optimally read ahead at the different depths of the metadata tree. Efficiency was improved and “jitter” between file system processes minimized.

Our lab setup

First a little background on PowerScale and OneFS. PowerScale is the updated name for the Isilon product line. The new PowerScale nodes are based on Dell servers with compute, RAM, networking, and storage. PowerScale is a scale-out, clustered network-attached-storage (NAS) solution. To build a OneFS file system, PowerScale nodes are joined to create cluster. The cluster creates a single NAS file system with the aggregate resources of all the nodes in the cluster. Client systems connect using a DNS name, and OneFS SmartConnect balances client connections between the various nodes. No matter which node the client connects to, that client has the potential to access all the data on the entire cluster. Further, the client systems benefit from the all the nodes acting in concert.

Even before the performance enhancements in OneFS 9.5, the NVMe-based PowerScale nodes were speedy, so a robust lab environment was going to be needed to stress the system. For this particular set of tests, I had access to 16 workstations running the latest version of Adobe Premiere Pro 2023. Each workstation ran Windows 10 with Nvidia GPU, Intel processor, and 10 GbE networking. On the storage side, the tests were performed against a minimum sized 3-node F900 PowerScale cluster with 100 GbE networking.

Adobe Premiere Pro excels at compressed video editing. The trick with compressed video is that an individual client workstation will get overwhelmed long before the storage system. As such, it is critical to evaluate whether any dropped frames are the result of storage or an overwhelmed workstation. A simple test is to take a single workstation and start playing back parallel compressed video streams, such as ProRes 422. Keeping a close watch on the workstation performance monitors, at a certain point CPU and GPU usage will spike and frames will drop. This test will show the maximum number of streams that a single workstation can handle. Because this test is all about storage performance, keeping the number of streams per workstation to a healthy range takes individual workstation performance out of the equation.

I settled on 10x streams of ProRes 422 UHD video running at 30 frames per second per workstation. Each individual video stream was ~70 MBps (560mbps). Running ten of these streams meant each workstation was pulling around 700 MBps (though with Premiere Pro prefetching this number was closer to 800 MBps). With this number of video streams, the workstation wasn’t working too hard and it was well within what would fit down a 10 GbE network pipe.

Running some quick math here, 16 workstations each pulling 800-ish MBps works out to about 12.5 GBps of total throughput. This throughput is not enough throughput to overwhelm even a small 3-node F900 cluster. In order to stress the system, all 16 workstations were manually pointed to single 100 GbE port on a single F900 node. Due to the clustered nature of OneFS, the clients will get benefit from the entire cluster. But even with the rest of the cluster behind it, at a certain point, a single F900 node is going to get overwhelmed.

Figure 1.  OneFS Lab configuration

Test methodology

The first step was to import test media for playback. Each workstation accessed its own unique set of 10x one-hour long UHD ProRes422 clips. Then a separate Premiere Pro project was created for each workstation with 10 simultaneous layers of video. The plan was to start playback one by one on each workstation and see where the tipping point was for that single PowerScale F900 node. The test was to be run first with OneFS 9.4 and then with OneFS 9.5.

Adobe Premiere Pro has a debug overlay called DogEars. In addition to showing dropped frames, DogEars provides some useful metrics about how “healthy” video playback is in Premiere Pro. Even before a system starts to drop frames, latency spikes and low prefetch buffers show when Premiere Pro is struggling to sustain playback.

The metrics in DogEars that I was focused on were the following:

Dropped frames: This metric is obvious, dropped frames are unacceptable. However, at times Premiere Pro will show single digit dropped frames at playback start.

FramePrefetchLatency: This metric only shows up during playback. The latency starts high while the prefetch frame buffer is filling. When that buffer gets up to slightly over 300 frames, the latency drops down to around 20 to 30 milliseconds. When the storage system was overwhelmed, this prefetch latency goes well above 30 milliseconds and stays there.

CompleteAheadOfPlay: This metric also only shows up during playback. The number of frames creeps up during playback and settles in at slightly over 300 prefetched frames. The FramePrefetchLatency above will be high (in the 100ms range or so) until the 300 frames are prefetched, at which point the latency will drop down to 30ms or lower. When the storage system is stressed, Premiere Pro is never able to fill this prefetch buffer, and it never gets up to the 300+ frames.

A blurry image of a city

Description automatically generated with medium confidence

Figure 2.  Premiere Pro with Dogears overlay

Test results

With the test environment configured and the individual projects loaded, it was time to see what the system could provide.  

With the PowerScale cluster running OneFS 9.4, playback was initiated on each Adobe Premiere workstation. Keep in mind that all the workstations were artificially pointed to a single node in this 3-node F900 cluster. That single F900 node running OneFS 9.4 could handle 10x of the workstations, each playing back 10x UHD streams. That’s 100x streams of UHD ProRes 422 video from one node. Not too shabby.  

At 110x streams (11 workstations), no frames were dropped, but the CompleteAheadOfPlay number on all the workstations started to go below 300. Also, the FramePreFetchLatency spiked to over 100 milliseconds. Clearly, the storage node was unable to provide more performance.

After reproducing these results several times to confirm accuracy, we unmounted the storage from each workstation and upgraded the F900 cluster to OneFS 9.5. Time to see how much of a difference the OneFS 9.5 performance boost would make for Premiere Pro.

As before, each workstation loaded a unique project with unique ProRes media. At 100x streams of video, playback chugged along fine. Time to load up additional streams and see where things break. 110, 120, 130, 140… playback from the single F900 node continued to chug along with no drops and acceptable latency. It was only at 150 streams of video that playback began to suffer. By this time, that single F900 node was pumping close to 10GBps out of that single 100 GbE NIC port. These 14x workstations were not entirely saturating the connection, but getting close. And the performance was a 40% bump from the OneFS 9.4 numbers. Impressive.

A screenshot of a computer

Description automatically generated

Figure 3.  isi statistics output with 140 streams of video from a single node

These results exceeded my expectations going into the project. Getting a 40% performance boost with a code upgrade to existing hardware is impressive. This increase lined up with some of the benchmarking tools used by engineering. But performance from a benchmark tool vs. a real-world application are often two entirely different things. Benchmark tools are particularly inaccurate for video playback where small increases in latency can result in unacceptable results. Because Adobe Premiere is one of the most widely used applications  with PowerScale storage, it made sense as a test platform to gauge these differences. For more information about PowerScale storage and media, check out https://Dell.to/media.

Click here to learn more about the author, Gregory Shiff


Read Full Blog
  • PowerScale
  • OneFS
  • Media and Entertainment
  • 8K
  • Baselight
  • FilmLight
  • finishing
  • uncompressed
  • 4K

Success with Dell PowerScale and Baselight by FilmLight

Gregory Shiff Gregory Shiff

Wed, 19 Jul 2023 18:19:27 -0000

|

Read Time: 0 minutes

In my role as technical lead for media workflows at Dell Technologies, I’m fortunate to partner with companies making some of the best tools for creatives. FilmLight is undeniably one of those companies. Baselight by FilmLight is used in the highest end of feature film production. I was eager to put the latest all-flash PowerScale OneFS nodes to the test and see how those storage nodes could support Baselight workflows. I’m pleased to say that PowerScale supports Baselight very well, and I’m able to share best practices for integrating PowerScale into Baselight environments.

Baselight is a color grading and image-processing system that is widely used in cinematic production. Traditionally, Baselight DI workflows are the domain of SAN or block storage. The journey towards supporting modern DI workflows on PowerScale started with OneFS’s support of NFS-over-RDMA. Using the RDMA protocol with PowerScale all flash storage allows for high throughput workflows that are unobtainable with TCP. Using RDMA for media applications is well documented in the blog and white paper: NFS over RDMA for Media Workflows.

With successful RDMA testing on other color correction software complete, I was confident that we could add Baselight to the list of supported platforms. The time seemed ripe, and FilmLight agreed to work with us on getting it done. In partnership with the FilmLight team in LA, we got Baselight One up and running in the Seattle media lab.

FilmLightOS already has a driver installed that supports RDMA for the NIC in the workstation. This made configuration easy, because no additional software had to be installed to support the protocol (at least in our case). While RDMA remains the best choice for using PowerScale with Baselight, not all networks can support RDMA. The good news here is that there is another option: nconnect.

The Linux distribution that Baselight runs on also supports the NFS nconnect mount option. Nconnect allows for multiple TCP connections between the Baselight client and the PowerScale storage. Testing with nconnect demonstrated enough throughput to support 8K uncompressed playback from PowerScale. While RDMA is preferred, it is not an absolute requirement.

Graphical user interface

Description automatically generated

With the storage mounted and performing as expected, we set about adjusting Baselight threads and DirectIO settings to optimize the interaction of Baselight and PowerScale. The results of this testing showed that increasing BaseLight’s thread count to 16 improved performance. (These threads were unrelated to the nconnect connections mentioned above.) DirectIO is a mechanism that bypasses some caching layers in Linux. DirectIO improved Baselight’s write performance and degraded read performance. Thankfully, Baselight is flexible enough to selectively enable DirectIO only for writes.

PowerScale is an easy win for Baselight One. However, Baselight also comes in other variations: Baselight Two and Baselight X. These versions of Baselight have separate processing nodes and host UI devices to tackle the most challenging workflows. These Baselight systems share configuration files that can cause issues with how the storage is mounted on the processing nodes as compared to the host UI nodes. When using RDMA, the processing nodes will use an RDMA mount while the host UI will use TCP. Working with the FilmLight team in LA, changes were made to support separate mount options for the processing nodes vs, host UI node.

A diagram of a computer

Description automatically generated

Getting to know Baselight and partnering with FilmLight on this project was highly satisfying. It would not have been easy to understand the finer intricacies of how Baselight interacts with storage without their help (the rendering and caching mechanisms within Baselight are awesome).

For more details about how to use PowerScale with Baselight, check out the full white paper: PowerScale OneFS: Baselight by FilmLight Best Practices and Configuration.

For more information, and the latest content on Dell Media and Entertainment storage solutions, visit us online

Click here to learn more about the author, Gregory Shiff  

Read Full Blog
  • security
  • PowerScale
  • OneFS

OneFS Restricted Shell—Log Viewing and Recovery

Nick Trimbee Nick Trimbee

Tue, 27 Jun 2023 20:37:27 -0000

|

Read Time: 0 minutes

Complementary to the restricted shell itself, which was covered in the previous article in this series, OneFS 9.5 also sees the addition of a new log viewer, plus a recovery shell option.

 

The new isi_log_access CLI utility enables an SSH user to read, page, and query the log files in the /var/log directory. The ability to run this tool is governed by the user’s role being granted the ISI_PRIV_SYS_SUPPORT role-based access control (RBAC) privilege.

OneFS RBAC is used to explicitly limit who has access to the range of cluster configurations and operations. This granular control allows for crafting of administrative roles, which can create and manage the various OneFS core components and data services, isolating each to specific security roles or to admin only, and so on.

In this case, a cluster security administrator selects the access zone, creates a zone-aware role within it, assigns the ISI_PRIV_SYS_SUPPORT privileges for isi_log_access use, and then assigns users to the role.

Note that the integrated OneFS AuditAdmin RBAC role does not contain the ISI_PRIV_SYS_SUPPORT privileges by default. Also, the integrated RBAC roles cannot be reconfigured:

# isi auth roles modify AuditAdmin --add-priv=ISI_PRIV_SYS_SUPPORT
The privileges of built-in role AuditAdmin cannot be modified

Therefore, the ISI_PRIV_SYS_SUPPORT role has to be added to a custom role.

For example, the following CLI syntax adds the user usr_admin_restricted to the rl_ssh role and adds the privilege ISI_PRIV_SYS_SUPPORT to the rl_ssh role:

# isi auth roles modify rl_ssh --add-user=usr_admin_restricted
# isi auth roles modify rl_ssh --add-priv=ISI_PRIV_SYS_SUPPORT
# isi auth roles view rl_ssh
        Name: rl_ssh
 Description: -
     Members: u_ssh_restricted
              u_admin_restricted
  Privileges
              ID: ISI_PRIV_LOGIN_SSH
      Permission: r
             ID: ISI_PRIV_SYS_SUPPORT
      Permission: r

The usr_admin_restricted user could also be added to the AuditAdmin role:

# isi auth roles modify AuditAdmin --add-user=usr_admin_restricted
# isi auth roles view AuditAdmin | grep -i member
     Members: usr_admin_restricted

The isi_log_access tool supports the following command options and arguments:

OptionDescription

–grep

Match a pattern against the file and display on stdout

–help

Display the command description and usage message

–list

List all the files in the /var/log tree

–less

Display the file on stdout with a pager in secure_mode

–more

Display the file on stdout with a pager in secure_mode

–view

Display the file on stdout

–watch

Display the end of the file and new content as it is written

–zgrep

Match a pattern against the unzipped file contents and display on stdout

–zview

Display an unzipped version of the file on stdout

Here the u_admin_restricted user logs in to the SSH and runs the isi_log_access utility to list the /var/log/messages log file:

# ssh u_admin_restricted@10.246.178.121
 (u_admin_restricted@10.246.178.121) 
 Password:
 Last login: Wed May  3 18:02:18 2023 from 10.246.159.107
 Copyright (c) 2001-2023 Dell Inc. or its subsidiaries. All Rights Reserved.
 Copyright (c) 1992-2018 The FreeBSD Project.
 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
         The Regents of the University of California. All rights reserved.
PowerScale OneFS 9.5.0.0
Allowed commands are
         clear ...
         isi ...
         isi_recovery_shell ...
         isi_log_access ...
         exit
         logout
 # isi_log_access –list
 LAST MODIFICATION TIME         SIZE       FILE
 Mon Apr 10 14:22:18 2023       56         alert.log
 Fri May  5 00:30:00 2023       62         all.log
 Fri May  5 00:30:00 2023       99         all.log.0.gz
 Fri May  5 00:00:00 2023       106        all.log.1.gz
 Thu May  4 00:30:00 2023       100        all.log.2.gz
 Thu May  4 00:00:00 2023       107        all.log.3.gz
 Wed May  3 00:30:00 2023       99         all.log.4.gz
 Wed May  3 00:00:00 2023       107        all.log.5.gz
 Tue May  2 00:30:00 2023       100        all.log.6.gz
 Mon Apr 10 14:22:18 2023       56         audit_config.log
 Mon Apr 10 14:22:18 2023       56         audit_protocol.log
 Fri May  5 17:23:53 2023       82064      auth.log
 Sat Apr 22 12:09:31 2023       10750      auth.log.0.gz
 Mon Apr 10 15:31:36 2023       0          bam.log
 Mon Apr 10 14:22:18 2023       56         boxend.log
 Mon Apr 10 14:22:18 2023       56         bwt.log
 Mon Apr 10 14:22:18 2023       56         cloud_interface.log
 Mon Apr 10 14:22:18 2023       56         console.log
 Fri May  5 18:20:32 2023       23769      cron
 Fri May  5 15:30:00 2023       8803       cron.0.gz
 Fri May  5 03:10:00 2023       9013       cron.1.gz
 Thu May  4 15:00:00 2023       8847       cron.2.gz
 Fri May  5 03:01:02 2023       3012       daily.log
 Fri May  5 00:30:00 2023       101        daily.log.0.gz
 Fri May  5 00:00:00 2023       1201       daily.log.1.gz
 Thu May  4 00:30:00 2023       102        daily.log.2.gz
 Thu May  4 00:00:00 2023       1637       daily.log.3.gz
 Wed May  3 00:30:00 2023       101        daily.log.4.gz
 Wed May  3 00:00:00 2023       1200       daily.log.5.gz
 Tue May  2 00:30:00 2023       102        daily.log.6.gz
 Mon Apr 10 14:22:18 2023       56         debug.log
 Tue Apr 11 12:29:37 2023       3694       diskpools.log
 Fri May  5 03:01:00 2023       244566     dmesg.today
 Thu May  4 03:01:00 2023       244662     dmesg.yesterday
 Tue Apr 11 11:49:32 2023       788        drive_purposing.log
 Mon Apr 10 14:22:18 2023       56         ethmixer.log
 Mon Apr 10 14:22:18 2023       56         gssd.log
 Fri May  5 00:00:35 2023       41641      hardening.log
 Mon Apr 10 15:31:05 2023       17996      hardening_engine.log
 Mon Apr 10 14:22:18 2023       56         hdfs.log
 Fri May  5 15:51:28 2023       31359      hw_ata.log
 Fri May  5 15:51:28 2023       56527      hw_da.log
 Mon Apr 10 14:22:18 2023       56         hw_nvd.log
 Mon Apr 10 14:22:18 2023       56         idi.log

In addition to parsing an entire log file with the more and less flags, the isi_log_access utility can also be used to watch (that is, tail) a log. For example, the /var/log/messages log file:

% isi_log_access --watch messages
 2023-05-03T18:00:12.233916-04:00 <1.5> h7001-2(id2) limited[68236]: Called ['/usr/bin/isi_log_access', 'messages'], which returned 2.
 2023-05-03T18:00:23.759198-04:00 <1.5> h7001-2(id2) limited[68236]: Calling ['/usr/bin/isi_log_access'].
 2023-05-03T18:00:23.797928-04:00 <1.5> h7001-2(id2) limited[68236]: Called ['/usr/bin/isi_log_access'], which returned 0.
 2023-05-03T18:00:36.077093-04:00 <1.5> h7001-2(id2) limited[68236]: Calling ['/usr/bin/isi_log_access', '--help'].
 2023-05-03T18:00:36.119688-04:00 <1.5> h7001-2(id2) limited[68236]: Called ['/usr/bin/isi_log_access', '--help'], which returned 0.
 2023-05-03T18:02:14.545070-04:00 <1.5> h7001-2(id2) limited[68236]: Command not in list of allowed commands.
 2023-05-03T18:02:50.384665-04:00 <1.5> h7001-2(id2) limited[68594]: Calling ['/usr/bin/isi_log_access', '--list'].
 2023-05-03T18:02:50.440518-04:00 <1.5> h7001-2(id2) limited[68594]: Called ['/usr/bin/isi_log_access', '--list'], which returned 0.
 2023-05-03T18:03:13.362411-04:00 <1.5> h7001-2(id2) limited[68594]: Command not in list of allowed commands.
 2023-05-03T18:03:52.107538-04:00 <1.5> h7001-2(id2) limited[68738]: Calling ['/usr/bin/isi_log_access', '--watch', 'messages'].

As expected, the last few lines of the messages log file are displayed. These log entries include the command audit entries for the usr_admin_secure user running the isi_log_access utility with both the --help, --list, and --watch arguments.

The isi_log_access utility also allows zipped log files to be read (–zview) or searched (–zgrep) without uncompressing them. For example, to find all the usr_admin entries in the zipped vmlog.0.gz file:

# isi_log_access --zgrep usr_admin vmlog.0.gz
0.0 64468 usr_admin_restricted /usr/local/bin/zsh 
    0.0 64346 usr_admin_restricted python /usr/local/restricted_shell/bin/restricted_shell.py (python3.8)
    0.0 64468 usr_admin_restricted /usr/local/bin/zsh
    0.0 64346 usr_admin_restricted python /usr/local/restricted_shell/bin/restricted_shell.py (python3.8)
    0.0 64342 usr_admin_restricted sshd: usr_admin_restricted@pts/3 (sshd)
    0.0 64331 root               sshd: usr_admin_restricted [priv] (sshd)
    0.0 64468 usr_admin_restricted /usr/local/bin/zsh
    0.0 64346 usr_admin_restricted python /usr/local/restricted_shell/bin/restricted_shell.py (python3.8)
    0.0 64342 usr_admin_restricted sshd: usr_admin_restricted@pts/3 (sshd)
    0.0 64331 root               sshd: usr_admin_restricted [priv] (sshd)
    0.0 64468 usr_admin_restricted /usr/local/bin/zsh
    0.0 64346 usr_admin_restricted python /usr/local/restricted_shell/bin/restricted_shell.py (python3.8)
    0.0 64342 usr_admin_restricted sshd: usr_admin_restricted@pts/3 (sshd)
    0.0 64331 root               sshd: usr_admin_restricted [priv] (sshd)
    0.0 64468 usr_admin_restricted /usr/local/bin/zsh
    0.0 64346 usr_admin_restricted python /usr/local/restricted_shell/bin/restricted_shell.py (python3.8)
    0.0 64342 usr_admin_restricted sshd: u_admin_restricted@pts/3 (sshd)
    0.0 64331 root               sshd: usr_admin_restricted [priv] (sshd)

OneFS recovery shell

The purpose of the recovery shell is to allow a restricted shell user to access a regular UNIX shell and its associated command set, if needed. As such, the recovery shell is primarily designed and intended for reactive cluster recovery operations and other unforeseen support issues. Note that the isi_recovery_shell CLI command can only be run, and the recovery shell entered, from within the restricted shell.

The ISI_PRIV_RECOVERY_SHELL privilege is required for a user to elevate their shell from restricted to recovery. The following syntax can be used to add this privilege to a role, in this case the rl_ssh role:

% isi auth roles modify rl_ssh --add-priv=ISI_PRIV_RECOVERY_SHELL
% isi auth roles view rl_ssh
        Name: rl_ssh
 Description: -
     Members: usr_ssh_restricted
              usr_admin_restricted
  Privileges
              ID: ISI_PRIV_LOGIN_SSH
      Permission: r
             ID: ISI_PRIV_SYS_SUPPORT
      Permission: r
             ID: ISI_PRIV_RECOVERY_SHELL
      Permission: r

However, note that the –-restricted-shell-enabled security parameter must be set to true before a user with the ISI_PRIV_RECOVERY_SHELL privilege can enter the recovery shell. For example:

% isi security settings view | grep -i restr
Restricted shell Enabled: No
% isi security settings modify –restricted-shell-enabled=true
% isi security settings view | grep -i restr
Restricted shell Enabled: Yes

The restricted shell user must enter the cluster’s root password to successfully enter the recovery shell. For example:

% isi_recovery_shell -h
 Description:
         This command is used to enter the Recovery shell i.e. normal zsh shell from the PowerScale Restricted shell. This command is supported only in the PowerScale Restricted shell.
Required Privilege:
         ISI_PRIV_RECOVERY_SHELL
Usage:
         isi_recovery_shell
            [{--help | -h}]

If the root password is entered incorrectly, the following error is displayed:

% isi_recovery_shell
 Enter 'root' credentials to enter the Recovery shell
 Password:
 Invalid credentials.
 isi_recovery_shell: PAM Auth Failed

A successful recovery shell launch is as follows:

$ ssh u_admin_restricted@10.246.178.121
 (u_admin_restricted@10.246.178.121) Password:
 Last login: Thu May  4 17:26:10 2023 from 10.246.159.107
 Copyright (c) 2001-2023 Dell Inc. or its subsidiaries. All Rights Reserved.
 Copyright (c) 1992-2018 The FreeBSD Project.
 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
         The Regents of the University of California. All rights reserved.
PowerScale OneFS 9.5.0.0
Allowed commands are
         clear ...
         isi ...
         isi_recovery_shell ...
         isi_log_access ...
         exit
         logout
% isi_recovery_shell
 Enter 'root' credentials to enter the Recovery shell
 Password:
 %

At this point, regular shell/UNIX commands (including the vi editor) are available again:

% whoami
 u_admin_restricted
% pwd
 /ifs/home/u_admin_restricted
 % top | head -n 10
 last pid: 65044;  load averages:  0.12,  0.24,  0.29  up 24+04:17:23    18:38:39
 118 processes: 1 running, 117 sleeping
 CPU:  0.1% user,  0.0% nice,  0.9% system,  0.1% interrupt, 98.9% idle
 Mem: 233M Active, 19G Inact, 2152K Laundry, 137G Wired, 60G Buf, 13G Free
 Swap:
   PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
  3955 root          1 -22  r30    50M    14M select  24 142:28   0.54% isi_drive_d
  5715 root         20  20    0   231M    69M kqread   5  55:53   0.15% isi_stats_d
  3864 root         14  20    0    81M    21M kqread  16 133:02   0.10% isi_mcp

The specifics of the recovery shell (ZSH) for the u_admin_restricted user are reported as follows:

% printenv $SHELL
 _=/usr/bin/printenv
 PAGER=less
 SAVEHIST=2000
 HISTFILE=/ifs/home/u_admin_restricted/.zsh_history
 HISTSIZE=1000
 OLDPWD=/ifs/home/u_admin_restricted
 PWD=/ifs/home/u_admin_restricted
 SHLVL=1
 LOGNAME=u_admin_restricted
 HOME=/ifs/home/u_admin_restricted
 RECOVERY_SHELL=TRUE
 TERM=xterm
 PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/root/bin

Shell logic conditions and scripts can be run. For example:

% while true; do uptime; sleep 5; done
  5:47PM  up 24 days,  3:26, 5 users, load averages: 0.44, 0.38, 0.34
  5:47PM  up 24 days,  3:26, 5 users, load averages: 0.41, 0.38, 0.34

ISI commands can be run, and cluster management tasks can be performed.

% isi hardening list
 Name  Description                       Status
 ---------------------------------------------------
 STIG  Enable all STIG security settings Not Applied
 ---------------------------------------------------
 Total: 1

For example, creating and deleting a snapshot:

% isi snap snap list
 ID Name Path
 ------------
 ------------
 Total: 0
% isi snap snap create /ifs/data
% isi snap snap list
 ID   Name  Path
 --------------------
 2    s2    /ifs/data
 --------------------
 Total: 1
% isi snap snap delete 2
 Are you sure? (yes/[no]): yes

Sysctls can be read and managed:

% sysctl efs.gmp.group
efs.gmp.group: <10539754> (4) :{ 1:0-14, 2:0-12,14,17, 3-4:0-14, smb: 1-4, nfs: 1-4, all_enabled_protocols: 1-4, isi_cbind_d: 1-4, lsass: 1-4, external_connectivity: 1-4 }

The restricted shell can be disabled:

% isi security settings modify --restricted-shell-enabled=false
% isi security settings view | grep -i restr
 Restricted shell Enabled: No

However, the isi underscore (isi_*) commands, such as isi_for_array, are still not permitted to run:

% /usr/bin/isi_for_array -s uptime
 zsh: permission denied: /usr/bin/isi_for_array
% isi_gather_info
 zsh: permission denied: isi_gather_info
% isi_cstats
 isi_cstats: Syscall ifs_prefetch_lin() failed: Operation not permitted

When finished, the user can either end the session entirely with the logout command or quit the recovery shell through exit and return to the restricted shell:

% exit
Allowed commands are
         clear ...
         isi ...
         isi_recovery_shell ...
         isi_log_access ...
         exit
         logout
 %

 
Author: Nick Trimbee

 

Read Full Blog
  • security
  • PowerScale
  • OneFS

OneFS Restricted Shell

Nick Trimbee Nick Trimbee

Tue, 27 Jun 2023 19:59:59 -0000

|

Read Time: 0 minutes

In contrast to many other storage appliances, PowerScale has always included an extensive, rich, and capable command line, drawing from its FreeBSD heritage. As such, it incorporates a choice of full UNIX shells (that is, ZSH), the ability to script in a variety of languages (Perl, Python, and so on), full data access, a variety of system and network management and monitoring tools, plus the comprehensive OneFS isi command set. However, what is a bonus for usability can also present a risk from a security point of view.

With this in mind, among the bevy of security features that debuted in OneFS 9.5 release is the addition of a restricted shell for the CLI. This shell heavily curtails access to cluster command line utilities, eliminating areas where commands and scripts could be run and files modified maliciously and unaudited.

The new restricted shell can help both public and private sector organizations to meet a variety of regulatory compliance and audit requirements, in addition to reducing the security threat surface when OneFS is administered.

 

Written in Python, the restricted shell constrains users to a tight subset of the commands available in the regular OneFS command line shells, plus a couple of additional utilities. These include:

CLI utilityDescription

ISI commands

The isi or “isi space” commands. These include the commands such as isi status, and so on. For the full set of isi commands, run isi –help.

Shell commands

The supported shell commands include clear, exit, logout, and CTRL+D.

Log access

The isi_log_access tool can be used if the user possesses the ISI_PRIV_SYS_SUPPORT privilege.

Recovery shell

The recovery shell isi_recovery_shell can be used if the user possesses the ISI_PRIV_RECOVERY_SHELL and the security setting Restricted shell Enabled is configured to true.

For a OneFS CLI command to be audited, its handler needs to call through the platform API (pAPI). This occurs with the regular isi commands but not necessarily with the “isi underscore” commands such as isi_for_array, and so on. While some of these isi_* commands write to log files, there is no uniform or consistent auditing or logging.

On the data access side, /ifs file system auditing works through the various OneFS protocol heads (NFS, SMB, S3, and so on). So if the CLI is used with an unrestricted shell to directly access and modify /ifs, any access and changes are unrecorded and unaudited.

In OneFS 9.5, the new restricted shell is included in the permitted shells list (/etc/shells):

# grep -i restr /etc/shells
/usr/local/restricted_shell/bin/restricted_shell.py

It can be easily set for a user through the CLI. For example, to configure the admin account to use the restricted shell, instead of its default of ZSH:

# isi auth users view admin | grep -i shell
                   Shell: /usr/local/bin/zsh
# isi auth users modify admin --shell=/usr/local/restricted_shell/bin/restricted_shell.py
# isi auth users view admin | grep -i shell
                   Shell: /usr/local/restricted_shell/bin/restricted_shell.py

OneFS can also be configured to limit non-root users to just the secure shell:

  Restricted shell Enabled: No
# isi security settings modify --restricted-shell-enabled=true
# isi security settings view | grep -i restr
  Restricted shell Enabled: Yes

The underlying configuration changes to support this include only allowing non-root users with approved shells in /etc/shells to log in through the console or SSH and having just /usr/local/restricted_shell/bin/restricted_shell.py in the /etc/shells config file.

Note that no users’ shells are changed when the configuration commands above are enacted. If users are intended to have shell access, their login shell must be changed before they can log in. Users will also require the privileges ISI_PRIV_LOGIN_SSH and/or ISI_PRIV_LOGIN_CONSOLE to be able to log in through SSH and the console, respectively.

While the WebUI in OneFS 9.5 does not provide a secure shell configuration page, the restricted shell can be enabled from the platform API, in addition to the CLI. The pAPI security settings now include a restricted_shell_enabled key, which can be enabled by setting to value=1, from its default of 0.

Be aware that, upon configuring a OneFS 9.5 cluster to run in hardened mode with the STIG profile (that is, isi hardening enable STIG), the restricted-shell-enable security setting is automatically set to true. This means that only root and users with ISI_PRIV_LOGIN_SSH and/or ISI_PRIV_LOGIN_CONSOLE privileges and the restricted shell as their shell will be permitted to log in to the cluster. We will focus on OneFS security hardening in a future article.

So let’s take a look at some examples of the restricted shell’s configuration and operation. 

First, we log in as the admin user and modify the file and local auth provider password hash types to the more secure SHA512 from their default value of NTHash:

# ssh 10.244.34.34 -l admin
# isi auth file view System | grep -i hash
     Password Hash Type: NTHash
# isi auth local view System | grep -i hash
      Password Hash Type: NTHash
# isi auth file modify System –-password-hash-type=SHA512
# isi auth local modify System –-password-hash-type=SHA512

Note that a cluster’s default user admin uses role-based access control (RBAC), whereas root does not. As such, the root account should ideally be used as infrequently as possible and, ideally, considered solely as the account of last resort.

Next, the admin and root passwords are changed to generate new passwords using the SHA512 hash:

# isi auth users change-password root
# isi auth users change-password admin

An rl_ssh role is created and the SSH access privilege is added to it:

# isi auth roles create rl_ssh
# isi auth roles modify rl_ssh –-add-priv=ISI_PRIV_LOGIN_SSH

Then a regular user (usr_ssh_restricted) and an admin user (usr_admin_resticted) are created with restricted shell privileges:

# isi auth users create usr_ssh_restricted –-shell=/usr/local/restricted_shell/bin/restricted_shell.py –-set-password
# isi auth users create usr_admin_restricted –shell=/usr/local/restricted_shell/bin/restricted_shell.py –-set-password

We then assign roles to the new users. For the restricted SSH user, we add to our newly created rl_ssh role:

# isi auth roles modify rl_ssh –-add-user=usr_ssh_restricted

The admin user is then added to the security admin and the system admin roles:

# isi auth roles modify SecurityAdmin –-add-user=usr_admin_restricted
# isi auth roles modify SystemAdmin –-add-user=usr_admin_restricted

Next, we connect to the cluster through SSH and authenticate as the usr_ssh_restricted user:

$ ssh usr_ssh_restricted@10.246.178.121
 (usr_ssh_restricted@10.246.178.121) Password:
 Copyright (c) 2001-2023 Dell Inc. or its subsidiaries. All Rights Reserved.
 Copyright (c) 1992-2018 The FreeBSD Project.
 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
         The Regents of the University of California. All rights reserved.
 PowerScale OneFS 9.5.0.0
Allowed commands are
         clear ...
         isi ...
         isi_recovery_shell ...
         isi_log_access ...
         exit
         logout
%

This account has no cluster RBAC privileges beyond SSH access so cannot run the various isi commands. For example, attempting to run isi status returns no data and, instead, warns of the need for event, job engine, and statistics privileges:

% isi status
Cluster Name: h7001
 __
 *** Capacity and health information require ***
 ***   the privilege: ISI_PRIV_STATISTICS.   ***
Critical Events:
*** Requires the privilege: ISI_PRIV_EVENT. ***
Cluster Job Status:
 __
*** Requires the privilege: ISI_PRIV_JOB_ENGINE. ***
Allowed commands are
         clear ...
         isi ...
         isi_recovery_shell ...
         isi_log_access ...
         exit
         logout
%

Similarly, standard UNIX shell commands, such as pwd and whoami, are also prohibited:

% pwd
Allowed commands are
        clear ...
        isi ...
        isi_recovery_shell ...
        isi_log_access ...
        exit
        logout
% whoami
Allowed commands are
        clear ...
        isi ...
        isi_recovery_shell ...
        isi_log_access ...
        exit
        logout


Indeed, without additional OneFS RBAC privileges, the only commands the usr_ssh_restricted user can actually run in the restricted shell are clear, exit, and logout:

Note that the restricted shell automatically logs out an inactive session after a short period of inactivity.

Next, we log in in with the usr_admin_restricted account:

$ ssh usr_admin_restricted@10.246.178.121
(usr_admin_restricted@10.246.178.121) Password:
Copyright (c) 2001-2023 Dell Inc. or its subsidiaries. All Rights Reserved.
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
PowerScale OneFS 9.5.0.0
Allowed commands are
         clear ...
         isi ...
         isi_recovery_shell ...
         isi_log_access ...
         exit
         logout
 %

The isi commands now work because the user has the SecurityAdmin and SystemAdmin roles and privileges:

% isi auth roles list
Name
---------------
AuditAdmin
BackupAdmin
BasicUserRole
SecurityAdmin
StatisticsAdmin
SystemAdmin
VMwareAdmin
rl_console
rl_ssh
---------------
Total: 9
Allowed commands are
        clear ...
        isi ...
        isi_recovery_shell ...
        isi_log_access ...
        exit
        logout
% isi auth users view usr_admin_restricted
                    Name: usr_admin_restricted
                      DN: CN=usr_admin_restricted,CN=Users,DC=H7001
              DNS Domain: -
                  Domain: H7001
                Provider: lsa-local-provider:System
        Sam Account Name: usr_admin_restricted
                     UID: 2003
                     SID: S-1-5-21-3745626141-289409179-1286507423-1003
                 Enabled: Yes
                 Expired: No
                   Expiry: -
                  Locked: No
                   Email: -
                   GECOS: -
           Generated GID: No
           Generated UID: No
           Generated UPN: Yes
           Primary Group
                          ID: GID:1800
                        Name: Isilon Users
          Home Directory: /ifs/home/usr_admin_restricted
        Max Password Age: 4W
        Password Expired: No
         Password Expiry: 2023-05-30T17:16:53
       Password Last Set: 2023-05-02T17:16:53
        Password Expires: Yes
              Last Logon: -
                   Shell: /usr/local/restricted_shell/bin/restricted_shell.py
                     UPN: usr_admin_restricted@H7001
User Can Change Password: Yes
   Disable When Inactive: No
Allowed commands are
        clear ...
        isi ...
        isi_recovery_shell ...
        isi_log_access ...
        exit
        logout
%

However, the OneFS “isi underscore” commands are not supported under the restricted shell. For example, attempting to use the isi_for_array command:

% isi_for_array -s uname -a
Allowed commands are
        clear ...
        isi ...
        isi_recovery_shell ...
        isi_log_access ...
        exit
        logout

Note that, by default, the SecurityAdmin and SystemAdmin roles do not grant the usr_admin_restricted user the privileges needed to run the new isi_log_access and isi_recovery_shell commands.

In the next article in this series, we’ll take a look at these associated isi_log_access and isi_recovery_shell utilities that are also introduced in OneFS 9.5.

Author: Nick Trimbee

 

Read Full Blog
  • PowerScale
  • OneFS
  • troubleshooting
  • firewall

OneFS Firewall Management and Troubleshooting

Nick Trimbee Nick Trimbee

Thu, 25 May 2023 14:41:59 -0000

|

Read Time: 0 minutes

In the final blog in this series, we’ll focus on step five of the OneFS firewall provisioning process and turn our attention to some of the management and monitoring considerations and troubleshooting tools associated with the firewall.

One can manage and monitor the firewall in OneFS 9.5 using the CLI, platform API, or WebUI. Because data security threats come from inside an environment as well as out, such as from a rogue IT employee, a good practice is to constrain the use of all-powerful ‘root’, ‘administrator’, and ‘sudo’ accounts as much as possible. Instead of granting cluster admins full rights, a preferred approach is to use OneFS’ comprehensive authentication, authorization, and accounting framework.

OneFS role-based access control (RBAC) can be used to explicitly limit who has access to configure and monitor the firewall. A cluster security administrator selects the desired access zone, creates a zone-aware role within it, assigns privileges, and then assigns members. For example, from the WebUI under Access > Membership and roles > Roles:

When these members login to the cluster from a configuration interface (WebUI, Platform API, or CLI) they inherit their assigned privileges.

Accessing the firewall from the WebUI and CLI in OneFS 9.5 requires the new ISI_PRIV_FIREWALL administration privilege.

# isi auth privileges -v | grep -i -A 2 firewall
         ID: ISI_PRIV_FIREWALL
Description: Configure network firewall
       Name: Firewall
   Category: Configuration
 Permission: w

This privilege can be assigned one of four permission levels for a role, including:

Permission Indicator

Description

No permission.

R

Read-only permission.

X

Execute permission.

W

Write permission.

By default, the built-in ‘SystemAdmin’ roles is granted write privileges to administer the firewall, while the built-in ‘AuditAdmin’ role has read permission to view the firewall configuration and logs.

With OneFS RBAC, an enhanced security approach for a site could be to create two additional roles on a cluster, each with an increasing realm of trust. For example:

1.  An IT ops/helpdesk role with ‘read’ access to the snapshot attributes would permit monitoring and troubleshooting the firewall, but no changes:

RBAC Role

Firewall Privilege

Permission

IT_Ops

ISI_PRIV_FIREWALL

Read

For example:

# isi auth roles create IT_Ops
# isi auth roles modify IT_Ops --add-priv-read ISI_PRIV_FIREWALL
# isi auth roles view IT_Ops | grep -A2 -i firewall
             ID: ISI_PRIV_FIREWALL
      Permission: r

2.  A Firewall Admin role would provide full firewall configuration and management rights:

RBAC Role

Firewall Privilege

Permission

FirewallAdmin

ISI_PRIV_FIREWALL

Write

For example:

# isi auth roles create FirewallAdmin
# isi auth roles modify FirewallAdmin –add-priv-write ISI_PRIV_FIREWALL
# isi auth roles view FirewallAdmin | grep -A2 -i firewall
ID: ISI_PRIV_FIREWALL
Permission: w

Note that when configuring OneFS RBAC, remember to remove the ‘ISI_PRIV_AUTH’ and ‘ISI_PRIV_ROLE’ privilege from all but the most trusted administrators.

Additionally, enterprise security management tools such as CyberArk can also be incorporated to manage authentication and access control holistically across an environment. These can be configured to change passwords on trusted accounts frequently (every hour or so), require multi-Level approvals prior to retrieving passwords, and track and audit password requests and trends.

OneFS firewall limits

When working with the OneFS firewall, there are some upper bounds to the configurable attributes to keep in mind. These include:

Name

Value

Description

MAX_INTERFACES

500

Maximum number of L2 interfaces including Ethernet, VLAN, LAGG interfaces on a node.

MAX _SUBNETS

100

Maximum number of subnets within a OneFS cluster

MAX_POOLS

100

Maximum number of network pools within a OneFS cluster

DEFAULT_MAX_RULES

100

Default value of maximum rules within a firewall policy

MAX_RULES

200

Upper limit of maximum rules within a firewall policy

MAX_ACTIVE_RULES

5000

Upper limit of total active rules across the whole cluster

MAX_INACTIVE_POLICIES

200

Maximum number of policies that are not applied to any network subnet or pool. They will not be written into ipfw tables.

Firewall performance

Be aware that, while the OneFS firewall can greatly enhance the network security of a cluster, by nature of its packet inspection and filtering activity, it does come with a slight performance penalty (generally less than 5%).

Firewall and hardening mode

If OneFS STIG hardening (that is, from ‘isi hardening apply’) is applied to a cluster with the OneFS firewall disabled, the firewall will be automatically activated. On the other hand, if the firewall is already enabled, then there will be no change and it will remain active.

Firewall and user-configurable ports

Some OneFS services allow the TCP/UDP ports on which the daemon listens to be changed. These include:

Service

CLI Command

Default Port

NDMP

isi ndmp settings global modify –port

10000

S3

isi s3 settings global modify –https-port

9020, 9021

SSH

isi ssh settings modify –port

22

The default ports for these services are already configured in the associated global policy rules. For example, for the S3 protocol:

# isi network firewall rules list | grep s3
default_pools_policy.rule_s3                  55     Firewall rule on s3 service                                                              allow
# isi network firewall rules view default_pools_policy.rule_s3
          ID: default_pools_policy.rule_s3
        Name: rule_s3
       Index: 55
 Description: Firewall rule on s3 service
    Protocol: TCP
   Dst Ports: 9020, 9021
Src Networks: -
   Src Ports: -
      Action: allow

Note that the global policies, or any custom policies, do not auto-update if these ports are reconfigured. This means that the firewall policies must be manually updated when changing ports. For example, if the NDMP port is changed from 10000 to 10001:

# isi ndmp settings global view
                       Service: False
                           Port: 10000
                            DMA: generic
          Bre Max Num Contexts: 64
MSB Context Retention Duration: 300
MSR Context Retention Duration: 600
        Stub File Open Timeout: 15
             Enable Redirector: False
              Enable Throttler: False
       Throttler CPU Threshold: 50
# isi ndmp settings global modify --port 10001
# isi ndmp settings global view | grep -i port
                           Port: 10001

The firewall’s NDMP rule port configuration must also be reset to 10001:

# isi network firewall rule list | grep ndmp
default_pools_policy.rule_ndmp                44     Firewall rule on ndmp service                                                            allow
# isi network firewall rule modify default_pools_policy.rule_ndmp --dst-ports 10001 --live
# isi network firewall rule view default_pools_policy.rule_ndmp | grep -i dst
   Dst Ports: 10001

Note that the –live flag is specified to enact this port change immediately.

Firewall and source-based routing

Under the hood, OneFS source-based routing (SBR) and the OneFS firewall both leverage ‘ipfw’. As such, SBR and the firewall share the single ipfw table in the kernel. However, the two features use separate ipfw table partitions.

This allows SBR and the firewall to be activated independently of each other. For example, even if the firewall is disabled, SBR can still be enabled and any configured SBR rules displayed as expected (that is, using ipfw set 0 show).

Firewall and IPv6

Note that the firewall’s global default policies have a rule allowing ICMP6 by default. For IPv6 enabled networks, ICMP6 is critical for the functioning of NDP (Neighbor Discovery Protocol). As such, when creating custom firewall policies and rules for IPv6-enabled network subnets/pools, be sure to add a rule allowing ICMP6 to support NDP. As discussed in a previous blog, an alternative (and potentially easier) approach is to clone a global policy to a new one and just customize its ruleset instead.

Firewall and FTP

The OneFS FTP service can work in two modes: Active and Passive. Passive mode is the default, where FTP data connections are created on top of random ephemeral ports. However, because the OneFS firewall requires fixed ports to operate, it only supports the FTP service in Active mode. Attempts to enable the firewall with FTP running in Passive mode will generate the following warning:

# isi ftp settings view | grep -i active
          Active Mode: No
# isi network firewall settings modify --enabled yes
FTP service is running in Passive mode. Enabling network firewall will lead to FTP clients having their connections blocked. To avoid this, please enable FTP active mode and ensure clients are configured in active mode before retrying. Are you sure you want to proceed and enable network firewall? (yes/[no]):

To activate the OneFS firewall in conjunction with the FTP service, first ensure that the FTP service is running in Active mode before enabling the firewall. For example:

# isi ftp settings view | grep -i enable
  FTP Service Enabled: Yes
# isi ftp settings view | grep -i active
          Active Mode: No
# isi ftp setting modify –active-mode true
# isi ftp settings view | grep -i active
          Active Mode: Yes
# isi network firewall settings modify --enabled yes

Note: Verify FTP active mode support and/or firewall settings on the client side, too.

Firewall monitoring and troubleshooting

When it comes to monitoring the OneFS firewall, the following logfiles and utilities provide a variety of information and are a good source to start investigating an issue:

Utility

Description

/var/log/isi_firewall_d.log

Main OneFS firewall log file, which includes information from firewall daemon.

/var/log/isi_papi_d.log

Logfile for platform AP, including Firewall related handlers.

isi_gconfig -t firewall

CLI command that displays all firewall configuration info.

ipfw show

CLI command that displays the ipfw table residing in the FreeBSD kernel.

Note that the preceding files and command output are automatically included in logsets generated by the ‘isi_gather_info’ data collection tool.

You can run the isi_gconfig command with the ‘-q’ flag to identify any values that are not at their default settings. For example, the stock (default) isi_firewall_d gconfig context will not report any configuration entries:

# isi_gconfig -q -t firewall
[root] {version:1}

The firewall can also be run in the foreground for additional active rule reporting and debug output. For example, first shut down the isi_firewall_d service:

# isi services -a isi_firewall_d disable
The service 'isi_firewall_d' has been disabled.

Next, start up the firewall with the ‘-f’ flag.

# isi_firewall_d -f
Acquiring kevents for flxconfig
Acquiring kevents for nodeinfo
Acquiring kevents for firewall config
Initialize the firewall library
Initialize the ipfw set
ipfw: Rule added by ipfw is for temporary use and will be auto flushed soon. Use isi firewall instead.
cmd:/sbin/ipfw set enable 0 normal termination, exit code:0
isi_firewall_d is now running
Loaded master FlexNet config (rev:312)
Update the local firewall with changed files: flx_config, Node info, Firewall config
Start to update the firewall rule...
flx_config version changed!                              latest_flx_config_revision: new:312, orig:0
node_info version changed!                               latest_node_info_revision: new:1, orig:0
firewall gconfig version changed!                                latest_fw_gconfig_revision: new:17, orig:0
Start to update the firewall rule for firewall configuration (gconfig)
Start to handle the firewall configure (gconfig)
Handle the firewall policy default_pools_policy
ipfw: Rule added by ipfw is for temporary use and will be auto flushed soon. Use isi firewall instead.
32043 allow tcp from any to any 10000 in
cmd:/sbin/ipfw add 32043 set 8 allow TCP from any  to any 10000 in  normal termination, exit code:0
ipfw: Rule added by ipfw is for temporary use and will be auto flushed soon. Use isi firewall instead.
32044 allow tcp from any to any 389,636 in
cmd:/sbin/ipfw add 32044 set 8 allow TCP from any  to any 389,636 in  normal termination, exit code:0
Snip...

If the OneFS firewall is enabled and some network traffic is blocked, either this or the ipfw show CLI command will often provide the first clues.

Please note that the ipfw command should NEVER be used to modify the OneFS firewall table!

For example, say a rule is added to the default pools policy denying traffic on port 9876 from all source networks (0.0.0.0/0):

# isi network firewall rules create default_pools_policy.rule_9876 --index=100 --dst-ports 9876 --src-networks 0.0.0.0/0 --action deny –live
# isi network firewall rules view default_pools_policy.rule_9876
          ID: default_pools_policy.rule_9876
        Name: rule_9876
       Index: 100
 Description:
    Protocol: ALL
   Dst Ports: 9876
Src Networks: 0.0.0.0/0
   Src Ports: -
      Action: deny

Running ipfw show and grepping for the port will show this new rule:

# ipfw show | grep 9876
32099            0               0 deny ip from any to any 9876 in

The ipfw show command output also reports the statistics of how many IP packets have matched each rule This can be incredibly useful when investigating firewall issues. For example, a telnet session is initiated to the cluster on port 9876 from a client:

# telnet 10.224.127.8 9876
Trying 10.224.127.8...
telnet: connect to address 10.224.127.8: Operation timed out
telnet: Unable to connect to remote host

The connection attempt will time out because the port 9876 ‘deny’ rule will silently drop the packets. At the same time, the ipfw show command will increment its counter to report on the denied packets. For example:

# ipfw show | grep 9876
32099            9             540 deny ip from any to any 9876 in

If this behavior is not anticipated or desired, you can find the rule name by searching the rules list for the port number, in this case port 9876:

# isi network firewall rules list | grep 9876
default_pools_policy.rule_9876                100                                                                 deny

The offending rule can then be reverted to ‘allow’ traffic on port 9876:

# isi network firewall rules modify default_pools_policy.rule_9876 --action allow --live

Or easily deleted, if preferred:

# isi network firewall rules delete default_pools_policy.rule_9876 --live
Are you sure you want to delete firewall rule default_pools_policy.rule_9876? (yes/[no]): yes

Author: Nick Trimbee




Read Full Blog
  • Isilon
  • PowerScale
  • OneFS
  • APEX

Running PowerScale OneFS in Cloud - APEX File Storage for AWS

Lieven Lin Lieven Lin

Wed, 28 Feb 2024 20:58:19 -0000

|

Read Time: 0 minutes

PowerScale OneFS 9.6 now brings a new offering in AWS cloud — APEX File Storage for AWS. APEX File Storage for AWS is a software-defined cloud file storage service that provides high-performance, flexible, secure, and scalable file storage for AWS environments. It is a fully customer managed service that is designed to meet the needs of enterprise-scale file workloads running on AWS.

Benefits of running OneFS in Cloud

APEX File Storage for AWS brings the OneFS distributed file system software into the public cloud, allowing users to have the same management experience in the cloud as with their on-premises PowerScale appliance.

With APEX File Storage for AWS, you can easily deploy and manage file storage on AWS, without the need for hardware or software management. The service provides a scalable and elastic storage infrastructure that can grow or shrink, according to your actual business needs.

Some of the key features and benefits of APEX File Storage for AWS include:

  • Scale-out: APEX File Storage for AWS is powered by the Dell PowerScale OneFS distributed file system. You can start with a small OneFS cluster and then expand it incrementally as your data storage requirements grow.  
  • Data management: APEX File Storage for AWS provides powerful data management capabilities, such as snapshot, data replication, and backup and restore. Because OneFS features are the same in the cloud as in on-premises, organizations can simplify operations and reduce management complexity with a consistent user experience.
  • Simplified journey to hybrid cloud: More and more organizations operate in a hybrid cloud environment, where they need to move data between on-premises and cloud-based environments. APEX File Storage for AWS can help you bridge this gap by facilitating seamless data mobility between on-premises and the cloud with native replication and by providing a consistent data management platform across both environments. Once in the cloud, customers can take advantage of enterprise-class OneFS features such as multi-protocol support, CloudPools, data reduction, and snapshots, to run their workloads in the same way as they do on-premises. APEX File Storage for AWS can use CloudPools to tier cold or infrequently accessed data to lower cost cloud storage, such as AWS S3 object storage. CloudPools extends the OneFS namespace to the private/public cloud and allows you to store much more data than the usable cluster capacity.
  • High performance: APEX File Storage for AWS delivers high-performance file storage with low-latency access to data, ensuring that you can access data quickly and efficiently.

Architecture

The architecture of APEX File Storage for AWS is based on the OneFS distributed file system, which consists of multiple cluster nodes to provide a single global namespace. Each cluster node is an instance of OneFS software that runs on an AWS EC2 instance and provides storage capacity and compute resources. The following diagram shows the architecture of APEX File Storage for AWS.

  • Availability zone: APEX File Storage for AWS is designed to run in a single AWS availability zone to get the best performance.
  • Virtual Private Cloud (VPC): APEX File Storage for AWS requires an AWS VPC to provide network connectivity.
  • OneFS cluster internal subnet: The cluster nodes communicate with each other through the internal subnet. The internal subnet must be isolated from instances that are not in the cluster. Therefore, a dedicated subnet is required for the internal network interfaces of cluster nodes that do not share internal subnets with other EC2 instances.
  • OneFS cluster external subnet: The cluster nodes communicate with clients through the external subnet by using different protocols, such as NFS, SMB, and S3.
  • OneFS cluster internal network interfaces: Network interfaces that are located in the internal subnet.
  • OneFS cluster external network interfaces: Network interfaces that are located in the external subnet.
  • OneFS cluster internal security group: The security group applies to the cluster internal network interfaces and allows all traffic between the cluster nodes’ internal network interfaces only.
  • OneFS cluster external security group: The security group applies to cluster external network interfaces and allows specific ingress traffic from clients.
  • Elastic Compute Cloud (EC2) instance nodes: Cluster nodes that run the OneFS filesystem backed by Elastic Block Store (EBS) volumes and that provide network bandwidth.

 

Supported cluster configuration

APEX File Storage for AWS provides two types of cluster configurations:

  • Solid State Drive (SSD) cluster: APEX File Storage for AWS supports clusters backed by General Purpose SSD (gp3) EBS volumes with up to 1PiB cluster raw capacity. The gp3 EBS volumes are the latest generation of General Purpose SSD volumes, and the lowest cost SSD volume offered by AWS EBS. They balance price and performance for a wide variety of workloads.

Configuration items

Supported options

Cluster size

4 to 6 nodes

EC2 instance type

All nodes in a cluster must be same instance size. The supported instance sizes are m5dn.8xlarge, m5dn.12xlarge, m5dn.16xlarge, or m5dn.24xlarge. See Amazon EC2 m5 instances for more details.

EBS volume (disk) type

gp3

EBS volume (disk) counts per node

5, 6, 10, 12, 15, 18, or 20

Single EBS volume sizes 

1TiB - 16TiB

Cluster raw capacity

24TiB - 1PiB

Cluster protection level

+2n

  • Hard Disk Drive (HDD) cluster: APEX File Storage for AWS supports clusters backed by Throughput Optimized HDD (st1) EBS volumes with up to 360TiB cluster raw capacity. The st1 EBS volumes provide low-cost magnetic storage that defines performance in terms of throughput rather than IOPS. This volume type is a good fit for large sequential workloads.

Configuration items

Supported options

Cluster size

4 to 6 nodes

EC2 instance type

All nodes in a cluster must be same instance size. The supported instance sizes are m5dn.8xlarge, m5dn.12xlarge, m5dn.16xlarge, or m5dn.24xlarge. See  Amazon EC2 m5 instances for more details.

EBS volume (disk) type

st1

EBS volume (disk) counts per node

5 or 6

Single EBS volume sizes

4TiB or 10TiB

Cluster raw capacity

80TiB - 360TiB

Cluster protection level

+2n

APEX File Storage for AWS can deliver 10GB/s seq read and 4GB/s seq write performance as the cluster size grows. To learn more details about APEX File Storage for AWS, see the following documentation.

Author: Lieven Lin


Read Full Blog
  • security
  • PowerScale
  • OneFS

OneFS Firewall Configuration–Part 2

Nick Trimbee Nick Trimbee

Wed, 17 May 2023 19:13:33 -0000

|

Read Time: 0 minutes

In the previous article in this OneFS firewall series, we reviewed the upgrade, activation, and policy selection components of the firewall provisioning process.

Now, we turn our attention to the firewall rule configuration step of the process.

As stated previously, role-based access control (RBAC) explicitly limits who has access to manage the OneFS firewall. So, ensure that the user account that will be used to enable and configure the OneFS firewall belongs to a role with the ‘ISI_PRIV_FIREWALL’ write privilege.

4. Configuring Firewall Rules

When the desired policy is created, the next step is to configure the rules. Clearly, the first step here is to decide which ports and services need securing or opening, beyond the defaults.

The following CLI syntax returns a list of all the firewall’s default services, plus their respective ports, protocols, and aliases, sorted by ascending port number:

# isi network firewall services list
 
Service Name     Port  Protocol   Aliases
 
---------------------------------------------
 
ftp-data         20    TCP        -
 
ftp              21    TCP        -
 
ssh              22    TCP        -
 
smtp             25    TCP        -
 
dns              53    TCP        domain
 
                       UDP
 
http             80    TCP        www
 
                                  www-http
 
kerberos         88    TCP        kerberos-sec
 
                       UDP
 
rpcbind          111   TCP        portmapper
 
                        UDP       sunrpc
 
                                 rpc.bind
 
ntp              123   UDP        -
 
dcerpc           135   TCP        epmap
 
                        UDP       loc-srv
 
netbios-ns       137   UDP        -
 
netbios-dgm      138   UDP        -
 
netbios-ssn      139   UDP        -
 
snmp             161   UDP        -
 
snmptrap         162   UDP        snmp-trap
 
mountd           300   TCP        nfsmountd
 
                       UDP
 
statd            302   TCP        nfsstatd
 
                       UDP
 
lockd            304   TCP       nfslockd
 
                       UDP
 
nfsrquotad       305   TCP        -
 
                       UDP
 
nfsmgmtd         306   TCP        -
 
                       UDP
 
ldap             389   TCP        -
 
                       UDP
 
https            443   TCP        -
 
smb              445   TCP        microsoft-ds
 
hdfs-datanode    585   TCP        -
 
asf-rmcp         623   TCP        -
 
                       UDP
 
ldaps            636   TCP        sldap
 
asf-secure-rmcp  664   TCP        -
 
                       UDP
 
ftps-data        989   TCP        -
 
ftps             990   TCP        -
 
nfs              2049  TCP        nfsd
 
                       UDP
 
tcp-2097         2097  TCP        -
 
tcp-2098         2098  TCP        -
 
tcp-3148         3148  TCP        -
 
tcp-3149         3149  TCP        -
 
tcp-3268         3268  TCP        -
 
tcp-3269         3269  TCP        -
 
tcp-5667         5667  TCP        -
 
tcp-5668         5668  TCP        -
 
isi_ph_rpcd      6557  TCP        -
 
isi_dm_d         7722  TCP        -
 
hdfs-namenode    8020  TCP        -
 
isi_webui        8080  TCP        apache2
 
webhdfs          8082  TCP        -
 
tcp-8083         8083  TCP        -
 
ambari-handshake 8440   TCP       -
 
ambari-heartbeat 8441   TCP       -
 
tcp-8443         8443  TCP        -
 
tcp-8470         8470  TCP        -
 
s3-http          9020  TCP        -
 
s3-https         9021  TCP        -
 
isi_esrs_d       9443  TCP        -
 
ndmp             10000 TCP       -
 
cee              12228 TCP       -
 
nfsrdma          20049 TCP       -
 
                       UDP
 
tcp-28080        28080 TCP       -
 
---------------------------------------------
 
Total: 55

Similarly, the following CLI command generates a list of existing rules and their associated policies, sorted in alphabetical order. For example, to show the first five rules:

# isi network firewall rules list –-limit 5
 
ID                                             Index  Description                                                                              Action
 
----------------------------------------------------------------------------------------------------------------------------------------------------
 
default_pools_policy.rule_ambari_handshake    41      Firewall rule on ambari-handshake service                                                allow
 
default_pools_policy.rule_ambari_heartbeat    42      Firewall rule on ambari-heartbeat service                                               allow
 
default_pools_policy.rule_catalog_search_req  50      Firewall rule on service for global catalog search requests                             allow
 
default_pools_policy.rule_cee                 52     Firewall rule on cee service                                                             allow
 
default_pools_policy.rule_dcerpc_tcp          18      Firewall rule on dcerpc(TCP) service                                                     allow
 
----------------------------------------------------------------------------------------------------------------------------------------------------
 
Total: 5

Both the ‘isi network firewall rules list’ and the ‘isi network firewall services list’ commands also have a ‘-v’ verbose option, and can return their output in csv, list, table, or json formats with the ‘–flag’.

To view the detailed info for a given firewall rule, in this case the default SMB rule, use the following CLI syntax:

# isi network firewall rules view default_pools_policy.rule_smb
 
          ID: default_pools_policy.rule_smb
 
        Name: rule_smb
 
       Index: 3
 
 Description: Firewall rule on smb service
 
    Protocol: TCP
 
   Dst Ports: smb
 
Src Networks: -
 
   Src Ports: -
 
      Action: allow

Existing rules can be modified and new rules created and added into an existing firewall policy with the ‘isi network firewall rules create’ CLI syntax. Command options include:

Option

Description

–action

Allow, which mean pass packets.

 

Deny, which means silently drop packets.

 

Reject which means reply with ICMP error code.

id

Specifies the ID of the new rule to create. The rule must be added to an existing policy. The ID can be up to 32 alphanumeric characters long and can include underscores or hyphens, but cannot include spaces or other punctuation. Specify the rule ID in the following format:

 

<policy_name>.<rule_name>

 

The rule name must be unique in the policy.

–index

The rule index in the pool. The valid value is between 1 and 99. The lower value has the higher priority. If not specified, automatically go to the next available index (before default rule 100).

–live

The live option must only be used when a user issues a command to create/modify/delete a rule in an active policy. Such changes will take effect immediately on all network subnets and pools associated with this policy. Using the live option on a rule in an inactive policy will be rejected, and an error message will be returned.

–protocol

Specify the protocol matched for the inbound packets.  Available values are tcp, udp, icmp, and all.  if not configured, the default protocol all will be used.

–dst-ports  

Specify the network ports/services provided in the storage system which is identified by destination port(s). The protocol specified by –protocol will be applied on these destination ports.

–src-networks

Specify one or more IP addresses with corresponding netmasks that are to be allowed by this firewall policy. The correct format for this parameter is address/netmask, similar to “192.0.2.128/25”. Separate multiple address/netmask pairs with commas. Use the value 0.0.0.0/0 for “any”.

–src-ports

Specify the network ports/services provided in the storage system which is identified by source port(s). The protocol specified by –protocol will be applied on these source ports.

Note that, unlike for firewall policies, there is no provision for cloning individual rules.

The following CLI syntax can be used to create new firewall rules. For example, to add ‘allow’ rules for the HTTP and SSH protocols, plus a ‘deny’ rule for port TCP 9876, into firewall policy fw_test1:

# isi network firewall rules create  fw_test1.rule_http  --index 1 --dst-ports http --src-networks 10.20.30.0/24,20.30.40.0/24 --action allow
# isi network firewall rules create  fw_test1.rule_ssh  --index 2 --dst-ports ssh --src-networks 10.20.30.0/24,20.30.40.0/16 --action allow
# isi network firewall rules create fw_test1.rule_tcp_9876 --index 3 --protocol tcp --dst-ports 9876   --src-networks 10.20.30.0/24,20.30.40.0/24 -- action deny

When a new rule is created in a policy, if the index value is not specified, it will automatically inherit the next available number in the series (such as index=4 in this case).

# isi network firewall rules create fw_test1.rule_2049  --protocol udp -dst-ports 2049 --src-networks 30.1.0.0/16 -- action deny

For a more draconian approach, a ‘deny’ rule could be created using the match-everything ‘*’ wildcard for destination ports and a 0.0.0.0/0 network and mask, which would silently drop all traffic:

# isi network firewall rules create fw_test1.rule_1234  --index=100--dst-ports * --src-networks 0.0.0.0/0 --action deny

When modifying existing firewall rules, use the following CLI syntax, in this case to change the source network of an HTTP allow rule (index 1) in firewall policy fw_test1:

# isi network firewall rules modify fw_test1.rule_http --index 1  --protocol ip --dst-ports http --src-networks 10.1.0.0/16 -- action allow

Or to modify an SSH rule (index 2) in firewall policy fw_test1, changing the action from ‘allow’ to ‘deny’:

# isi network firewall rules modify fw_test1.rule_ssh --index 2 --protocol tcp --dst-ports ssh --src-networks 10.1.0.0/16,20.2.0.0/16 -- action deny

Also, to re-order the custom TCP 9876 rule form the earlier example from index 3 to index 7 in firewall policy fw_test1.

# isi network firewall rules modify fw_test1.rule_tcp_9876 --index 7

Note that all rules equal or behind index 7 will have their index values incremented by one.

When deleting a rule from a firewall policy, any rule reordering is handled automatically. If the policy has been applied to a network pool, the ‘–live’ option can be used to force the change to take effect immediately. For example, to delete the HTTP rule from the firewall policy ‘fw_test1’:

# isi network firewall policies delete fw_test1.rule_http --live

Firewall rules can also be created, modified, and deleted within a policy from the WebUI by navigating to Cluster management > Firewall Configuration > Firewall Policies. For example, to create a rule that permits SupportAssist and Secure Gateway traffic on the 10.219.0.0/16 network:

Once saved, the new rule is then displayed in the Firewall Configuration page:

5. Firewall management and monitoring.

In the next and final article in this series, we’ll turn our attention to managing, monitoring, and troubleshooting the OneFS firewall (Step 5).

Author: Nick Trimbee



Read Full Blog
  • security
  • PowerScale
  • OneFS

OneFS Firewall Configuration—Part 1

Nick Trimbee Nick Trimbee

Tue, 02 May 2023 17:21:12 -0000

|

Read Time: 0 minutes

The new firewall in OneFS 9.5 enhances the security of the cluster and helps prevent unauthorized access to the storage system. When enabled, the default firewall configuration allows remote systems access to a specific set of default services for data, management, and inter-cluster interfaces (network pools).

The basic OneFS firewall provisioning process is as follows:

 

Note that role-based access control (RBAC) explicitly limits who has access to manage the OneFS firewall. In addition to the ubiquitous root, the cluster’s built-in SystemAdmin role has write privileges to configure and administer the firewall.

1.  Upgrade cluster to OneFS 9.5.

First, to provision the firewall, the cluster must be running OneFS 9.5.

If you are upgrading from an earlier release, the OneFS 9.5 upgrade must be committed before enabling the firewall.

Also, be aware that configuration and management of the firewall in OneFS 9.5 requires the new ISI_PRIV_FIREWALL administration privilege. 

# isi auth privilege | grep -i firewall
ISI_PRIV_FIREWALL                   Configure network firewall

This privilege can be granted to a role with either read-only or read/write permissions. By default, the built-in SystemAdmin role is granted write privileges to administer the firewall:

# isi auth roles view SystemAdmin | grep -A2 -i firewall
             ID: ISI_PRIV_FIREWALL
     Permission: w

Additionally, the built-in AuditAdmin role has read permission to view the firewall configuration and logs, and so on:

# isi auth roles view AuditAdmin | grep -A2 -i firewall
             ID: ISI_PRIV_FIREWALL
     Permission: r

Ensure that the user account that will be used to enable and configure the OneFS firewall belongs to a role with the ISI_PRIV_FIREWALL write privilege.

2.  Activate firewall.

The OneFS firewall can be either enabled or disabled, with the latter as the default state. 

The following CLI syntax will display the firewall’s global status (in this case disabled, the default):

# isi network firewall settings view
Enabled: False

Firewall activation can be easily performed from the CLI as follows:

# isi network firewall settings modify --enabled true
# isi network firewall settings view
Enabled: True

Or from the WebUI under Cluster management > Firewall Configuration > Settings:

Note that the firewall is automatically enabled when STIG hardening is applied to a cluster.

3.  Select policies.

A cluster’s existing firewall policies can be easily viewed from the CLI with the following command:

# isi network firewall policies list
ID        Pools                    Subnets                   Rules
 -----------------------------------------------------------------------------
 fw_test1  groupnet0.subnet0.pool0  groupnet0.subnet1         test_rule1
 -----------------------------------------------------------------------------
 Total: 1

Or from the WebUI under Cluster management > Firewall Configuration > Firewall Policies:

The OneFS firewall offers four main strategies when it comes to selecting a firewall policy: 

  1. Retaining the default policy
  2. Reconfiguring the default policy
  3. Cloning the default policy and reconfiguring
  4. Creating a custom firewall policy

We’ll consider each of these strategies in order:

a.  Retaining the default policy

In many cases, the default OneFS firewall policy value provides acceptable protection for a security-conscious organization. In these instances, once the OneFS firewall has been enabled on a cluster, no further configuration is required, and the cluster administrators can move on to the management and monitoring phase.

The firewall policy for all front-end cluster interfaces (network pool) is the default. While the default policy can be modified, be aware that this default policy is global. As such, any change against it will affect all network pools using this default policy.

The following table describes the default firewall policies that are assigned to each interface:

Policy

Description

Default pools policy

Contains rules for the inbound default ports for TCP and UDP services in OneFS

Default subnets policy

Contains rules for:

  • DNS port 53
  • ICMP
  • ICMP6

These can be viewed from the CLI as follows:

# isi network firewall policies view default_pools_policy
            ID: default_pools_policy
          Name: default_pools_policy
    Description: Default Firewall Pools Policy
Default Action: deny
      Max Rules: 100
          Pools: groupnet0.subnet0.pool0, groupnet0.subnet0.testpool1, groupnet0.subnet0.testpool2, groupnet0.subnet0.testpool3, groupnet0.subnet0.testpool4, groupnet0.subnet0.poolcava
        Subnets: -
          Rules: rule_ldap_tcp, rule_ldap_udp, rule_reserved_for_hw_tcp, rule_reserved_for_hw_udp, rule_isi_SyncIQ, rule_catalog_search_req, rule_lwswift, rule_session_transfer, rule_s3, rule_nfs_tcp, rule_nfs_udp, rule_smb, rule_hdfs_datanode, rule_nfsrdma_tcp, rule_nfsrdma_udp, rule_ftp_data, rule_ftps_data, rule_ftp, rule_ssh, rule_smtp, rule_http, rule_kerberos_tcp, rule_kerberos_udp, rule_rpcbind_tcp, rule_rpcbind_udp, rule_ntp, rule_dcerpc_tcp, rule_dcerpc_udp, rule_netbios_ns, rule_netbios_dgm, rule_netbios_ssn, rule_snmp, rule_snmptrap, rule_mountd_tcp, rule_mountd_udp, rule_statd_tcp, rule_statd_udp, rule_lockd_tcp, rule_lockd_udp, rule_nfsrquotad_tcp, rule_nfsrquotad_udp, rule_nfsmgmtd_tcp, rule_nfsmgmtd_udp, rule_https, rule_ldaps, rule_ftps, rule_hdfs_namenode, rule_isi_webui, rule_webhdfs, rule_ambari_handshake, rule_ambari_heartbeat, rule_isi_esrs_d, rule_ndmp, rule_isi_ph_rpcd, rule_cee, rule_icmp, rule_icmp6, rule_isi_dm_d
 # isi network firewall policies view default_subnets_policy
            ID: default_subnets_policy
          Name: default_subnets_policy
    Description: Default Firewall Subnets Policy
Default Action: deny
      Max Rules: 100
          Pools: -
        Subnets: groupnet0.subnet0
          Rules: rule_subnets_dns_tcp, rule_subnets_dns_udp, rule_icmp, rule_icmp6

Or from the WebUI under Cluster management > Firewall Configuration > Firewall Policies:

b.  Reconfiguring the default policy

Depending on an organization’s threat levels or security mandates, there may be a need to restrict access to certain additional IP addresses and/or management service protocols.

If the default policy is deemed insufficient, reconfiguring the default firewall policy can be a good option if only a small number of rule changes are required. The specifics of creating, modifying, and deleting individual firewall rules is covered later in this article (step 3).

Note that if new rule changes behave unexpectedly, or firewall configuration generally goes awry, OneFS does provide a “get out of jail free” card. In a pinch, the global firewall policy can be quickly and easily restored to its default values. This can be achieved with the following CLI syntax:

# isi network firewall reset-global-policy
This command will reset the global firewall policies to the original system defaults. Are you sure you want to continue? (yes/[no]):

Alternatively, the default policy can also be easily reverted from the WebUI by clicking the Reset default policies:

 c.  Cloning the default policy and reconfiguring

Another option is cloning, which can be useful when batch modification or a large number of changes to the current policy are required. By cloning the default firewall policy, an exact copy of the existing policy and its rules is generated, but with a new policy name. For example:

# isi network firewall policies clone default_pools_policy clone_default_pools_policy
# isi network firewall policies list | grep -i clone
clone_default_pools_policy -                           

Cloning can also be initiated from the WebUI under Firewall Configuration > Firewall Policies > More Actions > Clone Policy:

Enter a name for the clone in the Policy Name field in the pop-up window, and click Save:

 Once cloned, the policy can then be easily reconfigured to suit. For example, to modify the policy fw_test1 and change its default-action from deny-all to allow-all:

# isi network firewall policies modify fw_test1 --default-action allow-all

When modifying a firewall policy, you can use the --live CLI option to force it to take effect immediately. Note that the --live option is only valid when issuing a command to modify or delete an active custom policy and to modify default policy. Such changes will take effect immediately on all network subnets and pools associated with this policy. Using the --live option on an inactive policy will be rejected, and an error message returned.

Options for creating or modifying a firewall policy include:

Option

Description

--default-action

Automatically add one rule to deny all or allow all to the bottom of the rule set for this created policy (Index = 100).

--max-rule-num

By default, each policy when created could have a maximum of 100 rules (including one default rule), so user could configure a maximum of 99 rules. User could expand the maximum rule number to a specified value. Currently this value is limited to 200 (and user could configure a maximum of 199 rules).

--add-subnets

Specify the network subnet(s) to add to policy, separated by a comma.

--remove-subnets

Specify the networks subnets to remove from policy and fall back to global policy.

--add-pools

Specify the network pool(s) to add to policy, separated by a comma.

--remove-pools

Specify the networks pools to remove from policy and fall back to global policy.

When you modify firewall policies, OneFS issues the following warning to verify the changes and help avoid the risk of a self-induced denial-of-service:   

# isi network firewall policies modify --pools groupnet0.subnet0.pool0 fw_test1
Changing the Firewall Policy associated with a subnet or pool may change the networks and/or services allowed to connect to OneFS. Please confirm you have selected the correct Firewall Policy and Subnets/Pools. Are you sure you want to continue? (yes/[no]): yes

Once again, having the following CLI command handy, plus console access to the cluster is always a prudent move:

# isi network firewall reset-global-policy

So adding network pools or subnets to a firewall policy will cause the previous policy to be removed from them. Similarly, adding network pools or subnets to the global default policy will revert any custom policy configuration they might have. For example, to apply the firewall policy fw_test1 to IP Pool groupnet0.subnet0.pool0 and groupnet0.subnet0.pool1:

# isi network pools view groupnet0.subnet0.pool0 | grep -i firewall
       Firewall Policy: default_pools_policy
# isi network firewall policies modify fw_test1 --add-pools groupnet0.subnet0.pool0, groupnet0.subnet0.pool1
# isi network pools view groupnet0.subnet0.pool0 | grep -i firewall
       Firewall Policy: fw_test1

Or to apply the firewall policy fw_test1 to IP Pool groupnet0.subnet0.pool0 and groupnet0.subnet0:

# isi network firewall policies modify fw_test1 --apply-subnet groupnet0.subnet0.pool0, groupnet0.subnet0
# isi network pools view groupnet0.subnet0.pool0 | grep -i firewall
 Firewall Policy: fw_test1
# isi network subnets view groupnet0.subnet0 | grep -i firewall
 Firewall Policy: fw_test1

To reapply global policy at any time, either add the pools to the default policy:

# isi network firewall policies modify default_pools_policy --add-pools groupnet0.subnet0.pool0, groupnet0.subnet0.pool1
# isi network pools view groupnet0.subnet0.pool0 | grep -i firewall
 Firewall Policy: default_subnets_policy
# isi network subnets view groupnet0.subnet1 | grep -i firewall
 Firewall Policy: default_subnets_policy

Or remove the pool from the custom policy:

# isi network firewall policies modify fw_test1 --remove-pools groupnet0.subnet0.pool0 groupnet0.subnet0.pool1

You can also manage firewall policies on a network pool in the OneFS WebUI by going to Cluster configuration > Network configuration > External network > Edit pool details. For example:

 

Be aware that cloning is also not limited to the default policy because clones can be made of any custom policies too. For example:

# isi network firewall policies clone clone_default_pools_policy fw_test1

d.  Creating a custom firewall policy

Alternatively, a custom firewall policy can also be created from scratch. This can be accomplished from the CLI using the following syntax, in this case to create a firewall policy named fw_test1:

# isi network firewall policies create fw_test1 --default-action deny
# isi network firewall policies view fw_test1
            ID: fw_test1
          Name: fw_test1
    Description:
Default Action: deny
      Max Rules: 100
          Pools: -
        Subnets: -
          Rules: -

Note that if a default-action is not specified in the CLI command syntax, it will automatically default to deny.

Firewall policies can also be configured in the OneFS WebUI by going to Cluster management > Firewall Configuration > Firewall Policies > Create Policy:

However, in contrast to the CLI, if a default-action is not specified when a policy is created in the WebUI, the automatic default is to Allow because the drop-down list works alphabetically.

If and when a firewall policy is no longer required, it can be swiftly and easily removed. For example, the following CLI syntax deletes the firewall policy fw_test1, clearing out any rules within this policy container:

# isi network firewall policies delete fw_test1
Are you sure you want to delete firewall policy fw_test1? (yes/[no]): yes

Note that the default global policies cannot be deleted.

# isi network firewall policies delete default_subnets_policy
Are you sure you want to delete firewall policy default_subnets_policy? (yes/[no]): yes
Firewall policy: Cannot delete default policy default_subnets_policy.

4.  Configure firewall rules.

 In the next article in this series, we’ll turn our attention to this step, configuring the OneFS firewall rules.

 

 

Read Full Blog
  • security
  • PowerScale
  • OneFS

OneFS Host-Based Firewall

Nick Trimbee Nick Trimbee

Wed, 26 Apr 2023 15:40:15 -0000

|

Read Time: 0 minutes

Among the array of security features introduced in OneFS 9.5 is a new host-based firewall. This firewall allows cluster administrators to configure policies and rules on a PowerScale cluster in order to meet the network and application management needs and security mandates of an organization.

The OneFS firewall protects the cluster’s external, or front-end, network and operates as a packet filter for inbound traffic. It is available upon installation or upgrade to OneFS 9.5 but is disabled by default in both cases. However, the OneFS STIG hardening profile automatically enables the firewall and the default policies, in addition to manual activation.

The firewall generally manages IP packet filtering in accordance with the OneFS Security Configuration Guide, especially in regards to the network port usage. Packet control is governed by firewall policies, which have one or more individual rules.

Item

Description

Match

Action

Firewall Policy

Each policy is a set of firewall rules.

Rules are matched by index in ascending order.

Each policy has a default action.

Firewall Rule

Each rule specifies what kinds of network packets should be matched by Firewall engine and what action should be taken upon them.

Matching criteria includes protocol, source ports, destination ports, source network address).

Options are allow, deny, or reject.

 A security best practice is to enable the OneFS firewall using the default policies, with any adjustments as required. The recommended configuration process is as follows:

Step

Details

1.  Access

Ensure that the cluster uses a default SSH or HTTP port before enabling. The default firewall policies block all nondefault ports until you change the policies.

2.  Enable

Enable the OneFS firewall.

3.  Compare

Compare your cluster network port configurations against the default ports listed in Network port usage.

4.  Configure

Edit the default firewall policies to accommodate any non-standard ports in use in the cluster.

NOTE: The firewall policies do not automatically update when port configurations are changed.

5.  Constrain

Limit access to the OneFS Web UI to specific administrator terminals.

Under the hood, the OneFS firewall is built upon the ubiquitous ipfirewall, or ipfw, which is FreeBSD’s native stateful firewall, packet filter, and traffic accounting facility.

Firewall configuration and management is through the CLI, or platform API, or WebUI, and OneFS 9.5 introduces a new Firewall Configuration page to support this. Note that the firewall is only available once a cluster is already running OneFS 9.5 and the feature has been manually enabled, activating the isi_firewall_d service. The firewall’s configuration is split between gconfig, which handles the settings and policies, and the ipfw table, which stores the rules themselves.

The firewall gracefully handles SmartConnect dynamic IP movement between nodes since firewall policies are applied per network pool. Additionally, being network pool based allows the firewall to support OneFS access zones and shared/multitenancy models. 

The individual firewall rules, which are essentially simplified wrappers around ipfw rules, work by matching packets through the 5-tuples that uniquely identify an IPv4 UDP or TCP session:

  • Source IP address
  • Source port
  • Destination IP address
  • Destination port
  • Transport protocol

The rules are then organized within a firewall policy, which can be applied to one or more network pools. 

Note that each pool can only have a single firewall policy applied to it. If there is no custom firewall policy configured for a network pool, it automatically uses the global default firewall policy.

When enabled, the OneFS firewall function is cluster wide, and all inbound packets from external interfaces will go through either the custom policy or default global policy before reaching the protocol handling pathways. Packets passed to the firewall are compared against each of the rules in the policy, in rule-number order. Multiple rules with the same number are permitted, in which case they are processed in order of insertion. When a match is found, the action corresponding to that matching rule is performed. A packet is checked against the active ruleset in multiple places in the protocol stack, and the basic flow is as follows: 

  1. Get the logical interface for incoming packets.
  2. Find all network pools assigned to this interface.
  3. Compare these network pools one by one with destination IP address to find the matching pool (either custom firewall policy, or default global policy).
  4. Compare each rule with service (protocol and destination ports) and source IP address in this pool in order of lowest index value.  If matched, perform actions according to the associated rule.
  5. If no rule matches, go to the final rule (deny all or allow all), which is specified upon policy creation.

The OneFS firewall automatically reserves 20,000 rules in the ipfw table for its custom and default policies and rules. By default, each policy can have a maximum of 100 rules, including one default rule. This translates to an effective maximum of 99 user-defined rules per policy, because the default rule is reserved and cannot be modified. As such, a maximum of 198 policies can be applied to pools or subnets since the default-pools-policy and default-subnets-policy are reserved and cannot be deleted.

Additional firewall bounds and limits to keep in mind include:

Name

Value

Description

MAX_INTERFACES

500

Maximum number of Layer 2 interfaces per node (including Ethernet, VLAN, LAGG interfaces).

MAX _SUBNETS

100

Maximum number of subnets within a OneFS cluster.

MAX_POOLS

100

Maximum number of network pools within a OneFS cluster.

DEFAULT_MAX_RULES

100

Default value of maximum rules within a firewall policy.

MAX_RULES

200

Upper limit of maximum rules within a firewall policy.

MAX_ACTIVE_RULES

5000

Upper limit of total active rules across the whole cluster.

MAX_INACTIVE_POLICIES

200

Maximum number of policies that are not applied to any network subnet or pool. They will not be written into ipfw table.

The firewall default global policy is ready to use out of the box and, unless a custom policy has been explicitly configured, all network pools use this global policy. Custom policies can be configured by either cloning and modifying an existing policy or creating one from scratch. 

Component

Description

Custom policy

A user-defined container with a set of rules. A policy can be applied to multiple network pools, but a network pool can only apply one policy.

Firewall rule

An ipfw-like rule that can be used to restrict remote access. Each rule has an index that is valid within the policy. Index values range from 1 to 99, with lower numbers having higher priority. Source networks are described by IP and netmask, and services can be expressed either by port number (i.e., 80) or service name (i.e., http, ssh, smb). The * wildcard can also be used to denote all services. Supported actions include allow, drop, and reject.

Default policy

A global policy to manage all default services, used for maintaining OneFS minimum running and management. While Deny any is the default action of the policy, the defined service rules have a default action to allow all remote access. All packets not matching any of the rules are automatically dropped.  

Two default policies: 

  • default-pools-policy
  • default-subnets-policy

Note that these two default policies cannot be deleted, but individual rule modification is permitted in each.

Default services

The firewall’s default predefined services include the usual suspects, such as: DNS, FTP, HDFS, HTTP, HTTPS, ICMP, NDMP, NFS, NTP, S3, SMB, SNMP, SSH, and so on. A full listing is available in the isi network firewall services list CLI command output.

For a given network pool, either the global policy or a custom policy is assigned and takes effect. Additionally, all configuration changes to either policy type are managed by gconfig and are persistent across cluster reboots.

In the next article in this series we’ll take a look at the CLI and WebUI configuration and management of the OneFS firewall. 

 

 

Read Full Blog
  • security
  • PowerScale
  • OneFS
  • snapshots

OneFS Snapshot Security

Nick Trimbee Nick Trimbee

Fri, 21 Apr 2023 17:11:00 -0000

|

Read Time: 0 minutes

In this era of elevated cyber-crime and data security threats, there is increasing demand for immutable, tamper-proof snapshots. Often this need arises as part of a broader security mandate, ideally proactively, but oftentimes as a response to a security incident. OneFS addresses this requirement in the following ways:

On-cluster

Off-cluster

  • Read-only snapshots
  • Snapshot locks
  • Role-based administration
  • SyncIQ snapshot replication
  • Cyber-vaulting

Read-only snapshots

At its core, OneFS SnapshotIQ generates read-only, point-in-time, space efficient copies of a defined subset of a cluster’s data.

Only the changed blocks of a file are stored when updating OneFS snapshots, ensuring efficient storage utilization. They are also highly scalable and typically take less than a second to create, while generating little performance overhead. As such, the RPO (recovery point objective) and RTO (recovery time objective) of a OneFS snapshot can be very small and highly flexible, with the use of rich policies and schedules.

OneFS Snapshots are created manually, on a schedule, or automatically generated by OneFS to facilitate system operations. But whatever the generation method, when a snapshot has been taken, its contents cannot be manually altered.

Snapshot Locks

In addition to snapshot contents immutability, for an enhanced level of tamper-proofing, SnapshotIQ also provides the ability to lock snapshots with the ‘isi snapshot locks’ CLI syntax. This prevents snapshots from being accidentally or unintentionally deleted.

For example, a manual snapshot, ‘snaploc1’ is taken of /ifs/test:

# isi snapshot snapshots create /ifs/test --name snaploc1
# isi snapshot snapshots list | grep snaploc1
79188 snaploc1                                     /ifs/test

A lock is then placed on it (in this case lock ID=1):

# isi snapshot locks create snaplock1
# isi snapshot locks list snaploc1
ID
----
1
----
Total: 1

Attempts to delete the snapshot fail because the lock prevents its removal:

# isi snapshot snapshots delete snaploc1
Are you sure? (yes/[no]): yes
Snapshot "snaploc1" can't be deleted because it is locked

The CLI command ‘isi snapshot locks delete <lock_ID>’ can be used to clear existing snapshot locks, if desired. For example, to remove the only lock (ID=1) from snapshot ‘snaploc1’:

# isi snapshot locks list snaploc1
ID
----
1
----
Total: 1
# isi snapshot locks delete snaploc1 1
Are you sure you want to delete snapshot lock 1 from snaploc1? (yes/[no]): yes
# isi snap locks view snaploc1 1
No such lock

When the lock is removed, the snapshot can then be deleted:

# isi snapshot snapshots delete snaploc1
Are you sure? (yes/[no]): yes
# isi snapshot snapshots list| grep -i snaploc1 | wc -l
       0

Note that a snapshot can have up to a maximum of sixteen locks on it at any time. Also, lock numbers are continually incremented and not recycled upon deletion.

Like snapshot expiration, snapshot locks can also have an expiration time configured. For example, to set a lock on snapshot ‘snaploc1’ that expires at 1am on April 1st, 2024:

# isi snap lock create snaploc1 --expires '2024-04-01T01:00:00'
# isi snap lock list snaploc1
ID
----
36
----
Total: 1
# isi snap lock view snaploc1 33
     ID: 36
Comment:
Expires: 2024-04-01T01:00:00
  Count: 1

Note that if the duration period of a particular snapshot lock expires but others remain, OneFS will not delete that snapshot until all the locks on it have been deleted or expired.

The following table provides an example snapshot expiration schedule, with monthly locked snapshots to prevent deletion:

Snapshot Frequency

Snapshot Time

Snapshot Expiration

Max Retained Snapshots

Every other hour

Start at 12:00AM

End at 11:59AM

1 day





27

Every day

At 12:00AM

1 week

Every week

Saturday at 12:00AM

1 month

Every month

First Saturday of month at 12:00AM

Locked

Roles-based Access Control

Read-only snapshots plus locks present physically secure snapshots on a cluster. However, if you can login to the cluster and have the required elevated administrator privileges to do so, you can still remove locks and/or delete snapshots.

Because data security threats come from inside an environment as well as out, such as from a disgruntled IT employee or other internal bad actor, another key to a robust security profile is to constrain the use of all-powerful ‘root’, ‘administrator’, and ‘sudo’ accounts as much as possible. Instead, of granting cluster admins full rights, a preferred security best practice is to leverage the comprehensive authentication, authorization, and accounting framework that OneFS natively provides.

OneFS role-based access control (RBAC) can be used to explicitly limit who has access to manage and delete snapshots. This granular control allows you to craft administrative roles that can create and manage snapshot schedules, but prevent their unlocking and/or deletion. Similarly, lock removal and snapshot deletion can be isolated to a specific security role (or to root only).

A cluster security administrator selects the desired access zone, creates a zone-aware role within it, assigns privileges, and then assigns members.

For example, from the WebUI under Access > Membership and roles > Roles:

When these members access the cluster through the WebUI, PlatformAPI, or CLI, they inherit their assigned privileges.

The specific privileges that can be used to segment OneFS snapshot management include:

Privilege

Description

ISI_PRIV_SNAPSHOT_ALIAS

Aliasing for snapshots

ISI_PRIV_SNAPSHOT_LOCKS

Locking of snapshots from deletion

ISI_PRIV_SNAPSHOT_PENDING

Upcoming snapshot based on schedules

ISI_PRIV_SNAPSHOT_RESTORE

Restoring directory to a particular snapshot

ISI_PRIV_SNAPSHOT_SCHEDULES

Scheduling for periodic snapshots

ISI_PRIV_SNAPSHOT_SETTING

Service and access settings

ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT

Manual snapshots and locks

ISI_PRIV_SNAPSHOT_SNAPSHOT_SUMMARY

Snapshot summary and usage details

Each privilege can be assigned one of four permission levels for a role, including:

Permission Indicator

Description

No permission

R

Read-only permission

X

Execute permission

W

Write permission

The ability for a user to delete a snapshot is governed by the ‘ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT’ privilege. Similarly, the ‘ISI_PRIV_SNAPSHOT_LOCKS’ privilege governs lock creation and removal.

In the following example, the ‘snap’ role has ‘read’ rights for the ‘ISI_PRIV_SNAPSHOT_LOCKS’ privilege, allowing a user associated with this role to view snapshot locks:

# isi auth roles view snap | grep -I -A 1 locks
             ID: ISI_PRIV_SNAPSHOT_LOCKS
     Permission: r
--
# isi snapshot locks list snaploc1
ID
----
1
----
Total: 1

However, attempts to remove the lock ‘ID 1’ from the ‘snaploc1’ snapshot fail without write privileges:

# isi snapshot locks delete snaploc1 1
Privilege check failed. The following write privilege is required: Snapshot locks (ISI_PRIV_SNAPSHOT_LOCKS)

Write privileges are added to ‘ISI_PRIV_SNAPSHOT_LOCKS’ in the ‘’snaploc1’ role:

# isi auth roles modify snap –-add-priv-write ISI_PRIV_SNAPSHOT_LOCKS
# isi auth roles view snap | grep -I -A 1 locks
             ID: ISI_PRIV_SNAPSHOT_LOCKS
     Permission: w
--

This allows the lock ‘ID 1’ to be successfully deleted from the ‘snaploc1’ snapshot:

# isi snapshot locks delete snaploc1 1
Are you sure you want to delete snapshot lock 1 from snaploc1? (yes/[no]): yes
# isi snap locks view snaploc1 1
No such lock

Using OneFS RBAC, an enhanced security approach for a site could be to create three OneFS roles on a cluster, each with an increasing realm of trust:

1.  First, an IT ops/helpdesk role with ‘read’ access to the snapshot attributes would permit monitoring and troubleshooting, but no changes:

Snapshot Privilege

Description

ISI_PRIV_SNAPSHOT_ALIAS

Read

ISI_PRIV_SNAPSHOT_LOCKS

Read

ISI_PRIV_SNAPSHOT_PENDING

Read

ISI_PRIV_SNAPSHOT_RESTORE

Read

ISI_PRIV_SNAPSHOT_SCHEDULES

Read

ISI_PRIV_SNAPSHOT_SETTING

Read

ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT

Read

ISI_PRIV_SNAPSHOT_SNAPSHOT_SUMMARY

Read

2.  Next, a cluster admin role, with ‘read’ privileges for ‘ISI_PRIV_SNAPSHOT_LOCKS’ and ‘ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT’ would prevent snapshot and lock deletion, but provide ‘write’ access for schedule configuration, restores, and so on.

Snapshot Privilege

Description

ISI_PRIV_SNAPSHOT_ALIAS

Write

ISI_PRIV_SNAPSHOT_LOCKS

Read

ISI_PRIV_SNAPSHOT_PENDING

Write

ISI_PRIV_SNAPSHOT_RESTORE

Write

ISI_PRIV_SNAPSHOT_SCHEDULES

Write

ISI_PRIV_SNAPSHOT_SETTING

Write

ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT

Read

ISI_PRIV_SNAPSHOT_SNAPSHOT_SUMMARY

Write

3.  Finally, a cluster security admin role (root equivalence) would provide full snapshot configuration and management, lock control, and deletion rights:

Snapshot Privilege

Description

ISI_PRIV_SNAPSHOT_ALIAS

Write

ISI_PRIV_SNAPSHOT_LOCKS

Write

ISI_PRIV_SNAPSHOT_PENDING

Write

ISI_PRIV_SNAPSHOT_RESTORE

Write

ISI_PRIV_SNAPSHOT_SCHEDULES

Write

ISI_PRIV_SNAPSHOT_SETTING

Write

ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT

Write

ISI_PRIV_SNAPSHOT_SNAPSHOT_SUMMARY

Write

Note that when configuring OneFS RBAC, remember to remove the ‘ISI_PRIV_AUTH’ and ‘ISI_PRIV_ROLE’ privilege from all but the most trusted administrators.

Additionally, enterprise security management tools such as CyberArk can also be incorporated to manage authentication and access control holistically across an environment. These can be configured to frequently change passwords on trusted accounts (that is, every hour or so), require multi-Level approvals prior to retrieving passwords, and track and audit password requests and trends.

While this article focuses exclusively on OneFS snapshots, the expanded use of RBAC granular privileges for enhanced security is germane to most key areas of cluster management and data protection, such as SyncIQ replication, and so on.

Snapshot replication

In addition to using snapshots for its own checkpointing system, SyncIQ, the OneFS data replication engine, supports snapshot replication to a target cluster.

OneFS SyncIQ replication policies contain an option for triggering a replication policy when a snapshot of the source directory is completed. Additionally, at the onset of a new policy configuration, when the “Whenever a Snapshot of the Source Directory is Taken” option is selected, a checkbox appears to enable any existing snapshots in the source directory to be replicated. More information is available in this SyncIQ paper.

Cyber-vaulting

File data is arguably the most difficult to protect, because:

  • It is the only type of data where potentially all employees have a direct connection to the storage (with the other type of storage it’s through an application)
  • File data is linked (or mounted) to the operating system of the client. This means that it’s sufficient to gain file access to the OS to get access to potentially critical data.
  • Users are the largest breach points.

The Cyber Security Framework (CSF) from the National Institute of Standards and Technology (NIST) categorizes the threat through recovery process:

Within the ‘Protect’ phase, there are two core aspects:

  • Applying all the core protection features available on the OneFS platform, namely:

Feature

Description

Access control

Where the core data protection functions are being executed. Assess who actually needs write access.

Immutability

Having immutable snapshots, replica versions, and so on. Augmenting backup strategy with an archiving strategy with SmartLock WORM.

Encryption

Encrypting both data in-flight and data at rest.

Anti-virus

Integrating with anti-virus/anti-malware protection that does content inspection.

Security advisories

Dell Security Advisories (DSA) inform customers about fixes to common vulnerabilities and exposures. 

  • Data isolation provides a last resort copy of business critical data, and can be achieved by using an air gap to isolate the cyber vault copy of the data. The vault copy is logically separated from the production copy of the data. Data syncing happens only intermittently by closing the air gap after ensuring that there are no known issues.

The combination of OneFS snapshots and SyncIQ replication allows for granular data recovery. This means that only the affected files are recovered, while the most recent changes are preserved for the unaffected data. While an on-prem air-gapped cyber vault can still provide secure network isolation, in the event of an attack, the ability to failover to a fully operational ‘clean slate’ remote site provides additional security and peace of mind.

We’ll explore PowerScale cyber protection and recovery in more depth in a future article.

Author: Nick Trimbee

Read Full Blog
  • PowerScale
  • OneFS
  • SupportAssist

OneFS SupportAssist Architecture and Operation

Nick Trimbee Nick Trimbee

Fri, 21 Apr 2023 16:41:36 -0000

|

Read Time: 0 minutes

The previous article in this series looked at an overview of OneFS SupportAssist. Now, we’ll turn our attention to its core architecture and operation.

Under the hood, SupportAssist relies on the following infrastructure and services:

Service

Name

ESE

Embedded Service Enabler.

isi_rice_d

Remote Information Connectivity Engine (RICE).

isi_crispies_d

Coordinator for RICE Incidental Service Peripherals including ESE Start.

Gconfig

OneFS centralized configuration infrastructure.

MCP

Master Control Program – starts, monitors, and restarts OneFS services.

Tardis

Configuration service and database.

Transaction journal

Task manager for RICE.

Of these, ESE, isi_crispies_d, isi_rice_d, and the Transaction Journal are new in OneFS 9.5 and exclusive to SupportAssist. By contrast, Gconfig, MCP, and Tardis are all legacy services that are used by multiple other OneFS components.

The Remote Information Connectivity Engine (RICE) represents the new SupportAssist ecosystem for OneFS to connect to the Dell backend. The high level architecture is as follows:

Dell’s Embedded Service Enabler (ESE) is at the core of the connectivity platform and acts as a unified communications broker between the PowerScale cluster and Dell Support. ESE runs as a OneFS service and, on startup, looks for an on-premises gateway server. If none is found, it connects back to the connectivity pipe (SRS). The collector service then interacts with ESE to send telemetry, obtain upgrade packages, transmit alerts and events, and so on.

Depending on the available resources, ESE provides a base functionality with additional optional capabilities to enhance serviceability. ESE is multithreaded, and each payload type is handled by specific threads. For example, events are handled by event threads, binary and structured payloads are handled by web threads, and so on. Within OneFS, ESE gets installed to /usr/local/ese and runs as ‘ese’ user and group.

The responsibilities of isi_rice_d include listening for network changes, getting eligible nodes elected for communication, monitoring notifications from CRISPIES, and engaging Task Manager when ESE is ready to go.

The Task Manager is a core component of the RICE engine. Its responsibility is to watch the incoming tasks that are placed into the journal and to assign workers to step through the tasks  until completion. It controls the resource utilization (Python threads) and distributes tasks that are waiting on a priority basis.

The ‘isi_crispies_d’ service exists to ensure that ESE is only running on the RICE active node, and nowhere else. It acts, in effect, like a specialized MCP just for ESE and RICE-associated services, such as IPA. This entails starting ESE on the RICE active node, re-starting it if it crashes on the RICE active node, and stopping it and restarting it on the appropriate node if the RICE active instance moves to another node. We are using ‘isi_crispies_d’ for this, and not MCP, because MCP does not support a service running on only one node at a time.

The core responsibilities of ‘isi_crispies_d’ include:

  • Starting and stopping ESE on the RICE active node
  • Monitoring ESE and restarting, if necessary. ‘isi_crispies_d’ restarts ESE on the node if it crashes. It will retry a couple of times and then notify RICE if it’s unable to start ESE.
  • Listening for gconfig changes and updating ESE. Stopping ESE if unable to make a change and notifying RICE.
  • Monitoring other related services.

The state of ESE, and of other RICE service peripherals, is stored in the OneFS tardis configuration database so that it can be checked by RICE. Similarly, ‘isi_crispies_d’ monitors the OneFS Tardis configuration database to see which node is designated as the RICE ‘active’ node.

The ‘isi_telemetry_d’ daemon is started by MCP and runs when SupportAssist is enabled. It does not have to be running on the same node as the active RICE and ESE instance. Only one instance of ‘isi_telemetry_d’ will be active at any time, and the other nodes will be waiting for the lock.

You can query the current status and setup of SupportAssist on a PowerScale cluster by using the ‘isi supportassist settings view’ CLI command. For example:

# isi supportassist settings view
        Service enabled: Yes
       Connection State: enabled
      OneFS Software ID: ELMISL08224764
          Network Pools: subnet0:pool0
        Connection mode: direct
           Gateway host: -
           Gateway port: -
    Backup Gateway host: -
    Backup Gateway port: -
  Enable Remote Support: Yes
Automatic Case Creation: Yes
       Download enabled: Yes

You can also do this from the WebUI by navigating to Cluster management > General settings > SupportAssist:

You can enable or disable SupportAssist by using the ‘isi services’ CLI command set. For example:

# isi services isi_supportassist disable
The service 'isi_supportassist' has been disabled.
# isi services isi_supportassist enable
The service 'isi_supportassist' has been enabled.
# isi services -a | grep supportassist
   isi_supportassist    SupportAssist Monitor                    Enabled

You can check the core services, as follows:

# ps -auxw | grep -e 'rice' -e 'crispies' | grep -v grep
root    8348    9.4   0.0 109844  60984  -   Ss   22:14        0:00.06 /usr/libexec/isilon/isi_crispies_d /usr/bin/isi_crispies_d
root    8183    8.8   0.0 108060  64396  -   Ss   22:14        0:01.58 /usr/libexec/isilon/isi_rice_d /usr/bin/isi_rice_d

Note that when a cluster is provisioned with SupportAssist, ESRS can no longer be used. However, customers that have not previously connected their clusters to Dell Support can still provision ESRS, but will be presented with a message encouraging them to adopt the best practice of using SupportAssist.

Additionally, SupportAssist in OneFS 9.5 does not currently support IPv6 networking, so clusters deployed in IPv6 environments should continue to use ESRS until SupportAssist IPv6 integration is introduced in a future OneFS release.

Author: Nick Trimbee

Read Full Blog
  • PowerScale
  • OneFS

OneFS SupportAssist Management and Troubleshooting

Nick Trimbee Nick Trimbee

Tue, 18 Apr 2023 20:07:18 -0000

|

Read Time: 0 minutes

In this final article in the OneFS SupportAssist series, we turn our attention to management and troubleshooting.

Once the provisioning process above is complete, the isi supportassist settings view CLI command reports the status and health of SupportAssist operations on the cluster.

# isi supportassist settings view
        Service enabled: Yes
       Connection State: enabled
      OneFS Software ID: xxxxxxxxxx
          Network Pools: subnet0:pool0
        Connection mode: direct
           Gateway host: -
           Gateway port: -
    Backup Gateway host: -
    Backup Gateway port: -
  Enable Remote Support: Yes
Automatic Case Creation: Yes
       Download enabled: Yes

This can also be obtained from the WebUI by going to Cluster management > General settings > SupportAssist:

 There are some caveats and considerations to keep in mind when upgrading to OneFS 9.5 and enabling SupportAssist, including:

  • SupportAssist is disabled when STIG hardening is applied to a cluster.
  • Using SupportAssist on a hardened cluster is not supported.
  • Clusters with the OneFS network firewall enabled (isi network firewall settings) might need to allow outbound traffic on port 9443.
  • SupportAssist is supported on a cluster that’s running in Compliance mode.
  • Secure keys are held in key manager under the RICE domain.

Also, note that Secure Remote Services can no longer be used after SupportAssist has been provisioned on a cluster.

SupportAssist has a variety of components that gather and transmit various pieces of OneFS data and telemetry to Dell Support and backend services through the Embedded Service Enabler (ESE). These workflows include CELOG events; in-product activation (IPA) information; CloudIQ telemetry data; Isi-Gather-info (IGI) logsets; and provisioning, configuration, and authentication data to ESE and the various backend services.

Activity

Information

Events and alerts

SupportAssist can be configured to send CELOG events.

Diagnostics

The OneFS isi diagnostics gather and isi_gather_info logfile collation and transmission commands have a SupportAssist option. 

HealthChecks

HealthCheck definitions are updated using SupportAssist.

License Activation

The isi license activation start command uses SupportAssist to connect.

Remote Support

Remote Support uses SupportAssist and the Connectivity Hub to assist customers with their clusters.

Telemetry

CloudIQ telemetry data is sent using SupportAssist. 

CELOG

Once SupportAssist is up and running, it can be configured to send CELOG events and attachments  through ESE to CLM. This can be managed by the isi event channels CLI command syntax. For example:

# isi event channels list
ID   Name                Type          Enabled
-----------------------------------------------
1    RemoteSupport       connectemc    No
2    Heartbeat Self-Test heartbeat     Yes
3    SupportAssist       supportassist No
-----------------------------------------------
Total: 3
# isi event channels view SupportAssist
     ID: 3
   Name: SupportAssist
   Type: supportassist
Enabled: No

Or from the WebUI:

CloudIQ telemetry

In OneFS 9.5, SupportAssist provides an option to send telemetry data to CloudIQ. This can be enabled from the CLI as follows:

# isi supportassist telemetry modify --telemetry-enabled 1 --telemetry-persist 0
# isi supportassist telemetry view
        Telemetry Enabled: Yes
        Telemetry Persist: No
        Telemetry Threads: 8
Offline Collection Period: 7200

Or in the SupportAssist WebUI:

Diagnostics gather

Also in OneFS 9.5, the isi diagnostics gather and isi_gather_info CLI commands both include a --supportassist upload option for log gathers, which also allows them to continue to function through a new “emergency mode” when the cluster is unhealthy. For example, to start a gather from the CLI that will be uploaded through SupportAssist:

# isi diagnostics gather start –supportassist 1

Similarly, for ISI gather info:

# isi_gather_info --supportassist

Or to explicitly avoid using SupportAssist for ISI gather info log gather upload:

# isi_gather_info --nosupportassist

This can also be configured from the WebUI at Cluster management > General configuration > Diagnostics > Gather:

License Activation through SupportAssist

PowerScale License Activation (previously known as In-Product Activation) facilitates the management of the cluster's entitlements and licenses by communicating directly with Software Licensing Central through SupportAssist.

To activate OneFS product licenses through the SupportAssist WebUI:

  1. Go to Cluster management Licensing. 
    For example, on a new cluster without any signed licenses:


     
  2. Click the Update & Refresh button in the License Activation section. In the Activation File Wizard, select the software modules that you want in the activation file.

     

  3. Select Review changes, review, click Proceed, and finally Activate

Note that it can take up to 24 hours for the activation to occur.

Alternatively, cluster license activation codes (LAC) can also be added manually.

Troubleshooting

When it comes to troubleshooting SupportAssist, the basic process flow is as follows:

 
The OneFS components and services above are:

Component

Info

ESE

Embedded Service Enabler

isi_rice_d

Remote Information Connectivity Engine (RICE)

isi_crispies_d

Coordinator for RICE Incidental Service Peripherals including ESE Start

Gconfig

OneFS centralized configuration infrastructure

MCP

Master Control Program; starts, monitors, and restarts OneFS services

Tardis

Configuration service and database

Transaction journal

Task manager for RICE

Of these, ESE, isi_crispies_d, isi_rice_d, and the transaction journal are new in OneFS 9.5 and exclusive to SupportAssist. In contrast, Gconfig, MCP, and Tardis are all legacy services that are used by multiple other OneFS components. 

For its connectivity, SupportAssist elects a single leader single node within the subnet pool, and NANON nodes are automatically avoided. Ports 443 and 8443 are required to be open for bi-directional communication between the cluster and Connectivity Hub, and port 9443 is for communicating with a gateway. The SupportAssist ESE component communicates with a number of Dell backend services:

  • SRS
  • Connectivity Hub
  • CLM
  • ELMS/Licensing
  • SDR
  • Lightning
  • Log Processor
  • CloudIQ
  • ESE

Debugging backend issues might involve one or more services, and Dell Support can assist with this process.

The main log files for investigating and troubleshooting SupportAssist issues and idiosyncrasies are isi_rice_d.log and isi_crispies_d.log. There is also an ese_log, which can be useful, too. These logs can be found at:

Component

Logfile location

Info

Rice

/var/log/isi_rice_d.log

Per node

Crispies

/var/log/isi_crispies_d.log

Per node

ESE

/ifs/.ifsvar/ese/var/log/ESE.log

Cluster-wide for single instance ESE

Debug level logging can be configured from the CLI as follows:

# isi_for_array isi_ilog -a isi_crispies_d --level=debug+
# isi_for_array isi_ilog -a isi_rice_d --level=debug+

Note that the OneFS log gathers (such as the output from the isi_gather_info utility) will capture all the above log files, plus the pertinent SupportAssist Gconfig contexts and Tardis namespaces, for later analysis.

If needed, the Rice and ESE configurations can also be viewed as follows:

# isi_gconfig -t ese
[root] {version:1}
ese.mode (char*) = direct
ese.connection_state (char*) = disabled
ese.enable_remote_support (bool) = true
ese.automatic_case_creation (bool) = true
ese.event_muted (bool) = false
ese.primary_contact.first_name (char*) =
ese.primary_contact.last_name (char*) =
ese.primary_contact.email (char*) =
ese.primary_contact.phone (char*) =
ese.primary_contact.language (char*) =
ese.secondary_contact.first_name (char*) =
ese.secondary_contact.last_name (char*) =
ese.secondary_contact.email (char*) =
ese.secondary_contact.phone (char*) =
ese.secondary_contact.language (char*) =
(empty dir ese.gateway_endpoints)
ese.defaultBackendType (char*) = srs
ese.ipAddress (char*) = 127.0.0.1
ese.useSSL (bool) = true
ese.srsPrefix (char*) = /esrs/{version}/devices
ese.directEndpointsUseProxy (bool) = false
ese.enableDataItemApi (bool) = true
ese.usingBuiltinConfig (bool) = false
ese.productFrontendPrefix (char*) = platform/16/supportassist
ese.productFrontendType (char*) = webrest
ese.contractVersion (char*) = 1.0
ese.systemMode (char*) = normal
ese.srsTransferType (char*) = ISILON-GW
ese.targetEnvironment (char*) = PROD
 
# isi_gconfig -t rice
[root] {version:1}
rice.enabled (bool) = false
rice.ese_provisioned (bool) = false
rice.hardware_key_present (bool) = false
rice.supportassist_dismissed (bool) = false
rice.eligible_lnns (char*) = []
rice.instance_swid (char*) =
rice.task_prune_interval (int) = 86400
rice.last_task_prune_time (uint) = 0
rice.event_prune_max_items (int) = 100
rice.event_prune_days_to_keep (int) = 30
rice.jnl_tasks_prune_max_items (int) = 100
rice.jnl_tasks_prune_days_to_keep (int) = 30
rice.config_reserved_workers (int) = 1
rice.event_reserved_workers (int) = 1
rice.telemetry_reserved_workers (int) = 1
rice.license_reserved_workers (int) = 1
rice.log_reserved_workers (int) = 1
rice.download_reserved_workers (int) = 1
rice.misc_task_workers (int) = 3
rice.accepted_terms (bool) = false
(empty dir rice.network_pools)
rice.telemetry_enabled (bool) = true
rice.telemetry_persist (bool) = false
rice.telemetry_threads (uint) = 8
rice.enable_download (bool) = true
rice.init_performed (bool) = false
rice.ese_disconnect_alert_timeout (int) = 14400
rice.offline_collection_period (uint) = 7200

The -q flag can also be used in conjunction with the isi_gconfig command to identify any values that are not at their default settings. For example, the stock (default) Rice gconfig context will not report any configuration entries:

# isi_gconfig -q -t rice
[root] {version:1}

 

Read Full Blog
  • PowerScale
  • OneFS

OneFS SupportAssist Provisioning – Part 2

Nick Trimbee Nick Trimbee

Thu, 13 Apr 2023 21:29:24 -0000

|

Read Time: 0 minutes

In the previous article in this OneFS SupportAssist series, we reviewed the off-cluster prerequisites for enabling OneFS SupportAssist:

  1. Upgrading the cluster to OneFS 9.5.
  2. Obtaining the secure access key and PIN.
  3. Selecting either direct connectivity or gateway connectivity.
  4. If using gateway connectivity, installing Secure Connect Gateway v5.x.

In this article, we turn our attention to step 5: Provisioning SupportAssist on the cluster.

As part of this process, we’ll be using the access key and PIN credentials previously obtained from the Dell Support portal in step 2 above.

Provisioning SupportAssist on a cluster

SupportAssist can be configured from the OneFS 9.5 WebUI by going to Cluster management > General settings > SupportAssist. To initiate the provisioning process on a cluster, click the Connect SupportAssist link, as shown here:

If SupportAssist is not configured, the Remote support page displays the following banner, warning of the future deprecation of SRS:

Similarly, when SupportAssist is not configured, the SupportAssist WebUI page also displays verbiage recommending the adoption of SupportAssist:

There is also a Connect SupportAssist button to begin the provisioning process.

Selecting the Configure SupportAssist button initiates the setup wizard.

1.  Telemetry Notice

 


The first step requires checking and accepting the Infrastructure Telemetry Notice:



2.  Support Contract



For the next step, enter the details for the primary support contact, as prompted:

 
You can also provide the information from the CLI by using the isi supportassist contacts command set. For example:

# isi supportassist contacts modify --primary-first-name=Nick --primary-last-name=Trimbee --primary-email=trimbn@isilon.com


3.  Establish Connections

Next, complete the Establish Connections page

This involves the following steps:

      • Selecting the network pool(s)
      • Adding the secure access key and PIN
      • Configuring either direct or gateway access
      • Selecting whether to allow remote support, CloudIQ telemetry, and auto case creation

a.  Select network pool(s).

At least one statically allocated IPv4 network subnet and pool are required for provisioning SupportAssist. OneFS 9.5 does not support IPv6 networking for SupportAssist remote connectivity. However, IPv6 support is planned for a future release.

Select one or more network pools or subnets from the options displayed. In this example, we select subnet0pool0:



Or from the CLI:

Select one or more static subnets or pools for outbound communication, using the following CLI syntax:

# isi supportassist settings modify --network-pools="subnet0.pool0"

Additionally, if the cluster has the OneFS 9.5 network firewall enabled (“isi network firewall settings”), ensure that outbound traffic is allowed on port 9443.

b.  Add secure access key and PIN.

In this next step, add the secure access key and pin. These should have been obtained in an earlier step in the provisioning procedure from the following Dell Support site: https://www.dell.com/support/connectivity/product/isilon-onefs.


Alternatively, if configuring SupportAssist from the OneFS CLI, add the key and pin by using the following syntax:

# isi supportassist provision start --access-key <key> --pin <pin>


c.  Configure access.

  • Direct access

Or, to configure direct access (the default) from the CLI, ensure that the following parameter is set:

# isi supportassist settings modify --connection-mode direct
# isi supportassist settings view | grep -i "connection mode"
        Connection mode: direct
  • Gateway access

Alternatively, to connect through a gateway, select the Connect via Secure Connect Gateway button:

Complete the Gateway host and Gateway port fields as appropriate for the environment.

Alternatively, to set up a gateway configuration from the CLI, use the isi supportassist settings modify syntax. For example, to use the gateway FQDN secure-connect-gateway.yourdomain.com and the default port 9443:

# isi supportassist settings modify --connection-mode gateway
# isi supportassist settings view | grep -i "connection mode"
        Connection mode: gateway
# isi supportassist settings modify --gateway-host secure-connect-gateway.yourdomain.com --gateway-port 9443

When setting up the gateway connectivity option, Secure Connect Gateway v5.0 or later must be deployed within the data center. SupportAssist is incompatible with either ESRS gateway v3.52 or SAE gateway v4. However, Secure Connect Gateway v5.x is backward compatible with PowerScale OneFS ESRS, which allows the gateway to be provisioned and configured ahead of a cluster upgrade to OneFS 9.5.

d. Configure support options.

Finally, configure the support options:



When you have completed the configuration, the WebUI will confirm that SmartConnect is successfully configured and enabled, as follows:

 
Or from the CLI:

# isi supportassist settings view
        Service enabled: Yes
       Connection State: enabled
      OneFS Software ID: ELMISL0223BJJC
          Network Pools: subnet0.pool0, subnet0.testpool1, subnet0.testpool2, subnet0.testpool3, subnet0.testpool4
        Connection mode: gateway
           Gateway host: eng-sea-scgv5stg3.west.isilon.com
           Gateway port: 9443
    Backup Gateway host: eng-sea-scgv5stg.west.isilon.com
    Backup Gateway port: 9443
  Enable Remote Support: Yes
Automatic Case Creation: Yes
       Download enabled: Yes

 

 

Read Full Blog
  • PowerScale
  • OneFS

OneFS SupportAssist Provisioning – Part 1

Nick Trimbee Nick Trimbee

Thu, 13 Apr 2023 20:20:31 -0000

|

Read Time: 0 minutes

In OneFS 9.5, several OneFS components now leverage SupportAssist as their secure off-cluster data retrieval and communication channel. These components include:

ComponentDetails

Events and Alerts

SupportAssist can send CELOG events and attachments through Embedded Service Enabler (ESE) to CLM.

Diagnostics

Logfile gathers can be uploaded to Dell through SupportAssist.

License activation

License activation uses SupportAssist for the isi license activation start CLI command.

Telemetry

Telemetry is sent through SupportAssist to CloudIQ for analytics.

Health check

Health check definition downloads now leverage SupportAssist.

Remote Support

Remote Support now uses SupportAssist along with Connectivity Hub.

For existing clusters, SupportAssist supports the same basic workflows as its predecessor, ESRS, so the transition from old to new is generally pretty seamless.

The overall process for enabling OneFS SupportAssist is as follows:

  1. Upgrade the cluster to OneFS 9.5.
  2. Obtain the secure access key and PIN.
  3. Select either direct connectivity or gateway connectivity.
  4. If using gateway connectivity, install Secure Connect Gateway v5.x.
  5. Provision SupportAssist on the cluster.

 We’ll go through each of these configuration steps in order:

1.  Upgrading to OneFS 9.5

First, the cluster must be running OneFS 9.5 to configure SupportAssist.

There are some additional considerations and caveats to bear in mind when upgrading to OneFS 9.5 and planning on enabling SupportAssist. These include:

  • SupportAssist is disabled when STIG hardening is applied to the cluster.
  • Using SupportAssist on a hardened cluster is not supported.
  • Clusters with the OneFS network firewall enabled (”isi network firewall settings”) might need to allow outbound traffic on ports 443 and 8443, plus 9443 if gateway (SCG) connectivity is configured.
  • SupportAssist is supported on a cluster that’s running in Compliance mode.
  • If you are upgrading from an earlier release, the OneFS 9.5 upgrade must be committed before SupportAssist can be provisioned.

Also, ensure that the user account that will be used to enable SupportAssist belongs to a role with the ISI_PRIV_REMOTE_SUPPORT read and write privilege:

# isi auth privileges | grep REMOTE
ISI_PRIV_REMOTE_SUPPORT                           
  Configure remote support

 For example, for an ese user account:

# isi auth roles view SupportAssistRole
       Name: SupportAssistRole
Description: -
    Members: ese
 Privileges
             ID: ISI_PRIV_LOGIN_PAPI
     Permission: r
             ID: ISI_PRIV_REMOTE_SUPPORT
      Permission: w

2.  Obtaining secure access key and PIN

An access key and pin are required to provision SupportAssist, and these secure keys are held in key manager under the RICE domain. This access key and pin can be obtained from the following Dell Support site: https://www.dell.com/support/connectivity/product/isilon-onefs.

In the Quick link navigation bar, select the Generate Access key link:

 On the following page, select the appropriate button:

The credentials required to obtain an access key and pin vary, depending on prior cluster configuration. Sites that have previously provisioned ESRS will need their OneFS Software ID (SWID) to obtain their access key and pin.

The isi license list CLI command can be used to determine a cluster’s SWID. For example:

# isi license list | grep "OneFS Software ID"
OneFS Software ID: ELMISL999CKKD

However, customers with new clusters and/or customers who have not previously provisioned ESRS or SupportAssist will require their Site ID to obtain the access key and pin.

Note that any new cluster hardware shipping after January 2023 will already have an integrated key, so this key can be used in place of the Site ID.

For example, if this is the first time registering this cluster and it does not have an integrated key, select Yes, let’s register:


 Enter the Site ID, site name, and location information for the cluster:

Choose a 4-digit PIN and save it for future reference. After that, click Create My Access Key:

The access key is then generated.
 

An automated email containing the pertinent key info is sent from the Dell | ServicesConnectivity Team. For example:

This access key is valid for one week, after which it automatically expires.

Next, in the cluster’s WebUI, go back to Cluster management > General settings > SupportAssist and enter the access key and PIN information in the appropriate fields. Finally, click Finish Setup to complete the SupportAssist provisioning process:



3.  Deciding between direct or gateway topology 


A topology decision will need to be made between implementing either direct connectivity or gateway connectivity, depending on the needs of the environment:

  • Direct connect:



  • Gateway connect:


SupportAssist uses ports 443 and 8443 by default for bi-directional communication between the cluster and Connectivity Hub. These ports will need to be open across any firewalls or packet filters between the cluster and the corporate network edge to allow connectivity to Dell Support.

Additionally, port 9443 is used for communicating with a gateway (SCG).

# grep -i esrs /etc/services
isi_esrs_d      9443/tcp   #EMC Secure Remote Support outbound alerts

4.  Installing Secure Connect Gateway (optional) 

This step is only required when deploying Dell Secure Connect Gateway (SCG). If a direct connect topology is preferred, go directly to step 5.

When configuring SupportAssist with the gateway connectivity option, Secure Connect Gateway v5.0 or later must be deployed within the data center.

Dell SCG is available for Linux, Windows, Hyper-V, and VMware environments, and, as of this writing, the latest version is 5.14.00.16. The installation binaries can be downloaded from https://www.dell.com/support/home/en-us/product-support/product/secure-connect-gateway/drivers.

Download SCG as follows:

  1. Sign in to www.dell.com/SCG-App. The Secure Connect Gateway - Application Edition page is displayed. If you have issues signing in using your business account or if you are unable to access the page even after signing in, contact Dell Administrative Support.
  2. In the Quick links section, click Generate Access key.
  3. On the Generate Access Keypage, perform the following steps:
    1. Select a site ID, site name, or site location.
    2. Enter a four-digit PIN and click Generate key. An access key is generated and sent to your email address. NOTE: The access key and PIN must be used within seven days and cannot be used to register multiple instances of SCG.
    3. Click Done.
  4. On the Secure Connect Gateway – Application Edition page, click the Drivers & Downloads tab.
  5. Search and select the required version.
  6. In the ACTION column, click Download.

The following steps are required to set up SCG:

https://dl.dell.com/content/docu105633_secure-connect-gateway-application-edition-quick-setup-guide.pdf?language=en-us


 Pertinent resources for installing SCG include:


Another useful source of SCG installation, configuration, and troubleshooting information is the Dell Support forum: https://www.dell.com/community/Secure-Connect-Gateway/bd-p/SCG

5.  Provisioning SupportAssist on the cluster

 At this point, the off-cluster prestaging work should be complete.

In the next article in this series, we turn our attention to the SupportAssist provisioning process on the cluster itself (step 5).

 

 

Read Full Blog
  • PowerScale
  • OneFS

Dell PowerScale OneFS Introduction for NetApp Admins

Aqib Kazi Aqib Kazi

Tue, 04 Apr 2023 17:15:00 -0000

|

Read Time: 0 minutes

For enterprises to harness the advantages of advanced storage technologies with Dell PowerScale, a transition from an existing platform is necessary. Enterprises are challenged by how the new architecture will fit into the existing infrastructure. This blog post provides an overview of PowerScale architecture, features, and nomenclature for enterprises migrating from NetApp ONTAP.

PowerScale overview

The PowerScale OneFS operating system is based on a distributed architecture, built from the ground up as a clustered system. Each PowerScale node provides compute, memory, networking, and storage. The concepts of controllers, HA, active/standby, and disk shelves are not applicable in a pure scale-out architecture. Thus, when a node is added to a cluster, the cluster performance and capacity increase collectively.

Due to the scale-out distributed architecture with a single namespace, single volume, single file system, and one single pane of management, the system management is far simpler than with traditional NAS platforms. In addition, the data protection is software-based rather than RAID-based, eliminating all the associated complexities, including configuration, maintenance, and additional storage utilization. Administrators do not have to be concerned with RAID groups or load distribution.

NetApp’s ONTAP storage operating system has evolved into a clustered system with controllers. The system includes ONTAP FlexGroups composed of aggregates and FlexVols across nodes.

OneFS is a single volume, which makes cluster management simple. As the cluster grows in capacity, the single volume automatically grows. Administrators are no longer required to migrate data between volumes manually. OneFS repopulates and balances data between all nodes when a new node is added, making the node part of the global namespace. All the nodes in a PowerScale cluster are equal in the hierarchy. Drives share data intranode and internode.

PowerScale is easy to deploy, operate, and manage. Most enterprises require only one full-time employee to manage a PowerScale cluster.

For more information about the PowerScale OneFS architecture, see PowerScale OneFS Technical Overview and Dell PowerScale OneFS Operating System.

DiagramDescription automatically generated

Figure 1. Dell PowerScale scale-out NAS architecture

OneFS and NetApp software features

The single volume and single namespace of PowerScale OneFS also lead to a unique feature set. Because the entire NAS is a single file system, the concepts of FlexVols, shares, qtrees, and FlexGroups do not apply. Each NetApp volume has specific properties associated with limited storage space. Adding more storage space to NetApp ONTAP could be an onerous process depending on the current architecture. Conversely, on a PowerScale cluster, as soon as a node is added, the cluster is rebalanced automatically, leading to minimal administrator management. 

NetApp’s continued dependence on volumes creates potential added complexity for storage administrators. From a software perspective, the intricacies that arise from the concept of volumes span across all the features. Configuring software features requires administrators to base decisions on the volume concept, limiting configuration options. The volume concept is further magnified by the impacts on storage utilization. 

The fact that OneFS is a single volume means that many features are not volume dependent but, rather, span the entire cluster. SnapshotIQ, NDMP backups, and SmartQuotas do not have limits based on volumes; instead, they are cluster-specific or directory-specific.

As a single-volume NAS designed for file storage, OneFS has the scalable capacity with ease of management combined with features that administrators require. Robust policy-driven features such as SmartConnect, SmartPools, and CloudPools enable maximum utilization of nodes for superior performance and storage efficiency for maximum value. You can use SmartConnect to configure access zones that are mapped to specific node performances. SmartPools can tier cold data to nodes with deep archive storage, and CloudPools can store frozen data in the cloud. Regardless of where the data is residing, it is presented as a single namespace to the end user.

Storage utilization and data protection

Storage utilization is the amount of storage available after the NAS system overhead is deducted. The overhead consists of the space required for data protection and the operating system.

For data protection, OneFS uses software-based Reed-Solomon Error Correction with up to N+4 protection. OneFS offers several custom protection options that cover node and drive failures. The custom protection options vary according to the cluster configuration. OneFS provides data protection against more simultaneous hardware failures and is software-based, providing a significantly higher storage utilization. 

The software-based data protection stripes data across nodes in stripe units, and some of the stripe units are Forward Error Correction (FEC) or parity units. The FEC units provide a variable to reformulate the data in the case of a drive or node failure. Data protection is customizable to be for node loss or hybrid protection of node and drive failure.

With software-based data protection, the protection scheme is not per cluster. It has additional granularity that allows for making data protection specific to a file or directory—without creating additional storage volumes or manually migrating data. Instead, OneFS runs a job in the background, moving data as configured.

Figure 2. OneFS data protection

OneFS protects data stored on failing nodes, or drives in a cluster through a process called SmartFail. During the process, OneFS places a device into quarantine and, depending on the severity of the issue, places the data on the device into a read-only state. While a device is quarantined, OneFS reprotects the data on the device by distributing the data to other devices. 

NetApp’s data protection is all RAID-based, including NetApp RAID-TEC, NetApp RAID-DP, and RAID 4. NetApp only supports a maximum of triple parity, and simultaneous node failures in an HA pair are not supported. 

For more information about SmartFail, see the following blog: OneFS Smartfail. For more information about OneFS data protection, see High Availability and Data Protection with Dell PowerScale Scale-Out NAS.

NetApp FlexVols, shares, and Qtrees

NetApp requires administrators to manually create space and explicitly define aggregates and flexible volumes. The concept of FlexVols, shares, and Qtrees are nonexistent in OneFS, as the file system is a single volume and namespace, spanning the entire cluster. 

SMB shares and NFS exports are created through the web or command-line interface in OneFS. Both methods allow the user to create either within seconds with security options. SmartQuotas is used to manage storage limits, cluster-wide, across the entire namespace. They include accounting, warning messages, or hard limits of enforcement. The limits can be applied by directory, user, or group. 

Conversely, ONTAP quota management is at the volume or FlexGroup level, creating additional administrative overhead because the process is more onerous.

Snapshots

The OneFS snapshot feature is SnapshotIQ, which does not have specified or enforced limits for snapshots per directory or snapshots per cluster. However, the best practice is 1,024 snapshots per directory and 20,000 snapshots per cluster. OneFS also supports writable snapshots. For more information about SnapshotIQ and writable snapshots, see High Availability and Data Protection with Dell PowerScale Scale-Out NAS.

NetApp Snapshot supports 255 snapshots per volume in ONTAP 9.3 and earlier. ONTAP 9.4 and later versions support 1,023 snapshots per volume. By default, NetApp requires a space reservation of 5 percent in the volume when snapshots are used, requiring the space reservation to be monitored and manually increased if space becomes exhausted. Further, the space reservation can also affect volume availability. The space reservation requirement creates additional administration overhead and affects storage efficiency by setting aside space that might or might not be used.

Data replication

Data replication is required for disaster recovery, RPO, or RTO requirements. OneFS provides data replication through SyncIQ and SmartSync. 

SyncIQ provides asynchronous data replication, whereas NetApp’s asynchronous replication, which is called SnapMirror, is block-based replication. SyncIQ provides options for ensuring that all data is retained during failover and failback from the disaster recovery cluster. SyncIQ is fully configurable with options for execution times and bandwidth management. A SyncIQ target cluster may be configured as a target for several source clusters. 

SyncIQ offers a single-button automated process for failover and failback with Superna Eyeglass DR Edition. For more information about Superna Eyeglass DR Edition, see Superna | DR Edition (supernaeyeglass.com).

SyncIQ allows configurable options for replication down to a specific file, directory, or entire cluster. Conversely, NetApp’s SnapMirror replication starts at the volume at a minimum. The volume concept and dependence on volume requirements continue to add management complexity and overhead for administrators while also wasting storage utilization.

To address the requirements of the modern enterprise, OneFS version 9.4.0.0 introduced SmartSync. This feature replicates file-to-file data between PowerScale clusters. SmartSync cloud copy replicates file-to-object data from PowerScale clusters to Dell ECS and cloud providers. Having multiple target destinations allows administrators to store multiple copies of a dataset across locations, providing further disaster recovery readiness. SmartSync cloud copy replicates file-to-object data from PowerScale clusters to Dell ECS and cloud providers. SmartSync cloud copy also pulls the replicated object data from a cloud provider back to a PowerScale cluster in file. For more information about SyncIQ, see Dell PowerScale SyncIQ: Architecture, Configuration, and Considerations. For more information about SmartSync, see Dell PowerScale SmartSync.

Quotas

OneFS SmartQuotas provides configurable options to monitor and enforce storage limits at the user, group, cluster, directory, or subdirectory level. ONTAP quotas are user-, tree-, volume-, or group-based.

For more information about SmartQuotas, see Storage Quota Management and Provisioning with Dell PowerScale SmartQuotas.

Load balancing and multitenancy

Because OneFS is a distributed architecture across a collection of nodes, client connectivity to these nodes requires load balancing. OneFS SmartConnect provides options for balancing the client connections to the nodes within a cluster. Balancing options are round-robin or based on current load. Also, SmartConnect zones can be configured to have clients connect based on group and performance needs. For example, the Engineering group might require high-performance nodes. A zone can be configured, forcing connections to those nodes.

NetApp ONTAP supports multitenancy with Storage Virtual Machines (SVMs), formerly vServers and Logical Interfaces (LIFs). SVMs isolate storage and network resources across a cluster of controller HA pairs. SVMs require managing protocols, shares, and volumes for successful provisioning. Volumes cannot be nondisruptively moved between SVMs. ONTAP supports load balancing using LIFs, but configuration is manual and must be implemented by the storage administrator. Further, it requires continuous monitoring because it is based on the load on the controller. 

OneFS provides multitenancy through SmartConnect and access zones. Management is simple because the file system is one volume and access is provided by hostname and directory, rather than by volume. SmartConnect is policy-driven and does not require continuous monitoring. SmartConnect settings may be changed on demand as the requirements change.

SmartConnect zones allow administrators to provision DNS hostnames specific to IP pools, subnets, and network interfaces. If only a single authentication provider is required, all the SmartConnect zones map to a default access zone. However, if directory access and authentication providers vary, multiple access zones are provisioned, mapping to a directory, authentication provider, and SmartConnect zone. As a result, authenticated users of an access zone only have visibility into their respective directory. Conversely, an administrator with complete file system access can migrate data nondisruptively between directories.

For more information about SmartConnect, see PowerScale: Network Design Considerations.

Compression and deduplication

Both ONTAP and OneFS provide compression. The OneFS deduplication feature is SmartDedupe, which allows deduplication to run at a cluster-wide level, improving overall Data Reduction Rate (DRR) and storage utilization. With ONTAP, the deduplication is enabled at the aggregate level, and it cannot cross over nodes. 

For more information about OneFS data reduction, see Dell PowerScale OneFS: Data Reduction and Storage Efficiency. For more information about SmartDedupe, see Next-Generation Storage Efficiency with Dell PowerScale SmartDedupe.

Data tiering

OneFS has integrated features to tier data based on the data’s age or file type. NetApp has similar functionality with FabricPools.

OneFS SmartPools uses robust policies to enable data placement and movement across multiple types of storage. SmartPools can be configured to move data to a set of nodes automatically. For example, if a file has not been accessed in the last 90 days, in can be migrated to a node with deeper storage, allowing admins to define the value of storage based on performance. 

OneFS CloudPools migrates data to a cloud provider, with only a stub remaining on the PowerScale cluster, based on similar policies. CloudPools not only tiers data to a cloud provider but also recalls the data back to the cluster as demanded. From a user perspective, all the data is still in a single namespace, irrespective of where it resides.

Figure 3. OneFS SmartPools and CloudPools

ONTAP tiers to S3 object stores using FabricPools.

For more information about SmartPools, see Storage Tiering with Dell PowerScale SmartPools. For more information about CloudPools, see:

Monitoring

Dell InsightIQ and Dell CloudIQ provide performance monitoring and reporting capabilities. InsightIQ includes advanced analytics to optimize applications, correlate cluster events, and accurately forecast future storage needs. NetApp provides performance monitoring and reporting with Cloud Insights and Active IQ, which are accessible within BlueXP.  

For more information about CloudIQ, see CloudIQ: A Detailed Review. For more information about InsightIQ, see InsightIQ on Dell Support.

Security

Similar to ONTAP, the PowerScale OneFS operating system comes with a comprehensive set of integrated security features. These features include data at rest and data in flight encryption, virus scanning tool, WORM SmartLock compliance, external key manager for data at rest encryption, STIG-hardened security profile, Common Criteria certification, and support for UEFI Secure Boot across PowerScale platforms. Further, OneFS may be configured for a Zero Trust architecture and PCI-DSS. 

Superna security 

Superna exclusively provides the following security-focused applications for PowerScale OneFS: 

  • Ransomware Defender: Provides real-time event processing through user behavior analytics. The events are used to detect and stop a ransomware attack before it occurs.
  • Easy Auditor: Offers a flat-rate license model and ease-of-use features that simplify auditing and securing PBs of data.
  • Performance Auditor: Provides real-time file I/O view of PowerScale nodes to simplify root cause of performance impacts, assessing changes needed to optimize performance and debugging user, network, and application performance.
  • Airgap: Deployed in two configurations depending on the scale of clusters and security features:
  • Basic Airgap Configuration that deploys the Ransomware Defender agent on one of the primary clusters being protected.
  • Enterprise Airgap Configuration that deploys the Ransomware Defender agent on the cyber vault cluster. This solution comes with greater scalability and additional security features.

Figure 4. Superna security

NetApp ONTAP security is limited to the integrated features listed above. Additional applications for further security monitoring, like Superna, are not available for ONTAP.

For more information about Superna security, see supernaeyeglass.com. For more information about PowerScale security, see Dell PowerScale OneFS: Security Considerations.

Authentication and access control

NetApp and PowerScale OneFS both support several methods for user authentication and access control. OneFS supports UNIX and Windows permissions for data-level access control. OneFS is designed for a mixed environment that allows the configuration of both Windows Access Control Lists (ACLs) and standard UNIX permissions on the cluster file system. In addition, OneFS provides user and identity mapping, permission mapping, and merging between Windows and UNIX environments.

OneFS supports local and remote authentication providers. Anonymous access is supported for protocols that allow it. Concurrent use of multiple authentication provider types, including Active Directory, LDAP, and NIS, is supported. For example, OneFS is often configured to authenticate Windows clients with Active Directory and to authenticate UNIX clients with LDAP.

Role-based access control

OneFS supports role-based access control (RBAC), allowing administrative tasks to be configured without a root or administrator account. A role is a collection of OneFS privileges that are limited to an area of administration. Custom roles for security, auditing, storage, or backup tasks may be provisioned with RBACs. Privileges are assigned to roles. As users log in to the cluster through the platform API, the OneFS command-line interface, or the OneFS web administration interface, they are granted privileges based on their role membership.

For more information about OneFS authentication and access control, see PowerScale OneFS Authentication, Identity Management, and Authorization.

Learn more about PowerScale OneFS

To learn more about PowerScale OneFS, see the following resources:

 

Read Full Blog
  • OneFS
  • monitoring
  • troubleshooting
  • SmartQoS

OneFS SmartQoS Monitoring and Troubleshooting

Nick Trimbee Nick Trimbee

Tue, 21 Mar 2023 18:30:54 -0000

|

Read Time: 0 minutes

The previous articles in this series have covered the SmartQoS architecture, configuration, and management. Now, we’ll turn our attention to monitoring and troubleshooting.

You can use the ‘isi statistics workload’ CLI command to monitor the dataset’s performance. The ‘Ops’ column displays the current protocol operations per second. In the following example, Ops stabilize around 9.8, which is just below the configured limit value of 10 Ops.

# isi statistics workload --dataset ds1 & data

 

Similarly, this next example from the SmartQoS WebUI shows a small NFS workflow performing 497 protocol Ops in a pinned workload with a limit of 500 Ops:

You can pin multiple paths and protocols by selecting the ‘Pin Workload’ option for a given Dataset. Here, four directory path workloads are each configured with different Protocol Ops limits:

When it comes to troubleshooting SmartQoS, there are a few areas that are worth checking right away, including the SmartQoS Ops limit configuration, isi_pp_d and isi_stats_d daemons, and the protocol service(s).

  1. For suspected Ops limit configuration issues, first confirm that the SmartQoS limits feature is enabled:

# isi performance settings view
Top N Collections: 1024
Time In Queue Threshold (ms): 10.0
Target read latency in microseconds: 12000.0
Target write latency in microseconds: 12000.0
Protocol Ops Limit Enabled: Yes

Next, verify that the workload level protocols_ops limit is correctly configured:

# isi performance workloads view <workload>

Check whether any errors are reported in the isi_tardis_d configuration log:

# cat /var/log/isi_tardis_d.log

  2. To investigate isi_pp_d, first check that the service is enabled:

# isi services –a isi_pp_d
Service 'isi_pp_d' is enabled.

If necessary, you can restart the isi_pp_d service as follows:

# isi services isi_pp_d disable
Service 'isi_pp_d' is disabled.
# isi services isi_pp_d enable
Service 'isi_pp_d' is enabled.

There’s also an isi_pp_d debug tool, which can be helpful in a pinch:

# isi_pp_d -h
Usage: isi_pp_d [-ldhs]
-l Run as a leader process; otherwise, run as a follower. Only one leader process on the cluster will be active.
-d Run in debug mode (do not daemonize).
-s Display pp_leader node (devid and lnn)
-h Display this help.

You can enable debugging on the isi_pp_d log file with the following command syntax:

# isi_ilog -a isi_pp_d -l debug, /var/log/isi_pp_d.log

For example, the following log snippet shows a typical isi_ppd_d.log message communication between isi_pp_d leader and isi_pp_d followers:

/ifs/.ifsvar/modules/pp/comm/SETTINGS
[090500b000000b80,08020000:0000bfddffffffff,09000100:ffbcff7cbb9779de,09000100:d8d2fee9ff9e3bfe,090001 00:0000000075f0dfdf]      
100,,,,20,1658854839  < in the format of <workload_id, cputime, disk_reads, disk_writes, protocol_ops, timestamp>

Here, the extract from the /var/log/isi_pp_d.log logfiles from nodes 1 and 2 of a cluster illustrate the different stages of Protocol Ops limit enforcement and usage:

  3. To investigate the isi_stats_d, first confirm that the isi_pp_d service is enabled:

# isi services -a isi_stats_d
Service 'isi_stats_d' is enabled.

If necessary, you can restart the isi_stats_d service as follows:

# isi services isi_stats_d disable
# isi services isi_stats_d enable

You can view the workload level statistics with the following command:

# isi statistics workload list --dataset=<name>

You can enable debugging on the isi_stats_d log file with the following command syntax:

# isi_stats_tool --action set_tracelevel --value debug
# cat /var/log/isi_stats_d.log

  4. To investigate protocol issues, the ‘isi services’ and ‘lwsm’ CLI commands can be useful. For example, to check the status of the S3 protocol:

# /usr/likewise/bin/lwsm list | grep -i protocol
hdfs                       [protocol]    stopped
lwswift                    [protocol]    running (lwswift: 8393)
nfs                        [protocol]    running (nfs: 8396)
s3                         [protocol]    stopped
srv                        [protocol]    running (lwio: 8096)
# /usr/likewise/bin/lwsm status s3
stopped
# /usr/likewise/bin/lwsm info s3
Service: s3
Description: S3 Server
Categories: protocol
Path: /usr/likewise/lib/lw-svcm/s3.so
Arguments:
Dependencies: lsass onefs_s3 AuditEnabled?flt_audit_s3
Container: s3

This CLI output confirms that the S3 protocol is inactive. You can start the S3 service as follows:

# isi services -a | grep -i s3
s3                   S3
Service                               Enabled

Similarly, you can restart the S3 service as follows:

# /usr/likewise/bin/lwsm restart s3
Stopping service: s3
Starting service: s3

To investigate further, you can increase the protocol’s log level verbosity. For example, to set the s3 log to ‘debug’:

# isi s3 log-level view
Current logging level is 'info'
# isi s3 log-level modify debug
# isi s3 log-level view
Current logging level is 'debug'

Next, view and monitor the appropriate protocol log. For example, for the S3 protocol:

# cat /var/log/s3.log
# tail -f /var/log/s3.log

Beyond the above, you can monitor /var/log/messages for pertinent errors, because the main partition performance (PP) modules log to this file. You can enable debug level logging for the various PP modules as follows.

Dataset:

# sysctl ilog.ifs.acct.raa.syslog=debug+
ilog.ifs.acct.raa.syslog: error,warning,notice (inherited) -> error,warning,notice,info,debug

Workload:

# sysctl ilog.ifs.acct.rat.syslog=debug+
ilog.ifs.acct.rat.syslog: error,warning,notice (inherited) -> error,warning,notice,info,debug

Actor work:

# sysctl ilog.ifs.acct.work.syslog=debug+
ilog.ifs.acct.work.syslog: error,warning,notice (inherited) -> error,warning,notice,info,debug

When finished, you can restore the default logging levels for the above modules as follows:

# sysctl ilog.ifs.acct.raa.syslog=notice+
# sysctl ilog.ifs.acct.rat.syslog=notice+
# sysctl ilog.ifs.acct.work.syslog=notice+

Author: Nick Trimbee

Read Full Blog
  • PowerScale
  • OneFS
  • NAS
  • clusters
  • SmartQoS

OneFS SmartQoS Configuration and Setup

Nick Trimbee Nick Trimbee

Tue, 14 Mar 2023 16:06:06 -0000

|

Read Time: 0 minutes

In the previous article in this series, we looked at the underlying architecture and management of SmartQoS in OneFS 9.5. Next, we’ll step through an example SmartQoS configuration using the CLI and WebUI.

After an initial set up, configuring a SmartQoS protocol Ops limit comprises four fundamental steps. These are:

 

Step

Task

Description

Example

1

Identify Metrics of interest

Used for tracking, to enforce an Ops limit

Uses ‘path’ and ‘protocol’ for the metrics to identify the workload.

2

Create a Dataset

For tracking all of the chosen metric categories

Create the dataset ‘ds1’ with the metrics identified.

3

Pin a Workload

To specify exactly which values to track within the chosen metrics

path: /ifs/data/client_exports
 

protocol: nfs3

4

Set a Limit

To limit Ops based on the dataset, metrics (categories), and metric values defined by the workload

Protocol_ops limit: 100

Step 1:

First, select a metric of interest. For this example, we’ll use the following:

  • Protocol: NFSv3
  • Path: /ifs/test/expt_nfs

If not already present, create and verify an NFS export – in this case at /ifs/test/expt_nfs:

# isi nfs exports create /ifs/test/expt_nfs
# isi nfs exports list
ID Zone Paths Description
------------------------------------------------
1 System /ifs/test/expt_nfs
------------------------------------------------

Or from the WebUI, under Protocols UNIX sharing (NFS) > NFS exports:

Step 2:

The ‘dataset’ designation is used to categorize workload by various identification metrics, including:

ID Metric

Details

Username

UID or SID

Primary groupname

Primary GID or GSID

Secondary groupname

Secondary GID or GSID

Zone name

 

IP address

Local or remote IP address or IP address range

Path

Except for S3 protocol

Share

SMB share or NFS export ID

Protocol

NFSv3, NFSv4, NFSoRDMA, SMB, or S3

SmartQoS in OneFS 9.5 only allows protocol Ops as the transient resources used for configuring a limit ceiling.

For example, you can use the following CLI command to create a dataset ‘ds1’, specifying protocol and path as the ID metrics:

# isi performance datasets create --name ds1 protocol path
Created new performance dataset 'ds1' with ID number 1.

Note: Resource usage tracking by the ‘path’ metric is only supported by SMB and NFS.

The following command displays any configured datasets:

# isi performance datasets list

Or, from the WebUI, by navigating to Cluster management > Smart QoS:

Step 3:

After you have created the dataset, you can pin a workload to it by specifying the metric values. For example:

# isi performance workloads pin ds1 protocol:nfs3 path: /ifs/test/expt_nfs

Pinned performance dataset workload with ID number 100.

Or from the WebUI, by browsing to Cluster management > Smart QoS > Pin workload:

After pinning a workload, the entry appears in the ‘Top Workloads’ section of the WebUI page. However, wait at least 30 seconds to start receiving updates.

To list all the pinned workloads from a specified dataset, use the following command:

# isi performance workloads list ds1

The prior command’s output indicates that there are currently no limits set for this workload.

By default, a protocol ops limit exists for each workload. However, it is set to the maximum (the maximum value of a 64-bit unsigned integer). This is represented in the CLI output by a dash (“-“) if a limit has not been explicitly configured:

# isi performance workloads list ds1
ID   Name  Metric Values           Creation Time       Cluster Resource Impact  Client Impact   Limits
--------------------------------------------------------------------------------------
100  -     path:/ifs/test/expt_nfs 2023-02-02T12:06:05  -          -              -
           protocol:nfs3
--------------------------------------------------------------------------------------
Total: 1

Step 4:

For a pinned workload in a dataset, you can configure a limit for the protocol ops limit from the CLI, using the following syntax:

# isi performance workloads modify <dataset> <workload ID> --limits protocol_ops:<value>

When configuring SmartQoS, always be aware that it is a powerful performance throttling tool which can be applied to significant areas of a cluster’s data and userbase. For example, protocol Ops limits can be configured for metrics such as ‘path:/ifs’, which would affect the entire /ifs filesystem, or ‘zone_name:System’ which would limit the System access zone and all users within it. While such configurations are entirely valid, they would have a significant, system-wide impact. As such, exercise caution when configuring SmartQoS to avoid any inadvertent, unintended, or unexpected performance constraints.

In the following example, the dataset is ‘ds1’, the workload ID is ‘100’, and the protocol Ops limit is set to the value ‘10’:

# isi performance workloads modify ds1 100 --limits protocol_ops:10
protocol_ops: 18446744073709551615 -> 10

Or from the WebUI, by browsing to Cluster management > Smart QoS > Pin and throttle workload:

You can use the ‘isi performance workloads’ command in ‘list’ mode to show details of the workload ‘ds1’. In this case, ‘Limits’ is set to protocol_ops = 10.

# isi performance workloads list test
ID   Name  Metric Values           Creation Time       Cluster Resource Impact  Client Impact   Limits
--------------------------------------------------------------------------------------
100  -     path:/ifs/test/expt_nfs 2023-02-02T12:06:05  -   -  protocol_ops:10
           protocol:nfs3
--------------------------------------------------------------------------------------
Total: 1

Or in ‘view’ mode:

# isi performance workloads view ds1 100
                     ID: 100
                   Name: -
          Metric Values: path:/ifs/test/expt_nfs, protocol:nfs3
          Creation Time: 2023-02-02T12:06:05
Cluster Resource Impact: -
          Client Impact: -
                 Limits: protocol_ops:10

Or from the WebUI, by browsing to Cluster management > Smart QoS:

You can easily modify the limit value of a pinned workload with the following CLI syntax. For example, to set the limit to 100 Ops:

# isi performance workloads modify ds1 100 --limits protocol_ops:100

Or from the WebUI, by browsing to Cluster management > Smart QoS > Edit throttle:

Similarly, you can use the following CLI command to easily remove a protocol ops limit for a pinned workload:

# isi performance workloads modify ds1 100 --no-protocol-ops-limit

Or from the WebUI, by browsing to Cluster management > Smart QoS > Remove throttle:

Author: Nick Trimbee

Read Full Blog
  • PowerScale
  • OneFS

OneFS SupportAssist

Nick Trimbee Nick Trimbee

Mon, 13 Mar 2023 23:31:33 -0000

|

Read Time: 0 minutes

Among the myriad of new features included in the OneFS 9.5 release is SupportAssist, Dell’s next-gen remote connectivity system. SupportAssist is included with all support plans (features vary based on service level agreement).

Dell SupportAssist rapidly identifies, diagnoses, and resolves cluster issues and provides the following key benefits:

  • Improves productivity by replacing manual routines with automated support
  • Accelerates resolution, or avoid issues completely, with predictive issue detection and proactive remediation

Within OneFS, SupportAssist transmits events, logs, and telemetry from PowerScale to Dell support. As such, it provides a full replacement for the legacy ESRS.

Delivering a consistent remote support experience across the Dell storage portfolio, SupportAssist is intended for all sites that can send telemetry off-cluster to Dell over the Internet. SupportAssist integrates the Dell Embedded Service Enabler (ESE) into PowerScale OneFS along with a suite of daemons to allow its use on a distributed system.

SupportAssistESRS

Dell’s next-generation remote connectivity solution

Being phased out of service

Can either connect directly, or through supporting gateways

Can only use gateways for remote connectivity

Uses Connectivity Hub to coordinate support

Uses ServiceLink to coordinate support

 Using the Dell Connectivity Hub, SupportAssist can either interact directly or through a Secure Connect gateway. 

SupportAssist has a variety of components that gather and transmit various pieces of OneFS data and telemetry to Dell Support and backend services through the Embedded Service Enabler (ESE). These workflows include CELOG events; In-product activation (IPA) information; CloudIQ telemetry data; Isi-Gather-info (IGI) logsets; and provisioning, configuration, and authentication data to ESE and the various backend services.

WorkflowDetails

CELOG

In OneFS 9.5, SupportAssist can be configured to send CELOG events and attachments through ESE to CLM. CELOG has a “supportassist” channel that, when active, creates an EVENT task for SupportAssist to propagate. 

License Activation

The isi license activation start command uses SupportAssist to connect.

 

Several pieces of PowerScale and OneFS functionality require licenses, and must communicate with the Dell backend services in order to register and activate those cluster licenses. In OneFS 9.5, SupportAssist is the preferred mechanism to send those license activations through ESE to the Dell backend. License information can be generated through the isi license generate CLI command and then activated with the isi license activation start syntax.  

Provisioning

SupportAssist must register with backend services in a process known as provisioning. This process must be run before the ESE will respond on any of its other available API endpoints. Provisioning can only successfully occur once per installation, and subsequent provisioning tasks will fail. SupportAssist must be configured through the CLI or WebUI before provisioning.  The provisioning process uses authentication information that was stored in the key manager upon the first boot.  

Diagnostics

The OneFS isi diagnostics gather and isi_gather_info logfile collation and transmission commands have a --supportassist option. 

Healthchecks

HealthCheck definitions are updated using SupportAssist.

Telemetry

CloudIQ telemetry data is sent using SupportAssist. 

Remote Support

Remote Support uses SupportAssist and the Connectivity Hub to assist customers with their clusters.

SupportAssist requires an access key and PIN, or hardware key, to be enabled, with most customers likely using the access key and pin method. Secure keys are held in key manager under the RICE domain.

In addition to the transmission of data from the cluster to Dell, Connectivity Hub also allows inbound remote support sessions to be established for remote cluster troubleshooting.

 In the next article in this series, we’ll take a deeper look at the SupportAssist architecture and operation.

 

 

Read Full Blog
  • PowerScale
  • OneFS
  • SmartQoS

OneFS SmartQoS Architecture and Management

Nick Trimbee Nick Trimbee

Wed, 01 Mar 2023 22:34:30 -0000

|

Read Time: 0 minutes

The SmartQoS Protocol Ops limits architecture, introduced in OneFS 9.5, involves three primary capabilities:

  • Resource tracking
  • Resource limit distribution
  • Throttling

Under the hood, the OneFS protocol heads (NFS, SMB, and S3) identify and track how many protocol operations are being processed through a specific export or share. The existing partitioned performance (PP) reporting infrastructure is leveraged for cluster wide resource usage collection, limit calculation and distribution, along with new OneFS 9.5 functionality to support pinned workload protocol Ops limits.

The protocol scheduling module (LwSched) has a built-in throttling capability that allows the execution of individual operations to be delayed by temporarily pausing them, or ‘sleeping’. Additionally, in OneFS 9.5, the partitioned performance kernel modules have also been enhanced to calculate ‘sleep time’ based on operation count resource information (requested, average usage, and so on) – both within the current throttling window, and for a specific workload.

We can characterize the fundamental SmartQoS workflow as follows:

  1. Configuration, using the CLI, pAPI, or WebUI.
  2. Statistics gatherer obtains Op/s data from the partitioned performance (PP) kernel.
  3. Stats gatherer communicates Op/s data to PP leader service.
  4. Leader queries config manager for per-cluster rate limit.
  5. Leader calculates per-node limit.
  6. PP follower service is notified of per-node Op/s limit.
  7. Kernel is informed of new per-node limit.
  8. Work is scheduled with rate-limited resource.
  9. Kernel returns sleep time, if needed.

When an admin configures a per-cluster protocol Ops limit, the statistics gathering service, isi_stats_d, begins collecting workload resource information every 30 seconds by default from the partitioned performance (PP) kernel on each node in the cluster and notifies the isi_pp_d leader service of this resource info. Next, the leader gets the per-cluster protocol Ops limit plus additional resource consumption metrics from the isi_acct_cpp service from isi_tardis_d, the OneFS cluster configuration service and calculates the protocol Ops limit of each node for the next throttling window. It then instructs the isi_pp_d follower service on each node to update the kernel with the newly calculated protocol Ops limit, plus a request to reset the throttling window.

When the kernel receives a scheduling request for a work item from the protocol scheduler (LwSched), the kernel calculates the required ‘sleep time’ value, based on the current node protocol Ops limit and resource usage in the current throttling window. If insufficient resources are available, the work item execution thread is put to sleep for a specific interval returned from the PP kernel. If resources are available, or the thread is reactivated from sleeping, it executes the work item and reports the resource usage statistics back to PP, releasing any scheduling resources it may own.

SmartQoS can be configured through either the CLI, platform API, or WebUI, and OneFS 9.5 introduces a new SmartQoS WebUI page to support this. Note that SmartQoS is only available when an upgrade to OneFS 9.5 has been committed, and any attempt to configure or run the feature prior to upgrade commit will fail with the following message:

# isi performance workloads modify DS1 -w WS1 --limits protocol_ops:50000
 Setting of protocol ops limits not available until upgrade has been committed

When a cluster is running OneFS 9.5 and the release is committed, the SmartQoS feature is enabled by default. This, and the current configuration, can be confirmed using the following CLI command:

 # isi performance settings view
                   Top N Collections: 1024
        Time In Queue Threshold (ms): 10.0
 Target read latency in microseconds: 12000.0
Target write latency in microseconds: 12000.0
          Protocol Ops Limit Enabled: Yes

In OneFS 9.5, the ‘isi performance settings modify’ CLI command now includes a ‘protocol-ops-limit-enabled’ parameter to allow the feature to be easily disabled (or re-enabled) across the cluster. For example:

# isi performance settings modify --protocol-ops-limit-enabled false
protocol_ops_limit_enabled: True -> False

Similarly, the ‘isi performance settings view’ CLI command has been extended to report the protocol OPs limit state:

# isi performance settings view *
Top N Collections: 1024
Protocol Ops Limit Enabled: Yes

In order to set a protocol OPs limit on workload from the CLI, the ‘isi performance workload pin’ and ‘isi performance workload modify’ commands now accept an optional ‘–limits’ parameter. For example, to create a pinned workload with the ‘protocol_ops’ limit set to 10000:

# isi performance workload pin test protocol:nfs3 --limits
protocol_ops:10000

Similarly, to modify an existing workload’s ‘protocol_ops’ limit to 20000:

# isi performance workload modify test 101 --limits protocol_ops:20000
protocol_ops: 10000 -> 20000

When configuring SmartQoS, always be aware that it is a powerful throttling tool that can be applied to significant areas of a cluster’s data and userbase. For example, protocol OPs limits can be configured for metrics such as ‘path:/ifs’, which would affect the entire /ifs filesystem, or ‘zone_name:System’ which would limit the System access zone and all users within it.

While such configurations are entirely valid, they would have a significant, system-wide impact. As such, exercise caution when configuring SmartQoS to avoid any inadvertent, unintended, or unexpected performance constraints.

To clear a protocol Ops limit on workload, the ‘isi performance workload modify’ CLI command has been extended to accept an optional ‘–noprotocol-ops-limit’ argument. For example:

# isi performance workload modify test 101 --no-protocol-ops-limit
protocol_ops: 20000 -> 18446744073709551615

Note that the value of ‘18446744073709551615’ in the command output above represents ‘NO_LIMIT’ set.

You can view a workload’s protocol Ops limit by using the ‘isi performance workload list’ and ‘isi performance workload view’ CLI commands, which have been modified in OneFS 9.5 to display the limits appropriately. For example:

# isi performance workload list test
ID Name Metric Values Creation Time Impact Limits
---------------------------------------------------------------------
101 - protocol:nfs3 2023-02-02T22:35:02 - protocol_ops:20000
---------------------------------------------------------------------
# isi performance workload view test 101
ID: 101
Name: -
Metric Values: protocol:nfs3
Creation Time: 2023-02-02T22:35:02
Impact: -
Limits: protocol_ops:20000

In the next article in this series, we’ll step through an example SmartQoS configuration and verification from both the CLI and WebUI.

Author: Nick Trimbee

Read Full Blog
  • PowerScale
  • OneFS
  • SmartQoS
  • performance management

OneFS SmartQoS

Nick Trimbee Nick Trimbee

Thu, 23 Feb 2023 22:34:49 -0000

|

Read Time: 0 minutes

Built atop the partitioned performance (PP) resource monitoring framework, OneFS 9.5 introduces a new SmartQoS performance management feature. SmartQoS allows a cluster administrator to set limits on the maximum number of protocol operations per second (Protocol Ops) that individual pinned workloads can consume, in order to achieve desired business workload prioritization. Among the benefits of this new QoS functionality are:

  • Enabling IT infrastructure teams to achieve performance SLAs
  • Allowing throttling of rogue or low priority workloads and hence prioritization of other business critical workloads
  • Helping minimize data unavailability events due to overloaded clusters

 

This new SmartQoS feature in OneFS 9.5 supports the NFS, SMB and S3 protocols, including mixed traffic to the same workload.

But first, a quick refresher. The partitioned performance resource monitoring framework, which initially debuted in OneFS 8.0.1, enables OneFS to track and report the use of transient system resources (resources that only exist at a given instant), providing insight into who is consuming what resources, and how much of them. Examples include CPU time, network bandwidth, IOPS, disk accesses, and cache hits, and so on.

OneFS partitioned performance is an ongoing project that in OneFS 9.5 now provides control and insights. This allows control of work flowing through the system, prioritization and protection of mission critical workflows, and the ability to detect if a cluster is at capacity.

Because identification of work is highly subjective, OneFS partitioned performance resource monitoring provides significant configuration flexibility, by allowing cluster admins to craft exactly how they want to define, track, and manage workloads. For example, an administrator might want to partition their work based on criteria such as which user is accessing the cluster, the export/share they are using, which IP address they’re coming from – and often a combination of all three.

OneFS has always provided client and protocol statistics, but they were typically front-end only. Similarly, OneFS has provided CPU, cache, and disk statistics, but they did not display who was consuming them. Partitioned performance unites these two realms, tracking the usage of the CPU, drives, and caches, and spanning the initiator/participant barrier.

OneFS collects the resources consumed and groups them into distinct workloads. The aggregation of these workloads comprises a performance dataset.

Item

Description

Example

Workload

A set of identification metrics and resources used

{username:nick, zone_name:System} consumed {cpu:1.5s, bytes_in:100K, bytes_out:50M, …}

Performance Dataset

The set of identification metrics by which to aggregate workloads

 

The list of workloads collected that match that specification

{usernames, zone_names}

Filter

A method for including only workloads that match specific identification metrics

  • {username:nick, zone_name:System}
  • {username:jane, zone_name:System}
  • {username:nick, zone_name:Perf}

The following metrics are tracked by partitioned performance resource monitoring:

Category

Items

Identification Metrics

  • Username / UID / SID
  • Primary Groupname / GID / GSID
  • Secondary Groupname / GID / GSID
  • Zone Name
  • Local/Remote IP Address/Range
  • Path
  • Share / Export ID
  • Protocol
  • System Name
  • Job Type

Transient Resources

  • CPU Usage
  • Bytes In/Out – Net traffic minus TCP headers
  • IOPs – Protocol OPs
  • Disk Reads – Blocks read from disk
  • Disk Writes – Block written to the journal, including protection
  • L2 Hits – Blocks read from L2 cache
  • L3 Hits – Blocks read from L3 cache
  • Latency – Sum of time taken from start to finish of OP
  • ReadLatency
  • WriteLatency
  • OtherLatency

Performance Statistics

  • Read/Write/Other Latency

Supported Protocols

  • NFS
  • SMB
  • S3
  • Jobs
  • Background Services

Be aware that, in OneFS 9.5, SmartQoS currently does not support the following Partitioned Performance criteria:

Unsupported Group

Unsupported Items

Metrics

  • System Name
  • Job Type

Workloads

  • Top workloads (as they are dynamically and automatically generated by the kernel)
  • Workloads belonging to the ‘system’ dataset

Protocols

  • Jobs
  • Background services

When pinning a workload to a dataset, note that the more metrics there are in that dataset, the more parameters need to be defined when pinning to it. For example:

Dataset = zone_name, protocol, username

To set a limit on this dataset, you’d need to pin the workload by also specifying the zone name, protocol, and username.

When using the remote_address and/or local_address metrics, you can also specify a subnet. For example: 10.123.456.0/24

With the exception of the system dataset, you must configure performance datasets before statistics are collected.

For SmartQoS in OneFS 9.5, you can define and configure limits as a maximum number of protocol operations (Protocol Ops) per second across the following protocols:

  • NFSv3
  • NFSv4
  • NFSoRDMA
  • SMB
  • S3

You can apply a Protocol Ops limit to up to four custom datasets. All pinned workloads within a dataset can have a limit configured, up to a maximum of 1024 workloads per dataset. If multiple workloads happen to share a common metric value with overlapping limits, the lowest limit that is configured would be enforced

Note that when upgrading to OneFS 9.5, SmartQoS is activated only when the new release has been successfully committed.

In the next article in this series, we’ll take a deeper look at SmartQoS’ underlying architecture and workflow.

Author: Nick Trimbee

Read Full Blog
  • PowerScale
  • OneFS
  • SmartPools

OneFS SmartPools Transfer Limits Configuration and Management

Nick Trimbee Nick Trimbee

Thu, 16 Feb 2023 15:48:08 -0000

|

Read Time: 0 minutes

In the first article in this series, we looked at the architecture and considerations of the new SmartPools transfer limits in OneFS 9.5. Now, we turn our attention to the configuration and management of this feature.

From the control plane side, OneFS 9.5 contains several WebUI and CLI enhancements to reflect the new SmartPools transfer limits functionality. Probably the most obvious change is in the Local storage usage status histogram, where tiers and their child node pools have been aggregated for a more logical grouping. Also, blue limit-lines have been added above each of the storage pools, and a red warning status is displayed for any pools that have exceeded the transfer limit.

Similarly, the storage pools status page now includes transfer limit details, with the 90% limit displayed for any storage pools using the default setting.

From the CLI, the isi storagepool nodepools view command reports the transfer limit status and percentage for a pool. The used SSD and HDD bytes percentages in the command output indicate where the pool utilization is relative to the transfer limit.

The storage transfer limit can be easily configured from the CLI as either for a  specific pool, as a default, or disabled, using the new –transfer-limit and –default-transfer-limit flags.

The following CLI command can be used to set the transfer limit for a specific storage pool:

# isi storagepool nodepools/tier modify --transfer-limit={0-100, default, disabled} 

For example, to set a limit of 80% on an A200 nodepool:

# isi storagepool a200_30tb_1.6tb-ssd_96gb modify --transfer-limit=80 

Or to set the default limit of 90% on tier perf1:

# isi storagepool perf1 --transfer-limit=default 

Note that setting the transfer limit of a tier automatically applies to all its child node pools, regardless of any prior child limit configurations.

The global isi storage settings view CLI command output shows the default transfer limit, which is 90%, but it can be configured between 0 to 100%.

This default limit can be reconfigured from the CLI with the following syntax:

# isi storagepool settings modify --default-transfer-limit={0-100, disabled}

For example, to set a new default transfer limit of 85%:

# isi storagepool settings modify --default-transfer-limit=85

And the same changes can be made from the SmartPools WebUI, by navigating to Storage pools > SmartPools settings:

 Once a SmartPools job has been completed in OneFS 9.5, the job report contains a new field, files not moved due to transfer limit exceeded.

# isi job reports view 1056 
... 
... 
Policy/testpolicy/Access changes skipped 0 
Policy/testpolicy/ADS containers matched 'head’ 0 
Policy/testpolicy/ADS containers matched 'snapshot’ 0 
Policy/testpolicy/ADS streams matched 'head’ 0 
Policy/testpolicy/ADS streams matched 'snapshot’ 0 
Policy/testpolicy/Directories matched 'head’ 0 
Policy/testpolicy/Directories matched 'snapshot’ 0 
Policy/testpolicy/File creation templates matched 0 
Policy/testpolicy/Files matched 'head’ 0 
Policy/testpolicy/Files matched 'snapshot’ 0 
Policy/testpolicy/Files not moved due to transfer limit exceeded 0 
Policy/testpolicy/Files packed 0 
Policy/testpolicy/Files repacked 0 
Policy/testpolicy/Files unpacked 0 
Policy/testpolicy/Packing changes skipped 0 
Policy/testpolicy/Protection changes skipped 0 
Policy/testpolicy/Skipped files already in containers 0 
Policy/testpolicy/Skipped packing non-regular files 0 
Policy/testpolicy/Skipped packing regular files 0

Additionally, the SYS STORAGEPOOL FILL LIMIT EXCEEDED alert is triggered at the Info level when a storage pool’s usage has exceeded its transfer limit. Each hour, CELOG fires off a monitor helper script that measures how full each storage pool is relative to its transfer limit. The usage is gathered by reading from the disk pool database, and the transfer limits are stored in gconfig. If a node pool has a transfer limit of 50% and usage of 75%, the monitor helper would report a measurement of 150%, triggering an alert.

# isi event view 126 
ID: 126 
Started: 11/29 20:32 
Causes Long: storagepool: vonefs_13gb_4.2gb-ssd_6gb:hdd usage: 33.4, transfer limit: 30.0 
Lnn: 0 
Devid: 0 
Last Event: 2022-11-29T20:32:16 
Ignore: No 
Ignore Time: Never 
Resolved: No 
Resolve Time: Never 
Ended: -- 
Events: 1 
Severity: information

And from the WebUI:


And there you have it: Transfer limits, and the first step in the evolution toward a smarter SmartPools.

 

Read Full Blog
  • PowerScale
  • OneFS
  • SmartPools

OneFS SmartPools Transfer Limits

Nick Trimbee Nick Trimbee

Wed, 15 Feb 2023 22:53:09 -0000

|

Read Time: 0 minutes

The new OneFS 9.5 release introduces the first phase of engineering’s Smarter SmartPools initiative, and delivers a new feature called SmartPools transfer limits.

The goal of SmartPools Transfer Limits is to address spill over. Previously, when file pool policies were executed, OneFS had no guardrails to protect against overfilling the destination or target storage pool. So if a pool was overfilled, data would unexpectedly spill over into other storage pools.

An overflow would result in storagepool usage exceeding 100%, and cause the SmartPools job itself to do a considerable amount of unnecessary work, trying to send files to a given storagepool. But because the pool was full, it would then have to send those files off to another storage pool that was below capacity. This would result in data going where it wasn’t intended, and the potential for individual files to end up getting split between pools. Also, if the full pool was on the most highly performing storage in the cluster, all subsequent newly created data would now land on slower storage, affecting its throughput and latency. The recovery from a spillover can be fairly cumbersome because it’s tough for the cluster to regain balance, and urgent system administration may be required to free space on the affected tier.

In order to address this, SmartPools Transfer Limits allows a cluster admin to configure a storagepool capacity-usage threshold, expressed as a percentage, and beyond which file pool policies stop moving data to that particular storage pool.

These transfer limits only take effect when running jobs that apply filepool policies, such as SmartPools, SmartPoolsTree, and FilePolicy.

The main benefits of this feature are two-fold:

  • Safety, in that OneFS avoids undesirable actions, so the customer is prevented from getting into escalation situations, because SmartPools won’t overfill storage pools.
  • Performance, because transfer limits avoid unnecessary work, and allow the SmartPools job to finish sooner.

Under the hood, a cluster’s storagepool SSD and HDD usage is calculated using the same algorithm as reported by the ‘isi storagepools list’ CLI command. This means that a pool’s VHS (virtual hot spare) reserved capacity is respected by SmartPools transfer limits. When a SmartPools job is running, there is at least one worker on each node processing a single LIN at any given time. In order to calculate the current HDD and SSD usage per storagepool, the worker must read from the diskpool database. To circumvent this potential bottleneck, the filepool policy algorithm caches the diskpool database contents in memory for up to 10 seconds.

Transfer limits are stored in gconfig, and a separate entry is stored within the ‘smartpools.storagepools’ hierarchy for each explicitly defined transfer limit.

Note that in the SmartPools lexicon, ‘storage pool’ is a generic term denoting either a tier or nodepool. Additionally, SmartPools tiers comprise one or more constituent nodepools.

Each gconfig transfer limit entry stores a limit value and the diskpool database identifier of the storagepool to which the transfer limit applies. Additionally, a ‘transfer limit state’ field specifies which of three states the limit is in:

Limit state

Description

Default

Fallback to the default transfer limit.

Disabled

Ignore transfer limit.

Enabled

The corresponding transfer limit value is valid.

A SmartPools transfer limit does not affect the general ingress, restriping, or reprotection of files, regardless of how full the storage pool is where that file is located. So if you’re creating or modifying a file on the cluster, it will be created there anyway. This will continue up until the pool reaches 100% capacity, at which point it will then spill over.

The default transfer limit is 90% of a pool’s capacity. This applies to all storage pools where the cluster admin hasn’t explicitly set a threshold. Note also that the default limit doesn’t get set until a cluster upgrade to OneFS 9.5 has been committed. So if you’re running a SmartPools policy job during an upgrade, you’ll have the preexisting behavior, which is to send the file to wherever the file pool policy instructs it to go. It’s also worth noting that, even though the default transfer limit is set on commit, if a job was running over that commit edge, you’d have to pause and resume it for the new limit behavior to take effect. This is because the new configuration is loaded lazily when the job workers are started up, so even though the configuration changes, a pause and resume is needed to pick up those changes.

SmartPools itself needs to be licensed on a cluster in order for transfer limits to work. And limits can be configured at the tier or nodepool level. But if you change the limit of a tier, it automatically applies to all of its child nodepools, regardless of any prior child limit configurations. The transfer limit feature can also be disabled, which results in the same spillover behavior OneFS always displayed, and any configured limits will not be respected.

Note that a filepool policy’s transfer limits algorithm does not consider the size of the file when deciding whether to move it to the policy’s target storagepool, regardless of whether the file is empty, or a large file. Similarly, a target storagepool’s usage must exceed its transfer limit before the filepool policy will stop moving data to that target pool. The assumption here is that any storagepool usage overshoot is insignificant in scale compared to the capacity of a cluster’s storagepool.

A SmartPools file pool policy allows you to send snapshot or HEAD data blocks to different targets, if so desired.

Because the transfer limit applies to the storagepool itself, and not to the file pool policy, it’s important to note that, if you’ve got varying storagepool targets and one file pool policy, you may have a situation where the head data blocks do get moved. But if the snapshot is pointing at a storage pool that has exceeded its transfer limit, its blocks will not be moved.

File pool policies also allow you to specify how a mixed node’s SSDs are used: either as L3 cache, or as an SSD strategy for head and snapshot blocks. If the SSDs in a node are configured for L3, they are not being used for storage, so any transfer limits are irrelevant to it. As an alternative to L3 cache, SmartPools offers three main categories of SSD strategy:  

  • Avoid, which means send all blocks to HDD 
  • Data, which means send everything to SSD 
  • Metadata Read or Write, which sends varying numbers of metadata mirrors to SSD, and data blocks to hard disk.

To reflect this, SmartPools transfer limits are slightly nuanced when it comes to SSD strategies. That is, if the storagepool target contains both HDD and SSD, the usage capacity of both mediums needs to be below the transfer limit in order for the file to be moved to that target. For example, take two node pools, NP1 and NP2.

A file pool policy, Pol1, is configured and which matches all files under /ifs/dir1, with an SSD strategy of Metadata Write, and pool NP1 as the target for HEAD’s data blocks. For snapshots, the target is NP2, with an ‘avoid’ SSD strategy, so just writing to hard disk for both snapshot data and metadata.

When a SmartPools job runs and attempts to apply this file pool policy, it sees that SSD usage is above the 85% configured transfer limit for NP1. So, even though the hard disk capacity usage is below the limit, neither HEAD data nor metadata will be sent to NP1.

For the snapshot, the SSD usage is also above the NP2 pool’s transfer limit of 90%.

However, because the SSD strategy is ‘avoid’, and because the hard disk usage is below the limit, the snapshot’s data and metadata get successfully sent to the NP2 HDDs.

Author: Nick Trimbee

Read Full Blog
  • security
  • PowerScale
  • OneFS
  • cybersecurity

PowerScale OneFS 9.5 Delivers New Security Features and Performance Gains

Nick Trimbee Nick Trimbee

Fri, 28 Apr 2023 19:57:51 -0000

|

Read Time: 0 minutes

PowerScale – the world’s most flexible[1] and cyber-secure scale-out NAS solution[2]  – is powering up the new year with the launch of the innovative OneFS 9.5 release. With data integrity and protection being top of mind in this era of unprecedented corporate cyber threats, OneFS 9.5 brings an array of new security features and functionality to keep your unstructured data and workloads more secure than ever, as well as delivering significant performance gains on the PowerScale nodes – such as up to 55% higher performance on all-flash F600 and F900 nodes as compared with the previous OneFS release.[3]   

Table

Description automatically generated

OneFS and hardware security features 

New PowerScale OneFS 9.5 security enhancements include those that directly satisfy US Federal and DoD mandates, such as FIPS 140-2, Common Criteria, and DISA STIGs – in addition to general enterprise data security requirements. Multi-factor authentication (MFA), single sign-on (SSO) support, data encryption in-flight and at rest, TLS 1.2, USGv6R1 IPv6 support, SED Master Key rekey, plus a new host-based firewall are all part of OneFS 9.5. 

15TB and 30TB self-encrypting (SED) SSDs now enable PowerScale platforms running OneFS 9.5 to scale up to 186 PB of encrypted raw capacity per cluster – all within a single volume and filesystem, and before any additional compression and deduplication benefit.  

Delivering federal-grade security to protect data under a zero trust model 

Security-wise, the United States Government has stringent requirements for infrastructure providers such as Dell Technologies, requiring vendors to certify that products comply with requirements such as USGv6, STIGs, DoDIN APL, Common Criteria, and so on. Activating the OneFS 9.5 cluster hardening option implements a default maximum security configuration with AES and SHA cryptography, which automatically renders a cluster FIPS 140-2 compliant. 

OneFS 9.5 introduces SAML-based single sign-on (SSO) from both the command line and WebUI using a redesigned login screen. OneFS SSO is compatible with identity providers (IDPs) such as Active Directory Federation Services, and is also multi-tenant aware, allowing independent configuration for each of a cluster’s Access Zones. 

Federal APL requirements mandate that a system must validate all certificates in a chain up to a trusted CA root certificate. To address this, OneFS 9.5 introduces a common Public Key Infrastructure (PKI) library to issue, maintain, and revoke public key certificates. These certificates provide digital signature and encryption capabilities, using public key cryptography to provide identification and authentication, data integrity, and confidentiality. This PKI library is used by all OneFS components that need PKI certificate verification support, such as SecureSMTP, ensuring that they all meet Federal PKI requirements. 

This new OneFS 9.5 PKI and certificate authority infrastructure enables multi-factor authentication, allowing users to swipe a CAC or PIV smartcard containing their login credentials to gain access to a cluster, rather than manually entering username and password information. Additional account policy restrictions in OneFS 9.5 automatically disable inactive accounts, provide concurrent administrative session limits, and implement a delay after a failed login.  

As part of FIPS 140-2 compliance, OneFS 9.5 introduces a new key manager, providing a secure central repository for secrets such as machine passwords, Kerberos keytabs, and other credentials, with the option of using MCF (modular crypt format) with SHA256 or SHA512 hash types. OneFS protocols and services may be configured to support FIPS 140-2 data-in-flight encryption compliance, while SED clusters and the new Master Key re-key capability provide FIPS 140-2 data-at-rest encryption. Plus, any unused or non-compliant services are easily disabled.  

On the network side, the Federal APL has several IPv6 (USGv6) requirements that are focused on allowing granular control of individual components of a cluster’s IPv6 stack, such as duplicate address detection (DAD) and link local IP control. Satisfying both STIG and APL requirements, the new OneFS 9.5 front-end firewall allows security admins to restrict the management interface to specified subnet and implement port blocking and packet filtering rules from the cluster’s command line or WebUI, in accordance with federal or corporate security policy. 

Improving performance for the most demanding workloads

OneFS 9.5 unlocks dramatic performance gains, particularly for the all-flash NVMe platforms, where the PowerScale F900 can now support line-rate streaming reads. SmartCache enhancements allow OneFS 9.5 to deliver streaming read performance gains of up to 55% on the F-series nodes, F600 and F9003, delivering benefit to media and entertainment workloads, plus AI, machine learning, deep learning, and more. 

Enhancements to SmartPools in OneFS 9.5 introduce configurable transfer limits. These limits include maximum capacity thresholds, expressed as a percentage, above which SmartPools will not attempt to move files to a particular tier, boosting both reliability and tiering performance. 

Granular cluster performance control is enabled with the debut of PowerScale SmartQoS, which allows admins to configure limits on the maximum number of protocol operations that NFS, S3, SMB, or mixed protocol workloads can consume. 

Enhancing enterprise-grade supportability and serviceability

OneFS 9.5 enables SupportAssist, Dell’s next generation remote connectivity system for transmitting events, logs, and telemetry from a PowerScale cluster to Dell Support. SupportAssist provides a full replacement for ESRS, as well as enabling Dell Support to perform remote diagnosis and remediation of cluster issues. 

Upgrading to OneFS 9.5 

The new OneFS 9.5 code is available on the Dell Technologies Support site, as both an upgrade and reimage file, allowing both installation and upgrade of this new release.  

Author: Nick Trimbee

[1] Based on Dell analysis, August 2021.

[2] Based on Dell analysis comparing cybersecurity software capabilities offered for Dell PowerScale vs. competitive products, September 2022.

[3] Based on Dell internal testing, January 2023. Actual results will vary.


Read Full Blog
  • PowerScale
  • OneFS
  • diagnostics

OneFS Diagnostics

Nick Trimbee Nick Trimbee

Sun, 18 Dec 2022 19:43:36 -0000

|

Read Time: 0 minutes

In addition to the /usr/bin/isi_gather_info tool, OneFS also provides both a GUI and a common ‘isi’ CLI version of the tool – albeit with slightly reduced functionality. This means that a OneFS log gather can be initiated either from the WebUI, or by using the ‘isi diagnostics’ CLI command set with the following syntax:

# isi diagnostics gather start

The diagnostics gather status can also be queried as follows:

# isi diagnostics gather status
Gather is running.

When the command has completed, the gather tarfile can be found under /ifs/data/Isilon_Support.

You can also view and modify the ‘isi diagnostics’ configuration as follows:

# isi diagnostics gather settings view
                Upload: Yes
                  ESRS: Yes
         Supportassist: Yes
           Gather Mode: full
  HTTP Insecure Upload: No
      HTTP Upload Host:
      HTTP Upload Path:
     HTTP Upload Proxy:
HTTP Upload Proxy Port: -
            Ftp Upload: Yes
       Ftp Upload Host: ftp.isilon.com
       Ftp Upload Path: /incoming
      Ftp Upload Proxy:
 Ftp Upload Proxy Port: -
       Ftp Upload User: anonymous
   Ftp Upload Ssl Cert:
   Ftp Upload Insecure: No

The configuration options for the ‘isi diagnostics gather’ CLI command include:

Option

Description

–upload <boolean>

Enable gather upload.

–esrs <boolean>

Use ESRS for gather upload.

–gather-mode (incremental | full)

Type of gather: incremental, or full.

–http-insecure-upload <boolean>

Enable insecure HTTP upload on completed gather.

–http-upload-host <string>

HTTP Host to use for HTTP upload.

–http-upload-path <string>

Path on HTTP server to use for HTTP upload.

–http-upload-proxy <string>

Proxy server to use for HTTP upload.

–http-upload-proxy-port <integer>

Proxy server port to use for HTTP upload.

–clear-http-upload-proxy-port

Clear proxy server port to use for HTTP upload.

–ftp-upload <boolean>

Enable FTP upload on completed gather.

–ftp-upload-host <string>

FTP host to use for FTP upload.

–ftp-upload-path <string>

Path on FTP server to use for FTP upload.

–ftp-upload-proxy <string>

Proxy server to use for FTP upload.

–ftp-upload-proxy-port <integer>

Proxy server port to use for FTP upload.

–clear-ftp-upload-proxy-port

Clear proxy server port to use for FTP upload.

–ftp-upload-user <string>

FTP user to use for FTP upload.

–ftp-upload-ssl-cert <string>

Specifies the SSL certificate to use in FTPS connection.

–ftp-upload-insecure <boolean>

Whether to attempt a plain text FTP upload.

–ftp-upload-pass <string>

FTP user to use for FTP upload password.

–set-ftp-upload-pass

Specify the FTP upload password interactively.

As mentioned above, ‘isi diagnostics gather’ does not present quite as broad an array of features as the isi_gather_info utility. This is primarily for security purposes, because ‘isi diagnostics’ does not require root privileges to run. Instead, a user account with the ‘ISI_PRIV_SYS_SUPPORT’ RBAC privilege is needed in order to run a gather from either the WebUI or ‘isi diagnostics gather’ CLI interface.

When a gather is running, a second instance cannot be started from any other node until that instance finishes. Typically, a warning similar to the following appears:

"It appears that another instance of gather is running on the cluster somewhere. If you would like to force gather to run anyways, use the --force-multiple-igi flag. If you believe this message is in error, you may delete the lock file here: /ifs/.ifsvar/run/gather.node."

You can remove this lock as follows:

# rm -f /ifs/.ifsvar/run/gather.node

You can also initiate a log gather from the OneFS WebUI by navigating to Cluster management > Diagnostics > Gather:

 

The WebUI also uses the ‘isi diagnostics’ platform API handler and so, like the CLI command, also offers a subset of the full isi_gather_info functionality.

A limited menu of configuration options is also available in the WebUI, under Cluster management > Diagnostics > Gather settings:

Also contained within the OneFS diagnostics command set is the ‘isi diagnostics netlogger’ utility. Netlogger captures IP traffic over a period of time for network and protocol analysis.

Under the hood, netlogger is a Python wrapper around the ubiquitous tcpdump utility, and can be run either from the OneFS command line or WebUI.

For example, from the WebUI, browse to Cluster management > Diagnostics > Netlogger: