Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

Dell.com Contact Us
United States/English

Blogs

Short articles related to Dell PowerScale.

Blogs (133)

  • PowerScale
  • OneFS
  • HealthCheck
  • auto-updates

OneFS HealthCheck Auto-updates

Nick Trimbee Nick Trimbee

Tue, 21 May 2024 17:11:27 -0000

|

Read Time: 0 minutes

Prior to OneFS 9.4, Healthchecks were frequently regarded by storage administrators as yet another patch that needed to be installed on a PowerScale cluster. As a result, their adoption was routinely postponed or ignored, potentially jeopardizing a cluster’s well-being. To address this, OneFS HealthCheck auto-updates enable new Healthchecks to be automatically downloaded and non-disruptively installed on a PowerScale cluster without any user intervention.

The automated HealthCheck update framework helps accelerate the adoption of OneFS Healthchecks, by removing the need for manual checks, downloads, and installation. In addition to reducing management overhead, the automated Healthchecks integrate with CloudIQ to update the cluster health score - further improving operational efficiency, while avoiding known issues that affect cluster availability.

This figure shows the OneFS Healthcheck architecture.

Formerly known as Healthcheck patches, or RUPs, with OneFS 9.4 and later these are renamed as ‘Healthcheck definitions’. The Healthcheck framework checks for updates to these definitions using Dell Secure Remote Services (SRS). 

An auto-update configuration setting in the OneFS SRS framework controls whether the Healthcheck definitions are automatically downloaded and installed on a cluster. A OneFS platform API endpoint has been added to verify the Healthcheck version, and Healthchecks also optionally support OneFS compliance mode.

Healthcheck auto-update is enabled by default in OneFS 9.4 and later, and is available for both existing and new clusters, but can also be easily disabled from the CLI. If the auto-update is on and SRS is enabled, the Healthcheck definition is downloaded to the desired staging location and then automatically and non-impactfully installed on the cluster. Any Healthcheck definitions that are automatically downloaded are obviously signed and verified before being applied, to ensure their security and integrity.

So, the Healthcheck auto-update execution process itself is as follows:

This figure lists the six steps of the auto-update execution process. They are:  one - query the current Healthcheck version two - check the Healthcheck definition availability three - compare the versions four - download the Healthcheck definition package to the cluster five - unpack and install the package, and six - send telemetry data and update the Healthcheck framework with the new version

On the cluster, the Healthcheck auto-update utility isi_healthcheck_update monitors for a new package once a night, by default. This Python script checks the cluster’s current Healthcheck definition version and new updates availability using SRS. Next, it performs a version comparison of the install package, after which the new definition is downloaded and installed. Telemetry data is sent and the /var/db/healthcheck_version.json file is created if it’s not already present. This JSON file is then updated with the new Healthcheck version info.

To configure and use the Healthcheck auto-update functionality, you must perform the following steps:

  1. Upgrade the cluster to OneFS 9.4 or later and commit the upgrade.
  2. To use the isi_healthcheck script, OneFS needs to be licensed and connected to the ESRS gateway. OneFS 9.4 also introduces a new option for ESRS, ‘SRS Download Enabled’, which must be set to ‘Yes’ (the default value) to allow the isi_healthcheck_update utility to run. To do this, use the following syntax (in this example, using lab-sea-esrs.onefs.com as the primary ESRS gateway):
# isi esrs modify --enabled=yes --primary-esrs-gateway=10.12.15.50 --srs-download-enabled=true

Confirm the ESRS configuration as follows:

# isi esrs view
                                    Enabled: Yes
                       Primary ESRS Gateway: 10.12.15.50
                     Secondary ESRS Gateway: 
                        Alert on Disconnect: Yes
                       Gateway Access Pools: -
          Gateway Connectivity Check Period: 60
License Usage Intelligence Reporting Period: 86400
                           Download Enabled: No
                       SRS Download Enabled: Yes
          ESRS File Download Timeout Period: 50
           ESRS File Download Error Retries: 3
              ESRS File Download Chunk Size: 1000000
             ESRS Download Filesystem Limit: 80
        Offline Telemetry Collection Period: 7200
                Gateway Connectivity Status: Connected
  1. Next, use the CloudIQ web interface to onboard the cluster. This requires creating a site, and then from the Add Product page, configuring the serial number of each node in the cluster, along with the product type ISILON_NODE, the site ID, and then selecting Submit.

This is a CloudIQ WebUI screenshot that shows cluster onboarding.

CloudIQ cluster onboarding typically takes a couple of hours. When complete, the Product Details page shows the CloudIQ Status, ESRS Data, and CloudIQ Data fields as Enabled, as shown here:

This screenshot shows the CloudIQ product onboarding page.

  1. Examine the cluster status to verify that the cluster is available and connected in CloudIQ.

When these prerequisite steps are complete, use the new isi_healthcheck_update CLI command to enable auto-update. For example, to enable:

# isi_healthcheck_update --enable
2022-05-02 22:21:27,310 - isi_healthcheck.auto_update - INFO - isi_healthcheck_update started
2022-05-02 22:21:27,513 - isi_healthcheck.auto_update - INFO - Enable autoupdate

Similarly, you can also easily disable auto-update:

# isi esrs modify --srs-download-enabled=false

Auto-update also has the following gconfig global config options and default values:

# isi_gconfig -t healthcheck 
Default values: healthcheck_autoupdate.enabled (bool) = true healthcheck_autoupdate.compliance_update (bool) = false healthcheck_autoupdate.alerts (bool) = false healthcheck_autoupdate.max_download_package_time (int) = 600 healthcheck_autoupdate.max_install_package_time (int) = 3600 healthcheck_autoupdate.number_of_failed_upgrades (int) = 0 healthcheck_autoupdate.last_failed_upgrade_package (char*) = healthcheck_autoupdate.download_directory (char*) = /ifs/data/auto_upgrade_healthcheck/downloads

The isi_healthcheck_update Python utility is scheduled by cron and executed across all the nodes in the cluster, as follows:

# grep -i healthcheck /etc/crontab
# Nightly Healthcheck update
0       1       *        *       *       root     /usr/bin/isi_healthcheck_update -s

This default /etc/crontab entry executes auto-update once daily at 1am. However, this schedule can be adjusted to meet the needs of the local environment.

Auto-update checks for new package availability and downloads and performs a version comparison of the installed and the new package. The package is then installed, telemetry data sent, and the healthcheck_version.json file updated with new version.

After the Healthcheck update process has completed, you can use the following CLI command to view any automatically downloaded Healthcheck packages. For example:

# isi upgrade patches list
Patch Name               Description                                Status
-----------------------------------------------------------------------------
HealthCheck_9.4.0_32.0.3 [9.4.0 UHC 32.0.3] HealthCheck definition  Installed
-----------------------------------------------------------------------------
Total: 1

Additionally, viewing the JSON version file will also confirm this:

# cat /var/db/healthcheck_version.json
{“version”: “32.0.3”}

In the unlikely event that auto-updates run into issues, the following troubleshooting steps can be useful:

  1. Confirm that Healthcheck auto-update is actually enabled:

Check the ESRS global config settings and verify they are set to ‘True’.

# isi_gconfig -t esrs esrs.enabled
esrs.enabled (bool) = true
# isi_gconfig -t esrs esrs.srs_download_enabled
esrs.srs_download_enabled (bool) = true

If not, run:

# isi_gconfig -t esrs esrs.enabled=true 
# isi_gconfig -t esrs esrs.srs_download_enabled=true 
  1. If an auto-update patch installation is not completed within 60 minutes, OneFS increments the unsuccessful installations counter for the current patch, and re-attempts installation the following day.
  2. If the unsuccessful installations counter exceeds five attempts, the installation will be aborted. However, you can reset the following auto-update gconfig values, as follows, to re-enable the installation:
# isi_gconfig -t healthcheck healthcheck_autoupdate.last_failed_upgrade_package = 0 
# isi_gconfig -t healthcheck healthcheck_autoupdate.number_of_failed_upgrades = "" 
  1. If a patch installation status is reported as ‘failed’, the recommendation is to contact Dell Support to diagnose and resolve the issue:
# isi upgrade patches list
Patch Name               Description                                Status
-----------------------------------------------------------------------------
HealthCheck_9.4.0_32.0.3 [9.4.0 UHC 32.0.3] HealthCheck definition  Failed
-----------------------------------------------------------------------------
Total: 1

However, the following CLI command can be carefully used to repair the patch system by attempting to abort the most recent failed action: 

# isi upgrade patches abort 

The isi upgrade archive --clear command stops the current upgrade and prevents it from being resumed:

# isi upgrade archive --clear

When the upgrade status is reported as ‘unknown’, run:

# isi upgrade patch uninstall 
  1. The file /var/log/isi_healthcheck.log is also a great source for detailed auto-upgrade information.

Author: Nick Trimbee

Read Full Blog
  • PowerScale
  • OneFS
  • metadata management

OneFS Metadata

Nick Trimbee Nick Trimbee

Fri, 17 May 2024 20:07:06 -0000

|

Read Time: 0 minutes

OneFS uses two principal data structures to enable information about each object, or metadata, within the file system to be searched, managed, and stored efficiently and reliably. These structures are:

  • Inodes
  • B-trees

OneFS uses inodes to store file attributes and pointers to file data locations on disk. Each file, directory, link, and so on, is represented by an inode.

Within OneFS, inodes come in two sizes - either 512B or 8KB. The size that OneFS uses is determined primarily by the physical and logical block formatting of the drives in a diskpool.

All OneFS inodes have both static and dynamic sections. The static section space is limited and valuable because it can be accessed in a single I/O, and does not require a distributed lock to access it. It holds fixed-width, commonly used attributes such as POSIX mode bits, owner, and size.

Graphic illustrating the composition of a OneFS inode.

In contrast, the dynamic portion of an inode allows new attributes to be added, if necessary, without requiring an inode format update. This can be done by simply adding a new type value with code to serialize and deserialize it. Dynamic attributes are stored in the stream-style type-length-value (TLV) format, and include protection policies, OneFS ACLs, embedded b-tree roots, domain membership info, and so on.

If necessary, OneFS can also use extension blocks, which are 8KB blocks to store any attributes that cannot fully fit into the inode itself. OneFS data services such as SnapshotIQ also commonly leverage inode extension blocks.

Graphic illustrating a OneFS inode with extension blocks.

Inodes are dynamically created and stored in locations across all drives in the clusters; OneFS uses b-trees (actually B+ trees) for their indexing and rapid retrieval. The general structure of a OneFS b-tree includes a top-level block, known as the ‘root’. B-tree blocks that reference other b-trees are referred to as ‘inner blocks’. The last blocks at the end of the tree are called ‘leaf blocks’.

 

Graphic depicting the general structure of a OneFS b-tree.

Only the leaf blocks actually contain metadata, whereas the root and inner blocks provide a balanced index of addresses allowing rapid identification of and access to the leaf blocks and their metadata.

A LIN, or logical inode, is accessed every time a file, directory, or b-tree is accessed. The function of the LIN Tree is to store the mapping between a unique LIN number and its inode mirror addresses.

The LIN is represented as a 64-bit hexadecimal number. Each file is assigned a single LIN and, because LINs are never reused, it is unique for the cluster’s lifespan. For example, the file /ifs/data/test/file1 has the following LIN:

# isi get -D /ifs/data/test/f1 | grep LIN:
*   LIN:                1:2d29:4204

Similarly, its parent directory, /ifs/data/testhas:

# isi get -D /ifs/data/test | grep LIN:
*   LIN:                1:0353:bb59
*   LIN:                1:0009:0004
*   LIN:                1:2d29:4204

The file above’s LIN tree entry includes the mapping between the LIN and its three mirrored inode disk addresses.

# isi get -D /ifs/data/test/f1 | grep "inode"
* IFS inode: [ 92,14,524557565440:512, 93,19,399535074304:512, 95,19,610321964032:512 ]

Taking the first of these inode addresses, 92,14,524557565440:512, the following can be inferred, reading from left to right:

  • It’s on node 92.
  • Stored on drive lnum 14.
  • At block address 524557565440.
  • And is a 512byte inode.

The file’s parent LIN can also be easily determined:

# isi get -D /ifs/data/test/f1 | grep -i "Parent Lin"
*  Parent Lin          1:0353:bb59

In addition to the LIN tree, OneFS also uses b-trees to support file and directory access, plus the management of several other data services. That said, the three principal b-trees that OneFS employs are:

Category

B+ Tree Name

Description

Files

Metatree or Inode Format Manager (IFM B-tree)

  • This B-tree stores a mapping of Logical Block Number (LBN) to protection group
  • It is responsible to storing the physical location of file blocks on disk.

Directories

Directory Format Manager (DFM B-tree)

  • This B-tree stores directory entries (File names and directory/sub-directories)
  • It includes the full /ifs namespace and everything under it.

System

System B-tree (SBT)

  • Standardized B+ Tree implementation to store records for OneFS internal use, typically related to a particular feature including: Diskpool DB, IFS Domains, WORM, Idmap. Quota (QDB) and Snapshot Tracking Files (STF) are actually separate/unique B+ Tree implementations.

OneFS also relies heavily on several other metadata structures too, including:

  • Shadow Store - Dedupe/clone metadata structures including SINs
  • QDB – Quota Database structures
  • System B+ Tree Files
  • STF – Snapshot Tracking Files
  • WORM
  • IFM Indirect
  • Idmap
  • System Directories
  • Delta Blocks
  • Logstore Files

Both inodes and b-tree blocks are mirrored on disk. Mirror-based protection is used exclusively for all OneFS metadata because it is simple and lightweight, thereby avoiding the additional processing of erasure coding. Because metadata typically only consumes around 2% of the overall cluster’s capacity, the mirroring overhead for metadata is minimal.

The number of inode mirrors (minimum 2x up to 8x) is determined by the nodepool’s achieved protection policy and the metadata type. The following is a mapping of the default number of mirrors for all metadata types.

Protection Level

Metadata Type

Number of Mirrors

+1n

File inode

2 inodes per file

+2d:1n

File inode

3 inodes per file

+2n

File inode

3 inodes per file

+3d:1n

File inode

4 inodes per file

+3d:1n1d

File inode

4 inodes per file

+3n

File inode

4 inodes per file

+4d:1n

File inode

5 inodes per file

+4d:2n

File inode

5 inodes per file

+4n

File inode

5 inodes per file

2x->8x

File inode

Same as protection level. I.e. 2x == 2 inode mirrors

+1n

Directory inode

3 inodes per file

+2d:1n

Directory inode

4 inodes per file

+2n

Directory inode

4 inodes per file

+3d:1n

Directory inode

5 inodes per file

+3d:1n1d

Directory inode

5 inodes per file

+3n

Directory inode

5 inodes per file

+4d:1n

Directory inode

6 inodes per file

+4d:2n

Directory inode

6 inodes per file

+4n

Directory inode

6 inodes per file

2x->8x

Directory inode

+1 protection level. I.e. 2x == 3 inode mirrors

 

LIN root/master

8x

 

LIN inner/leaf

Variable – per-entry protection

 

IFM/DFM b-tree

Variable – per-entry protection

 

Quota database b-tree (QDB)

8x

 

SBT System b-tree (SBT)

Variable – per-entry protection

 

Snapshot tracking files (STF)

8x

Note that, by default, directory inodes are mirrored at one level higher than the achieved protection policy, because directories are more critical and make up the OneFS single namespace. The root of the LIN Tree is the most critical metadata type and is always mirrored at 8x.

OneFS SSD strategy governs where and how much metadata is placed on SSD or HDD. There are five SSD Strategies, and these can be configured using OneFS’ file pool policies:

SSD Strategy

Description

L3 Cache

All drives in a Node Pool are used as a read-only evection cache from L2 Cache. Currently used data and metadata will fill the entire capacity of the SSD Drives in this mode. Note: L3 mode does not guarantee that all metadata will be on SSD, so this may not be the most performant mode for metadata intensive workflows.

Metadata Read

One metadata mirror is placed on SSD. All other mirrors will be on HDD for hybrid and archive models. This mode can boost read performance for metadata intensive workflows.

Metadata Write

All metadata mirrors are placed on SSD. This mode can boost both read and write performance when there is significant demand on metadata IO. Note: It is important to understand the SSD capacity requirements needed to support Metadata strategies.  

Data

Place data on SSD. This is not a widely used strategy, because Hybrid and Archive nodes have limited SSD capacities, and metadata should take priority on SSD for best performance.

Avoid

Avoid using SSD for a specific path. This is not a widely used strategy but could be handy if you had archive workflows that did not require SSD and wanted to dedicate your SSD space for other more important paths/workflows.

Fundamentally, OneFS metadata placement is determined by the following attributes:

  • The model of the nodes in each node pool (F-series, H-series, A-series)
  • The current SSD Strategy on the node pool configured using the default filepool policy and custom administrator-created filepool policies
  • The cluster’s global storage pool settings

You can use the following CLI commands to verify the current SSD strategy and metadata placement details on a cluster. For example, to check whether L3 Mode is enabled on a specific node pool:

# isi storagepool nodepool list
ID     Name                       Nodes  Node Type IDs   Protection Policy  Manual
----------------------------------------------------------------------------------
1      h500_30tb_3.2tb-ssd_128gb  1      1               +2d:1n             No

In this output, there is a single H500 node pool reported with an ID of 1. To display the details of this pool, use the following command:

# isi storagepool nodepool view 1
                 ID: 1
               Name: h500_30tb_3.2tb-ssd_128gb
              Nodes: 1, 2, 3, 4, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40
      Node Type IDs: 1
  Protection Policy: +2d:1n
             Manual: No
         L3 Enabled: Yes
L3 Migration Status: l3
               Tier: -
              Usage
                Avail Bytes: 321.91T
            Avail SSD Bytes: 0.00
                   Balanced: No
                 Free Bytes: 329.77T
             Free SSD Bytes: 0.00
                Total Bytes: 643.13T
            Total SSD Bytes: 0.00
    Virtual Hot Spare Bytes: 7.86T

Note that if, as in this case, L3 is enabled on a node pool, any changes to this pool’s SSD Strategy configuration using file pool policies, and so on, will not be honored. This will remain until L3 cache has been disabled and the SSDs reformatted for use as metadata mirrors.

You can use the following command to check the cluster’s default file pool policy configuration:

# isi filepool default-policy view
          Set Requested Protection: default
               Data Access Pattern: concurrency
                  Enable Coalescer: Yes
                    Enable Packing: No
               Data Storage Target: anywhere
                 Data SSD Strategy: metadata
           Snapshot Storage Target: anywhere
             Snapshot SSD Strategy: metadata
                        Cloud Pool: -
         Cloud Compression Enabled: -
          Cloud Encryption Enabled: -
              Cloud Data Retention: -
Cloud Incremental Backup Retention: -
       Cloud Full Backup Retention: -
               Cloud Accessibility: -
                  Cloud Read Ahead: -
            Cloud Cache Expiration: -
         Cloud Writeback Frequency: -
      Cloud Archive Snapshot Files: -
                                ID: -

And to list all FilePool Policies configured on a cluster:

# isi filepool policies list

View a specific FilePool Policy:

# isi filepool policies view <Policy Name>

OneFS also provides global storagepool configuration settings that control additional metadata placement. For example: 

# isi storagepool settings view
     Automatically Manage Protection: files_at_default
Automatically Manage Io Optimization: files_at_default
Protect Directories One Level Higher: Yes
       Global Namespace Acceleration: disabled
       Virtual Hot Spare Deny Writes: Yes
        Virtual Hot Spare Hide Spare: Yes
      Virtual Hot Spare Limit Drives: 2
     Virtual Hot Spare Limit Percent: 0
             Global Spillover Target: anywhere
                   Spillover Enabled: Yes
        SSD L3 Cache Default Enabled: Yes
                     SSD Qab Mirrors: one
            SSD System Btree Mirrors: one
            SSD System Delta Mirrors: one

The CLI output below includes descriptions of the relevant metadata options available.  

# isi storagepool settings modify -h | egrep -i options -A 30
Options:
    --automatically-manage-protection (all | files_at_default | none)
        Set whether SmartPools manages files' protection settings.
    --automatically-manage-io-optimization (all | files_at_default | none)
        Set whether SmartPools manages files' I/O optimization settings.
    --protect-directories-one-level-higher <boolean>
        Protect directories at one level higher.
    --global-namespace-acceleration-enabled <boolean>
        Global namespace acceleration enabled.
    --virtual-hot-spare-deny-writes <boolean>
        Virtual hot spare: deny new data writes.
    --virtual-hot-spare-hide-spare <boolean>
        Virtual hot spare: reduce amount of available space.
    --virtual-hot-spare-limit-drives <integer>
        Virtual hot spare: number of virtual drives.
    --virtual-hot-spare-limit-percent <integer>
        Virtual hot spare: percent of total storage.
    --spillover-target <str>
        Spillover target.
    --spillover-anywhere
        Set global spillover to anywhere.
    --spillover-enabled <boolean>
        Spill writes into pools within spillover_target as needed.
    --ssd-l3-cache-default-enabled <boolean>
        Default setting for enabling L3 on new Node Pools.
    --ssd-qab-mirrors (one | all)
        Controls number of mirrors of QAB blocks to place on SSDs.
    --ssd-system-btree-mirrors (one | all)
        Controls number of mirrors of system B-tree blocks to place on SSDs.
    --ssd-system-delta-mirrors (one | all)
        Controls number of mirrors of system delta blocks to place on SSDs.

OneFS defaults to protecting directories one level higher than the configured protection policy and retaining one mirror of system b-trees on SSD. For optimal performance on hybrid platform nodes, the recommendation is to place all metadata mirrors on SSD, assuming that the capacity is available. Be aware, however, that the metadata SSD mirroring options only become active if L3 Mode is disabled.

Additionally, global namespace acceleration (GNA) is a legacy option that allows nodes without SSD to place their metadata on nodes with SSD. All currently shipping PowerScale node models include at least one SSD drive.

Author: Nick Trimbee

Read Full Blog
  • security
  • PowerScale
  • OneFS
  • upgrades

OneFS Signed Upgrades

Nick Trimbee Nick Trimbee

Fri, 17 May 2024 16:42:45 -0000

|

Read Time: 0 minutes

Introduced as part of the OneFS security enhancements, signed upgrades help maintain system integrity by preventing a cluster from being compromised by the installation of maliciously modified upgrade packages. This is required by several industry security compliance mandates, such as the DoD Network Device Management Security Requirements Guide, which stipulates “The network device must prevent the installation of patches, service packs, or application components without verification the software component has been digitally signed using a certificate that is recognized and approved by the organization”.

With this signed upgrade functionality, all packages must be cryptographically signed before they can be installed. This applies to all upgrade types, including core OneFS, patches, cluster firmware, and drive firmware. The underlying components that comprise this feature include an updated .isi format for all package types, plus a new OneFS Catalog to store the verified packages. In OneFS 9.4 and later, the actual upgrades themselves are still performed using either the CLI or WebUI, and are very similar to previous versions.

Under the hood, the signed upgrade process works as follows:

This image depicts the OneFS signed upgrade process.

Everything goes through the catalog, which comprises four basic components. There’s a small SQLite database that tracks metadata, a library which has the basic logic for the catalog, the signature library based around OpenSSL which handles all of the verification, and a couple of directories in which to store the verified packages.

With signed upgrades, there’s a single file to download that contains the upgrade package, README text, and all signature data. No file unpacking is required.

The .isi file format is a follows:

This graphic illustrates the .isi file format.

This graphic illustrates the .isi package file format.

In the second region of the package file, you can directly incorporate a ‘readme’ text file that provides instructions, version compatibility requirements, and so on.

The first region, which contains the main package data, is also compatible with previous OneFS versions that don’t support the .isi format. This allows a signed firmware of the DSP package to be installed on OneFS 9.3 and earlier.

The new OneFS catalog provides a secure place to store verified .isi packages, and only the root account has direct access. The catalog itself is stored at /ifs/,ifsvar/catalog and all maintenance and interaction is performed using the isi upgrade catalog CLI command set. The contents, or artifacts, of the catalog each have an ID that corresponds to the SHA256 hash of the file.

Any user account with the ISI_PRIV_SYS_UPGRADE privilege can perform the following catalog-related actions, expressed as flags to the isi upgrade catalog command:

Action

Description

Clean

List packages in the catalog

Export

Save a catalog item to a user specified file location

Import

Verify and add a new .isi package file into the catalog

List

List packages in the catalog

Readme

Display the README text from a catalog item or .isi package file

Remove

Manually remove a package from the catalog

Repair

Re-verify all catalog packages an rebuild the database 

Verify

Verify the signature of a catalog item or .isi package file 

Package verification leverages the OneFS OpenSSL library, which enables a SHA256 hash of the manifest to be verified against the certificate. As part of this process, the chain-of-trust for the included certificate is compared with contents of the /etc/ssl/certs directory, and the distinguished name on the manifest checked against /etc/upgrade/identities file. Finally, the SHA256 hash of the data regions is compared against values from the manifest.

To check the signature, use the isi upgrade catalog verify command. For example:

# isi upgrade catalog verify --file /ifs/install.isi
Item             Verified
--------------------------
/ifs/install.isi True
--------------------------
Total: 1

To display additional install image details, use the isi_packager view command:

# isi_packager view --package /ifs/install.isi
== Region 1 ==
Type: OneFS Install Image
Name: OneFS_Install_0x90500B000000AC8_B_MAIN_2760(RELEASE)
Hash: ef7926cfe2255d7a620eb4557a17f7650314ce1788c623046929516d2d672304
Size: 397666098
 
== Footer Details ==
Format Version: 1
 Manifest Size: 296
Signature Size: 2838
Timestamp Size: 1495
 Manifest Hash: 066f5d6e6b12081d3643060f33d1a25fe3c13c1d13807f49f51475a9fc9fd191
Signature Hash: 5be88d23ac249e6a07c2c169219f4f663220d4985e58b16be793936053a563a3
Timestamp Hash: eca62a3c7c3f503ca38b5daf67d6be9d57c4fadbfd04dbc7c5d7f1ff80f9d948
 
== Signature Details ==
Fingerprint:     33fba394a5a0ebb11e8224a30627d3cd91985ccd
Issuer:          ISLN
Subject:         US / WA / Sea / Isln OneFS.
Organization:    Isln Powerscale OneFS
Expiration:      2022-09-07 22:00:22
Ext Key Usage:   codesigning

You can list the packages in the catalog, as follows:

# isi upgrade catalog list
ID    Type  Description                                              README
-----------------------------------------------------------------------------
cdb88 OneFS OneFS 9.4.0.0_build(2797)style(11) / B_MAIN_2797(RELEASE) -
3a145 DSP   Drive_Support_v1.39.1                                    Included 
840b8 Patch HealthCheck_9.2.1_2021-09                                Included 
aa19b Patch 9.3.0.2_GA-RUP_2021-12_PSP-1643                          Included
-----------------------------------------------------------------------------
Total: 4

Note that the package ID is comprised of the first few characters of SHA256 hash.

Packages are automatically imported when used, and verified upon import. You can also perform verification and import manually, if desired:

# isi upgrade catalog verify --file Drive_Support_v1.39.1.isi 
Item                                      Verified 
------------------------------------------------- 
/ifs/packages/Drive_Support_v1.39.1.isi True 
------------------------------------------------- 
# isi upgrade catalog import Drive_Support_v1.39.1.isi

You can also export packages from the catalog and copy them to another cluster, for example. Generally, exported packages can be re-imported, too.

# isi upgrade catalog list 
ID    Type  Description                                               README 
----------------------------------------------------------------------------- 
00b9c OneFS OneFS 9.5.0.0_build(2625)style(11) / B_MAIN_2625(RELEASE) – 
3a145 DSP Drive_Support_v1.39.1 Included 
----------------------------------------------------------------------------- 
Total: 5 
# isi upgrade catalog export --id 3a145 --file /ifs/Drive_Support_v1.39.1.isi

However, auto-generated OneFS images cannot be reimported.

The README column of the isi upgrade catalog list output indicates whether release notes are included for a .isi file or catalog item. If available, you can view them as follows:

# isi upgrade catalog readme --file HealthCheck_9.2.1_2021-09.isi | less Updated: September 02, 2021 *****************************************************************************
HealthCheck_9.2.1_2021-09: Patch for OneFS 9.2.1.x. 
This patch contains the 2021-09 RUP for the Isilon HealthCheck System 
***************************************************************************** 
This patch can be installed on clusters running the following OneFS version: 
* 9.2.1.x 
:

Within a readme file, details typically include a short description of the artifact, and which minimum OneFS version the cluster is required to be running for installation.

Cleanup of patches and OneFS images is performed automatically upon commit. Any installed packages require the artifact to be present in the catalog for a successful uninstall. Similarly, the committed OneFS image is required when removing a patch or when expanding the cluster by adding a node.

You can remove artifacts manually, as follows:

# isi upgrade catalog remove --id 840b8 
This will remove the specified artifact and all related metadata. 
Are you sure? (yes/[no]): yes

However, always use caution if attempting to manually remove a package.

When it comes to catalog housekeeping, the ‘clean’ function removes any catalog artifact files without database entries, although normally this happens automatically when an item is removed.

# isi upgrade catalog clean 
This will remove any artifacts that do not have associated metadata in the database. 
Are you sure? (yes/[no]): yes

Additionally, the catalog ‘repair’ function rebuilds the database, re-imports all valid items, and re-verifies their signatures:

# isi upgrade catalog repair 
This will attempt to repair the catalog directory. This will result in all stored artifacts being re-verified. Artifacts that fail to be verified will be deleted. Additionally, a new catalog directory will be initialized with the remaining artifacts. 
Are you sure? (yes/[no]): yes

When installing a signed upgrade, patch, firmware, or drive support package (DSP) on a cluster running OneFS 9.4 or later, the command syntax used is fundamentally the same as in prior OneFS versions, with only the file extension itself having changed. The actual install file will have the ‘.isi’ extension, and the file containing the hash value for download verification will have a ‘.isi.sha256’ suffix. For example, take the OneFS install files:

  • OneFS_v9.5.0.0_Install.isi
  • OneFS_v9.5.0.0_Install.isi.sha256

You can use the following syntax to initiate a parallel OneFS signed upgrade:

# isi upgrade start --install-image-path /ifs/install.isi -–parallel 

Or, if the desired upgrade image package is already in the catalog, you can instead use the —install-image-id flag to install it:

# isi upgrade start --install-image-id 00b9c –parallel

Or to upgrade a cluster’s firmware:

# isi upgrade firmware start --fw-pkg /ifs/IsiFw_Package_v10.3.7.isi –-rolling

To upgrade a cluster’s firmware using the ID of a package that’s in the catalog: 

# isi upgrade firmware start --fw-pkg-id cf01b -–rolling

To initiate a simultaneous upgrade of a patch:

# isi upgrade patches install --patch /ifs/patch.isi -–simultaneous 

And finally, to initiate a simultaneous upgrade of a drive firmware package:

# isi_dsp_install Drive_Support_v1.39.1.isi

Note that patches and drive support firmware are not currently able to be installed by their package IDs.

A committed upgrade image from the previous OneFS upgrade is automatically saved in the catalog, and also created automatically when a new cluster is configured. This image is required for new node joins, as well as when uninstalling patches. However, it’s worth noting that auto-created images will not have a signature and, although you can export them, they cannot be re-imported back into the catalog.

If the committed upgrade image is somehow missing, CELOG events are generated and the isi upgrade catalog repair command output displays an error. Additionally, when it comes to troubleshooting the signed upgrade process, it can pay to check the /var/log/messages and /var/log/isi_papi_d.log files, and the OneFS upgrade logs.

Author: Nick Trimbee

Read Full Blog
  • isilon
  • powerscale
  • f910
  • onefs

Accelerating AI Innovation and Sustainability: The High-Density, High-Performance Dell PowerScale F910

Aqib Kazi Aqib Kazi

Fri, 24 May 2024 19:47:16 -0000

|

Read Time: 0 minutes

Accelerating AI Innovation and Sustainability: The High-Density, High-Performance Dell PowerScale F910

 

In the era of rapid technological advancement, enterprises face an unprecedented challenge: accelerating AI innovation while minimizing environmental impact. As the demand for AI processing power skyrockets, so does the associated energy consumption, thus leading to a significant increase in carbon footprint. Power consumption has become a recurring topic with the introduction of AI. The New York Times reports, A.I. Could Soon Need as Much Electricity as an Entire Country (nytimes.com). These challenges call for a modern solution bridging the gap between performance and sustainability.

Enter Dell’s PowerScale F910 platform, a high-density, high-performance node designed to accelerate AI innovation while dramatically reducing an enterprise’s carbon footprint. A cutting-edge platform, transforming the way organizations approach AI, reaching ambitious performance goals without compromising our environmental responsibility.

How much more density are we talking about? The F910 offers 20% more density per rack unit than the F710 which was released merely three months ago with extraordinary performance. Moreover, the F910 delivers density with performance, as it’s up to 2.2x faster to AI insights.

In this blog post, we’ll explore how our high-density, high-performance platform redefines the landscape of AI innovation and sustainability. Discover how your enterprise can leverage this groundbreaking technology to accelerate AI initiatives, drive business growth, and make a positive impact on the environment.

AI innovation and sustainability challenges

Many challenges arise as enterprises increasingly rely on AI to drive innovation and gain a competitive edge. One of the most significant challenges is the exponentially increased demand for AI processing power. As AI models become more sophisticated and data volumes explode, enterprises require hardware solutions to keep pace with these evolving needs.

However, with great processing power comes great energy consumption. The energy-intensive nature of AI workloads has led to a concerning rise in enterprises' carbon footprints. Data centers housing AI infrastructure consume vast amounts of electricity, often generated from non-renewable resources, contributing significantly to greenhouse gas emissions. The impact here is not only environment but it is also a risk to businesses as consumers, investors, and regulators who increasingly prioritize sustainability.

Moreover, AI hardware's high energy consumption translates into substantial operational costs for enterprises. The electricity required to power and cool AI systems can quickly eat into an organization’s bottom line, making it challenging to justify the ROI of AI initiatives. In fact, Forbes reports Generative AI Breaks The Data Center: Data Center Infrastructure And Operating Costs Projected To Increase To Over $76 Billion By 2028 (forbes.com).

To address these challenges, enterprises urgently need hardware platforms that deliver high performance while prioritizing energy efficiency. They require platforms that can handle the demanding workloads of AI innovation without compromising on sustainability goals. The industry is calling for a paradigm shift in AI hardware design that places equal emphasis on processing power and environmental responsibility.

Fortunately, as the AI factory, Dell Technologies has answered the call to balance AI innovation with sustainability, by developing the new PowerScale platform that tackles these challenges head-on. In leveraging cutting-edge technology and innovative design principles, our solution enables enterprises to accelerate AI innovation while significantly reducing their carbon footprint. In the next section, we’ll examine how our hardware platform revolutionizes the AI landscape and paves the way for a more sustainable future.

Introducing the PowerScale F910

At Dell, we understand the pressing demand for a hardware solution that can bridge the gap between AI innovation and sustainability. From this need, we’ve developed a cutting-edge hardware platform designed to address the challenges enterprises face in the AI landscape.

Our high-density, high-performance hardware platform is a testament to our commitment to push the boundaries of AI technology while prioritizing environmental responsibility. By leveraging state-of-the-art hardware and the technical innovations of OneFS, the F910 provides unparalleled processing power in a compact, energy-efficient package.

Overview

The F910's front panel has a bezel protecting the 24 NVMe SSD drives, as displayed in the image below.

The front panel has an LCD that offers a range of information and status updates. It also has an option to add a node to an existing PowerScale cluster. The LCD display is also used to view the node’s iDRAC IP address, MAC address, cluster name, asset tag, power output, and temperature information. Furthermore, the front panel has an LED for the status on the left. For example, a failed drive illuminates an amber LED.

Moving on to the rear of the F910, we can see all the connections.

The power supplies are split across the backplane, allowing maximum airflow through the center of the chassis. The front-end and back-end network interfaces are on opposite sides, offering Ethernet connectivity. The other interfaces on the rear include iDRAC, serial, and management NICs.

The F910 nodes use NVMe SSDs. In a 6 RU rack configuration of 3 nodes, the F910 raw capacity spans a minimum of 276.5 TB to a maximum of 2.16 PB. The available drive capacities for the F910 are listed in the following table.

Non-SED Drive Capacities

SED-FIPS Drive Capacities

SED-Non-FIPS Drive Capacities

3.84 TB

3.84 TB

15.36 TB

7.68 TB

7.68 TB

30.72 TB QLC

15.36 TB QLC

15.36 TB QLC*

30.72 TB QLC

30.72 TB QLC*

*Future availability

For a new cluster deployment, a minimum of 3 F910 nodes is required to form a cluster. For existing cluster deployments, the F910 is node pool compatible with the F900, allowing the F910 to be added in a multiple of 1. If an existing cluster does not have any F900s, a minimum of 3 nodes is required to form a new node pool.

High density

In addition to being a high-performance platform, the F910 is the highest-density all-flash PowerScale node. We’ve engineered our system to pack not only an exceptional amount of computing power but also drive density, maximizing AI capabilities without the need for extensive physical infrastructure. This reduces the spatial footprint of AI hardware and minimizes the energy required for cooling and maintenance. See the table below to compare how the F910 compares to our other all-flash platforms.

Platform

Cluster Density per Rack Unit

PowerScale F200

30.72 TB

PowerScale F210

61 TB

PowerScale F600

245 TB

PowerScale F710

307 TB

Isilon F810

231 TB

PowerScale F910

360 TB

The F910 offers 20% more density compared to the F710. This number is further magnified if we compare the F910 to the F810, where the F910 offers a 55% gain! So now, let's characterize this into. What would look like in a data center? Let’s take a scenario where a data center currently has 10 racks, and each rack is filled to its current maximum capacity. With a 55% gain in density per rack unit, we can now fit 55% more computing and storage resources in each rack compared to the previous setup. If we take a simple scenario where a current data center has 10 racks, with 55% density gain, that's 10 ÷ 155% ≈ 6.45 racks.

In this scenario, the impact of a 55% gain in density per rack unit is still significant, as it allows the data center to reduce its rack consumption by 35%. Reducing physical space requirements can lead to cost savings, improved efficiency, and greater flexibility for future growth. Now, let’s visualize this in the image below.

Data reduction is enabled by default out of the box, further increasing the F910’s high density and capacity envelope. The inline data reduction process incorporates both compression and deduplication. When these elements are combined, they significantly boost the overall density of a cluster. As the density per Rack Unit (RU) increases, it decreases the Total Cost of Ownership (TCO) for the solution, reducing the carbon footprint.

High performance

The F910 achieves the ultimate performance envelope by taking advantage of hardware and software updates. PowerScale OneFS 9.7 and 9.8 introduced several performance-oriented updates.

OneFS 9.7 introduced a significant leap in performance by enhancing the following:

  • Implementing a round-robin distribution strategy across thread groups has significantly reduced thread lock contention, increasing OneFS's overall efficiency and performance.
  • Contention on turnstile locks has been reduced by increasing the value of Read-Write (RW) Lock retries, optimizing system performance.
  • In the context of NVMe storage nodes, writing operations are strategically executed around the journal for newly allocated blocks, therefore maintaining high performance and preventing data processing and access delays.

OneFS 9.8 further pushes the pure software performance envelope, optimizing the OneFS 9.7 updates to further build on them. Additionally, OneFS 9.8 introduces enhancements to thread handling and lock management. Finally, general code updates have brought about a significant performance leap.

On the hardware front, the F910 leverages the PowerEdge platform for extreme performance. Powered by a dual-socket Intel® Xeon® Gold 6442Y Processor, it delivers higher core counts, faster memory speeds, and improved security features. The F910 features PCIe 5.0 technology, which doubles the bandwidth and reduces the latency of the previous generation, thus enabling faster data transfers and more efficient use of accelerators. Furthermore, the F910 takes advantage of the DDR5 RAM, offering greater speed and bandwidth. The table below summarizes the F910’s hardware specifications:

Attribute

PowerScale F910 Specification

CPU

Dual Socket – Intel Sapphire Rapids 6442Y (2.6G/24C)

Memory

Dual Rank DDR5 RDIMMs 512 GB (16 x 32 GB)

Front-end networking

2 x 100 GbE or 25 GbE

Infrastructure networking

2 x 100 GbE

NVMe SSD drives

24

The combination of hardware and software updates allows the F910 to tackle even the most challenging workloads, minimizing time for AI insights. Overall, the F910 delivers AI insights 2.2x faster than previous generations. Let’s take that into context for a minute. If learning an AI model takes 10 hours to complete, being able to do it 2.2 times faster means it would be finished in approximately 4.55 hours. That’s a significant improvement in efficiency and productivity. You’re saving approximately 5.45 hours that can be used for other AI models.

NVIDIA DGX SuperPOD certification

Dell PowerScale is the world’s first Ethernet-based storage solution certified on NVIDIA DGX SuperPOD. The collaboration between Dell and NVIDIA is designed to help customers achieve faster and more efficient AI storage. Dell PowerScale exceeds the performance benchmark requirements for DGX SuperPOD. Integrating PowerScale and DGX SuperPOD allows for handling vast amounts of data at unprecedented speeds, thereby accelerating the process of training AI models.  

The PowerScale F910 expands the family of the already NVIDIA DGX SuperPOD-certified storage solution, while accelerating training times and balancing sustainability. For more on the PowerScale NVIDIA DGX SuperPOD certification, see h19971-powerscale-ethernet-superpod-certification.pdf (delltechnologies.com).

Services

Accelerate AI outcomes with help at every stage from Dell Professional services. Trusted experts work alongside you to align a winning strategy, validate data sets, implement, train and support GenAI models and close skills gaps to help you maintain secure and optimized F910 operations now and into the future. Furthermore, Dell services embeds sustainability throughout our services portfolio to proactively help customers approach the most pressing environmental challenges from sustainability to reducing waste. 

Customer feedback

During our F910 beta program, partners and customers tested and validated the F910's performance. We wanted to confirm our performance and density gains in a real-world environment. More importantly, we wanted to know what the gains would be on an existing workload.

At the onset, after the first batch of tests, all of the initial feedback was consistent. To paraphrase,

“I know you all claimed a lot more performance, but I didn’t think it would be this good.” 

We were ecstatic to hear that feedback. As the tests rolled on, more glowing reviews continued to come in.

In the end, John Lochausen, Technical Solutions Architect at World Wide Technology, summed up the sentiment best:

“We're hyper-focused on AI innovation in our AI Proving Ground, and the all-flash PowerScale F910 has exceeded our expectations. It doubles performance, reducing the power and energy costs required for the same workload, further advancing our customers' sustainability goals.”

Conclusion

In conclusion, what the F910 proves is that when it comes to AI innovation and sustainability, you can have your cake and eat it, too. Organizations can now accelerate AI innovation while accelerating sustainability. To summarize, the F910 checks all the modern AI workload requirements: High-Performance 🗹 High-Density 🗹 Power-Efficient 🗹 NVIDIA-Certified 🗹

For more on the PowerScale F910, see PowerScale All-Flash F210, F710, and F910 | Dell Technologies Info Hub

Author: Aqib Kazi, Senior Principal Engineering Technologist

 

Read Full Blog
  • Isilon
  • PowerScale
  • ransomware
  • cybersecurity

Future-Proof Your Data: Airgap Your Business Continuity Dataset

Aqib Kazi Aqib Kazi

Thu, 02 May 2024 17:49:35 -0000

|

Read Time: 0 minutes

In today's digital age, the ransomware threat looms larger than ever, posing a significant risk to businesses worldwide. As someone deeply entrenched in the intricacies of data protection, I've seen firsthand how devastating data breaches can be. It's not just about the immediate loss of data: the ripple effects can disrupt business operations for weeks, if not months. Interestingly, while nearly 91% of organizations use some form of data backup, a staggering 72% have had to recover lost data from a backup in the past year. This highlights the critical need for robust business continuity plans that go beyond simple data backup.

With the rate of new business failures varying significantly by location, the importance of being prepared cannot be overstated. For instance, in the District of Columbia, 28% of businesses fail in their first year, partly due to inadequate data protection strategies. I aim to shed light on how a well-crafted business continuity dataset can protect your company against ransomware attacks. By understanding common pitfalls and leveraging PowerScale’s Cyber Protection Suite, businesses can ensure that their data remains secure and accessible in an airgap, even in the face of unforeseen disasters.

Understanding ransomware attacks and their impact on business continuity

Ransomware attacks are not just a temporary disruption but a significant threat to a company's ongoing ability to conduct business. As a cybersecurity enthusiast committed to sharing valuable insights, I've seen firsthand how these attacks can dismantle a business's operations overnight. It's crucial for organizations to understand the ramifications of ransomware and to implement strategies that mitigate these risks. Let’s delve into the cost of ransomware to businesses and examine some case studies that highlight the importance of being prepared.

The cost of ransomware to businesses

The financial implications of ransomware attacks on businesses are staggering. Beyond the demand for ransom payments, which can reach millions of dollars, hidden costs can be even more detrimental. I've seen businesses suffer from prolonged downtime, loss of productivity, and irreversible damage to their reputation. According to recent cybersecurity reports, businesses attacked by ransomware often face comprehensive audits, increased insurance premiums, and, in some cases, legal liabilities due to compromised data.

Moreover, the recovery process incurs significant IT expenditures, including forensic analysis, system upgrades, and employee training. With organizations being attacked by ransomware every 14 seconds, the urgency for resilient data protection strategies is clear. Understanding these costs underlines the importance of investing in proactive cybersecurity measures, underscoring that the expense of prevention pales when compared to the costs of recovery.

Recent ransomware attacks

In September 2023, the MGM Grand Hotel in Las Vegas, Nevada faced a significant ransomware attack that had widespread repercussions. The attack targeted MGM Resorts International properties across the U.S. Operational disruptions were severe, including the disabling of online reservation systems, malfunctioning digital room keys, and affected slot machines on casino floors. MGM Resorts’ websites were also taken down. The financial impact was substantial, costing the company approximately $100 million due to operational disruptions and recovery efforts. The perpetrators behind this attack were the ALPHV group (also known as BlackCat), which is believed to have links to the Russian government.

Also in September 2023, further down the Las Vegas Strip, Caesars Entertainment, one of the world’s largest casino companies, fell victim to a ransomware attack. The attack targeted their systems, resulting in unauthorized access to their network. The hackers exfiltrated data, including many customers’ driver’s licenses and Social Security numbers. Caesars confirmed the breach and subsequently paid a multi-million-dollar ransom to the cybercriminals.

These attacks serve as a stark reminder that even seemingly impenetrable organizations can fall victim to sophisticated cyber threats.

Key elements of effective business continuity plans

As I delve deeper into the significance of business continuity planning for mitigating ransomware risks, it's crucial to outline the key elements that make these plans effective. First and foremost, a comprehensive risk assessment stands at the foundation. It enables businesses to identify potential vulnerabilities and ransomware threats, ensuring that all angles are covered. Next, comes the development of a robust incident response plan. This detailed guide prepares organizations to react swiftly and efficiently during a ransomware attack, minimizing operational disruptions and financial losses.

Another essential component is securing data backups. These backups must be regularly stored in an airgap to prevent them from becoming ransomware targets. Communication plans must also be in place to ensure transparent and timely information sharing with all stakeholders during and after a ransomware incident. Lastly, continuous employee training on cybersecurity best practices helps strengthen the first line of defense against ransomware attacks.

Incorporating ransomware preparedness into business continuity plans

Incorporating ransomware preparedness into business continuity plans is not just an option — it's a necessity in today's digital age. To start, adjusting the incident response plan specifically for ransomware involves identifying the most critical assets and ensuring they are protected with the most robust defenses. This can include employing advanced threat detection tools and securing endpoints to limit the spread of an attack.

Next, understanding the specific recovery requirements for key operations is vital. It involves establishing clear recovery time objectives (RTOs) and recovery point objectives (RPOs) for all critical systems. This guides the priority of system restorations to ensure that the most essential services are brought back online first, to minimize the business impact.

Testing the business continuity plan against ransomware scenarios on a regular basis is indispensable. Simulated attacks provide invaluable insights into the effectiveness of the current strategy and reveal areas for improvement. It ensures that when a real attack occurs, the organization is not caught off guard.

How an airgap protects your data

An airgap, also known as an air wall or air gapping, is a network security measure employed on one or more computers to ensure that a secure computer network is physically isolated from unsecured networks, such as the public Internet or an unsecured local area network. This strategy seeks to ensure the total isolation of a given system electromagnetically, electronically, and physically. An airgapped computer or network has no network interface controllers connected to other networks, with a physical and logical airgap.

For example, data on a PowerScale cluster in an airgap is fully protected from ransomware attacks, because the malware does not have access to it. In the following figure, the PowerScale cluster resides in the Cyber Recovery Vault, which has no access to the outside world.

 

When the Business Continuity Dataset is copied to the PowerScale Vault Cluster, it is completely airgapped from the outside world. You can configure the solution to check for new dataset updates or simply have it remain as a vault.

In this blog, I explained how to airgap a PowerScale cluster. In future blogs, I’ll cover the entire PowerScale Cyber Protection Suite. For more information about the PowerScale Cyber Protection Suite, be sure to see PowerScale Cyber Protection Suite Reference Architecture | Dell Technologies Info Hub.

Author: Aqib Kazi Senior Principal Engineering Technologist

Read Full Blog
  • security
  • PowerScale
  • OneFS
  • HTTP

OneFS and HTTP Security

Nick Trimbee Nick Trimbee

Mon, 22 Apr 2024 20:35:30 -0000

|

Read Time: 0 minutes

To enable granular HTTP security configuration, OneFS provides an option to disable nonessential HTTP components selectively. This can help reduce the overall attack surface of your infrastructure. Disabling a specific component’s service still allows other essential services on the cluster to continue to run unimpeded. In OneFS 9.4 and later, you can disable the following nonessential HTTP services:

Service

Description

PowerScaleUI

The OneFS WebUI configuration interface.

Platform-API-External

External access to the OneFS platform API endpoints.

Rest Access to Namespace (RAN)

REST-ful access by HTTP to a cluster’s /ifs namespace.

RemoteService

Remote Support and In-Product Activation.

SWIFT (deprecated)

Deprecated object access to the cluster using the SWIFT protocol. This has been replaced by the S3 protocol in OneFS.

You can enable or disable each of these services independently, using the CLI or platform API, if you have a user account with the ISI_PRIV_HTTP RBAC privilege.

You can use the isi http services CLI command set to view and modify the nonessential HTTP services:

# isi http services list
ID                     Enabled
------------------------------
Platform-API-External Yes
PowerScaleUI          Yes
RAN                   Yes
RemoteService         Yes
SWIFT                 No
------------------------------
Total: 5

For example, you can easily disable remote HTTP access to the OneFS /ifs namespace as follows:

# isi http services modify RAN --enabled=0

You are about to modify the service RAN. Are you sure? (yes/[no]): yes

Similarly, you can also use the WebUI to view and edit a subset of the HTTP configuration settings, by navigating to Protocols > HTTP settings:

WebUI screenshot showing HTTP configuration settings.

That said, the implications and impact of disabling each of the services is as follows:

Service

Disabling impacts

WebUI

The WebUI is completely disabled, and access attempts (default TCP port 8080) are denied with the warning Service Unavailable. Please contact Administrator.

If the WebUI is re-enabled, the external platform API service (Platform-API-External) is also started if it is not running. Note that disabling the WebUI does not affect the PlatformAPI service.

Platform API

External API requests to the cluster are denied, and the WebUI is disabled, because it uses the Platform-API-External service.

Note that the Platform-API-Internal service is not impacted if/when the Platform-API-External is disabled, and internal pAPI services continue to function as expected.

If the Platform-API-External service is re-enabled, the WebUI will remain inactive until the PowerScaleUI service is also enabled.

RAN

If RAN is disabled, the WebUI components for File System Explorer and File Browser are also automatically disabled.

From the WebUI, attempts to access the OneFS file system explorer (File System > File System Explorer) fail with the warning message Browse is disabled as RAN service is not running. Contact your administrator to enable the service.

This same warning also appears when attempting to access any other WebUI components that require directory selection.

RemoteService

If RemoteService is disabled, the WebUI components for Remote Support and In-Product Activation are disabled.

In the WebUI, going to Cluster Management > General Settings and selecting the Remote Support tab displays the message The service required for the feature is disabled. Contact your administrator to enable the service.

In the WebUI, going to Cluster Management > Licensing and scrolling to the License Activation section displays the message The service required for the feature is disabled. Contact your administrator to enable the service.

SWIFT

Deprecated object protocol and disabled by default.

You can use the CLI command isi http settings view to display the OneFS HTTP configuration:

# isi http settings view
            Access Control: No
      Basic Authentication: No
    WebHDFS Ran HTTPS Port: 8443
                        Dav: No
         Enable Access Log: Yes
                      HTTPS: No
 Integrated Authentication: No
               Server Root: /ifs
                    Service: disabled
           Service Timeout: 8m20s
          Inactive Timeout: 15m
           Session Max Age: 4H
Httpd Controlpath Redirect: No

Similarly, you can manage and change the HTTP configuration using the isi http settings modify CLI command.

For example, to reduce the maximum session age from four to two hours:

# isi http settings view | grep -i age
           Session Max Age: 4H
# isi http settings modify --session-max-age=2H
# isi http settings view | grep -i age
           Session Max Age: 2H

The full set of configuration options for isi http settings includes:

Option

Description

--access-control <boolean>

Enable Access Control Authentication for the HTTP service. Access Control Authentication requires at least one type of authentication to be enabled.

--basic-authentication <boolean>

Enable Basic Authentication for the HTTP service.

--webhdfs-ran-https-port <integer>

Configure Data Services Port for the HTTP service.

--revert-webhdfs-ran-https-port

Set value to system default for --webhdfs-ran-https-port.

--dav <boolean>

Comply with Class 1 and 2 of the DAV specification (RFC 2518) for the HTTP service. All DAV clients must go through a single node. DAV compliance is NOT met if you go through SmartConnect, or using 2 or more node IPs.

--enable-access-log <boolean>

Enable writing to a log when the HTTP server is accessed for the HTTP service.

--https <boolean>

Enable the HTTPS transport protocol for the HTTP service.

--https <boolean>

Enable the HTTPS transport protocol for the HTTP service.

--integrated-authentication <boolean>

Enable Integrated Authentication for the HTTP service.

--server-root <path>

Document root directory for the HTTP service. Must be within /ifs.

--service (enabled | disabled | redirect | disabled_basicfile)

Enable/disable the HTTP Service or redirect to WebUI or disabled BasicFileAccess.

--service-timeout <duration>

The amount of time (in seconds) that the server will wait for certain events before failing a request. A value of 0 indicates that the service timeout value is the Apache default.

--revert-service-timeout

Set value to system default for --service-timeout.

--inactive-timeout <duration>

Get the HTTP RequestReadTimeout directive from both the WebUI and the HTTP service.

--revert-inactive-timeout

Set value to system default for --inactive-timeout.

--session-max-age <duration>

Get the HTTP SessionMaxAge directive from both WebUI and HTTP service.

--revert-session-max-age

Set value to system default for --session-max-age.

--httpd-controlpath-redirect <boolean>

Enable or disable WebUI redirection to the HTTP service.

Note that while the OneFS S3 service uses HTTP, it is considered a tier-1 protocol, and as such is managed using its own isi s3 CLI command set and corresponding WebUI area. For example, the following CLI command forces the cluster to only accept encrypted HTTPS/SSL traffic on TCP port 9999 (rather than the default TCP port 9021):

# isi s3 settings global modify --https-only 1 –https-port 9921
# isi s3 settings global view
         HTTP Port: 9020
        HTTPS Port: 9999
        HTTPS only: Yes
S3 Service Enabled: Yes

Additionally, you can entirely disable the S3 service with the following CLI command:

# isi services s3 disable
The service 's3' has been disabled.

Or from the WebUI, under Protocols > S3 > Global settings:

WebUI Screenshot showing the S3 global configuration settings.

 Author: Nick Trimbee



Read Full Blog
  • PowerScale
  • OneFS
  • management ports

OneFS and PowerScale F-series Management Ports

Nick Trimbee Nick Trimbee

Mon, 22 Apr 2024 20:12:20 -0000

|

Read Time: 0 minutes

Another security enhancement that OneFS 9.5 and later releases brings to the table is the ability to configure 1GbE NIC ports dedicated to cluster management on the PowerScale F900, F710, F600, F210, and F200 all-flash storage nodes and P100 and B100 accelerators. Since these platforms were released, customers have been requesting the ability to activate the 1GbE NIC ports so that the node management activity and front end protocol traffic can be separated on physically distinct interfaces.

For background, since their introduction, the F600 and F900 have shipped with a quad port 1GbE rNDC (rack Converged Network Daughter Card) adapter. However, these 1GbE ports were non-functional and unsupported in OneFS releases prior to 9.5. As such, the node management and front-end traffic was co-mingled on the front-end interface.

In OneFS 9.5 and later, 1GbE network ports are now supported on all of the PowerScale PowerEdge based platforms for the purposes of node management, and are physically separate from the other network interfaces. Specifically, this enhancement applies to the F900, F600, F200 all-flash nodes, and P100 and B100 accelerators.

Under the hood, OneFS has been updated to recognize the 1GbE rNDC NIC ports as usable for a management interface. Note that the focus of this enhancement is on factory enablement and support for existing F600 customers that have the unused 1GbE rNDC hardware. This functionality has also been back-ported to OneFS 9.4.0.3 and later RUPs. Since the introduction of this feature, there have been several requests raised about field upgrades, but that use case is separate and will be addressed in a later release through scripts, updates of node receipts, procedures, and so on.

Architecturally, aside from some device driver and accounting work, no substantial changes were required to the underlying OneFS or platform architecture to implement this feature. This means that in addition to activating the rNDC, OneFS now supports the relocated front-end NIC in PCI slots 2 or 3 for the F200, B100, and P100.

OneFS 9.5 and later recognizes the 1GbE rNDC as usable for the management interface in the OneFS Wizard, in the same way it always has for the H-series and A-series chassis-based nodes.

All four ports in the 1GbE NIC are active, and for the Broadcom board, the interfaces are initialized and reported as bge0, bge1, bge2, and bge3.

The pciconf CLI utility can be used to determine whether the rNDC NIC is present in a node. If it is, a variety of identification and configuration details are displayed. For example, let’s look at the following output from a Broadcom rNDC NIC in an F200 node:

# pciconf -lvV pci0:24:0:0
bge2@pci0:24:0:0: class=0x020000 card=0x1f5b1028 chip=0x165f14e4 rev=0x00 hdr=0x00
      class       = network
      subclass    = ethernet
      VPD ident   = ‘Broadcom NetXtreme Gigabit Ethernet’
      VPD ro PN   = ‘BCM95720’
      VPD ro MN   = ‘1028’
      VPD ro V0   = ‘FFV7.2.14’
      VPD ro V1   = ‘DSV1028VPDR.VER1.0’
      VPD ro V2   = ‘NPY2’
      VPD ro V3   = ‘PMT1’
      VPD ro V4   = ‘NMVBroadcom Corp’
      VPD ro V5   = ‘DTINIC’
      VPD ro V6   = ‘DCM1001008d452101000d45’

We can use the ifconfig CLI utility to determine the specific IP/interface mapping on the Broadcom rNDC interface. For example:

# ifconfig bge0
 TME-1: bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
 TME-1:      ether 00:60:16:9e:X:X
 TME-1:      inet 10.11.12.13 netmask 0xffffff00 broadcast 10.11.12.255 zone 1
 TME-1:      inet 10.11.12.13 netmask 0xffffff00 broadcast 10.11.12.255 zone 0
 TME-1:      media: Ethernet autoselect (1000baseT <full-duplex>)
 TME-1:      status: active

In this output, the first IP address of the management interface’s pool is bound to bge0, which is the first port on the Broadcom rNDC NIC.

We can use the isi network pools CLI command to determine the corresponding interface. Within the system zone, the management interface is allocated an address from the configured IP range within its associated interface pool. For example:

# isi network pools list
ID                      SC Zone                  IP Ranges                   Allocation Method
----------------------------------------------------------------------------------------------
groupnet0.mgt.mgt       cluster_mgt_isln.com     10.11.12.13-10.11.12.20     static
# isi network pools view groupnet0.mgt.mgt | grep -i ifaces
               Ifaces: 1:mgmt-1, 2:mgmt-1, 3:mgmt-1, 4:mgmt-1, 5:mgmt-1

Or from the WebUI, under Network configuration > External network:

WebUI Network configuration screenshot, focusing on the External network tab

Drilling down into the mgt pool details shows the 1GbE management interfaces as the pool interface members:

WebUI screenshot shoing 1GbE management interfaces.

Note that the 1GbE rNDC network ports are solely intended as cluster management interfaces. As such, they are not supported for use with regular front-end data traffic.

The F900 and F600 nodes already ship with a four port 1GbE rNDC NIC installed. However, the F200, B100, and P100 platform configurations have also been updated to include a quad port 1GbE rNDC card. These new configurations have been shipping by default since January 2023. This required relocating the front end network’s 25GbE NIC (Mellanox CX4) to PCI slot 2 in the motherboard. Additionally, the OneFS updates needed for this feature have also now allowed the F200 platform to be offered with a 100GbE option too. The 100GbE option uses a Mellanox CX6 NIC in place of the CX4 in slot 2.

With this 1GbE management interface enhancement, the same quad-port rNDC card (typically the Broadcom 5720) that has been shipped in the F900 and F600 since their introduction, is now included in the F200, B100 and P100 nodes as well. All four 1GbE rNDC ports are enabled and active under OneFS 9.5 and later, too.

Node port ordering continues to follow the standard, increasing numerically from left to right. However, be aware that the port labels are not visible externally because they are obscured by the enclosure’s sheet metal.

The following back-of-chassis hardware images show the new placements of the NICs in the various F-series and accelerator platforms:

F600

F600 rear view.

F900

F900 rear view.

For both the F600 and F900, the NIC placement remains unchanged, because these nodes have always shipped with the 1GbE quad port in the rNDC slot since their launch.

F200

F200 rear view.

The F200 sees its front-end NIC moved to slot 3, freeing up the rNDC slot for the quad-port 1GbE Broadcom 5720.

B100 rear view.

Because the B100 backup accelerator has a fibre-channel card in slot 2, it sees its front-end NIC moved to slot 3, freeing up the rNDC slot for the quad-port 1GbE Broadcom 5720.

P100 rear view.

Finally, the P100 accelerator sees its front-end NIC moved to slot 3, freeing up the rNDC slot for the quad-port 1GbE Broadcom 5720.

Note that, while there is currently no field hardware upgrade process for adding rNDC cards to legacy F200 nodes or B100 and P100 accelerators, this will be addressed in a future release.

Author: Nick Trimbee

Read Full Blog
  • PowerScale
  • API
  • OneFS
  • CLI
  • USB ports

OneFS Security and USB Device Control

Nick Trimbee Nick Trimbee

Fri, 19 Apr 2024 17:34:44 -0000

|

Read Time: 0 minutes

As we’ve seen over the course of the last several articles, OneFS 9.5 delivers a wealth of security focused features. These span the realms of core file system, protocols, data services, platform, and peripherals. Among these security enhancements is the ability to manually or automatically disable a cluster’s USB ports from either the CLI, platform API, or by activating a security hardening policy.

In support of this functionality, the basic USB port control architecture is as follows:

Graphic depicting basic USB port control architecture.

To facilitate this, OneFS 9.5 and subsequent releases see the addition of a new gconfig variable, ‘usb_ports_disabled’, in ‘security_config’, specifically to track the status of USB Ports on a cluster. On receiving an admin request either from the CLI or the platform API handler to disable the USB port, OneFS modifies the security config parameter in gconfig. For example:

# isi_gconfig -t security_config | grep -i usb
usb_ports_disabled (bool) = true

Under the hood, the MCP (master control process) daemon watches for any changes to the ‘isi_security.gcfg’ security config file on the cluster. If the value for the ‘usb_ports_disabled’ variable in the ‘isi_security.gcfg’ file is updated, then MCP executes the ‘isi_config_usb’ utility to enact the desired change. Note that because ‘isi_config_usb’ operates per-node but the MCP actions are global (executed cluster wide), isi_config_usb is invoked across each node by a Python script to enable or disable the cluster’s USB Ports.

The USB Ports enable/disable feature is only supported on PowerScale F900, F600, F200, H700/7000, and A300/3000 clusters running OneFS 9.5 and later, and PowerScale F710 and F210 running OneFS 9.7 or later.

In OneFS 9.5 and later, USB port control can be manually configured from either the CLI or platform API.

Graphic showing USB port control manually configuration from either the CLI or platform API.

Note that there is no WebUI option at this time.

The following table lists the CLI and platform API configuration options for USB port control in OneFS 9.5 and later:

Action

CLI Syntax

Description

View

isi security settings view

Report the state of a cluster’s USB ports.

Enable

isi security settings modify --usb-ports-disabled=False 

Activate a cluster’s USB ports.

Disable

isi security settings modify --usb-ports-disabled=True

Disable a cluster’s USB ports.

For example:

# isi security settings view | grep -i usb
      USB Ports Disabled: No
# isi security settings modify --usb-ports-disabled=True
# isi security settings view | grep -i usb
      USB Ports Disabled: Yes

Similarly, to re-enable a cluster’s USB ports:

# isi security settings modify --usb-ports-disabled=False
# isi security settings view | grep -i usb
      USB Ports Disabled: No

Note that a user account with the OneFS ISI_PRIV_CLUSTER RBAC privilege is required to configure USB port changes on a cluster.

In addition to the ‘isi security settings’ CLI command, there is also a node-local CLI utility:

# whereis isi_config_usb
isi_config_usb: /usr/bin/isi_hwtools/isi_config_usb

As mentioned previously, ‘isi security settings’ acts globally on a cluster, using ‘isi_config_usb’ to effect its changes on each node.

Alternatively, cluster USB ports can also be enabled and disabled using the OneFS platform API with the following endpoints:

API

Method

Argument

Output

/16/security/settings

GET

No argument required.

JSON object for security settings with USB ports setting.

/16/security/settings

PUT

JSON object with boolean value for USB ports setting.

None or Error.

For example:

# curl -k -u <username>:<passwd> https://localhost:8080/platform/security/settings”
 
 {
 "settings" :
 {
 "fips_mode_enabled" : false,
 "restricted_shell_enabled" : false,
 "usb_ports_disabled" : true
 }
 }

In addition to manual configuration, the USB ports are automatically disabled if the STIG security hardening profile is applied to a cluster. 

Graphic depicting the USB ports being automatically disabled if the STIG security hardening profile is applied to a cluster. 

This is governed by the following section of XML code in the isi_hardening configuration file, which can be found at /etc/isi_hardening/profiles/isi_hardening.xml:

<CONFIG_ITEM id ="isi_usb_ports" version = "1">
              <PapiOperation>
                           <DO>
                                        <URI>/security/settings</URI>
                                        <BODY>{"usb_ports_disabled": true}</BODY>
                                        <KEY>settings</KEY>
                           </DO>
                           <UNDO>
                                        <URI>/security/settings</URI>
                                        <BODY>{"usb_ports_disabled": false}</BODY>
                                        <KEY>settings</KEY>
                           </UNDO>
                           <ACTION_SCOPE>CLUSTER</ACTION_SCOPE>
                           <IGNORE>FALSE</IGNORE>
              </PapiOperation>
 </CONFIG_ITEM>

The ‘isi_config_usb’ CLI utility can be used to display the USB port status on a subset of nodes. For example:

# isi_config_usb --nodes 1-10 --mode display
    Node   |   Current  |  Pending
-----------------------------------
    TME-9  |   UNSUP    | INFO: This platform is not supported to run this script.
    TME-8  |   UNSUP    | INFO: This platform is not supported to run this script.
    TME-1  |     On     |
    TME-3  |     On     |
    TME-2  |     On     |
   TME-10  |     On     |
    TME-7  |   AllOn    |
    TME-5  |   AllOn    |
    TME-6  |   AllOn    |
Unable to connect: TME-4

Note: In addition to port status, the output identifies any nodes that do not support USB port control (nodes 8 and 9 above) or that are unreachable (node 4 above).

When investigating or troubleshooting issues with USB port control, the following log files are the first places to check:

Log file

Description

/var/log/isi_papi_d.log

Will log any requests to enable or disable USB ports.

/var/log/isi_config_usb.log

Logs activity from the isi_config_usb script execution.

/var/log/isi_mcp

Logs activity related to MCP actions on invoking the API.

Author: Nick Trimbee

Read Full Blog

OneFS System Configuration Auditing

Nick Trimbee Nick Trimbee

Thu, 18 Apr 2024 04:55:18 -0000

|

Read Time: 0 minutes

OneFS auditing can detect potential sources of data loss, fraud, inappropriate entitlements, access attempts that should not occur, and a range of other anomalies that are indicators of risk. This can be especially useful when the audit associates data access with specific user identities. 

In the interests of data security, OneFS provides chain of custody auditing by logging specific activity on the cluster. This includes OneFS configuration changes plus NFS, SMB, and HDFS client protocol activity which are required for organizational IT security compliance, as mandated by regulatory bodies like HIPAA, SOX, FISMA, MPAA, and more. 

OneFS auditing uses Dell’s Common Event Enabler (CEE) to provide compatibility with external audit applications. A cluster can write audit events across up to five CEE servers per node in a parallel, load-balanced configuration. This allows OneFS to deliver an end to end, enterprise grade audit solution which efficiently integrates with third party solutions like Varonis DatAdvantage. 

The following diagram outlines the basic architecture of OneFS audit: 

  Both system configuration changes, as well as protocol activity, can be easily audited on a PowerScale cluster. However, the protocol path is greyed out above, since it is outside the focus of this article. More information on OneFS protocol auditing can be found here. 

As illustrated above, the OneFS audit framework is centered around three main services. 

Service

Description

isi_audit_cee 

Service allowing OneFS to support third-party auditing applications. The main method of accessing protocol audit data from OneFS is through a third-party auditing application. 

isi_audit_d 

Responsible for per-node audit queues and managing the data store for those queues. It provides a protocol on which clients may produce event payloads within a given context. It establishes a Unix domain socket for queue producers and handles writing and rotation of log files in /ifs/.ifsvar/audit/logs/node###/{config,protocol}/*. 

isi_audit_syslog 

Daemon providing forwarding of audit config and protocol events to syslog. 

 The basic configuration auditing workflow sees a cluster config change request come in via either the OneFS CLI, WebUI or platform API. The API handler infrastructure passes this request to the isi_audit_d service which intercepts it as a client thread and adds it to the audit queue. It is then processed and passed via a backend thread and written to the audit log files (IFS) as appropriate.

If audit syslog forwarding has been configured, IFS also passes the event to the isi_audit_syslog daemon, where a supervisor process instructs a writer thread to send it to the syslog which in turn updates its pertinent /var/log/ logfiles.  

Similarly, if Common Event Enabler (CEE) forwarding has been enabled, IFS will also pass the request to the isi_audit_cee service where a delivery worker threads will intercept it and send the event to the CEE server pool. The isi_audit_cee heartbeat task makes CEE servers available for audit event delivery. Only after a CEE server has received a successful heartbeat will audit events be delivered to it. Every ten seconds, the heartbeat task wakes up and sends each CEE server in the configuration a heartbeat. While CEE servers are available and events are in memory, an attempt will be made to deliver these. Shutdown will only save audit log position if all the events are delivered to CEE since audit should not lose events. It isn't critical that all events are delivered at shutdown since any unsaved events can be resent to CEE on the next start of isi_audit_cee since CEE handles duplicates.

 Within OneFS, all audit data is organized by topic and is securely stored in the file system.

# isi audit topics list

Name     Max Cached Messages

-----------------------------

protocol 2048

config   1024

-----------------------------

Total: 2

Auditing can detect a variety of potential sources of data loss. These include unauthorized access attempts, inappropriate entitlements, plus a bevy of other fraudulent activities that plague organizations across the gamut of industries. Enterprises are increasingly required to comply with stringent regulatory mandates developed to protect against these sources of data theft and loss.

OneFS system configuration auditing is designed to track and record all configuration events that are handled by the API through the command-line interface (CLI).  

# isi audit topics view config

               Name: config

Max Cached Messages: 1024

Once enabled, system configuration auditing requires no additional configuration, and auditing events are automatically stored in the config audit topic directories. Audit access and management is governed by the ‘ISI_PRIV_AUDIT’ RBAC privilege, and OneFS provides a default ‘AuditAdmin’ role for this purpose.

Audit events are stored in a binary file under /ifs/.ifsvar/audit/logs. The logs automatically roll over to a new file after the size reaches 1 GB. The audit logs are consumable by auditing applications that support the Dell Common Event Enabler (CEE).

OneFS audit topics and settings can easily be viewed and modified. For example, to increase the configuration auditing maximum cached messages threshold to 2048 from the CLI:

# isi audit topics modify config --max-cached-messages 2048

# isi audit topics view config

 

               Name: config

Max Cached Messages: 2048

Audit configuration can also be modified or viewed per access zone and/or topic.

Operation 

CLI Syntax 

Method and URI 

Get audit settings 

isi audit settings  view 

GET <cluster-ip:port>/platform/3/audit/settings 

Modify audit settings 

isi audit settings modify … 

PUT <cluster-ip:port>/platform/3/audit/settings 

View JSON schema for this resource, including query parameters and object properties info. 

 

GET <cluster-ip:port>/platform/3/audit/settings?describe 

View JSON schema for this resource, including query parameters and object properties info. 

 

GET <cluster-ip:port>/platform/1/audit/topics?describe 

Configuration auditing can be enabled on a cluster from either the CLI or platform API. The current global audit configuration can be viewed as follows:

1# isi audit settings global view

     Protocol Auditing Enabled: No

                 Audited Zones: -

               CEE Server URIs: -

                       Hostname:

       Config Auditing Enabled: No

         Config Syslog Enabled: No

         Config Syslog Servers: -

     Config Syslog TLS Enabled: No

  Config Syslog Certificate ID:

       Protocol Syslog Servers: -

   Protocol Syslog TLS Enabled: No

Protocol Syslog Certificate ID:

         System Syslog Enabled: No

         System Syslog Servers: -

     System Syslog TLS Enabled: No

  System Syslog Certificate ID:

          Auto Purging Enabled: No

              Retention Period: 180

       System Auditing Enabled: No

In this case, configuration auditing is disabled – its default setting. The following CLI syntax will enable (and verify) configuration auditing across the cluster:

# isi audit settings global modify --config-auditing-enabled 1

# isi audit settings global view | grep -i 'config audit'

       Config Auditing Enabled: Yes

In the next article, we’ll look at the config audit management, event viewing, and troubleshooting.

To enable configuration change audit redirection to syslog:

# isi audit settings global modify --config-auditing-enabled true

# isi audit settings global modify --config-syslog-enabled true

# isi audit settings global view | grep -i 'config audit'

       Config Auditing Enabled: Yes

Similarly, to disable configuration change audit redirection to syslog:

# isi audit settings global modify --config-syslog-enabled false

# isi audit settings global modify --config-auditing-enabled false

configure audit

2.

#isi audit setting modify --add-cee-server-uris='http://seavee5.west.isilon.com:12228/cee'

4.

# isi audit settings modify --add-audited-zones=auditgti

4' if you don't want audit that much

# isi audit setting modify --remove-audited-zones=System

config zone

3.

#isi zone zones create --all-auth-providers=true --audit-failure=all --audit-success=all --path=/ifs/data --name=auditgti

3'. if you dont' want to audit that much

#isi zone zones create --all-auth-providers=true --audit-failure=read,logon --audit-success=write,delete --path=/ifs/data --name=auditgti

network pool

5.

#isi network create pool --name=subnet0:auditpool --access-zone=auditgit --iface=<your interface> --range=<your range>  

5' you can also audit System by default, so this step can be ignored  

other settings

#isi audit setting modify --hostname="<any name you want really, this just gets inserted into the payload>"

 

#isi audit setting modify --cee-log-time="Protocol@1900-01-01 00:00:01"

The platform API can also be used to configure and manage auditing. For example, to enable configuration auditing on a cluster:

PUT /platform/1/audit/settings

Authorization: Basic QWxhZGRpbjpvcGVuIHN1c2FtZQ==

{

'config_auditing_enabled': True

}

Response example

 

The HTTP ‘204 response code from the cluster indicates that the request was successful, and that configuration auditing is now enabled on the cluster. No message body is returned for this request.

204 No Content

Content-type: text/plain,  

Allow: 'GET, PUT, HEAD'

 

Similarly, to modify the config audit topic’s maximum cached messages threshold to a value of ‘1000’ via the API:

PUT /1/audit/topics/config

Authorization: Basic QWxhZGRpbjpvcGVuIHN1c2FtZQ==

    {

         "max_cached_messages": 1000

    }

Again, no message body is returned from OneFS for this request.

204 No Content  

Content-type: text/plain,  

Allow: 'GET, PUT, HEAD'

Note that, in the unlikely event that a cluster experiences an outage during which it loses quorum, auditing will be suspended until it is regained. Events similar to the following will be written to the /var/log/audit_d.log file:

940b5c700]: Lost quorum! Audit logging will be disabled until /ifs is writeable again.

2023-08-28T15:37:32.132780+00:00 <1.6> TME-1(id1) isi_audit_d[6495]: [0x345940b5c700]: Regained quorum. Logging resuming.

When it comes to reading audit events on the cluster, OneFS natively provides the handy ‘isi_audit_viewer’ utility. For example, the following audit viewer output shows the events logged when the cluster admin added the ‘/ifs/tmp’ path to the SmartDedupe configuration, and created a new user named ‘test’1’:

# isi_audit_viewer

[0: Tue Aug 29 23:01:16 2023] {"id":"f54a6bec-46bf-11ee-920d-0060486e0a26","timestamp":1693350076315499,"payload":{"user":{"token": {"UID":0, "GID":0, "SID": "SID:S-1-22-1-0", "GSID": "SID:S-1-22-2-0", "GROUPS": ["SID:S-1-5-11", "GID:5", "GID:10", "GID:20", "GID:70"], "protocol": 17, "zone id": 1, "client": "10.135.6.255", "local": "10.219.64.11" }},"uri":"/1/dedupe/settings","method":"PUT","args":{}

,"body":{"paths":["/ifs/tmp"]}

}}

[1: Tue Aug 29 23:01:16 2023] {"id":"f54a6bec-46bf-11ee-920d-0060486e0a26","timestamp":1693350076391422,"payload":{"status":204,"statusmsg":"No Content","body":{}}}

[2: Tue Aug 29 23:03:43 2023] {"id":"4cfce7a5-46c0-11ee-920d-0060486e0a26","timestamp":1693350223446993,"payload":{"user":{"token": {"UID":0, "GID":0, "SID": "SID:S-1-22-1-0", "GSID": "SID:S-1-22-2-0", "GROUPS": ["SID:S-1-5-11", "GID:5", "GID:10", "GID:20", "GID:70"], "protocol": 17, "zone id": 1, "client": "10.135.6.255", "local": "10.219.64.11" }},"uri":"/18/auth/users","method":"POST","args":{}

,"body":{"name":"test1"}

}}

[3: Tue Aug 29 23:03:43 2023] {"id":"4cfce7a5-46c0-11ee-920d-0060486e0a26","timestamp":1693350223507797,"payload":{"status":201,"statusmsg":"Created","body":{"id":"SID:S-1-5-21-593535466-4266055735-3901207217-1000"}

}}

The audit log entries, such as those above, typically comprise the following components:

  1. Timestamp (Human readable)
  2. Unique Entry ID
  3. Timestamp (Unix Epoch Time)
  4. Node Number
  5. The user tokens of the person executing the command
    1. User persona (Unix/Windows)
    2. Primary group persona (Unix/Windows)
    3. Supplemental group personas (Unix/Windows)
    4. RBAC privileges of the person executing the command
  6. Interface used to generate the command
    1. 10 = PAPI / WebUI
    2. 16 = Console
    3. 17 = SSH
  7. Access Zone that the command was executed against
  8. Where the user connected from
  9. The local node address where the command was executed
  10. Command
  11. Command arguments
  12. Command body

The ‘isi_audit_viewer’ utility automatically reads the ‘config’ log topic by default, but can also be used read the ‘protocol’ log topic too. Its CLI command syntax is as follows:

# isi_audit_viewer -h

Usage: isi_audit_viewer [ -n <nodeid> | -t <topic> | -s <starttime>|

         -e <endtime> | -v ]

         -n <nodeid> : Specify node id to browse (default: local node)

         -t <topic>  : Choose topic to browse.

            Topics are "config" and "protocol" (default: "config")

         -s <start>  : Browse audit logs starting at <starttime>

         -e <end>    : Browse audit logs ending at <endtime>

         -v verbose  : Prints out start / end time range before printing

             records

Note that, on large clusters where there is heavy (up to the 100,000’s) of audit writes, when running the isi_audit_viewer utility across the cluster with ‘isi_for_array’, it can potentially lead to memory starvation and other issues – especially if outputting to a directory under /ifs. As such, consider directing the output to a non-IFS location such as /var/temp. Also, the isi_audit_viewer ‘-s’ (start time) and ‘-e’ (end time) flags can be used to limit a search (for  1-5 minutes), helping reduce the size of data.

In addition to reading audit events, the view is also a useful tool to assist with troubleshoot any auditing issues. Additionally, any errors that are encountered while processing audit events, and when delivering them to an external CEE server, are written to the log file ‘/var/log/isi_audit_cee.log’. Additionally, the protocol specific logs will contain any issues the audit filter has collecting while auditing events.

Read Full Blog

OneFS System Configuration Auditing – Part 2

Nick Trimbee Nick Trimbee

Thu, 18 Apr 2024 22:28:35 -0000

|

Read Time: 0 minutes

In the previous article of this series, we looked at the architecture and operation of OneFS configuration auditing. Now, we’ll turn our attention to its management, event viewing, and troubleshooting. 

The CLI command set for configuring ‘isi audit’ is split between two areas: 

Area 

Detail 

Syntax 

Events 

Specifies which events get logged, across three categories: 

•Audit Failure 

•Audit Success 

•Syslog Audit Events 

isi audit settings … 

Global 

Configuration of global audit parameters, including topics, zones, CEE, syslog, puring, retention, and more. 

isi audit settings global ... 



The ‘view’ argument for each command returns the following output: 

  1. Events: 

isi audit settings view 

            Audit Failure: create_filecreate_directoryopen_file_writeopen_file_readclose_file_unmodifiedclose_file_modifieddelete_filedelete_directoryrename_filerename_directoryset_security_fileset_security_directory 

            Audit Success: create_filecreate_directoryopen_file_writeopen_file_readclose_file_unmodifiedclose_file_modifieddelete_filedelete_directoryrename_filerename_directoryset_security_fileset_security_directory 

      Syslog Audit Events: create_filecreate_directoryopen_file_writeopen_file_readclose_file_unmodifiedclose_file_modifieddelete_filedelete_directoryrename_filerename_directoryset_security_fileset_security_directory 

Syslog Forwarding Enabled: No 

  1. Global: 

isi audit settings global view 

     Protocol Auditing Enabled: Yes 

                 Audited Zones: - 

               CEE Server URIs: - 

                      Hostname: 

       Config Auditing Enabled: Yes 

         Config Syslog Enabled: No 

         Config Syslog Servers: - 

     Config Syslog TLS Enabled: No 

  Config Syslog Certificate ID: 

       Protocol Syslog Servers: - 

   Protocol Syslog TLS Enabled: No 

Protocol Syslog Certificate ID: 

         System Syslog Enabled: No 

         System Syslog Servers: - 

     System Syslog TLS Enabled: No 

  System Syslog Certificate ID: 

          Auto Purging Enabled: No 

              Retention Period: 180 

       System Auditing Enabled: No 

While configuration auditing is disabled on OneFS by default, the following CLI syntax can be used to enable and verify config auditing across the cluster: 

isi audit settings global modify --config-auditing-enabled 1 

isi audit settings global view | grep -i 'config audit' 

       Config Auditing Enabled: Yes 

Similarly, to enable configuration change audit redirection to syslog: 

isi audit settings global modify --config-auditing-enabled true 

isi audit settings global modify --config-syslog-enabled true 

isi audit settings global view | grep -i 'config audit' 

       Config Auditing Enabled: Yes 

Or to disable redirection to syslog: 

isi audit settings global modify --config-syslog-enabled false 

isi audit settings global modify --config-auditing-enabled false 

CEE servers can be configured as follows: 

#isi audit settings global modify --add-cee-server-uris='<URL>’ 

For example:

#isi audit settings global modify --add-cee-server 

-uris='http://cee1.isilon.com:12228/cee' 

 

Auditing can be constrained by access zone, too: 

isi audit settings modify --add-audited-zones=audit_az1 

 

Note that, when auditing is enabled, the system zone is included by default. However, it can be excluded, if desired: 

isi audit setting modify --remove-audited-zones=System 

Access zone’s audit parameters can also be configured via the ‘isi zones’ CLI command set. For example: 

#isi zone zones create --all-auth-providers=true --audit-failure=all --audit-success=all --path=/ifs/data --name=audit_az1 

Granular audit event type configuration can be specified, if desired, to narrow the scope and reduce the amount of audit logging. 

For example, the following command syntax constrains auditing to read and logon failures and successful writes and deletes under path /ifs/data in the audit_az1 access zone:  

#isi zone zones create --all-auth-providers=true --audit-failure=read,logon --audit-success=write,delete --path=/ifs/data --name=audit_az1 

In addition to the CLI, the OneFS platform API can also be used to configure and manage auditing. For example, to enable configuration auditing on a cluster: 

PUT /platform/1/audit/settings 

Authorization: Basic QWxhZGRpbjpvcGVuIHN1c2FtZQ== 

{ 

'config_auditing_enabled': True 

} 

The following ‘204’ HTTP response code from the cluster indicates that the request was successful, and that configuration auditing is now enabled on the cluster. No message body is returned for this request. 

204 No Content 

Content-type: text/plain,  

Allow: 'GET, PUT, HEAD' 

Similarly, to modify the config audit topic’s maximum cached messages threshold to a value of ‘1000’ via the API: 

PUT /1/audit/topics/config 

Authorization: Basic QWxhZGRpbjpvcGVuIHN1c2FtZQ== 

    { 

        "max_cached_messages": 1000 

    } 

Again, no message body is returned from OneFS for this request. 

204 No Content  

Content-type: text/plain,  

Allow: 'GET, PUT, HEAD' 

Note that, in the unlikely event that a cluster experiences an outage during which it loses quorum, auditing will be suspended until it is regained. Events similar to the following will be written to the /var/log/audit_d.log file: 

940b5c700]: Lost quorum! Audit logging will be disabled until /ifs is writeable again. 

2023-08-28T15:37:32.132780+00:00 <1.6> TME-1(id1) isi_audit_d[6495]: [0x345940b5c700]: Regained quorum. Logging resuming. 

When it comes to reading audit events on the cluster, OneFS natively provides the handy ‘isi_audit_viewer’ utility. For example, the following audit viewer output shows the events logged when the cluster admin added the ‘/ifs/tmp’ path to the SmartDedupe configuration, and created a new user named ‘test’1’: 

isi_audit_viewer 

[0: Tue Aug 29 23:01:16 2023] {"id":"f54a6bec-46bf-11ee-920d-0060486e0a26","timestamp":1693350076315499,"payload":{"user":{"token": {"UID":0, "GID":0, "SID": "SID:S-1-22-1-0", "GSID": "SID:S-1-22-2-0", "GROUPS": ["SID:S-1-5-11", "GID:5", "GID:10", "GID:20", "GID:70"], "protocol": 17, "zone id": 1, "client": "10.135.6.255", "local": "10.219.64.11" }},"uri":"/1/dedupe/settings","method":"PUT","args":{} 

,"body":{"paths":["/ifs/tmp"]} 

}} 

[1: Tue Aug 29 23:01:16 2023] {"id":"f54a6bec-46bf-11ee-920d-0060486e0a26","timestamp":1693350076391422,"payload":{"status":204,"statusmsg":"No Content","body":{}}} 

[2: Tue Aug 29 23:03:43 2023] {"id":"4cfce7a5-46c0-11ee-920d-0060486e0a26","timestamp":1693350223446993,"payload":{"user":{"token": {"UID":0, "GID":0, "SID": "SID:S-1-22-1-0", "GSID": "SID:S-1-22-2-0", "GROUPS": ["SID:S-1-5-11", "GID:5", "GID:10", "GID:20", "GID:70"], "protocol": 17, "zone id": 1, "client": "10.135.6.255", "local": "10.219.64.11" }},"uri":"/18/auth/users","method":"POST","args":{} 

,"body":{"name":"test1"} 

}} 

[3: Tue Aug 29 23:03:43 2023] {"id":"4cfce7a5-46c0-11ee-920d-0060486e0a26","timestamp":1693350223507797,"payload":{"status":201,"statusmsg":"Created","body":{"id":"SID:S-1-5-21-593535466-4266055735-3901207217-1000"} 

}} 

The audit log entries, such as those above, typically comprise the following components:


Order

Component

Detail

1

Timestamp

Timestamp in human readable form

2

ID

Unique entry ID

3

Timestamp

Timestamp in UNIX epoch time

4

Node

Node number

5

User tokens

The user tokens of the Roles and rights of user executing the command.
1. User persona (Unix/Windows
2. Primary group persona (Unix/Windows
3. Supplemental group personas (Unix/Windows)
4. RBAC privileges of the user executing the command

6

Interface

Interface used to generate the command:

1. 10 = pAPI / WebUI

2. 16 = Console CLI

3. 17 = SSH CLI

7

Zone

Access zone that the command was executed against

8

Client IP

Where the user connected from

9

Local node

Local node address where the command was executed

10

Command

Command syntax

11

Arguments

Command arguments

12

Body

Command body


The ‘isi_audit_viewer’ utility automatically reads the ‘config’ log topic by default, but can also be used read the ‘protocol’ log topic too. Its CLI command syntax is as follows: 

isi_audit_viewer -h 

Usage: isi_audit_viewer [ -n <nodeid> | -t <topic> | -s <starttime>| 

         -e <endtime> | -v ] 

         -n <nodeid> : Specify node id to browse (default: local node) 

         -t <topic>  : Choose topic to browse. 

            Topics are "config" and "protocol" (default: "config") 

         -s <start>  : Browse audit logs starting at <starttime> 

         -e <end>    : Browse audit logs ending at <endtime> 

         -v verbose  : Prints out start / end time range before printing 

             records 

Note that, on large clusters where there is heavy (in the 100,000’s) of audit writes, when running the isi_audit_viewer utility across the cluster with ‘isi_for_array’, it can potentially lead to memory starvation and other issues – especially if outputting to a directory under /ifs. As such, consider directing the output to a non-IFS location such as /var/temp. Also, the isi_audit_viewer ‘-s’ (start time) and ‘-e’ (end time) flags can be used to limit a search (iefor  1-5 minutes), helping reduce the size of data. 

In addition to reading audit events, the view is also a useful tool to assist with troubleshoot any auditing issues. Additionally, any errors that are encountered while processing audit events, and when delivering them to an external CEE server, are written to the log file ‘/var/log/isi_audit_cee.log’. Additionally, the protocol specific logs will contain any issues the audit filter has collecting while auditing events. 

Author: Nick Trimbee


Read Full Blog
  • PowerScale
  • OneFS
  • logfiles
  • SupportAssist

OneFS Log Gather Transmission

Nick Trimbee Nick Trimbee

Wed, 17 Apr 2024 15:45:51 -0000

|

Read Time: 0 minutes

The OneFS isi_gather_info utility is the ubiquitous method for collecting and uploading a PowerScale cluster’s context and configuration to assist with the identification and resolution of bugs and issues. As such, it performs the following roles:

  • Executes many commands, scripts, and utilities on a cluster, and saves their results
  • Collates, or gathers, all these files into a single ‘gzipped’ package
  • Optionally transmits this log gather package back to Dell using a choice of several transport methods

By default, a log gather tarfile is written to the /ifs/data/Isilon_Support/pkg/ directory. It can also be uploaded to Dell by the following means:

Upload mechanism

Description 

TCP port

OneFS release support

SupportAssist / ESRS

Uses Dell Secure Remote Support (SRS) for gather upload.

443/8443

Any

FTP

Use FTP to upload the completed gather.

21

Any

FTPS

Use SSH-based encrypted FTPS to upload the gather.

22

Default in OneFS 9.5 and later

HTTP

Use HTTP to upload the gather.

80/443

Any

As indicated in this table, OneFS 9.5 and later releases now leverage FTPS as the default option for FTP upload, thereby protecting the upload of cluster configuration and logs with an encrypted transmission session.

Under the hood, the log gather process comprises an eight phase workflow, with transmission comprising the penultimate ‘Upload’ phase:

Graphic depicting log gathering process.

The details of each phase are as follows:

Phase

Description

1. Setup

Reads from the arguments passed in, and from any config files on disk, and sets up the config dictionary, which will be used throughout the rest of the codebase. Most of the code for this step is contained in isilon/lib/python/gather/igi_config/configuration.py. This is also the step in which the program is most likely to exit, if some config arguments end up being invalid.

2. Run local

Executes all the cluster commands, which are run on the same node that is starting the gather. All these commands run in parallel (up to the current parallelism value). This is typically the second longest running phase.

3. Run nodes

Executes the node commands across all of the cluster’s nodes. This runs on each node, and while these commands run in parallel (up to the current parallelism value), they do not run in parallel with the ‘Run local’ step.

4. Collect

Ensures that all of the results end up on the overlord node (the node that started the gather). If the gather is using /ifs, it is very fast; if it is not using /ifs, it needs to SCP all the node results to a single node.

5. Generate Extra Files

Generates nodes_info.xml and package_info.xml. These two files are present in every gather, and provide important metadata about the cluster.

6. Packing

Packs (tars and gzips) all the results. This is typically the longest running phase, often by an order of magnitude.

7. Upload

Transports the tarfile package to its specified destination using SupportAssist, ESRS, FTPS, FTP, HTTP, and so on. Depending on the geographic location, this phase might also be lengthy.

8. Cleanup

Cleans up any intermediary files that were created on the cluster. This phase will run even if the gather fails, or is interrupted.

Because the isi_gather_info tool is primarily intended for troubleshooting clusters with issues, it runs as root (or compadmin in compliance mode), because it needs to be able to execute under degraded conditions (such as without GMP, during upgrade, and under cluster splits, and so on). Given these atypical requirements, isi_gather_info is built as a standalone utility, rather than using the platform API for data collection.

While FTPS is the new default and recommended transport, the legacy plaintext FTP upload method is still available in OneFS 9.5 and later. As such, Dell’s log server, ftp.isilon.com, also supports both encrypted FTPS and plaintext FTP, so will not impact older release FTP log upload behavior.

This OneFS 9.5 FTPS security enhancement encompasses three primary areas where an FTPS option is now supported:

  • Directly executing the /usr/bin/isi_gather_info utility
  • Running using the isi diagnostics gather CLI command set
  • Creating a diagnostics gather through the OneFS WebUI

For the isi_gather_info utility, two new options are included in OneFS 9.5 and later releases:

New option for isi_gather_info

Description

Default value

--ftp-insecure

Enables the gather to use unencrypted FTP transfer.

False

--ftp-ssl-cert

Enables the user to specify the location of a special SSL certificate file.

Empty string. Not typically required.

Similarly, there are two corresponding options in OneFS 9.5 and later for the isi diagnostics CLI command:

New option for isi diagnostics

Description

Default value

--ftp-upload-insecure

Enables the gather to use unencrypted FTP transfer.

No

--ftp-upload-ssl-cert

Enables the user to specify the location of a special SSL certificate file.

Empty string. Not typically required.

Based on these options, the following table provides some command syntax usage examples, for both FTPS and FTP uploads:

FTP upload type

Description

Example isi_gather_info syntax

Example isi diagnostics syntax

Secure upload (default)

Upload cluster logs to the Dell log server (ftp.isilon.com) using encrypted FTP (FTPS).

# isi_gather_info

Or

# isi_gather_info --ftp

# isi diagnostics gather start

Or

# isi diagnostics gather start --ftp-upload-insecure=no

Secure upload

Upload cluster logs to an alternative server using encrypted FTPS.

# isi_gather_info --ftp-host <FQDN> --ftp-ssl-cert <SSL_CERT_PATH>

# isi diagnostics gather start --ftp-upload-host=<FQDN> --ftp-ssl-cert= <SSL_CERT_PATH>

Unencrypted upload

Upload cluster logs to the Dell log server (ftp.isilon.com) using plaintext FTP.

# isi_gather_info --ftp-insecure

# isi diagnostics gather start --ftp-upload-insecure=yes

Unencrypted upload

Upload cluster logs to an alternative server using plaintext FTP.

# isi_gather_info --ftp-insecure --ftp-host <FQDN>

# isi diagnostics gather start --ftp-upload-host=<FQDN> --ftp-upload-insecure=yes

Note that OneFS 9.5 and later releases provide a warning if the cluster admin elects to continue using non-secure FTP for the isi_gather_info tool. Specifically, if the --ftp-insecure option is configured, the following message is displayed, informing the user that plaintext FTP upload is being used, and that the connection and data stream will not be encrypted:

# isi_gather_info --ftp-insecure
You are performing plain text FTP logs upload.
This feature is deprecated and will be removed
in a future release. Please consider the possibility
of using FTPS for logs upload. For further information,
please contact PowerScale support
...

In addition to the command line, log gathers can also be configured using the OneFS WebUI by navigating to Cluster management > Diagnostics > Gather settings.

WebUI screenshot showing FTP/FTPS upload options.

The Edit gather settings page in OneFS 9.5 and later has been updated to reflect FTPS as the default transport method, plus the addition of radio buttons and text boxes to accommodate the new configuration options.

If plaintext FTP upload is configured, the healthcheck command will display a warning that plaintext upload is used and is no longer a recommended option. For example:

CLI screenshot showing a healthcheck warning that plain-text upload is used and is no longer a recommended option.

For reference, the OneFS 9.5 and later isi_gather_info CLI command syntax includes the following options:

Option

Description

--upload <boolean>

Enable gather upload.

--esrs <boolean>

Use ESRS for gather upload.

--noesrs

Do not attempt to upload using ESRS.

--supportassist

Attempt SupportAssist upload.

--nosupportassist

Do not attempt to upload using SupportAssist.

--gather-mode (incremental | full)

Type of gather: incremental or full.

--http-insecure <boolean>

Enable insecure HTTP upload on completed gather.

--http-host <string>

HTTP Host to use for HTTP upload.

--http-path <string>

Path on HTTP server to use for HTTP upload.

--http-proxy <string>

Proxy server to use for HTTP upload.

--http-proxy-port <integer>

Proxy server port to use for HTTP upload.

--ftp <boolean>

Enable FTP upload on completed gather.

--noftp

Do not attempt FTP upload.

--set-ftp-password

Interactively specify alternate password for FTP.

--ftp-host <string>

FTP host to use for FTP upload.

--ftp-path <string>

Path on FTP server to use for FTP upload.

--ftp-port <string>

Specifies alternate FTP port for upload.

--ftp-proxy <string>

Proxy server to use for FTP upload.

--ftp-proxy-port <integer>

Proxy server port to use for FTP upload.

--ftp-mode <value>

Mode of FTP file transfer. Valid values are both, active, and passive.

--ftp-user <string>

FTP user to use for FTP upload.

--ftp-pass <string>

Specify alternative password for FTP.

--ftp-ssl-cert <string>

Specifies the SSL certificate to use in FTPS connection.

--ftp-upload-insecure <boolean>

Whether to attempt a plaintext FTP upload.

--ftp-upload-pass <string>

FTP user to use for FTP upload password.

--set-ftp-upload-pass

Specify the FTP upload password interactively.

When a logfile gather arrives at Dell, it is automatically unpacked by a support process and analyzed using the logviewer tool.

Author: Nick Trimbee

Read Full Blog
  • security
  • PowerScale
  • cybersecurity

PowerScale Security Baseline Checklist

Aqib Kazi Aqib Kazi

Tue, 16 Apr 2024 22:36:48 -0000

|

Read Time: 0 minutes

As a security best practice, a quarterly security review is recommended. Forming an aggressive security posture for a PowerScale cluster is composed of different facets that may not be applicable to every organization. An organization’s industry, clients, business, and IT administrative requirements determine what is applicable. To ensure an aggressive security posture for a PowerScale cluster, use the checklist in the following table as a baseline for security.

This table serves as a security baseline and must be adapted to specific organizational requirements. See the Dell PowerScale OneFS: Security Considerations | Dell Technologies Info Hub white paper for a comprehensive explanation of the concepts in the table below.

Further, cluster security is not a single event. It is an ongoing process: Monitor this blog for updates. As new updates become available, this post will be updated. Consider implementing an organizational security review on a quarterly basis.

The items listed in the following checklist are not in order of importance or hierarchy but rather form an aggressive security posture as more features are implemented.

Security feature

Configuration

References and notes

Complete (Y/N)

Notes

Data at Rest Encryption

Implement external key manager with SEDs

Overview | Dell PowerScale OneFS: Security Considerations | Dell Technologies Info Hub

 

 

Data in flight encryption

Encrypt protocol communication and data replication

Dell PowerScale: Solution Design and Considerations for SMB Environments (delltechnologies.com)

 

PowerScale OneFS NFS Design Considerations and Best Practices | Dell Technologies Info Hub

 

Dell PowerScale SyncIQ: Architecture, Configuration, and Considerations | Dell Technologies Info Hub

 

 

Role Based Access Control (RBAC)

Assign the lowest possible access required for each role

PowerScale OneFS Authentication, Identity Management, and Authorization | Dell Technologies Info Hub

 

 

Multifactor authentication

SSH multifactor authentication with Duo | PowerScale OneFS Authentication, Identity Management, and Authorization | Dell Technologies Info Hub

 

SAML-based SSO for WebUI | PowerScale OneFS Authentication, Identity Management, and Authorization | Dell Technologies Info Hub

 

 

Cybersecurity

PowerScale Cyber Protection Suite Reference Architecture | Dell Technologies Info Hub

 

 

Monitoring

Monitor cluster activity

 

 

 

 

Cluster configuration backup and recovery

Ensure quarterly cluster backups

Backing Up and Restoring PowerScale Cluster Configurations in OneFS 9.7 | Dell Technologies Info Hub

 

 

Secure Boot

Configure PowerScale Secure Boot

Overview | Dell PowerScale OneFS: Security Considerations | Dell Technologies Info Hub

 

 

Auditing

Configure auditing

File System Auditing with Dell EMC PowerScale and Dell EMC Common Event Enabler | Dell Technologies Info Hub

 

 

Custom applications

Create a custom application for cluster monitoring 

GitHub - Isilon/isilon_sdk: Official repository for isilon_sdk

 

 

SED and cluster Universal Key rekey

Set a frequency to automatically rekey the Universal Key for SEDs and the cluster

SEDs universal key rekey | Dell PowerScale OneFS: Security Considerations | Dell Technologies Info Hub

Cluster services rekey | Dell PowerScale OneFS: Security Considerations | Dell Technologies Info Hub

 

 

Perform a quarterly security review 

Review all organizational security requirements and current implementation.

Check this paper and checklist for updates:

Security Advisories, Notices and Resources | Dell US

 

 

General cluster security best practices

See the best practices section of the Security Configuration Guide for the relevant release, at PowerScale OneFS Info Hubs | Dell US

 

 

Login, authentication, and privileges best practices

 

 

SNMP security best practices

 

 

SSH security best practices

 

 

Data-access protocols best practices

 

 

Web interface security best practices

 

 

Anti-virus

PowerScale: AntiVirus Solutions | Dell Technologies Info Hub

 

 

Author: Aqib Kazi – Senior Principal Engineering Technologist


Read Full Blog

OneFS SyncIQ and Windows File Create Date

Nick Trimbee Nick Trimbee

Fri, 17 May 2024 20:27:18 -0000

|

Read Time: 0 minutes

In the POSIX world, files typically possess three fundamental timestamps:

Timestamp

Alias

Description

Access

atime

Access timestamp of the last read.

Change

ctime

Status change timestamp of the last update to the file's metadata.

Modify

mtime

Modification timestamp of the last write.

These timestamps can be easily viewed from a variety of file system tools and utilities. For example, in this case running ‘stat’ from the OneFS CLI:

# stat -x tstr
  File: "tstr"
  Size: 0            FileType: Regular File
  Mode: (0600/-rw-------)         Uid: (    0/     root)  Gid: (    0/    wheel)
Device: 18446744073709551611,18446744072690335895   Inode: 5103485107    Links: 1
Access: Mon Sep 11 23:12:47 2023
Modify: Mon Sep 11 23:12:47 2023
Change: Mon Sep 11 23:12:47 2023

A typical instance of a change, or “ctime”, timestamp update occurs when a file’s access permissions are altered. Since modifying the permissions doesn’t physically open the file (ie. access the file’s data), its “atime” field is not updated. Similarly, since no modification is made to the file’s contents the “mtime” also remains unchanged. However, the file’s metadata has been changed, and the ctime field is used to record this event. As such, the “ctime” stamp allows a workflow such as a backup application to know to make a fresh copy of the file, including its updated permission values. Similarly, a file rename is another operation that modifies its “ctime” entry without affecting the other timestamps.

Certain other file systems also include a fourth timestamp: namely the “birthtime” of when the file was created. Birthtime (by definition) should never change. It’s also an attribute which organizations and their storage administrators may or may not care about.

Within the Windows file system realm, this “birthtime” timestamp, is affectionally known as “create date”. The create date of a file is essentially the date and time when its inode is “born”.

Note: that this is not a recognized POSIX attribute, like ctime or mtime, rather it is something that was introduced as part of Windows compatibility requirements. And, because it’s a birthtime, linking operations do not necessarily affect it unless a new inode is not created.

As shown below, this create, or birth, date can differ from a file’s modified or accessed dates because the creation date is when that file’s inode version originated. So, for instance, if a file is copied, the new file’s create date will be set to the current time since it has a new inode. This can be seen in the following example where a file is copied from a flash drive mounted on a Windows client’s file system under drive “E:”, to a cluster’s SMB share mounted at drive “Z:”.

  

The “Date created” date above is ahead in time of both the “accessed” and “modified”, because the latter two were merely inherited from the source file, whereas the create date was set when the copy was made.

The corresponding “date”, “stat”, and “isi get” CLI output from the cluster confirms this:

# stat TEST.txt
18446744072690400151 5103485107 -rw------- 1 root wheel 18446744073709551615 0 "Sep 11 23:12:47 2023" "Sep 11 23:12:47 2023" "Sep 11 23:12:47 2023" "Sep 11 23:12:47 2023" 8192 48 0xe0 tstr
# isi get -Dd TEST.txt
POLICY   W   LEVEL PERFORMANCE   COAL  ENCODING       FILE              IADDRS
default      16+2/2 concurrency   on    UTF-8         tstr              <34,12,58813849600:8192>, <35,3,58981457920:8192>, <69,12,57897025536:8192> ct: 1694473967 rt: 0
*************************************************
* IFS inode: [ 34,12,58813849600:8192, 35,3,58981457920:8192, 69,12,57897025536:8192 ]
*************************************************
*
*  Inode Version:      8
*  Dir Version:        2
*  Inode Revision:     1
*  Inode Mirror Count: 3
*  Recovered Flag:     0
*  Restripe State:     0
*  Link Count:         1
*  Size:               0
*  Mode:               0100600
*  Flags:              0xe0
*  SmartLinked:        False
*  Physical Blocks:    0
*  Phys. Data Blocks:  0
*  Protection Blocks:  0
*  LIN:                1:3031:00b3
*  Logical Size:       0
*  Shadow refs:        0
*  Do not dedupe:      0
*  In CST stats:       False
*  Last Modified:      1694473967.071973000
*  Last Inode Change:  1694473967.071973000
*  Create Time:        1694473967.071973000
*  Rename Time:        0
<snip>

In releases before OneFS 9.5, when a file is replicated, its create date is timestamped when that file was copied from the source cluster. This means when the replication job ran, or, more specifically, when the individual job worker thread got around to processing that specific file.

By way of contrast, OneFS 9.5 and later releases ensure that SyncIQ fully replicates the full array of metadata, preserving all values, including that of the birth time / create date.

The primary consideration for the new create date functionality is that it requires both source and target clusters in a replication set to be running OneFS 9.5 or later.

If either the source or the target is running pre-9.5 code, this time field retains its old behavior of being set to the time of replication (actual file creation) rather than the correct value associated with the source file.

 

In OneFS 9.5 and later releases, create date timestamping works exactly the same way as SyncIQ replication of other metadata (such as “mtime”, etc), occurring automatically as part of every file replication. Plus, no additional configuration is necessary beyond upgrading both clusters to OneFS 9.5 or later.

One other significant thing to note about this feature is that SyncIQ is changelist-based, using OneFS snapshots under the hood for its checkpointing and delta comparisons.. This means that, if a replication relationship has been configured prior to OneFS 9.5 or later upgrade, the source cluster will have valid birthtime data, but the target’s birthtime data will reflect the local creation time of the files it’s copied.

Note: that, upon upgrading both sides to OneFS 9.5 or later and running a SyncIQ job, nothing will change. This is because SyncIQ will perform its snapshot comparison, determine that no changes were made to the dataset, and so will not perform any replication work. However, if a source file is “touched” so that it’s mtime is changed (or any other action performed that will cause a copy-on-write, or CoW) that will cause the file to show up in the snapshot diff and therefore be replicated. As part of replicating that file, the correct birth time will be written on the target.

Note: that a full replication (re)sync does not get triggered as a result of upgrading a replication cluster pair to OneFS 9.5 or later and thereby enabling this functionality. Instead, any create date timestamp resolution happens opportunistically and in the background as files gets touched or modified - and thereby replicated. Be aware that ‘touching’ a file does change its modification time, in addition to updating the create date, which may be undesirable.

Author: Nick Trimbee

Read Full Blog
  • Isilon
  • PowerScale
  • OneFS
  • SyncIQ

Securing PowerScale OneFS SyncIQ

Aqib Kazi Aqib Kazi

Tue, 16 Apr 2024 17:55:56 -0000

|

Read Time: 0 minutes

In the data replication world, ensuring your PowerScale clusters' security is paramount. SyncIQ, a powerful data replication tool, requires encryption to prevent unauthorized access.

Concerns about unauthorized replication 

A cluster might inadvertently become the target of numerous replication policies, potentially overwhelming its resources. There’s also the risk of an administrator mistakenly specifying the wrong cluster as the replication target.

Best practices for security 

To secure your PowerScale cluster, Dell recommends enabling SyncIQ encryption as per Dell Security Advisory DSA-2020-039: Dell EMC Isilon OneFS Security Update for a SyncIQ Vulnerability | Dell US. This feature, introduced in OneFS 8.2, prevents man-in-the-middle attacks and addresses other security concerns.

Encryption in new and upgraded clusters 

SyncIQ is disabled by default for new clusters running OneFS 9.1. When SyncIQ is enabled, a global encryption flag requires all SyncIQ policies to be encrypted. This flag is also set for clusters upgraded to OneFS 9.1, unless there’s an existing SyncIQ policy without encryption.

Alternative measures 

For clusters running versions earlier than OneFS 8.2, configuring a SyncIQ pre-shared key (PSK) offers protection against unauthorized replication policies.

By following these security measures, administrators can ensure that their PowerScale clusters are safeguarded against unauthorized access and maintain the integrity and confidentiality of their data.

SyncIQ encryption: securing data in transit

Securing information as it moves between systems is paramount in the data-driven world. Dell PowerScale OneFS release 8.2 has brought a game-changing feature to the table: end-to-end encryption for SyncIQ data replication. This ensures that data is not only protected while at rest but also as it traverses the network between clusters.

Why encryption matters 

Data breaches can be catastrophic, and because data replication involves moving large volumes of sensitive information, encryption acts as a critical shield. With SyncIQ’s encryption, organizations can enforce a global setting that mandates encryption across all SyncIQ policies, to add an extra layer of security.

Test before you implement

It’s crucial to test SyncIQ encryption in a lab environment before deploying it in production. Although encryption introduces minimal overhead, its impact on workflow can vary based on several factors, such as network bandwidth and cluster resources.

Technical underpinnings 

SyncIQ encryption is powered by X.509 certificates, TLS version 1.2, and OpenSSL version 1.0.2o6. These certificates are meticulously managed within the cluster’s certificate stores, ensuring a robust and secure data replication process.

Remember, this is just the beginning of a comprehensive guide about SyncIQ encryption. Stay tuned for more insights about configuration steps and best practices for securing your data with Dell PowerScale’s innovative solutions.

Configuration

Configuring SyncIQ encryption requires a supported OneFS release, certificates, and finally, the OneFS configuration. Before enabling SyncIQ encryption in production, test it in a lab environment that mimics the production setup. Measure the impact on transmission overhead by considering network bandwidth, cluster resources, workflow, and policy configuration.

Here’s a high level summary of the configuration steps:

  1. Ensure compatibility:
    1. Ensure that the source and target clusters are running OneFS 8.2 or later.
    2. Upgrade and commit both clusters to OneFS release 8.2 or later.

  2. Create X.509 certificates:
    1. Create X.509 certificates for the source and target clusters using publicly available tools.
    2. The certificate creation process results in the following components:
      • Certificate Authority (CA) certificate
      • Source certificate and private key
      • Target certificate and private key

Note: Some certificate authorities may not generate the public and private key pairs. In that case, manually generate a Certificate Signing Request (CSR) and obtain signed certificates.

3. Transfer certificates to clusters:

  1. Transfer the certificates to each cluster.

4. Activate each certificate as follows:

    1. Add the source cluster certificate under Data Protection > SyncIQ > Certificates.
    2. Add the target server certificate under Data Protection > SyncIQ > Settings.
    3. Add the Certificate Authority under Access > TLS Certificates and select Import Authority.

5. Enforce encryption:

    1. Each cluster stores its certificate and its peer’s certificate.
    2. The source cluster must store the target cluster’s certificate, and vice versa.
    3. Storing the peer’s certificate creates a list of approved clusters for data replication.

By following these steps, you can secure your data in transit between PowerScale clusters using SyncIQ encryption. Remember to customize the certificates and settings according to your specific environment and requirements.

For more detailed information about configuring SyncIQ encryption, see SyncIQ encryption | Dell PowerScale SyncIQ: Architecture, Configuration, and Considerations | Dell Technologies Info Hub.

SyncIQ pre-shared key

A SyncIQ pre-shared key (PSK) is configured solely on the target cluster to restrict policies from source clusters without the PSK.

Use Cases: This is recommended for environments without SyncIQ encryption, such as clusters pre-OneFS 8.2 or due to other factors.

SmartLock Compliance: Not supported by SmartLock Compliance mode clusters; upgrading and configuring SyncIQ encryption is advised.

Policy Update: After updating source cluster policies with the PSK, no further configuration is needed. Use the isi sync policies view command to verify.

Remember, configuring the PSK will cause all replicating jobs to the target cluster to fail, so ensure that all SyncIQ jobs are complete or canceled before proceeding.

For more detailed information about configuring a SyncIQ pre-shared key, see SyncIQ pre-shared key | Dell PowerScale SyncIQ: Architecture, Configuration, and Considerations | Dell Technologies Info Hub.

Resources

Author: Aqib Kazi, Senior Principal Engineering Technologist

Read Full Blog
  • PowerScale
  • OneFS

PowerScale OneFS 9.8

Nick Trimbee Nick Trimbee

Tue, 09 Apr 2024 14:00:00 -0000

|

Read Time: 0 minutes

It’s launch season here at Dell Technologies, and PowerScale is already scaling up spring with the innovative OneFS 9.8 release which shipped today, 9th April 2024. This new 9.8 release has something for everyone, introducing PowerScale innovations in cloud, performance, serviceability, and ease of use.

 A figure describing the differences in OneFS versions 9.6, 9.7, and 9.8. OneFS 9.8 includes APEX File Storage for Azure, NFSv4.1 over RDMA, Job Engine SmartThrottling, Serviceability enhancements in SmartLog and Auto-analysis, IPv6 Source-based routing, streaming write performance, and multipath client driver.Figure 1. OneFS 9.8 release features

APEX File Storage for Azure

After the debut of APEX File Storage for AWS last year, OneFS 9.8 amplifies PowerScale’s presence in the public cloud by introducing APEX File Storage for Azure.

A figure describing how APEX File Storage for Azure interacts with OneFS and the cloud.Figure 2. OneFS 9.8 APEX File Storage for Azure

In addition to providing the same OneFS software platform on-prem and in the cloud as well as customer-managed for full control, APEX File Storage for Azure in OneFS 9.8 provides linear capacity and performance scaling from four to eighteen SSD nodes and up to 3PB per cluster, making it a solid fit for AI, ML, and analytics applications, as well as traditional file shares and home directories and vertical workloads like M&E, healthcare, life sciences, and financial services.

A diagram showing how OneFS 9.8 works within PowerScale and alongside APEX file storage for Azure and AWS, including multi-protocol access, data reduction, CloudPools, SmartQuotas, SyncIQ, SnapshotIQ, SmartQoS, and SmartConnect.Figure 3. Dell PowerScale scale-out architecture

PowerScale’s scale-out architecture can be deployed on customer-managed AWS and Azure infrastructure, providing the capacity and performance needed to run a variety of unstructured workflows in the public cloud.

Once in the cloud, existing PowerScale investments can be further leveraged by accessing and orchestrating your data through the platform's multi-protocol access and APIs. 

This includes the common OneFS control plane (CLI, WebUI, and platform API) and the same enterprise features, such as Multi-protocol, SnapshotIQ, SmartQuotas, Identity management, and so on.        

Simplicity and efficiency

OneFS 9.8 SmartThrottling is an automated impact control mechanism for the job engine, allowing the cluster to automatically throttle job resource consumption if it exceeds pre-defined thresholds in order to prioritize client workloads. 

OneFS 9.8 also delivers automatic on-cluster core file analysis, and SmartLog provides an efficient, granular log file gathering and transmission framework. Both of these new features help dramatically accelerate the ease and time to resolution of cluster issues.

Performance

OneFS 9.8 also adds support for Remote Direct Memory Access (RDMA) over NFS 4.1 support for applications and clients. This allows for substantially higher throughput performance – especially in the case of single-connection and read-intensive workloads such as machine learning and generative AI model training – while also reducing both cluster and client CPU utilization and provides the foundation for interoperability with NVIDIA’s GPUDirect.

RDMA over NFSv4.1 in OneFS 9.8 leverages the ROCEv2 network protocol. OneFS CLI and WebUI configuration options include global enablement and IP pool configuration, filtering, and verification of RoCEv2 capable network interfaces. NFS over RDMA is available on all PowerScale platforms containing Mellanox ConnectX network adapters on the front end and with a choice of 25, 40, or 100 Gigabit Ethernet connectivity. The OneFS user interface helps easily identify which of a cluster’s NICs support RDMA.

Under the hood, OneFS 9.8 introduces efficiencies such as lock sharding and parallel thread handling, delivering a substantial performance boost for streaming write-heavy workloads such as generative AI inferencing and model training. Performance scales linearly as compute is increased, keeping GPUs busy and allowing PowerScale to easily support AI and ML workflows both small and large. OneFS 9.8 also includes infrastructure support for future node hardware platform generations.

Multipath Client Driver

The addition of a new Multipath Client Driver helps expand PowerScale’s role in Dell Technologies’ strategic collaboration with NVIDIA, delivering the first and only end-to-end large scale AI system. This is based on the PowerScale F710 platform in conjunction with PowerEdge XE9680 GPU servers and NVIDIA’s Spectrum-X Ethernet switching platform to optimize performance and throughput at scale.

In summary, OneFS 9.8 brings the following new features to the Dell PowerScale ecosystem:

Feature

Info

Cloud

  • APEX File Storage for Azure
  • Up to 18 SSD nodes and 3PB per cluster

Simplicity

  • Job Engine SmartThrottling
  • Source-based routing for IPv6 networks

Performance

  • NFSv4.1 over RDMA
  • Streaming write performance enhancements
  • Infrastructure support for next generation all-flash node hardware platform

Serviceability

  • Automatic on-cluster core file analysis
  • SmartLog efficient, granular log file gathering

We’ll be taking a deeper look at this new functionality in blog articles over the course of the next few weeks. 

Meanwhile, the new OneFS 9.8 code is available on the Dell Online Support site, both as an upgrade and reimage file, allowing installation and upgrade of this new release.

 

Author: Nick Trimbee

Read Full Blog

Unveiling APEX File Storage for Microsoft Azure – Running PowerScale OneFS on Azure

Vincent Shen Vincent Shen

Tue, 09 Apr 2024 20:30:02 -0000

|

Read Time: 0 minutes

Overview

PowerScale OneFS 9.8 now brings a new offering in Azure — APEX File Storage for Microsoft Azure! It is a software-defined cloud file storage service that provides high-performance, flexible, secure, and scalable file storage for Microsoft Azure environments. It is also a fully customer managed service that is designed to meet the needs of enterprise-scale file workloads running on Azure. This offer joins another native cloud solution that was released last year - APEX File Storage for AWS, for more information, refer to the link: https://www.dell.com/en-us/dt/apex/storage/public-cloud/file.htm?hve=explore+file

Benefits of running OneFS in Cloud

APEX File Storage for Microsoft Azure brings the OneFS distributed file system software into the public cloud, allowing users to have the same management experience in the cloud as with their on-premises PowerScale appliance.

With APEX File Storage for Microsoft Azure, you can easily deploy and manage file storage on Azure. The service provides a scalable and elastic storage infrastructure that can grow, according to your actual business needs.

Some of the key features and benefits of APEX File Storage for Microsoft Azure include:

  • Scale-out: APEX File Storage for Microsoft Azure is powered by the Dell PowerScale OneFS distributed file system. You can start with a small OneFS cluster (minimal 4 nodes) and then expand it incrementally as your data storage requirements grow up to 5.6 PiB in cluster capacity with a single namespace (maximum 18 nodes). This large capacity helps support the most demanding, data intensive workloads such as AI.
  • Data management: APEX File Storage for Microsoft Azure provides powerful data management capabilities such as: snapshot, data replication, and backup and restore. Because OneFS features are the same in the cloud as they are in on-premises, organizations can simplify operations and reduce management complexity with a consistent user experience.
  • Simplified journey to hybrid cloud: More and more organizations operate in a hybrid cloud environment, where they need to move data between on-premises and cloud-based environments. APEX File Storage for Microsoft Azure can help you bridge this gap by facilitating seamless data mobility between on-premises and the cloud with native replication and by providing a consistent data management platform across both environments. Once in the cloud, customers can take advantage of enterprise-class OneFS features such as: multi-protocol support, CloudPools, data reduction, security, and snapshots, to run their workloads in the same way as they do on-premises.
  • Data resilience: Ensuring data resilience is critical for businesses to maintain continuity and to safeguard information. APEX File Storage for Microsoft Azure implements erasure coding techniques. This advanced approach optimizes storage efficiency and enhances fault tolerance, enabling the cluster to withstand multiple node failures. By spreading nodes across different racks using Azure availability set, the cluster ensures that data accessibility is maintained in the event of a rack failure.
  • High performance: APEX File Storage for Microsoft Azure delivers high-performance file storage with low-latency access to data, ensuring that you can access data quickly and efficiently. Compared to Azure NetApp Files, Dell APEX File Storage for Microsoft Azure enables: about 6x greater cluster performance, up to 11x larger namespace, up to 23x more snapshots per volume, 2x higher cluster resiliency, and an easier and more robust cluster expansion.
  • Proactive support: With a 97% customer satisfaction rate, Dell Support Services provides highly trained experts around the clock and around the globe to address your OneFS needs, minimize disruptions, and help you maintain a high level of productivity and outcomes.  

Architecture

APEX File Storage for Microsoft Azure is a software-defined cloud file storage service that combines the power of OneFS distributed file system with the flexibility and scalability of cloud infrastructure. It is a fully customer-managed service that is designed to meet the needs of enterprise-scale file workloads running on Azure.

The architecture of APEX File Storage for Microsoft Azure is built on the OneFS distributed file system. This architecture uses multiple cluster nodes to establish a single global namespace. Each cluster node operates as an instance of the OneFS software, running on an Azure VM to deliver storage capacity and compute resources. It is worth noting that the network bandwidth limit at the Azure VM level is shared between the cluster internal network and the external network.

APEX File Storage for Microsoft Azure uses cloud-native technologies and leverages the elasticity of cloud infrastructure, so that you can easily scale the storage infrastructure as your business requirements grow. APEX File Storage for Microsoft Azure can dynamically scale storage capacity and performance to meet changing demands. It is able to add additional cluster nodes without disruption enabling the storage infrastructure to scale in a more cost-effective and efficient manner. To guarantee the durability and resiliency of data, APEX File Storage for Microsoft Azure distributes data across multiple nodes within the cluster. It also uses advanced data protection techniques such as erasure coding and it provides features such as SyncIQ to ensure that data is available. Even in the event of one or more node failures, the data remains accessible from the remaining cluster nodes.

 

Availability set and proximity placement group: APEX File Storage for Microsoft Azure is designed to run in an availability set, and the availability set is associated with a dedicated proximity placement group. In this way, APEX File Storage for Microsoft Azure can have better reliability by ensuring more consistent, lower latency on the cluster backend network.

Virtual network: APEX File Storage for Microsoft Azure requires an Azure virtual network to provide network connectivity.

  • OneFS cluster internal subnet: The cluster nodes communicate with each other through the internal subnet. The internal subnet must be isolated from VMs that are not in the cluster. Therefore, a dedicated subnet is required for the internal network interfaces of cluster nodes that do not share the internal subnets with other Azure VMs.
  • OneFS cluster external subnet: The cluster nodes communicate with clients through the external subnet by using different protocols, such as NFS, SMB, and S3. 
  • OneFS cluster internal network interfaces: Network interfaces are in the internal subnet.
  • OneFS cluster external network interfaces: Network interfaces are in the external subnet.
  • Network security group: The network security group applies to the cluster network interfaces, which allows/denies specific traffic to OneFS cluster.
  • Azure VMs: These VMs serve as cluster nodes running the OneFS file system, backed by Azure managed disks. Each node within the cluster is strategically placed in an availability set and a proximity placement group. This configuration ensures that all nodes reside in separate fault domains, enhancing reliability, and it brings them physically closer together to enable lower network latency between cluster nodes. See the Azure availability sets overview and Azure proximity placement groups documentation for more details.

Overall, APEX File Storage for Microsoft Azure offers a powerful and flexible scale-out file storage solution that can help you improve data management, optimize costs, scalability, and security in a cloud-based environment.

Supported cluster configurations

Table 1 shows the supported configuration for APEX File Storage on Azure. It provides you the flexibility to choose different cluster size, various Azure VM size/SKU and so many Azure disk options to meeting your business requirements. For the detailed explanation of these configurations, refer to https://infohub.delltechnologies.com/en-US/t/apex-file-storage-for-microsoft-azure-deployment-guide.

  1. Supported configuration for a single cluster

Configuration items

Supported options

Cluster size

4 to 18 nodes

Azure VM size/SKU

All nodes in a cluster must use the same VM size/SKU. The supported VM sizes are: 

Azure managed disk type

All nodes in a cluster muse use the same disk type. The supported disk types are:

Note: Premium SSDs are only supported with Ddsv5-series and Edsv5-series

Azure managed disk size

All nodes in a cluster muse use the same disk size. The supported disk sizes are:

  • 0.5 TiB: P20 or E20 
  •    1 TiB: P30 or E30 
  •    2 TiB: P40, E40, or S40
  •    4 TiB: P50, E50, or S50
  •    8 TiB: P60, E60, or S60
  •  16 TiB: P70, E70, or S70

Disk count per node

All nodes in a cluster muse use the same disk count. The supported disk counts are:

  • 5, 6, 10, 12, 15, 18, 20, 24, 25, or 30

Cluster raw capacity

Minimum: 10 TiB, maximum: 5760 TiB

Cluster protection level

Default is +2n. Also supports +2d:1n with additional capacity restrictions. 

Support Regions

APEX File Storage for Microsoft Azure is globally available. For the detailed regions, refer to https://infohub.delltechnologies.com/en-US/t/apex-file-storage-for-microsoft-azure-deployment-guide.

Performance

Compared to Azure NetApp Files, Dell APEX File Storage for Microsoft Azure enables about 6x greater cluster performance, up to 11x larger namespace, up to 23x more snapshots per volume, 2x higher cluster resiliency, and easier and more robust cluster expansion.

Besides that, I will show you an example of how sequential read and sequential write performance can be linearly scaled out from 4 nodes to 18 nodes to make sure it meets your business requirements. 

This is what we have set up: 

We are using Azure Standard D48ds_v5 VM type and we scale from 10 nodes to 14 nodes and in the end to 18 nodes for testing purposes. With each deployment we kept all the other factors the same, we maintained 12 P40 Azure premium SSDs for data disks in each node. The following table displays the configuration:

Node type 

Node count 

Data disk type 

Data disk count 

Standard_D48ds_v5 

10 

P40 

12 

Standard_D48ds_v5 

14 

P40 

12 

Standard_D48ds_v5 

18 

P40 

12 

The diagram below demonstrates how read performance increases when we scale out our APEX File Storage for Microsoft Azure. You can see a clear positive trend starting from 10 nodes to 18 nodes. The same conclusion also applies with write performance. 

 

Another example is that you can also scale-up the overall performance of an APEX File Storage for Microsoft Azure by choosing a more powerful Azure VM size/SKU:

In this example, we tested the following Azure VM size/SKU with the same node number (4) and disk number per node (12):

  • D32ds_v5
  • D48ds_v5
  • D64ds_v5
  • E104ids_v5

From the results, we can easily find that with the scale-up of Azure VM size/SKU, both read and write performance increase:

 For more details on the performance results and best practices, refer to the following whitepaper https://infohub.delltechnologies.com/en-US/t/introduction-to-apex-file-storage-for-azure-1/.

Resources

https://infohub.delltechnologies.com/en-US/t/apex-file-storage-for-microsoft-azure-deployment-guide

https://infohub.delltechnologies.com/en-US/t/introduction-to-apex-file-storage-for-azure-1/

https://www.dell.com/en-us/blog/ai-anywhere-with-apex-file-storage-for-microsoft-azure/

Authors:

Vincent Shen, Lieven Lin, and Jason He

 

 

 

Read Full Blog
  • AI
  • deep learning
  • machine learning
  • PowerScale
  • OneFS
  • Unstructured Data

Optimizing AI: Meeting Unstructured Storage Demands Efficiently

Aqib Kazi Aqib Kazi

Thu, 21 Mar 2024 14:46:23 -0000

|

Read Time: 0 minutes

The surge in artificial intelligence (AI) and machine learning (ML) technologies has sparked a revolution across industries, pushing the boundaries of what's possible. However, this innovation comes with its own set of challenges, particularly when it comes to storage. The heart of AI's potential lies in its ability to process and learn from vast amounts of data, most of which is unstructured. This has placed unprecedented demands on storage solutions, becoming a critical bottleneck for advancing AI technologies.

Navigating the complex landscape of unstructured data storage is no small feat. Traditional storage systems struggle to keep up with the scale and flexibility required by AI workloads. Enterprises find themselves at a crossroads, seeking solutions that can provide scalable, affordable, and fault-tolerant storage. The quest for such a platform is not just about meeting current needs but also paving the way for the future of AI-driven innovation.

The current state of ML and AI

The evolution of ML and AI technologies has reshaped industries far and wide, setting new expectations for data processing and analysis capabilities. These advancements are directly tied to an organization's capacity to handle vast volumes of unstructured data, a domain where traditional storage solutions are being outpaced.

ML and AI applications demand unprecedented levels of data ingestion and computational power, necessitating scalable and flexible storage solutions. Traditional storage systems—while useful for conventional data storage needs—grapple with scalability issues, particularly when faced with the immense file quantities AI and ML workloads generate.

Although traditional object storage methods are capable of managing data as objects within a pool, they fall short when meeting the agility and accessibility requirements essential for AI and ML processes. These storage models struggle with scalability and facilitating the rapid access and processing of data crucial for deep learning and AI algorithms.

The dire necessity of a new kind of storage solution is evident as the current infrastructure is unable to cope with the silos of unstructured data. These silos make it challenging to access, process, and unify data sources, which in turn cripples the effectiveness of AI and ML projects. Furthermore, the maximum storage capacity of traditional storage, tethering at tens of terabytes, is insufficient for the needs of AI-driven initiatives which often require petabytes of data to train sophisticated models.

As ML and AI continue to advance, the quest for a storage solution that can support the growing demands of these technologies remains pivotal. The industry is in dire need of systems that provide ample storage and ensure the flexibility, reliability, and performance efficiency necessary to propel AI and ML into their next phase of innovation.

Understanding unstructured storage demands for AI

The advent of AI and ML has brought unprecedented advancements across industries, enhancing efficiency, accuracy, and the ability to manage and process large datasets. However, the core of these technologies relies on the capability to store, access, and analyze unstructured data efficiently. Understanding the storage demands essential for AI applications is crucial for businesses looking to harness the full power of AI technology.

High throughput and low latency

For AI and ML applications, time is of the essence. The ability to process data at high speeds with high throughput and access it with minimal delay and low latency are non-negotiable requirements. These applications often involve complex computations performed on vast datasets, necessitating quick access to data to maintain a seamless process. For instance, in real-time AI applications such as voice recognition or instant fraud detection, any delay in data processing can critically impact performance and accuracy. Therefore, storage solutions must be designed to accommodate these needs, delivering data as swiftly as possible to the application layer.

Scalability and flexibility

As AI models evolve and the volume of data increases, the need for scalability in storage solutions becomes paramount. The storage architecture must accommodate growth without compromising on performance or efficiency. This is where the flexibility of the storage solutions comes into play. An ideal storage system for AI would scale in capacity and performance, adapting to the changing demands of AI applications over time. Combining the best of on-premises and cloud storage, hybrid storage solutions offer a viable path to achieving this scalability and flexibility. They enable businesses to leverage the high performance of on-premise solutions and the scalability and cost-efficiency of cloud storage, ensuring the storage infrastructure can grow with the AI application needs.

Data durability and availability

Ensuring the durability and availability of data is critical for AI systems. Data is the backbone of any AI application, and its loss or unavailability can lead to significant setbacks in development and performance. Storage solutions must, therefore, provide robust data protection mechanisms and redundancies to safeguard against data loss. Additionally, high availability is essential to ensure that data is always accessible when needed, particularly for AI applications that require continuous operation. Implementing a storage system with built-in redundancy, failover capabilities, and disaster recovery plans is essential to maintain continuous data availability and integrity.

In the context of AI where data is continually ingested, processed, and analyzed, the demands on storage solutions are unique and challenging. Key considerations include maintaining high throughput and low latency for real-time processing, establishing scalability and flexibility to adapt to growing data volumes, and ensuring data durability and availability to support continuous operation. Addressing these demands is critical for businesses aiming to leverage AI technologies effectively, paving the way for innovation and success in the digital era.

What needs to be stored for AI?

The evolution of AI and its underlying models depends significantly on various types of data and artifacts generated and used throughout its lifecycle. Understanding what needs to be stored is crucial for ensuring the efficiency and effectiveness of AI applications.

Raw data

Raw data forms the foundation of AI training. It's the unmodified, unprocessed information gathered from diverse sources. For AI models, this data can be in the form of text, images, audio, video, or sensor readings. Storing vast amounts of raw data is essential as it provides the primary material for model training and the initial step toward generating actionable insights.

Preprocessed data

Once raw data is collected, it undergoes preprocessing to transform it into a more suitable format for training AI models. This process includes cleaning, normalization, and transformation. As a refined version of raw data, preprocessed data needs to be stored efficiently to streamline further processing steps, saving time and computational resources.

Training datasets

Training datasets are a selection of preprocessed data used to teach AI models how to make predictions or perform tasks. These datasets must be diverse and comprehensive, representing real-world scenarios accurately. Storing these datasets allows AI models to learn and adapt to the complexities of the tasks they are designed to perform.

Validation and test datasets

Validation and test datasets are critical for evaluating an AI model's performance. These datasets are separate from the training data and are used to tune the model's parameters and test its generalizability to new, unseen data. Proper storage of these datasets ensures that models are both accurate and reliable.

Model parameters and weights

An AI model learns to make decisions through its parameters and weights. These elements are fine-tuned during training and crucial for the model's decision-making processes. Storing these parameters and weights allows models to be reused, updated, or refined without retraining from scratch.

Model architecture

The architecture of an AI model defines its structure, including the arrangement of layers and the connections between them. Storing the model architecture is essential for understanding how the model processes data and for replicating or scaling the model in future projects.

Hyperparameters

Hyperparameters are the configuration settings used to optimize model performance. Unlike parameters, hyperparameters are not learned from the data but set prior to the training process. Storing hyperparameter values is necessary for model replication and comparison of model performance across different configurations.

Feature engineering artifacts

Feature engineering involves creating new input features from the existing data to improve model performance. The artifacts from this process, including the newly created features and the logic used to generate them, need to be stored. This ensures consistency and reproducibility in model training and deployment.

Results and metrics

The results and metrics obtained from model training, validation, and testing provide insights into model performance and effectiveness. Storing these results allows for continuous monitoring, comparison, and improvement of AI models over time.

Inference data

Inference data refers to new, unseen data that the model processes to make predictions or decisions after training. Storing inference data is key for analyzing the model's real-world application and performance and making necessary adjustments based on feedback.

Embeddings

Embeddings are dense representations of high-dimensional data in lower-dimensional spaces. They play a crucial role in processing textual data, images, and more. Storing embeddings allows for more efficient computation and retrieval of similar items, enhancing model performance in recommendation systems and natural language processing tasks.

Code and scripts

The code and scripts used to create, train, and deploy AI models are essential for understanding and replicating the entire AI process. Storing this information ensures that models can be retrained, refined, or debugged as necessary.

Documentation and metadata

Documentation and metadata provide context, guidelines, and specifics about the AI model, including its purpose, design decisions, and operating conditions. Proper storage of this information supports ethical AI practices, model interpretability, and compliance with regulatory standards.

Challenges of unstructured data in AI

In the realm of AI, handling unstructured data presents a unique set of challenges that must be navigated carefully to harness its full potential. As AI systems strive to mimic human understanding, they face the intricate task of processing and deriving meaningful insights from data that lacks a predefined format. This section delves into the core challenges associated with unstructured data in AI, primarily focusing on data variety, volume, and velocity.

Data variety

Data variety refers to the myriad types of unstructured data that AI systems are expected to process, ranging from texts and emails to images, videos, and audio files. Each data type possesses its unique characteristics and demands specific preprocessing techniques to be effectively analyzed by AI models.

  • Richer Insights but Complicated Processing: While the diverse data types can provide richer insights and enhance model accuracy, they significantly complicate the data preprocessing phase. AI tools must be equipped with sophisticated algorithms to identify, interpret, and normalize various data formats.
  • Innovative AI Applications: The advantage of mastering data variety lies in the development of innovative AI applications. By handling unstructured data from different domains, AI can contribute to advancements in natural language processing, computer vision, and beyond.

Data volume

The sheer volume of unstructured data generated daily is staggering. As digital interactions increase, so does the amount of data that AI systems need to analyze.

  • Scalability Challenges: The exponential growth in data volume poses scalability challenges for AI systems. Storage solutions must not only accommodate current data needs but also be flexible enough to scale with future demands.
  • Efficient Data Processing: AI must leverage parallel processing and cloud storage options to keep up with the volume. Systems designed for high-throughput data analysis enable quicker insights, which are essential for timely decision-making and maintaining relevance in a rapidly evolving digital landscape.

Data velocity

Data velocity refers to the speed at which new data is generated and the pace at which it needs to be processed to remain actionable. In the age of real-time analytics and instant customer feedback, high data velocity is both an opportunity and a challenge for AI.

  • Real-Time Processing Needs: AI systems are increasingly required to process information in real-time or near-real-time to provide timely insights. This necessitates robust computational infrastructure and efficient data streaming technologies.
  • Constant Adaptation: The dynamic nature of unstructured data, coupled with its high velocity, demands that AI systems constantly adapt and learn from new information. Maintaining accuracy and relevance in fast-moving data environments is critical for effective AI performance.

In addressing these challenges, AI and ML technologies are continually evolving, developing more sophisticated systems capable of handling the complexity of unstructured data. The key to unlocking the value hidden within this data lies in innovative approaches to data management where flexibility, scalability, and speed are paramount.

Strategies to manage unstructured data in AI

The explosion of unstructured data poses unique challenges for AI applications. Organizations must adopt effective data management strategies to harness the full potential of AI technologies. In this section, we delve into key strategies like data classification and tagging and the use of PowerScale clusters to efficiently manage unstructured data in AI.

Data classification and tagging

Data classification and tagging are foundational steps in organizing unstructured data and making it more accessible for AI applications. This process involves identifying the content and context of data and assigning relevant tags or labels, which is crucial for enhancing data discoverability and usability in AI systems.

  • Automated tagging tools can significantly reduce the manual effort required to label data, employing AI algorithms to understand the content and context automatically.
  • Custom metadata tags allow for the creation of a rich set of file classification information. This not only aids in the classification phase but also simplifies later iterations and workflow automation.
  • Effective data classification enhances data security by accurately categorizing sensitive or regulated information, enabling compliance with data protection regulations.

Implementing these strategies for managing unstructured data prepares organizations for the challenges of today's data landscape and positions them to capitalize on the opportunities presented by AI technologies. By prioritizing data classification and leveraging solutions like PowerScale clusters, businesses can build a strong foundation for AI-driven innovation.

An image of a human using AI for analytics.

Best practices for implementing AI storage solutions

Implementing the right AI storage solutions is crucial for businesses seeking to harness the power of artificial intelligence. With the explosive growth of unstructured data, adhering to best practices that optimize performance, scalability, and cost is imperative. This section delves into key practices to ensure your AI storage infrastructure meets the demands of modern AI workloads.

Assess workload requirements

Before diving into storage solutions, one must thoroughly assess AI workload requirements. Understanding the specific needs of your AI applications—such as the volume of data, the necessity for high throughput/low latency, and the scalability and availability requirements—is fundamental. This step ensures you select the most suitable storage solution that meets your application's needs.

AI workloads are diverse, with each having unique demands on storage infrastructure. For instance, training a machine learning model could require rapid access to vast amounts of data, whereas inference workloads may prioritize low latency. An accurate assessment leads to an optimized infrastructure, ensuring that storage solutions are neither overprovisioned nor underperforming, thereby supporting AI applications efficiently and cost-effectively.

Leverage PowerScale

For managing large volumes and varieties of unstructured data, leveraging PowerScale nodes offers a scalable and efficient solution. PowerScale nodes are designed to handle the complexities of AI and machine learning workloads, offering optimized performance, scalability, and data mobility. These clusters allow organizations to store and process vast amounts of data efficiently for a range of AI use cases due to the following:

  • Scalability is a key feature, with PowerScale clusters capable of growing with the organization's data needs. They support massive capacities, allowing businesses to store petabytes of data seamlessly.
  • Performance is optimized for the demanding workloads of AI applications with the ability to process large volumes of data at high speeds, reducing the time for data analyses and model training.
  • Data mobility within PowerScale clusters on-premise and in the cloud ensures that data can be accessed when and where needed, supporting various AI and machine learning use cases across different environments.

PowerScale clusters allow businesses to start small and grow capacity as needed, ensuring that storage infrastructure can scale alongside AI initiatives without compromising on performance. The ability to handle multiple data types and protocols within a single storage infrastructure simplifies management and reduces operational costs, making PowerScale nodes an ideal choice for dynamic AI environments.

Utilize PowerScale OneFS 9.7.0.0

PowerScale OneFS 9.7.0.0 is the latest version of  the Dell PowerScale operating system for scale-out network-attached storage (NAS). OneFS 9.7.0.0 introduces several enhancements in data security, performance, cloud integration, and usability. 

OneFS 9.7.0.0 extends and simplifies the PowerScale offering in the public cloud, providing more features across various instance types and regions. Some of the key features in OneFS 9.7.0.0 include:

  • Cloud Innovations: Extends cloud capabilities and features, building upon the debut of APEX File Storage for AWS
  • Performance Enhancements: Enhancements to overall system performance
  • Security Enhancements: Enhancements to data security features
  • Usability Improvements: Enhancements to make managing and using PowerScale easier

Employ PowerScale F210 and F710

PowerScale, through its continuous innovation, extends into the AI era by introducing the next generation of PowerEdge-based nodes: the PowerScale F210 and F710. These new all-flash nodes leverage the Dell PowerEdge R660 from the PowerEdge platform, unlocking enhanced performance capabilities.

On the software front, both the F210 and F710 nodes benefit from significant performance improvements in PowerScale OneFS 9.7. These nodes effectively address the most demanding workloads by combining hardware and software innovations. The PowerScale F210 and F710 nodes represent a powerful combination of hardware and software advancements, making them well-suited for a wide range of workloads. For more information on the F210 and F710, see PowerScale All-Flash F210 and F710 | Dell Technologies Info Hub.

Ensure data security and compliance

Given the sensitivity of the data used in AI applications, robust security measures are paramount. Businesses must implement comprehensive security strategies that include encryption, access controls, and adherence to data protection regulations. Safeguarding data protects sensitive information and reinforces customer trust and corporate reputation.

Compliance with data protection laws and regulations is critical to AI storage solutions. As regulations can vary significantly across regions and industries, understanding and adhering to these requirements is essential to avoid significant fines and legal challenges. By prioritizing data security and compliance, organizations can mitigate risks associated with data breaches and non-compliance.

Monitor and optimize

Continuous storage environment monitoring and optimization are essential for maintaining high performance and efficiency. Monitoring tools can provide insights into usage patterns, performance bottlenecks, and potential security threats, enabling proactive management of the storage infrastructure.

Regular optimization efforts can help fine-tune storage performance, ensuring that the infrastructure remains aligned with the evolving needs of AI applications. Optimization might involve adjusting storage policies, reallocating resources, or upgrading hardware to improve efficiency, reduce costs, and ensure that storage solutions continue to effectively meet the demands of AI workloads.

By following these best practices, businesses can build and maintain a storage infrastructure that supports their current AI applications and is poised for future growth and innovation.

Conclusion

Navigating the complexities of unstructured storage demands for AI is no small feat. Yet, by adhering to the outlined best practices, businesses stand to benefit greatly. The foundational steps include assessing workload requirements, selecting the right storage solutions, and implementing robust security measures. Furthermore, integrating PowerScale nodes and a commitment to continuous monitoring and optimization are key to sustaining high performance and efficiency. As the landscape of AI continues to evolve, these practices will not only support current applications but also pave the way for future growth and innovation. In the dynamic world of AI, staying ahead means being prepared, and these strategies offer a roadmap to success.

Frequently asked questions

How big are AI data centers?

Data centers catering to AI, such as those by Amazon and Google, are immense, comparable to the scale of football stadiums.

How does AI process unstructured data?

AI processes unstructured data including images, documents, audio, video, and text by extracting and organizing information. This transformation turns unstructured data into actionable insights, propelling business process automation and supporting AI applications.

How much storage does an AI need?

AI applications, especially those involving extensive data sets, might require significant memory, potentially as much as 1TB or more. Such vast system memory efficiently facilitates the processing and statistical analysis of entire data sets.

Can AI handle unstructured data?

Yes, AI is capable of managing both structured and unstructured data types from a variety of sources. This flexibility allows AI to analyze and draw insights from an expansive range of data, further enhancing its utility across diverse applications.

 

Author: Aqib Kazi, Senior Principal Engineer, Technical Marketing

Read Full Blog
  • AI
  • PowerScale
  • Storage
  • Security
  • safety and security
  • Video

The Influence of Artificial Intelligence on Video, Safety, and Security

Mordekhay Shushan Brian St.Onge Mordekhay Shushan Brian St.Onge

Fri, 23 Feb 2024 22:45:15 -0000

|

Read Time: 0 minutes

SIA recently unveiled its 2024 Security Megatrend report in which AI prominently claims the top position, dominating all four top spots. With AI making waves across global industries, there arises a set of concerns that demand thoughtful consideration. The key megatrends highlighted are as follows:

  • AI: Security of AI
  • AI: Visual Intelligence (Distinct from Video Surveillance)
  • AI: Generative AI
  • AI: Regulations of AI

This discussion will specifically delve into the first two trends—AI Security and Visual Intelligence.

Security of AI

The top spot on the list is occupied by the security of AI. Ironically, the most effective security for AI is AI itself. AI is tasked with monitoring behaviors related to data creation and access, identifying anomalies indicative of potential malicious activities. As businesses increasingly adopt AI, the value of data rises significantly for the organization. However, with AI becoming a more integral operational component, a cyber incident could disrupt not only data but also overall operations and production, particularly when there's a lack of metadata for decision-making.

Ensuring robust cyber protection for data becomes crucial, and solutions like the Ransomware Defender in Dell Technologies' unstructured data offering play a key role. Cyber recovery strategies are also imperative to swiftly resume normal operations. An air-gapped cyber recovery vault is essential, minimizing disruptions and securing a clean and complete dataset for rapid recovery from incidents.

 This is an illustration of how an air-gapped cyber recovery vault works. An operational airgap separates the cyber recovery vault and ensures a clean and complete dataset is available for rapid recovery from incidentsFigure 1. Air-gapped cyber recovery vault

AI visual intelligence

AI Visual Intelligence has been increasingly used across various industries for a multitude of purposes, including object recognition and classification, anomaly detection, predictive analytics, customer insights and experience enhancement, autonomous systems, healthcare diagnostics, environmental monitoring, and surveillance and security. By integrating AI Visual Intelligence into their operations, businesses can harness the power of visual data to improve decision-making, automate processes, enhance efficiencies, and unlock new opportunities for innovation and growth.

Video extends beyond security to impact business operations, enhancing efficiencies as the metadata collected from cameras serves business use cases beyond security functions. An example is the collection of this metadata, such as image metadata, timestamps, objects metadate, geo location, and more. The collection of this metadata necessitates a robust storage solution to preserve complete datasets readily available for models to achieve desired outcomes. This data is considered a mission-critical workload, demanding optimal uptime for storage solutions.

Adopting an N+X node-based storage architecture on-premises guarantees that data is consistently written and available, providing 99.9999% (6 nines) availability in an on-prem cloud environment. Dell Unstructured Data Solutions align perfectly with this workload, ensuring uninterrupted business operations compared to server-based storage solutions facing challenges during deployment or encountering issues with public cloud connectivity. The potential cost-prohibitive nature of public cloud storage for the data required in regular AI modeling may lead to a continued trend of cloud repatriation to on-premises.

Security practitioners evaluating the need for cameras must now strategically map out potential stakeholders within organizations to determine camera requirements aligned with their business outcomes. This strategic approach is anticipated to drive a higher demand for cameras and associated services.

Resources

Check out Dell PowerScale for more information about Dell PowerScale solutions.

 

Authors: Mordi Shushan, Brian Stonge  


Read Full Blog
  • PowerScale
  • OneFS
  • F210
  • F710

Introducing the Next Generation of PowerScale – the AI Ready Data Platform

Aqib Kazi Aqib Kazi

Tue, 20 Feb 2024 19:07:47 -0000

|

Read Time: 0 minutes

Generative AI systems thrive on vast amounts of unstructured data, which are essential for training algorithms to recognize patterns, make predictions, and generate new content. Unstructured data – such as text, images, and audio – does not follow a predefined model, making it more complex and varied than structured data.

Preprocessing unstructured data

Unstructured data does not have a predefined format or schema, including text, images, audio, video, or documents. Preprocessing unstructured data involves cleaning, normalizing, and transforming the data into a structured or semi-structured form that the AI can understand and that can be used for analysis or machine learning.

Preprocessing unstructured data for generative AI is a crucial step that involves preparing the raw data for use in training AI models. The goal is to enhance the quality and structure of the data to improve the performance of generative models.

There are different steps and techniques for preprocessing unstructured data, depending on the type and purpose of the data. Some common steps are:

  • Data completion: This step involves filling in missing or incomplete data, either by using average or estimated values or by discarding or ignoring the data points with missing fields.
  • Data noise reduction: This step involves removing or reducing irrelevant, redundant, or erroneous data, such as duplicates, spelling errors, hidden objects, or background noise.
  • Data transformation: This step involves converting the data into a standard or consistent format, including scaling and normalizing numerical data, encoding categorical data, or extracting features from text, image, audio, or video data.
  • Data reduction: This step involves reducing the dimensionality or size of the data, either by selecting a subset of relevant features or data points or by applying techniques such as principal component analysis, clustering, or sampling.
  • Data validation: This step involves checking the quality and accuracy of the preprocessed data by using statistical methods, visualization tools, or domain knowledge.

These steps can help enhance the quality, reliability, and interpretability of the data, which can improve the performance and outcomes of the analysis or machine learning models.

PowerScale F210 and F710 platform

PowerScale’s continuous innovation extends into the AI era with the introduction of the next generation of PowerEdge-based nodes, including the PowerScale F210 and F710. The new PowerScale all-flash nodes leverage Dell PowerEdge R660, unlocking the next generation of performance. On the software front, the F210 and F710 take advantage of significant performance improvements in PowerScale OneFS 9.7. Combining the hardware and software innovations, the F210 and F710 tackle the most demanding workloads with ease.

The F210 and F710 offer greater density in a 1U platform, with the F710 supporting 10 NVMe SSDs per node and the F210 offering a 15.36 TB drive option. The Sapphire Rapids CPU provide 19% lower cycles-per-instruction. PCIe Gen 5 doubles throughput when compared to PCIe Gen 4. Additionally, the nodes take advantage of DDR5, offering greater speed and bandwidth.

From a software perspective, PowerScale OneFS 9.7 introduces a significant leap in performance. OneFS 9.7 updates the protocol stack, locking, and direct-write. To learn more about OneFS 9.7, check out this article on PowerScale OneFS 9.7.

The OneFS journal in the all-flash F210 and F710 nodes uses a 32 GB configuration of the Dell Software Defined Persistent Memory (SDPM) technology. Previous platforms used NVDIMM-n for persistent memory, which consumed a DIMM slot.

For more details about the F210 and F710, see our other blog post at Dell.com: https://www.dell.com/en-us/blog/next-gen-workloads-require-next-gen-storage/.

Performance

The introduction of the PowerScale F210 and F710 nodes capitalizes on significant leaps in hardware and software from the previous generations. OneFS 9.7 introduces tremendous performance-oriented updates, including the protocol stack, locking, and direct-write. The PowerEdge-based servers offer a substantial hardware leap from previous generations. The hardware and software advancements combine to offer enormous performance gains, particularly for streaming reads and writes.

PowerScale F210

The PowerScale F210 is a 1U chassis based on the PowerEdge R660. A minimum of three nodes is required to form a cluster, with a maximum of 252 nodes. The F210 is node pool compatible with the F200.

An image of the PowerScale F210 front bezel

Table 1. F210 specifications

Attribute

PowerScale F210 Specification

Chassis

1U Dell PowerEdge R660

CPU

Single Socket – Intel Sapphire Rapids 4410Y (2G/12C)

Memory

Dual Rank DDR5 RDIMMs 128 GB (8 x 16 GB)

Journal

1 x 32 GB SDPM

Front-end networking

2 x 100 GbE or 25 GbE

Infrastructure networking

2 x 100 GbE or 25 GbE

NVMe SSD drives

4

PowerScale F710

The PowerScale F710 is a 1U chassis based on the PowerEdge R660. A minimum of three nodes is required to form a cluster, with a maximum of 252 nodes.

An image of the PowerScale F710 front bezel

Table 2. F710 specifications

Attribute

PowerScale F710 Specification

Chassis

1U Dell PowerEdge R660

CPU

Dual Socket – Intel Sapphire Rapids 6442Y (2.6G/24C)

Memory

Dual Rank DDR5 RDIMMs 512 GB (16 x 32 GB)

Journal

1 x 32 GB SDPM

Front-end networking

2 x 100 GbE or 25 GbE

Infrastructure networking

2 x 100 GbE

NVMe SSD drives

10

For more details on the new PowerScale all-flash platforms, see the PowerScale All-Flash F210 and F710 white paper.


Author: Aqib Kazi

Read Full Blog
  • Isilon
  • PowerScale
  • OneFS
  • ACL
  • Permission

OneFS Access Control Lists Overview

Lieven Lin Lieven Lin

Thu, 18 Jan 2024 22:29:13 -0000

|

Read Time: 0 minutes

As we know, when users access OneFS cluster data via different protocols, the final permission enforcement happens on the OneFS file system. In OneFS, this is achieved by the Access Control Lists (ACLs) implementation, which provides granular permission control on directories and files. In this article, we will look at the basics of OneFS ACLs.

OneFS ACL

OneFS provides a single namespace for multiprotocol access and has its own internal ACL representation to perform access control. The internal ACL is presented as protocol-specific views of permissions so that NFS exports display POSIX mode bits for NFSv3 and ACL for NFSv4 and SMB. 

When connecting to an PowerScale cluster with SSH, you can manage not only POSIX mode bits but also ACLs with standard UNIX tools such as chmod commands. In addition, you can edit ACL policies through the web administration interface to configure OneFS permissions management for networks that mix Windows and UNIX systems.

The OneFS ACL design is derived from Windows NTFS ACL. As such, many of its concept definitions and operations are similar to the Windows NTFS ACL, such as ACE permissions and inheritance.

OneFS synthetic ACL and real ACL

To deliver cross-protocol file access seamlessly, OneFS stores an internal representation of a file-system object’s permissions. The internal representation can contain information from the POSIX mode bits or the ACL. 

OneFS has two types of ACLs to fulfill different scenarios:

  • OneFS synthetic ACL: Under the default ACL policy, if no inheritable ACL entries exist on a parent directory – such as when a file or directory is created through a NFS or SSH session on OneFS within the parent directory – the directory will only contain POSIX mode bits permission. OneFS uses the internal representation to generate a OneFS synthetic ACL, which is an in-memory structure that approximates the POSIX mode bits of a file or directory for an SMB or NFSv4 client. 
  • OneFS real ACL: Under the default ACL policy, when a file or directory is created through SMB or when the synthetic ACL of a file or directory is modified through an NFSv4 or SMB client, the OneFS real ACL is initialized and stored on disk. The OneFS real ACL can also be initialized using the OneFS enhanced chmod command tool with the +a, -a, or =a option to modify the ACL. 

OneFS access control entries

In contrast to the Windows DACL and NFSv4 ACL, the OneFS ACL access control entry (ACE) adds an additional identity type. OneFS ACEs contain the following information:

  • Identity name: The name of a user or group
  • ACE type: The type of the ACE (allow or deny)
  • ACE permissions and inheritance flags: A list of permissions and inheritance flags separated with commas

OneFS ACE permissions

Similar to the Windows permission level, OneFS divides permissions into the following three types:

  • Standard ACE permissions: These apply to any object in the file system
  • Generic ACE permissions: These map to a bundle of specific permissions
  • Constant ACE permissions: These are specific permissions for file-system objects

The standard ACE permissions that can appear for a file-system object are shown in the following table:

ACE permission

Applies to

Description

std_delete

Directory or file

The right to delete the object

std_read_dac

Directory or file

The right to read the security descriptor, not including the SACL

std_write_dac

Directory or file

The right to modify the DACL in the object's security descriptor

std_write_owner

Directory or file

The right to change the owner in the object's security descriptor

std_synchronize

Directory or file

The right to use the object as a thread synchronization primitive

std_required

Directory or file

Maps to std_delete, std_read_dac, std_write_dac, and std_write_owner

The generic ACE permissions that can appear for a file system object are shown in the following table:

ACE permission

Applies to

Description

generic_all

Directory or file

Read, write, and execute access. Maps to file_gen_all or dir_gen_all.

generic_read

Directory or file

Read access. Maps to file_gen_read or dir_gen_read.

generic_write

Directory or file

Write access. Maps to file_gen_write or dir_gen_write.

generic_exec

Directory or file

Execute access. Maps to file_gen_execute or dir_gen_execute.

dir_gen_all

Directory

Maps to dir_gen_read, dir_gen_write, dir_gen_execute, delete_child, and std_write_owner.

dir_gen_read

Directory

Maps to list, dir_read_attr, dir_read_ext_attr, std_read_dac, and std_synchronize.

dir_gen_write

Directory

Maps to add_file, add_subdir, dir_write_attr, dir_write_ext_attr, std_read_dac, and std_synchronize.

dir_gen_execute

Directory

Maps to traverse, std_read_dac, and std_synchronize.

file_gen_all

File

Maps to file_gen_read, file_gen_write, file_gen_execute, delete_child, and std_write_owner.

file_gen_read

File

Maps to file_read, file_read_attr, file_read_ext_attr, std_read_dac, and std_synchronize.

file_gen_write

File

Maps to file_write, file_write_attr, file_write_ext_attr, append, std_read_dac, and std_synchronize.

file_gen_execute

File

Maps to execute, std_read_dac, and std_synchronize.

The constant ACE permissions that can appear for a file-system object are shown in the following table:

ACE permission

Applies to

Description

modify

File

Maps to file_write, append, file_write_ext_attr, file_write_attr, delete_child, std_delete, std_write_dac, and std_write_owner

file_read

File

The right to read file data

file_write

File

The right to write file data

append

File

The right to append to a file

execute

File

The right to execute a file

file_read_attr

File

The right to read file attributes

file_write_attr

File

The right to write file attributes

file_read_ext_attr

File

The right to read extended file attributes

file_write_ext_attr

File

The right to write extended file attributes

delete_child

Directory or file

The right to delete children, including read-only files within a directory; this is currently not used for a file, but can still be set for Windows compatibility

list

Directory

List entries

add_file

Directory

The right to create a file in the directory

add_subdir

Directory

The right to create a subdirectory

traverse

Directory

The right to traverse the directory

dir_read_attr

Directory

The right to read directory attributes

dir_write_attr

Directory

The right to write directory attributes

dir_read_ext_attr

Directory

The right to read extended directory attributes

dir_write_ext_attr

Directory

The right to write extended directory attributes

OneFS ACL inheritance

Inheritance allows permissions to be layered or overridden as needed in an object hierarchy and allows for simplified permissions management. The semantics of OneFS ACL inheritance are the same as Windows ACL inheritance and will feel familiar to someone versed in Windows NTFS ACL inheritance. The following table shows the ACE inheritance flags defined in OneFS:

ACE inheritance flag

Set on directory or file

Description

object_inherit

Directory only

Indicates an ACE applies to the current directory and files within the directory

container_inherit

Directory only

Indicates an ACE applies to the current directory and subdirectories within the directory

inherit_only

Directory only

Indicates an ACE applies to subdirectories only, files only, or both within the directory.

no_prop_inherit

Directory only

Indicates an ACE applies to the current directory or only the first-level contents of the directory, not the second-level or subsequent contents

inherited_ace

File or directory

Indicates an ACE is inherited from the parent directory

 

Author: Lieven Lin

Read Full Blog
  • PowerScale
  • OneFS
  • CloudPools

CloudPools Operation Workflows

Jason He Jason He

Fri, 12 Jan 2024 21:01:01 -0000

|

Read Time: 0 minutes

The Dell PowerScale CloudPools feature of OneFS allows tiering cold or infrequently accessed data to move to lower-cost cloud storage. CloudPools extends the PowerScale namespace to the private cloud, or the public cloud. For CloudPools supported cloud providers, see the CloudPools Supported Cloud Providers blog.

This blog focuses on the following CloudPools operation workflows:

  • Archive
  • Recall
  • Read
  • Update

Archive

The archive operation is the CloudPools process of moving file data from the local PowerScale cluster to cloud storage. Files are archived either using the SmartPools Job or from the command line. The CloudPools archive process can be paused or resumed.

The following figure shows the workflow of the CloudPools archive.

 This figure illustrates the workflow of the CloudPools archive: 1. A file matches a file pool policy. 2. The file data is split into chunks Cloud Data Object (CDO). 3. The chunks are sent from the PowerScale cluster to cloud. 4. The file is truncated into a SmartLink file and a Cloud Metadata Object (CMO) is written to cloud.

Figure 1.  Archive workflow

More workflow details include:

  • The file pool policy in Step 1 specifies a cloud target and cloud-specific parameters. Policy examples include:
  • Encryption: CloudPools provides an option to encrypt data before the data is sent to the cloud storage. It uses the PowerScale key management module for data encryption and uses AES-256 as the encryption algorithm. The benefit of encryption is that only encrypted data is being sent over the network.
  • Compression: CloudPools provides an option to compress data before the data is sent to the cloud storage. It implements block-level compression using the zlib compression library. CloudPools does not compress data that is already compressed.
  • Local data cache: Caching is used to support local reading and writing of SmartLink files. To optimize performance, it reduces bandwidth costs by eliminating repeated fetching of file data for repeated reads and writes. The data cache is used for temporarily caching file data from the cloud storage on PowerScale disk storage for files that have been moved off cluster by CloudPools.
  • Data retention: Data retention is a concept used to determine how long to keep cloud objects on the cloud storage.
  • When chunks are sent from the PowerScale cluster to cloud in Step 3, a checksum is applied for each chunk to ensure data integrity.

Recall

The recall operation is the CloudPools process of reversing the archive process. It replaces the SmartLink file by restoring the original file data on the PowerScale cluster and removing the cloud objects in cloud. The recall process can only be performed using the command line. The CloudPools recall process can be paused or resumed.

The following figure shows the workflow of CloudPools recall. 

This figure illustrates the workflow of the CloudPools recall: 1. OneFS retrieves the CDOs from cloud to the PowerScale cluster. 2. The SmartLink file is replaced by restoring the original file data. 3. The cloud objects are removed in cloud asynchronously if the data retention period has expired.

Figure 2.  Recall workflow

Read

The read operation is the CloudPools process of client data access, known as inline access. When a client opens a file for read, the blocks are added to the cache in the associated SmartLink file by default. The cache can be disabled by setting the accessibility in the file pool policy for CloudPools. The accessibility setting is used to specify how data is cached in SmartLink files when a user or application accesses a SmartLink file on the PowerScale cluster. Values are cached (default) and no cache.

The following figure shows the workflow of CloudPools read by default. 

This figure illustrates the workflow of the CloudPools read: 1. Client accesses the file through the SmartLink file. 2. OneFS retrieves CDOs from cloud to the local cache on the PowerScale cluster. 3. FIle data is sent to the client from the local cache on the PowerScale cluster. 4. OneFS purges expired cache information for the SmartLink file.

Figure 3.  Read workflow

Starting from OneFS 9.1.0.0, cloud object cache is introduced to enhance CloudPools functions for communicating with cloud. In Step 1, OneFS looks for data in the object cache first and OneFS retrieves data from the object cache if the data is already in the object cache. Cloud object cache reduces the number of requests to cloud when reading a file.

Prior to OneFS 9.1.0.0, in Step 1, OneFS looks for data in the local data cache first. It moves to Step 3 if the data is already in the local data cache.

Note: Cloud object cache is per node. Each node maintains its own object cache on the cluster. 

Update

The update operation is the CloudPools process that occurs when clients update data. When clients change to a SmartLink file, CloudPools first writes the changes in the data local cache and then periodically sends the updated file data to cloud. The space used by the cache is temporary and configurable.

The following figure shows the workflow of CloudPools update. 

This figure illustrates the workflow of the CloudPools update: 1. The client accesses the file through the SmartLink file. 2. OneFS retrieves CDOs from cloud, putting the file data in the local cache. 3. The client updates the file and those changes are stored in the local cache. 4. OneFS sends the updated file data from the local cache to cloud. 5. OneFS purges expired cache information for the SmartLink file.

Figure 4.  Update workflow

Thank you for taking the time to read this blog, and congratulations on gaining a clear understanding of how the OneFS CloudPools operation works!

Author: Jason He, Principal Engineering Technologist

Read Full Blog
  • PowerScale
  • OneFS
  • CloudPools

CloudPools Reporting

Jason He Jason He

Fri, 12 Jan 2024 20:33:21 -0000

|

Read Time: 0 minutes

This blog focuses on CloudPools reporting, specifically:

  • CloudPools network stats
  • The isi_fsa_pools_usage feature

CloudPools network stats

Dell PowerScale CloudPools network stats collect every network transaction and provide network activity statistics from CloudPools connections to the cloud storage.

Displaying network activity statistics

The network activity statistics include bytes In, bytes Out, and the number of GET, PUT, and DELETE operations. CloudPools network stats are available in two categories:

  • Per CloudPools account
  • Per file pool policy

Note: CloudPools network stats do not provide file statistics, such as the file list being archived or recalled.

Run the following command to check the CloudPools network stats by CloudPools account:

isi_test_cpool_stats -Q --accounts <account_name>

For example, the following command shows the current CloudPools network stats by CloudPools account:

isi_test_cpool_stats -Q --accounts testaccount
Account Name   Bytes In    Bytes Out   Num Reads   Num Writes   Num Deletes
testaccount    4194896000  4194905034  4000        2001         8001  

Similarly, you can run the following command to check the CloudPools network stats by file pool policy:

isi_test_cpool_stats -Q --policies <policy_name>

And here is an example of current CloudPools network stats by file pool policy:

isi_test_cpool_stats -Q --policies testpolicy
Policy Name    Bytes In       Bytes Out      Num Reads      Num Writes
testpolicy     4154896000     4154905034     4000           2001

Note: The command output does not include the number of deletes by file pool policy.

Run the following command to check the history for CloudPools network stats:

isi_test_cpool_stats -q –s <number of seconds in the past to start stat query>

Use the s parameter to define the number of seconds in the past. For example, set it as 86,400 to query CloudPools network stats over the last day, as in the following example:

isi_test_cpool_stats -q -s 86400
Account          bytes-in     bytes-out    gets   puts   deletes
testaccount    | 4194896000 | 4194905034 | 4000 | 2001 | 8001

You can also run the following command to flush stats from memory to database and get the real-time CloudPools network stats:

isi_test_cpool_stats -f

Displaying stats for CloudPools activities

The cloud statistics namespace with CloudPools is added in OneFS 9.4.0.0. This feature leverages existing OneFS daemons and systems to track statistics about CloudPools activities. The statistics include bytes In, bytes Out, and the number of Reads, Writes, and Deletions. CloudPools statistics are available in two categories:

  • Per CloudPools account
  • Per file pool policy

Note: The cloud statistics namespace with CloudPools does not provide file statistics, such as the file list being archived or recalled.

You can run these isi statistics cloud commands to view statistics about CloudPools activities:

isi statistics cloud --account <account_name>
isi statistics cloud --policy <policy_name>

The following command shows an example of current CloudPools statistics by CloudPools account:

isi statistics cloud --account s3                    
Account Policy In      Out     Reads   Writes  Deletions       Cloud      Node
s3             218.5KB 218.7KB 1       2       0               AWS        3
s3             0.0B    0.0B    0       0       0               AWS        1
s3             0.0B    0.0B    0       0       0               AWS        2

The following command shows an example of current CloudPools statistics by file pool policy:

isi statistics cloud --policy s3policy        
Account Policy         In      Out     Reads   Writes  Deletions  Cloud       Node
s3      s3policy       218.5KB 218.7KB  1      2       0          AWS         3
s3      s3policy       0.0B    0.0B     0      0       0          AWS         1
s3      s3policy       0.0B    0.0B     0      0       0          AWS         2

The isi_fsa_pools_usage feature

Starting from OneFS 8.2.2, you can run the following command to list Logical Size and Physical Size of stubs in one directory. This feature leverages IndexUpdate and FSA (File System Analytics) jobs. To enable this feature, it requires:

  • Scheduling the IndexUpdate job. It’s recommended to run it every four hours.
  • Scheduling the FSA job. It’s recommended to run it every day, but not more often than the IndexUpdate job.
isi_fsa_pools_usage /ifs
Node Pool                  Dirs  Files  Streams  Logical Size   Physical Size
Cloud                      0     1       0       338.91k           24.00k
h500_30tb_3.2tb-ssd_128gb  42    300671  0       879.23G            1.20T

Now, you get how to use commands for CloudPools reporting. It’s simple and straightforward. Thanks for reading!

Author: Jason He, Principal Engineering Technologist

Read Full Blog
  • PowerScale
  • OneFS
  • CloudPools

Protecting CloudPools SmartLink Files

Jason He Jason He

Fri, 12 Jan 2024 17:20:14 -0000

|

Read Time: 0 minutes

Dell PowerScale CloudPools SmartLink files are the sole means to access file data stored in the cloud, so ensure that you protect them from accidental deletion.

Note: SmartLink files cannot be backed up using a copy command, such as secure copy (scp).

This blog focuses on backing up SmartLink files using OneFS SyncIQ and NDMP (Network Data Management Protocol).

When the CloudPools version differs between the source cluster and the target PowerScale cluster, the CloudPools cross-version compatibility is handled.

NDMP and SyncIQ provide two types of copy or backup:

  • Shallow copy (SC)/backup: Replicates or backs up SmartLink files to the target PowerScale cluster or tape as SmartLink files without file data.
  • Deep copy (DC)/backup: Replicates or backs up SmartLink files to the target PowerScale cluster or tape as regular files or unarchived files. The backup or replication will be slower than for a shallow copy backup. Disk space will be consumed on the target cluster for replicating data.

The following table shows the CloudPools and OneFS mapping information. CloudPools 2.0 is released along with OneFS 8.2.0. CloudPools 1.0 is running in OneFS 8.0.x or 8.1.x.

Table 1.  CloudPools and OneFS mapping information

OneFS version

CloudPools version

OneFS 8.0.x/OneFS 8.1.x

CloudPools 1.0

OneFS 8.2.0 or higher

CloudPools 2.0

The following table shows the NDMP and SyncIQ supported use cases when different versions of CloudPools are running on the source and target clusters. As noted in the following table, if CloudPools 2.0 is running on the source PowerScale cluster and CloudPools 1.0 is running on the target PowerScale cluster, shallow copies are not allowed.

Table 2.  NDMP and SyncIQ supported use cases with CloudPools  

Source

Target

SC NDMP

DC NDMP

SC SyncIQ replication

DC SyncIQ replication

CloudPools 1.0

CloudPools 2.0

Supported

Supported

Supported

Supported

CloudPools 2.0

CloudPools 1.0

Not Supported

Supported

Not Supported

Supported

SyncIQ

SyncIQ is CloudPools-aware but consider the guidance in snapshot efficiency, especially where snapshot retention periods on the target cluster will be long.

SyncIQ policies support two types of data replication for CloudPools:

  • Shallow copy: This option is used to replicate files as SmartLink files without file data from the source PowerScale cluster to the target PowerScale cluster.
  • Deep copy: This option is used to replicate files as regular files or unarchived files from the source PowerScale cluster to the target PowerScale cluster.

SyncIQ, SmartPools, and CloudPools licenses are required on both the source and target PowerScale cluster. It is highly recommended to set up a scheduled SyncIQ backup of the SmartLink files.

When SyncIQ replicates SmartLink files, it also replicates the local cache state and unsynchronized cache data from the source PowerScale cluster to the target PowerScale cluster. The following figure shows the SyncIQ replication when replicating directories including SmartLink files and unarchived normal files. Both unidirectional and bi-directional replication are supported.

Note: OneFS manages cloud access at the cluster level and does not support managing cloud access at the directory level. When failing over a SyncIQ directory containing SmartLink files to a target cluster, you need to remove cloud access on the source cluster and add cloud access on the target cluster. If there are multiple CloudPools storage accounts, removing/adding cloud access will impact all CloudPools storage accounts on the source/target cluster.

Protecting CloudPools SmartLink files using SyncIQ replication. This figure illustrates the SyncIQ replication when replicating directories including SmartLink files and unarchived normal files from source Site 1 to the target Site 2. The figure also shows the supported SyncIQ unidirectional (from Site 1 to Site 2 only) and bi-directional replication (from Site 1 to Site 2 and from Site 2 to Site 1).

Figure 1.  SyncIQ replication

Note: If encryption is enabled in a file pool policy for CloudPools, SyncIQ also replicates all the relevant encryption keys to the secondary PowerScale cluster along with the SmartLink files.

NDMP

NDMP is also CloudPools-aware and supports three backup and restore methods for CloudPools:

  • DeepCopy: This option is used to back up files as regular files or unarchived files. Files can only be restored as regular files.
  • ShallowCopy: This option is used to back up files as SmartLink files without file data. Files can only be restored as SmartLink files.
  • ComboCopy: This option is used to back up files as SmartLink files with file data. Files can be restored as regular files or SmartLink files.

It is possible to update the file data and send the updated data to the cloud storage. Multiple version SmartLink files can be backed up to tape using NDMP, and multiple versions of CDOs (Cloud Data Objects) are protected in the cloud under the data retention setting. You can restore a specific version of a SmartLink file from tape to a PowerScale cluster and continue to access (read or update) the file as before.

Note: If encryption is enabled in the file pool policy for CloudPools, NDMP also backs up all the relevant encryption keys to tapes along with the SmartLink files.

Thank you for taking the time to read this blog, and congratulations on knowing the solutions for protecting SmartLink files using OneFS SyncIQ and NDMP.

Author: Jason He, Principal Engineering Technologist

Read Full Blog
  • PowerScale
  • AWS
  • APEX

How to Size Disk Capacity When Cluster Has Data Reduction Enabled

Yunlong Zhang Yunlong Zhang

Mon, 08 Jan 2024 18:22:11 -0000

|

Read Time: 0 minutes

When sizing a storage solution for OneFS, two major aspects need to be considered – capacity and performance. In this blog, we will talk about how to calculate the raw capacity in each node in the AWS cloud environment.

Consider a customer who wants to have 30TB of data capacity on APEX File Storage on AWS. The data reduction ratio is 1.6, and the cluster contains 6 nodes. How much capacity is needed for each node of the cluster?

1. The usable capacity is calculated by dividing the application data size by the data reduction ratio: 30TB/1.6 = 18.75TB

2. OneFS in the AWS environment uses +2n as the default protection level. The +2n protection level striping pattern of 6 nodes is 4+2. The raw capacity necessary can be calculated by dividing the usable capacity by the striping pattern for the number of nodes involved: 18.75TB/66% = 28.41TB

3. Single disk capacity is then calculated by dividing the total raw capacity by the number of nodes involved:  28.41TB/6 nodes = 4.735TB

4. When each node contains 10 disks, each disk’s raw capacity should be 474GB.

OK, let's take a look at the formula of this calculation:

single disk capacity = (((application data size/data reduction ratio)/striping efficiency)/cluster node count)/node disk count

For reference, the striping patterns of 4, 5, and 6 nodes are listed as follows:

* 4 nodes: 2+2 (50%)

* 5 nodes: 3+2 (60%)

* 6 nodes: 4+2 (66%)

Now, knowing the logical data capacity, you can calculate the appropriate amount of capacity of each single EBS volume in the cluster.


Author: Yunlong Zhang

Read Full Blog
  • OneFS
  • S3
  • Performance

Running COSBench Performance Test on PowerScale

Yunlong Zhang Yunlong Zhang

Tue, 09 Jan 2024 14:21:02 -0000

|

Read Time: 0 minutes

Starting with OneFS version 9.0, PowerScale enables data access through the Amazon Simple Storage Service (Amazon S3) application programing interface (API) natively. PowerScale implements the S3 API as a first-class protocol along with other NAS protocols on top of its distributed OneFS file system.

COSBench is a popular benchmarking tool to measure the performance of Cloud Object Storage services and supports the S3 protocol. In the following blog, we will walk through how to set up COSBench to test the S3 performance of an PowerScale cluster.

Step 1:Choose v0.4.2.c4 version

I suggest choosing the v0.4.2 release candidate 4 instead of the latest v0.4.2 release, especially if you receive an error message like the following and your COSBench service cannot be started:

# cat driver-boot.log     
Listening on port 0.0.0.0/0.0.0.0:18089 ...
!SESSION 2020-06-03 10:12:59.683 -----------------------------------------------
eclipse.buildId=unknown
java.version=1.7.0_261
java.vendor=Oracle Corporation
BootLoader constants: OS=linux, ARCH=x86_64, WS=gtk, NL=en_US
Command-line arguments:  -console 18089
!ENTRY org.eclipse.osgi 4 0 2020-06-03 10:13:00.367
!MESSAGE Bundle plugins/cosbench-castor not found.
!ENTRY org.eclipse.osgi 4 0 2020-06-03 10:13:00.368
!MESSAGE Bundle plugins/cosbench-log4j not found.
!ENTRY org.eclipse.osgi 4 0 2020-06-03 10:13:00.368
!MESSAGE Bundle plugins/cosbench-log@6:start not found.
!ENTRY org.eclipse.osgi 4 0 2020-06-03 10:13:00.369
!MESSAGE Bundle plugins/cosbench-config@6:start not found.

Step 2: Install Java

Both Java 1.7 and 1.8 work well with COSBench.

Step 3: Config ncat

Ncat is necessary for COSBench to work. Without it, you will receive the following error message:

[root]hopisdtmelabs14# bash ./start-driver.sh  
Launching osgi framwork ...
Successfully launched osgi framework!
Booting cosbench driver ...
which: no nc in (/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/tme/bin:/usr/local/tme/tme_portal/perf_web/bin)
No appropriate tool found to detect cosbench driver status.

Use the following commands to install Ncat (example here is CentOS 7) and config it for COSBench:

yum -y install wget
wget [https://nmap.org/dist/ncat-7.80-1.x86_64.rpm](https://nmap.org/dist/ncat-7.80-1.x86_64.rpm)
yum localinstall ncat-7.80-1.x86_64.rpm
cd /usr/bin
ln -s ncat nc

Step 4: Unzip the COSBench files

After you download the 0.4.2.c4.zip, you can unzip it to a directory:

unzip 0.4.2.c4.zip

Grant all the bash script permission to be executed:

chmod +x /tmp/cosbench/0.4.2.c4/*.sh

Step 5: Start drivers and controller

On drivers and controller, find the cosbench-start.sh. Locate the java launching line, then add the following two options:

-Dcom.amazonaws.services.s3.disableGetObjectMD5Validation=true
-Dcom.amazonaws.services.s3.disablePutObjectMD5Validation=true

The COSBench tool has two roles: controller and driver. You can use the following command to start the driver:

bash ./cosbench/start-driver.sh

Before we start the controller, we need to change the configuration to let the controller knows how many drivers it has and their addresses. This is done by filling in information in the controller's main configuration file. The configuration file is under ./conf, and the name of the file is controller.conf. Following is an example of the controller.conf:

[controller]
drivers = 4
log_level = INFO
log_file = log/system.log
archive_dir = archive
 
[driver1]
name = driver1
url = [http://10.245.109.115:18088/driver](http://10.245.109.115:18088/driver)
 
[driver2]
name = driver2
url = [http://10.245.109.116:18088/driver](http://10.245.109.116:18088/driver)
 
[driver3]
name = driver3
url = [http://10.245.109.117:18088/driver](http://10.245.109.117:18088/driver)
 
[driver4]
name = driver4
url = [http://10.245.109.118:18088/driver](http://10.245.109.118:18088/driver)

Run the start-controller.sh to start the controller role:

bash ./start-controller.sh

Step 6: Prepare PowerScale

First, you need to prepare your PowerScale cluster for the S3 test. Make sure to record the secret key of the newly created user, s3. Run the following commands to prepare PowerScale for the S3 performance test:

isi services s3 enable
isi s3 settings global modify --https-only=false
isi auth users create s3 --enabled=true
isi s3 keys create s3
mkdir -p -m 777 /ifs/s3/bkt1  
chmod 777 /ifs/s3
isi s3 buckets create --owner=s3 --name=bkt1  --path=/ifs/s3/bkt1

Compose the workload XML file, and use it to specify the details of the test you want to run. Here is an example:

<?xml version="1.0" encoding="UTF-8"?>
<workload name="S3-F600-Test1" description="Isilon F600 with original configuration">
        <storage type="s3" config="accesskey=1_s3_accid;secretkey=wEUqWNWkQGmgMos70NInqW26WpGf;endpoint=http://f600-2:9020/bkt1;path_style_access=true"/>
        <workflow>
               <workstage name="init-for-write-1k">
                       <work type="init" workers="1" config="cprefix=write-bucket-1k; containers=r(1,6)"/>
               </workstage>
               <workstage name="init-for-read-1k">
                       <work type="init" workers="1" config="cprefix=read-bucket-1k; containers=r(1,6)"/>
               </workstage>
               <workstage name="prepare-1k">
                       <work type="prepare" workers="1" config="cprefix=read-bucket-1k;containers=r(1,6);oprefix=1kb_;objects=r(1,1000);sizes=c(1)KB"/>
               </workstage>
               <workstage name="write-1kb">
                       <work name="main" type="normal" interval="5" division="container" chunked="false" rampup="0" rampdown="0" workers="6" totalOps="6000">
                               <operation type="write" config="cprefix=write-bucket-1k; containers=r(1,6); oprefix=1kb_; objects=r(1,1000); sizes=c(1)KB"/>
                       </work>
               </workstage>
               <workstage name="read-1kb">
                       <work name="main" type="normal" interval="5" division="container" chunked="false" rampup="0" rampdown="0" workers="6" totalOps="6000">
                               <operation type="read" config="cprefix=read-bucket-1k; containers=r(1,6); oprefix=1kb_; objects=r(1,1000)"/>
                       </work>
               </workstage>
        </workflow>
</workload>

Step 7: Run the test

You can directly submit the XML in the COSBench WebUI, or you can use the following command line in the controller console to start the test:

bash ./cli.sh submit ./conf/my-s3-test.xml

You will see the test successfully finished, as shown in the following figure.

This figure shows the logging of the stages in the workload. The State column, which is second to last, shows each stage in the workload as complete.

Figure 1. Completion screen after testing

Have fun testing!

 

Author: Yunlong Zhang


Read Full Blog
  • AWS
  • OneFS
  • APEX
  • Performance

Will More Disks Lead to Better Performance in APEX File Storage in AWS?

Yunlong Zhang Yunlong Zhang

Mon, 08 Jan 2024 18:02:59 -0000

|

Read Time: 0 minutes

Dell Technologies has developed a range of PowerScale platforms, including all flash models, hybrid models, and archive models, all of which exhibit exceptional design. The synergy between the disk system and the compute system is highly effective, showcasing a well-matched integration.

In the cloud environment, customers have the flexibility to control the number of CPU cores and memory sizes by selecting different instance types. APEX File Storage for AWS uses EBS volumes as its node disks. Customers can also select a different number of EBS volumes in each node, and for gp3 volumes, customers are able to customize the performance of each volume by specifying the throughput or IOPS capability.

With this level of flexibility, how shall we configure the disk system to make the most out of the entire OneFS system? Typically, in an on-prem appliance, the more disks a PowerScale node contains, the better performance the disk system can provide thanks to a greater number of devices contributing to the delivery of throughput or IOPS.

In a OneFS cloud environment, does it hold true that more EBS volumes indicates better performance? In short, it depends. When the aggregated EBS volume performance is smaller than the instance EBS bandwidth limit, test results show that more EBS volumes can improve performance. When aggregated EBS volume performance is larger than EBS bandwidth limit, adding more EBS volumes will not improve performance.

What is the best practice of setting the number of EBS volumes of each node?

1. Make the aggregated EBS volume bandwidth limit match the instance type EBS bandwidth limit. 

For example, we want to use m5dn.16xlarge as the instance type of our OneFS cloud system. According to AWS, the EBS Bandwidth of m5dn.16xlarge is 13,600 Mbps, which is 1700 MB/sec. If we choose to use 10 EBS volumes in each node, then we should config each gp3 EBS volume to be capable of delivering 170 MB/sec throughput. This will make the aggregated EBS volume throughput equal to the m5dn.16xlarge EBS bandwidth limit.

Note that each gp3 EBS volume has 125MB/sec free throughput and 3,000 IOPS for free. As a cost-saving measure, we can config each node to have 12 EBS volumes to better leverage free EBS volume throughput.

 For example, considering an m5dn.16xlarge instance type with 12 TB raw capacity per node, the disk cost of 10 volumes and 12 volumes are as follows:

      1. For 10 drives, each EBS volume should support 170 MB/sec throughput, and each node EBS storage cost is 1001.2 USD a month.
      2. For 12 drives, each EBS volume should support 142 MB/sec throughput, and each node EBS storage cost is 991.20 USD a month.

Using 12 EBS volumes can save $10 per node per month.

2. Do not set up more than 12 EBS volumes in each node.

Although APEX File Storage for AWS also supports 15, 18, and 20 gp3 volumes in each node, we do not recommend configuring more than 12 EBS volumes in each node for OneFS 9.7. This is best practice for keeping software journal space for each disk from being too small and is beneficial for write performance.  

 

Author: Yunlong Zhang


Read Full Blog

Simplifying OneFS Deployment on AWS with Terraform

Lieven Lin Lieven Lin

Wed, 20 Dec 2023 20:07:34 -0000

|

Read Time: 0 minutes

In the first release of APEX File Storage for AWS in May 2023, users gained the capability to execute file workloads in the AWS cloud, thus harnessing the power of the PowerScale OneFS scale-out NAS storage solution. However, the initial implementation required the manual provisioning of all necessary AWS resources to provision the OneFS cluster—a less than optimal experience for embarking on the APEX File Storage journey in AWS.

With the subsequent release of APEX File Storage for AWS in December 2023, we are pleased to introduce a new, user-friendly open-source Terraform module. This module is designed to enhance and simplify the deployment process, alleviating the need for manual resource provisioning. In this blog post, we will delve into the details of leveraging this Terraform module, providing you with a comprehensive guide to expedite your APEX File Storage deployment on AWS.

Overview of Terraform onefs module

Terraform onefs module is an open-source module for the auto-deployment of AWS resources for a OneFS cluster. It is released and licensed under the MPL-2.0 license. You can find more details on the onefs module from the Terraform Registry. The onefs module provides the following features to help you deploy APEX File Storage for AWS OneFS clusters in AWS:

  • Provision necessary AWS resources for a single OneFS cluster, including EC2 instances, EBS volumes, placement group, and network interfaces.
  • Expand cluster size by provisioning additional AWS resources, including EC2 instances, EBS volumes, and network interfaces.

Getting Started

To use the Terraform onefs module, you need a machine that has Terraform installed and can connect to your AWS account. After you have fulfilled the prerequisites in documentation, you can start to deploy AWS resources for a OneFS cluster.  

This blog provides instructions for deploying the required AWS infrastructure resources for APEX File Storage for AWS with Terraform.This includes: EC2 instances, spread strategy placement group, network interfaces, and EBS volumes.

1. Get the latest version of the onefs module from the Terraform Registry.

 

2. Prepare a main.tf file that uses the onefs module version collected in Step 1. The onefs module requires a set of input variables. The following is an example file named main.tf for creating a 4-nodes OneFS cluster.

module "onefs" {

   source  = "dell/onefs/aws"

   version = "1.0.0"

 

   region = "us-east-1"

   availability_zone = "us-east-1a"

   iam_instance_profile = "onefs-runtime-instance-profile"

   name = "vonefs-cfv"

   id = "vonefs-cfv"

   nodes = 4

   instance_type = "m5dn.12xlarge"

   data_disk_type = "gp3"

   data_disk_size = 1024

   data_disks_per_node = 6

   internal_subnet_id = "subnet-0c0106598b95ee7b6"

   external_subnet_id = "subnet-0837801239d54e245"

   contiguous_ips= true

   first_external_node_hostnum = 5

   internal_sg_id = "sg-0ee87249a52397219"

   security_group_external_id = "sg-0635f298c9cb764da"

   image_id = "ami-0f1a267119a34361c"

   credentials_hashed = true

   hashed_root_passphrase = "$5$9874f5d2c724b8ca$IFZZ5e9yfUVqNKVL82s.iFLIktr4WLavFhUVa8A"

   hashed_admin_passphrase = "$5$9874f5d2c724b8ca$IFZZ5e9yfUVqNKVL82s.iFLIktr4WLavFhUVa8A"

   dns_servers = ["169.254.169.253"]

   timezone = "Greenwich Mean Time"

}

 

output "onefs-outputs" {

   value = module.onefs

   sensitive = true

}

3. Change your current working directory to the main.tf directory.

4. Initialize the module’s root directory by installing the required providers and modules for the deployment. In the following example, the onefs module is downloaded automatically from the Terraform Registry.

# terraform init

Initializing the backend...

Initializing modules...

Downloading registry.terraform.io/dell/onefs/aws 1.0.0 for onefs...

- onefs in .terraform\modules\onefs

- onefs.onefsbase in .terraform\modules\onefs\modules\base

- onefs.onefsbase.machineid in .terraform\modules\onefs\modules\machineid

 

Initializing provider plugins...

- Finding latest version of hashicorp/aws...

- Installing hashicorp/aws v5.30.0...

- Installed hashicorp/aws v5.30.0 (signed by HashiCorp)

5. Verify the configuration files in the onefs directory.

# terraform validate

6. Apply the configurations by running the following command.

# terraform apply

7. Enter “yes” after you have previewed and confirmed the changes.

Do you want to perform these actions?

   Terraform will perform the actions described above.

   Only 'yes' will be accepted to approve.

 

   Enter a value: yes

8. Wait for the AWS resources to be provisioned. The output displays all the cluster information. If the deployment fails, re-run the terraform apply command to deploy.

Apply complete! Resources: 13 added, 0 changed, 0 destroyed.

Outputs:

onefs-outputs = <sensitive>

9. Get the cluster details information by running the following command.

# terraform output --json

The following example output is truncated.

additional_nodes = 3

cluster_id = "vonefs-cfv"

control_ip_address = "10.0.32.5"

external_ip_addresses = [

   "10.0.32.5",

   "10.0.32.6",

   "10.0.32.7",

   "10.0.32.8",

]

gateway_hostnum = 1

instance_id = [

   "i-0eead1ee1dd67da6e",

   "i-054efe96f6e605009",

   "i-06e0b1ce06bad42a1",

   "i-0e463c742974641d7",

]

internal_ip_addresses = [

   "10.0.16.5",

   "10.0.16.6",

   "10.0.16.7",

   "10.0.16.8",

]

internal_network_high_ip = "10.0.16.8"

internal_network_low_ip = "10.0.16.5"

mgmt_ip_addresses = []

node_configs = {

   "0" = {

     "external_interface_id" = "eni-09ddea1fd79f0d0ab"

     "external_ips" = [

       "10.0.32.5",

     ]

     "internal_interface_id" = "eni-0caeee71581a8c429"

     "internal_ips" = [

       "10.0.16.5",

     ]

     "mgmt_interface_id" = null

     "mgmt_ips" = null /* tuple */

     "serial_number" = "SV200-930073-0000"

   }

   "1" = {

     "external_interface_id" = "eni-00869c96a27c20c93"

     "external_ips" = [

       "10.0.32.6",

     ]

     "internal_interface_id" = "eni-0471bbba5a7f6596d"

     "internal_ips" = [

       "10.0.16.6",

     ]

     "mgmt_interface_id" = null

     "mgmt_ips" = null /* tuple */

     "serial_number" = "SV200-930073-0001"

   }

   "2" = {

     "external_interface_id" = "eni-0dac5052668bd3a4f"

     "external_ips" = [

       "10.0.32.7",

     ]

     "internal_interface_id" = "eni-09d35ffa61b3dcd60"

     "internal_ips" = [

       "10.0.16.7",

     ]

     "mgmt_interface_id" = null

     "mgmt_ips" = null /* tuple */

     "serial_number" = "SV200-930073-0002"

   }

   "3" = {

     "external_interface_id" = "eni-028d211ef2d5b577c"

     "external_ips" = [

       "10.0.32.8",

     ]

     "internal_interface_id" = "eni-02a99febea713f2d1"

     "internal_ips" = [

       "10.0.16.8",

     ]

     "mgmt_interface_id" = null

     "mgmt_ips" = null /* tuple */

     "serial_number" = "SV200-930073-0003"

   }

}

region = "us-east-1"

10. Write down the following output variables for setting up a cluster described in documentation.

  • control_ip_address: The external IP address of the cluster’s first node
  • external_ip_addresses: The external IP addresses of all provisioned cluster nodes
  • internal_ip_addresses: The internal IP addresses of all provisioned cluster nodes
  • internal_network_high_ip: The highest internal IP address assigned
  • internal_network_low_ip: The lowest internal IP address assigned
  • instance_id: The EC2 instance IDs of the cluster nodes

11. All AWS resources are now provisioned. After the cluster’s first node starts, it will form a single node cluster. You can use the cluster’s first node to add additional nodes to the cluster described in documentation. Below are the provisioned AWS EC2 instances with Terraform onefs module.

Available input variables

The Terraform onefs module provides a set of input variables for you to specify your own settings, including AWS resources and OneFS cluster, for example: AWS network resources, cluster name and password. See the table below for details used in the main.tf file. 

 

Variable Name

Type

Description

region

string

(Required) The AWS region of OneFS cluster nodes.

availability_zone

string

(Required) The AWS availability zone of OneFS cluster nodes.

iam_instance_profile

string

(Required) The AWS instance profile name of OneFS cluster nodes. For more details, see the AWS documentation Instance profiles.

name

string

(Required) The OneFS cluster name. Cluster names must begin with a letter and can contain only numbers, letters, and hyphens. If the cluster is joined to an Active Directory domain, the cluster name must be 11 characters or fewer.

id

string

(Required) The ID of the OneFS cluster. The onefs module uses the ID to add tags to the AWS resources. It is recommended to set the ID to your cluster name.

nodes

number

(Required) The number of OneFS cluster nodes: it should be 4, 5, or 6.

instance_type

string

(Required) The EC2 instance type of OneFS cluster nodes. All nodes in a cluster must have the same instance size. The supported instance sizes are:

  • EC2 m5dn instances: m5dn.8xlarge, m5dn.12xlarge, m5dn.16xlarge, m5dn.24xlarge
  • EC2 m6idn instances: m6idn.8xlarge, m6idn.12xlarge, m6idn.16xlarge, m6idn.24xlarge
  • EC2 m5d instances: m5d.24xlarge
  • EC2 i3en instances: i3en.12xlarge

Note: You must run PoC if you intend to use m5d.24xlarge or i3en.12xlarge EC2 instance types. For details, contact your Dell account team.

data_disk_type

string

(Required) The EBS volume type for the cluster, gp3 or st1.

data_disk_size

number

(Required) The single EBS volume size in GiB. Consider the Supported cluster configuration, it should be 1024 to 16384 for gp3, 4096 or 10240 for st1.

data_disks_per_node

number

(Required) The number of EBS volumes per node. Consider the Supported cluster configuration, it should be 5, 6, 10, 12, 15, 18, or 20 for gp3, 5 or 6 for st1.

internal_subnet_id

string

(Required) The AWS subnet ID for the cluster internal network interfaces.

external_subnet_id

string

(Required) The AWS subnet ID for the cluster external network interfaces.

contiguous_ips

bool

(Required) A boolean flag to indicate whether to allocate contiguous IPv4 addresses to the cluster nodes’ external network interfaces. It is recommended to set to true.

first_external_node_hostnum

number

(Required if contiguous_ips=true)

The host number of the first node’s external IP address in the given AWS subnet. Default is set to 5, The first four IP addresses in an AWS subnet are reserved by AWS, so the onefs module will allocate the fifth IP address to the cluster’s first node. If the IP is in use, the module will fail. Therefore, when setting contiguous_ips=true, ensure that you set a correct host number that has sufficient contiguous IPs for your cluster. Refer to Terraform cidrhost Function for more details about host number.

internal_sg_id

string

(Required) The AWS security group ID for the cluster internal network interfaces.

security_group_external_id

string

(Required) The AWS security group ID for the cluster external network interfaces.

image_id

string

(Required) The OneFS AMI ID described in Find the OneFS AMI ID.

credentials_hashed

bool

(Required) A boolean flag to indicate whether the credentials are hashed or in plain text.

hashed_root_passphrase

string

(Required if credentials_hashed=true)

The hashed root password for the OneFS cluster

hashed_admin_passphrase

string

(Required if credentials_hashed=true)

The hashed admin password for the OneFS cluster

root_password

string

(Required if credentials_hashed=false)

The root password for the OneFS cluster

admin_password

string

(Required if credentials_hashed=false)

The admin password for the OneFS cluster

dns_servers

list(string)

(Optional) The cluster DNS server, default is set to ["169.254.169.253"], which is the AWS Route 53 Resolver. For details, see Amazon DNS server.

dns_domains

list(string)

(Optional) The cluster DNS domain default is set to ["<region>.compute.internal"]

timezone

string

(Optional) The cluster time zone, default is set to "Greenwich Mean Time". Several available options are: Greenwich Mean Time, Eastern Time Zone, Central Time Zone, Mountain Time Zone, Pacific Time Zone. You can change the time zone after the cluster is deployed by following the steps in the section OneFS documentation – Set the cluster date and time.

resource_tags

map(string)

(Optional) The tags that will be attached to provisioned AWS resources. For example, resource_tags={“project”: “onefs-poc”, “tester”: “bob”}.

 

Learn More

In this article, we have shown how to use Terraform onefs module. You can refer to the documentation below for more details about APEX File Storage for AWS:

 

Author: Lieven Lin

 

Read Full Blog
  • backup
  • PowerScale
  • NDMP

OneFS NDMP Backup Overview

Jason He Jason He

Fri, 15 Dec 2023 15:00:00 -0000

|

Read Time: 0 minutes

NDMP (Network Data Management Protocol) specifies a common architecture and data format for backups and restores of NAS (Network Attached Storage), allowing heterogeneous network file servers to directly communicate to tape devices for backup and restore operations. NDMP addresses the problems caused by the integrations of different backup software or DMA (Data Management Applications), file servers, and tape devices.  

The NDMP architecture is a client/server model with the following characteristics:

    • The NDMP host is a file server that is being protected with an NDMP backup solution.
    • The NDMP server is a virtual state machine on the NDMP host that is controlled using NDMP.
    • The backup software is considered as a client to the NDMP server.

OneFS supports the following two types of NDMP backups:

    • NDMP two-way backup
    • NDMP three-way backup

In both backup models, OneFS takes a snapshot of the backup directory to ensure consistency of data. The backup operates on the snapshot instead of the source directory, which allows users to continue read/write activities as normal. OneFS makes entries in the file history that are transferred from the PowerScale cluster to the backup server during the backup.

NDMP two-way backup

The NDMP two-way backup is also known as the local or direct NDMP backup, which is considered the most efficient model and usually provides the best performance. The backup moves the backup data directly from the PowerScale cluster to the tape devices without moving to the backup server over the network.

In this model, OneFS must detect the tape devices before you back up data to them. The PowerScale cluster provides the option for NDMP two-way backups as shown in the following figure. You can connect the PowerScale cluster to a Backup Accelerator node and connect tape devices to that node. The Backup Accelerator node is synonymous with a Fibre Attached Storage node without adding primary storage and offloads NDMP workloads from the primary storage nodes. You can directly connect tape devices to the Fibre Channel ports on the PowerScale cluster or Backup Accelerator node using Fibre Channel. Alternatively, you can connect Fibre Channel switches to the Fibre Channel ports that connect tape devices to the PowerScale cluster or Backup Accelerator node.

 Figure 1. NDMP two-way backup with B100 backup accelerator connected to the PowerScale cluster

The following table shows details of the NDMP two-way backup supported by PowerScale:  

NDMP two-way backup option

Generation 5 PowerScale nodes with an InfiniBand back end

Generation 6+ PowerScale nodes with an InfiniBand back end

Generation 6+ PowerScale nodes with an Ethernet back end

B100 backup accelerator

Supported

Supported

Supported


Note: The B100 backup accelerator requires OneFS 9.3.0.0 or later.


 NDMP three-way backup

The NDMP three-way backup, also known as the remote NDMP backup, is shown in the following figure.

Figure 2. NDMP three-way backup

In this backup mode, the tape devices are connected to the backup media server. OneFS does not detect tape devices on the PowerScale cluster, and Fibre Channel ports are not required on the PowerScale cluster. The NDMP service runs on the NDMP server or the PowerScale cluster. The NDMP tape service runs on the backup media server. A DMA on the backup server instructs the PowerScale cluster to start backing up data from the PowerScale cluster to the backup media server over the network. The backup media server moves the backup data to tape devices. Both servers are connected to each other across the network boundary. Sometimes, the backup server and backup media server reside on the same physical machine.

For some specific DMA, DMA can write NDMP data to non-NDMP devices. For example, Dell NetWorker software writes NDMP data to non-NDMP devices, including tape, virtual tape, Advanced File Type Device (AFTD), and Dell PowerProtect DD series appliances. For more information on Data Protection with Dell NetWorker using NDMP, refer to this guide: Dell PowerScale: Data Protection with Dell NetWorker using NDMP.

 

Author: Jason He, Principal Engineering Technologist


Read Full Blog
  • Isilon
  • PowerScale
  • AWS
  • OneFS
  • APEX

Unveiling APEX File Storage for AWS Enhancements

Lieven Lin Lieven Lin

Wed, 13 Dec 2023 15:36:10 -0000

|

Read Time: 0 minutes

We are thrilled to announce the latest version of APEX File Storage for AWS! This release brings a multitude of enhancements to elevate your AWS file storage experience, including expanded AWS regions with the support for additional EC2 instance types, a Terraform module for streamlined deployment, larger raw capacity, and additional OneFS features support.

APEX File Storage delivers Dell’s leading enterprise-class high-performance scale-out file storage as a software-defined customer-managed offer in the public cloud. Based on PowerScale OneFS, APEX File Storage for AWS brings enterprise file capabilities and performance and delivers operational consistency across multicloud environments, simplifying hybrid cloud environments by facilitating seamless data mobility between on-premises and the cloud with native replication and making it the perfect option to run AI workloads. APEX File Storage can enhance customers’ development and innovation initiatives by combining proven data services such as multi-protocol access, security features, and a proven scale-out architecture with the flexibility of public cloud infrastructure and services. APEX File Storage enables organizations to run the software they trust directly in the public cloud without retraining their staff or refactoring their storage architecture.

What's New?

1. Additional EC2 instance types support

We've expanded compatibility by adding support for a wider range of EC2 instance types. This means you have more flexibility in choosing the instance type that best suits your performance and resource requirements. We now support the following EC2 instance types:

    • EC2 m5dn instances: m5dn.8xlarge, m5dn.12xlarge, m5dn.16xlarge, m5dn.24xlarge
    • EC2 m6idn instances: m6idn.8xlarge, m6idn.12xlarge, m6idn.16xlarge, m6idn.24xlarge
    • EC2 m5d instances: m5d.24xlarge
    • EC2 i3en instances: i3en.12xlarge

Please note that it is required to run PoC if you intend to use m5d.24xlarge or i3en.12xlarge EC2 instance types. Please contact your Dell account team for the details.

2. Extended AWS regions support

APEX File Storage is now available in more AWS regions than ever before. A total of 28 regions are available for you. We understand that our users operate globally, and this expansion ensures that you can leverage APEX File Storage wherever your AWS resources are located. The following table lists all available regions for different EC2 instance types:

3. Terraform module: auto-deployment made effortless

Simplify your deployment process with our new Terraform module, which automates the AWS resource deployment process to ensure a smooth and error-free experience.

Once you fulfill the deployment prerequisites, you can deploy a cluster with a single Terraform command. For more details, refer to documentation: APEX File Storage for AWS Deployment Guide with Terraform. Stay tuned for a blog with additional details coming soon. 

4. Larger raw capacity: more room for your data

Your data is growing, and so should your storage capacity. APEX File Storage for AWS can now support up to 1.6PiB raw capacity, enabling workloads that produce a vast amount of data such as AI and ensuring that you have ample space to store, manage, and scale your data effortlessly.

5. Additional OneFS features support

The OneFS features not supported in the first release of APEX File Storage for AWS are now supported, including:

    • Enhanced Protocols: With HDFS protocol support, you can seamlessly integrate HDFS into your workflows, enhancing your data processing capabilities in AWS. Enjoy expanded connectivity with support for HTTP and FTP protocols, providing more flexibility in accessing and managing your files.
    • Quality of Service – SmartQoS: Ensure a consistent and reliable user experience with SmartQoS, which enables you to prioritize workloads and applications based on performance requirements.
    • Immutable Data Protection - SmartLock: Enhance data protection by leveraging SmartLock to create Write Once Read Many (WORM) files, providing an added layer of security against accidental or intentional data alteration.
    • Large File Support: Address the needs of large-scale data processing with improved support for large files, facilitating efficient storage and retrieval. A single file size can be up to 16TiB now.

Learn More

For deployment instructions and detailed information on these exciting new features, refer to our documentation:

Author: Lieven Lin

Read Full Blog
  • REST API
  • IIQ 5.0.0

REST API in IIQ 5.0.0

Vincent Shen Vincent Shen

Tue, 12 Dec 2023 15:00:00 -0000

|

Read Time: 0 minutes

REST APIs have been introduced in IIQ 5.0.0, providing the equivalent of the CLI command in the previous IIQ version. CLI will not be available in IIQ 5.0.0. In order to understand how REST APIs work in IIQ, we will cover:

  • REST API Authentication
  • Creating a REST API Session
  • Getting a REST API Session
  • Managing PowerScale Clusters using REST API
  • Exporting a Performance Report
  • Deleting a REST API Session

Let’s get started!

REST API Authentication

IIQ 5.0.0 leverages JSON Web Tokens (JWT) along with x-csrf-token for session-based authentication. Following are some of the benefits of using JWTs:

  • JWTs contains the user’s details
  • JWTs incorporate digital signatures to ensure their integrity and protect against unauthorized modifications by potential attackers
  • JWTs offer efficiency and rapid verification processes

Creating a REST API Session

A POST request to /insightiq/rest/security-iam/v1/auth/login/ will create a session with JWT cookie and x-csrf-token. A status code of 201 (Created) is returned upon successful user authentication. If the authentication process fails, the API responds with a status code of 401 (Unauthorized).

The following is an example of getting the JWT token with the POST method.

The POST request is:

curl -vk -X POST https://172.16.202.71:8000/insightiq/rest/security-iam/v1/auth/login -d '{"username": "administrator", "password": "a"}'  -H 'accept: application/json'  -H 'Content-Type: application/json'

The POST response is:

Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 172.16.202.71:8000...
* Connected to 172.16.202.71 (172.16.202.71) port 8000 (#0)
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server accepted h2
* Server certificate:
*  subject: O=Test; CN=cmo.ingress.dell
*  start date: Dec  4 05:59:14 2023 GMT
*  expire date: Dec  4 07:59:44 2023 GMT
*  issuer: C=US; ST=TX; L=Round Rock; O=DELL EMC; OU=Storage; CN=Platform Root CA; emailAddress=a@dell.com
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* using HTTP/2
* h2h3 [:method: POST]
* h2h3 [:path: /insightiq/rest/security-iam/v1/auth/login]
* h2h3 [:scheme: https]
* h2h3 [:authority: 172.16.202.71:8000]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [accept: application/json]
* h2h3 [content-type: application/json]
* h2h3 [content-length: 46]
* Using Stream ID: 1 (easy handle 0x5618836b5eb0)
> POST /insightiq/rest/security-iam/v1/auth/login HTTP/2
> Host: 172.16.202.71:8000
> user-agent: curl/8.0.1
> accept: application/json
> content-type: application/json
> content-length: 46
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* We are completely uploaded and fine
< HTTP/2 201
< server: istio-envoy
< date: Mon, 04 Dec 2023 07:19:19 GMT
< content-type: application/json
< content-length: 54
< set-cookie: insightiq_auth=eyJ0eXAiOiJKV1QiLCJhbGciOiJQUzUxMiJ9.eyJjc3JmIjoiN3Z0eG5sMWRxbHIzaGtubGp3MjdwYXl3eW54bzQzdGs0Zmx4IiwiZXhwIjoxNzAxNzg5MTM5LCJpYXQiOjE3MDE3NDU5MzksImlzcyI6IkRlbGwgVGVjaG5vbG9naWVzIiwicm9sZSI6ImFkbWluIiwic2Vzc2lvbiI6InJ6ZnA3ZTRpMXdzd2xuYjBuNGo3YmQwNmF5dWs3emNkeXp1ZSIsInN1YiI6ImFkbWluaXN0cmF0b3IifQ.yKyfXbezscqn6UPa9fXxxjh71MCgeRAXPZhXkG-v92siwXAEP40ASb5bQUFHnAmWwwtlB4Jt8lX9kY8LmRkqi1V7B3v0LgxUp68heAc0HZAh6XO92ac9AfZ9dAuE9H3U4RNELm4vVx8mGrGmuzQymWUG5yRCNk03SpeW8esHnTPRVoGGsE4Cf6ta3BrUXBfic-D_TL01YgyY3Dy_T8Z1oqhkD508GPEYnEeNMU1QtZAwkmj6MJHtGmp69T0ljtQdIW2oi5xYdPs-ZHGSFRGG4j2o8xAEFV8A4igzP-5XOkE9NCcx2mkj67OdvVgNBxCcY-X7cnYyfLgagkanyQSgdA; Secure; HttpOnly; Path=/
< set-cookie: csrf_token=7vtxnl1dqlr3hknljw27paywynxo43tk4flx; Secure; HttpOnly; Path=/
< x-csrf-token: 7vtxnl1dqlr3hknljw27paywynxo43tk4flx
< x-envoy-upstream-service-time: 3179
< content-security-policy: default-src 'self' 'unsafe-inline' 'unsafe-eval' data:; style-src 'unsafe-inline' 'self';
< x-frame-options: sameorigin
< x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
< referrer-policy: strict-origin-when-cross-origin
< 
{"timeout_absolute":43200,"username":"administrator"}
* Connection #0 to host 172.16.202.71 left intact

The JWT cookie and x-csrf-token have been created and highlighted in the POST response section. The timeout for the session is 43,200 seconds (12 hours). You can save them for future uses:

export TOK="insightiq_auth=eyJ0eXAiOiJKV1QiLCJhbGciOiJQUzUxMiJ9.eyJjc3JmIjoiN3Z0eG5sMWRxbHIzaGtubGp3MjdwYXl3eW54bzQzdGs0Zmx4IiwiZXhwIjoxNzAxNzg5MTM5LCJpYXQiOjE3MDE3NDU5MzksImlzcyI6IkRlbGwgVGVjaG5vbG9naWVzIiwicm9sZSI6ImFkbWluIiwic2Vzc2lvbiI6InJ6ZnA3ZTRpMXdzd2xuYjBuNGo3YmQwNmF5dWs3emNkeXp1ZSIsInN1YiI6ImFkbWluaXN0cmF0b3IifQ.yKyfXbezscqn6UPa9fXxxjh71MCgeRAXPZhXkG-v92siwXAEP40ASb5bQUFHnAmWwwtlB4Jt8lX9kY8LmRkqi1V7B3v0LgxUp68heAc0HZAh6XO92ac9AfZ9dAuE9H3U4RNELm4vVx8mGrGmuzQymWUG5yRCNk03SpeW8esHnTPRVoGGsE4Cf6ta3BrUXBfic-D_TL01YgyY3Dy_T8Z1oqhkD508GPEYnEeNMU1QtZAwkmj6MJHtGmp69T0ljtQdIW2oi5xYdPs-ZHGSFRGG4j2o8xAEFV8A4igzP-5XOkE9NCcx2mkj67OdvVgNBxCcY-X7cnYyfLgagkanyQSgdA"

Getting a REST API Session

Use the GET method against /insightiq/rest/security-iam/v1/auth/session/ to get the session information. In the request header, include the cookie and the x-csrf-token field for authentication.

curl -k -v -X GET https://172.16.202.71:8000/insightiq/rest/security-iam/v1/auth/session --cookie $TOK -H 'accept: application/json'  -H 'Content-Type: application/json' -H 'x-csrf-token: 7vtxnl1dqlr3hknljw27paywynxo43tk4flx'

The response is:

Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying 172.16.202.71:8000...
* Connected to 172.16.202.71 (172.16.202.71) port 8000 (#0)
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server accepted h2
* Server certificate:
*  subject: O=Test; CN=cmo.ingress.dell
*  start date: Dec  5 03:16:34 2023 GMT
*  expire date: Dec  5 05:17:04 2023 GMT
*  issuer: C=US; ST=TX; L=Round Rock; O=DELL EMC; OU=Storage; CN=Platform Root CA; emailAddress=a@dell.com
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* using HTTP/2
* h2h3 [:method: GET]
* h2h3 [:path: /insightiq/rest/security-iam/v1/auth/session]
* h2h3 [:scheme: https]
* h2h3 [:authority: 172.16.202.71:8000]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [cookie: insightiq_auth=eyJ0eXAiOiJKV1QiLCJhbGciOiJQUzUxMiJ9.eyJjc3JmIjoiN3Z0eG5sMWRxbHIzaGtubGp3MjdwYXl3eW54bzQzdGs0Zmx4IiwiZXhwIjoxNzAxNzg5MTM5LCJpYXQiOjE3MDE3NDU5MzksImlzcyI6IkRlbGwgVGVjaG5vbG9naWVzIiwicm9sZSI6ImFkbWluIiwic2Vzc2lvbiI6InJ6ZnA3ZTRpMXdzd2xuYjBuNGo3YmQwNmF5dWs3emNkeXp1ZSIsInN1YiI6ImFkbWluaXN0cmF0b3IifQ.yKyfXbezscqn6UPa9fXxxjh71MCgeRAXPZhXkG-v92siwXAEP40ASb5bQUFHnAmWwwtlB4Jt8lX9kY8LmRkqi1V7B3v0LgxUp68heAc0HZAh6XO92ac9AfZ9dAuE9H3U4RNELm4vVx8mGrGmuzQymWUG5yRCNk03SpeW8esHnTPRVoGGsE4Cf6ta3BrUXBfic-D_TL01YgyY3Dy_T8Z1oqhkD508GPEYnEeNMU1QtZAwkmj6MJHtGmp69T0ljtQdIW2oi5xYdPs-ZHGSFRGG4j2o8xAEFV8A4igzP-5XOkE9NCcx2mkj67OdvVgNBxCcY-X7cnYyfLgagkanyQSgdA]
* h2h3 [accept: application/json]
* h2h3 [content-type: application/json]
* h2h3 [x-csrf-token: 7vtxnl1dqlr3hknljw27paywynxo43tk4flx]
* Using Stream ID: 1 (easy handle 0x561f902392c0)
> GET /insightiq/rest/security-iam/v1/auth/session HTTP/2
> Host: 172.16.202.71:8000
> user-agent: curl/8.0.1
> cookie: insightiq_auth=eyJ0eXAiOiJKV1QiLCJhbGciOiJQUzUxMiJ9.eyJjc3JmIjoiN3Z0eG5sMWRxbHIzaGtubGp3MjdwYXl3eW54bzQzdGs0Zmx4IiwiZXhwIjoxNzAxNzg5MTM5LCJpYXQiOjE3MDE3NDU5MzksImlzcyI6IkRlbGwgVGVjaG5vbG9naWVzIiwicm9sZSI6ImFkbWluIiwic2Vzc2lvbiI6InJ6ZnA3ZTRpMXdzd2xuYjBuNGo3YmQwNmF5dWs3emNkeXp1ZSIsInN1YiI6ImFkbWluaXN0cmF0b3IifQ.yKyfXbezscqn6UPa9fXxxjh71MCgeRAXPZhXkG-v92siwXAEP40ASb5bQUFHnAmWwwtlB4Jt8lX9kY8LmRkqi1V7B3v0LgxUp68heAc0HZAh6XO92ac9AfZ9dAuE9H3U4RNELm4vVx8mGrGmuzQymWUG5yRCNk03SpeW8esHnTPRVoGGsE4Cf6ta3BrUXBfic-D_TL01YgyY3Dy_T8Z1oqhkD508GPEYnEeNMU1QtZAwkmj6MJHtGmp69T0ljtQdIW2oi5xYdPs-ZHGSFRGG4j2o8xAEFV8A4igzP-5XOkE9NCcx2mkj67OdvVgNBxCcY-X7cnYyfLgagkanyQSgdA
> accept: application/json
> content-type: application/json
> x-csrf-token: 7vtxnl1dqlr3hknljw27paywynxo43tk4flx
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
< HTTP/2 200
< server: istio-envoy
< date: Tue, 05 Dec 2023 03:19:41 GMT
< content-type: application/json
< content-length: 57
< x-envoy-upstream-service-time: 5
< content-security-policy: default-src 'self' 'unsafe-inline' 'unsafe-eval' data:; style-src 'unsafe-inline' 'self';
< x-frame-options: sameorigin
< x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
< referrer-policy: strict-origin-when-cross-origin
< 
{"username": "administrator", "timeout_absolute": 42758}
* Connection #0 to host 172.16.202.71 left intact

The response body will return the username of the session and its remaining timeout value in seconds.

Managing PowerScale Clusters using REST API

You can also use the IIQ REST API to manage your PowerScale cluster. Like what we’ve seen so far, all requests must include the parameter cookie and x-csrf-token for authentication. When adding clusters into IIQ, you will also need to provide the cluster IP with username and password. For details, refer to the following table:

Table 1. Using REST API to manage PowerScale Clusters

Functionality

REST API Endpoint

REST API Details

Add Cluster to IIQ

POST /insightiq/rest/clustermanager/v1/clusters
curl -k -v -X 'POST' \
'https://<EXTERNAL_IP>:8000/insightiq/rest/clustermanager/v1/clusters' \
  --cookie <COOKIE> \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'x-csrf-token: <X-CSRF-TOKEN>'
  -d '{
    "host": "<HOST>",
    "username": "<USERNAME>",
    "password": "<PASSWORD>"
}'

Delete Cluster from IIQ

DELETE /insightiq/rest/clustermanager/v1/clusters/<GUID>
curl -k -v -X 'DELETE' \
'https://<EXTERNAL_IP>:8000/insightiq/rest/clustermanager/v1/clusters/<GUID>' \
--cookie <COOKIE> \
-H 'accept: application/json' \
-H 'x-csrf-token: <X-CSRF-TOKEN>'

Exporting a Performance Report

To export a performance report from IIQ 5.0.0, you can use the GET method against /iiq/reporting/api/v1/timeseries/download_data with the following query parameters:

  • cluster – PowerScale cluster id 
  • start_time – UNIX epoch timestamp of the beginning of the data range. Defaults to most recent saved report data and time.
  • end_time – UNIX epoch timestamp of the end of the data range. Defaults to most recent saved report data and time.
  • key – the performance key. To get a list of all the supported keys, use the GET method against http://<lP>:8000/iiq/reporting/api/v1/reports/data-element-types

Note: To get the performance key, use values from the response list for data-element-types in definition.timeseries_keys from data elements where report_group is equal to performance and definition.layout is equal to chart. See the following screenshot for an example. In this example, the performance key is ext_net.


 Figure 1. Getting the performance key where the key is ext_net

The following is an example of using the IIQ REST API to export the cluster performance of external network throughput to a CSV file:

curl -vk -X GET "https://10.246.159.113:8000/iiq/reporting/api/v1/timeseries/download_data?cluster=0007433384d03e80b4582103b56e1cac33a2&start_time=1694143511&end_time=1694366711&key=ext_net" -H 'x-csrf-token: vny27rem4l6ww29hvkhuaka0ix7x172wufbv' --cookie $TOK >>perf.csv

Deleting a REST API Session

To remove an IIQ REST API session, use the following API:

curl -k -v -X GET https://<EXTERNAL_IP>:8000/insightiq/rest/security-iam/v1/auth/logout --cookie <COOKIE> -H 'accept: application/json'  -H 'Content-Type: application/json' -H 'x-csrf-token: <X-CSRF-TOKEN>'

Conclusion

The IIQ REST API is a powerful tool. Please refer to the Dell Technologies InsightIQ 5.0.0 User Guide for more information. For any questions, feel free to reach out to me at Vincent.shen@dell.com

 

Author: Vincent Shen


Read Full Blog
  • Isilon
  • PowerScale
  • AWS
  • OneFS
  • APEX

Alert in IIQ 5.0.0 – Part I

Vincent Shen Vincent Shen

Wed, 13 Dec 2023 17:40:06 -0000

|

Read Time: 0 minutes

Alert is a new feature introduced with the release of IIQ 5.0.0. It provides the capability and flexibility to configure alerts based on the KPI threshold.

This blog will walk you through the following aspects of this feature:

  1. Introduction to Alert
  2. How to configure alerts using Alert

Let’s get started:

Introduction

IIQ 5.0.0 can send email alerts based on your defined KPI and threshold. The supported KPIs are listed in the following table:

KPI Name

Description

Scope

Protocol Latency SMB

Average latency within last 10 minutes required for the various operations for the SMB protocol

Across all nodes and clients per cluster.

Protocol Latency NFS

Average latency within last 10 minutes required for the various operations for the NFS protocol.

Across all nodes and clients per cluster.

Active Clients NFS

The current number of active clients using NFS. The client is active when it is transmitting or receiving data.

Across all nodes per cluster.

Active Clients SMB 1

 The current number of active clients using SMB 1. The client is active when it is transmitting or receiving data.

Across all nodes per cluster.

Active Clients SMB 2

The current number of active clients using SMB 2. The client is active when it is transmitting or receiving data.

Across all nodes per cluster.

Connected Clients NFS

The current number of connected clients using NFS. The client is connected when it has an open TCP connection to the cluster. It can transmit or receive data or it can be in an idle state.

Across all nodes per cluster.

Connected Clients SMB

The current number of connected clients using SMB. The client is connected when it has an open TCP connection to the cluster. It can transmit or receive data or it can be in an idle state.

Across all nodes per cluster.

Pending Disk Operation Count

The average pending disk operation count within the last 10 minutes. It is the number of I/O operations that are pending at the file system level and waiting to be issued to an individual drive.

Across all disks per cluster.

CPU Usage

The average usage of CPU cores including the physical cores and hyperthreaded core within last 10 minutes.

Across all nodes per cluster.

Cluster Capacity

The current used capacity for the cluster.

N/A

Nodepool Capacity

The current used capacity for the node pool in a cluster.

N/A

Drive Capacity

The current used capacity for a drive in a cluster.

N/A

Node Capacity

The current used capacity for a node in a cluster.

N/A

Network Throughput Equivalency

Checks whether the network throughput for each node within the last 10 minutes is within the specified threshold percentage of the average network throughput of all nodes in the node pool for the same time.

Across all nodes per node pool.

 

Each KPI requires a threshold and a severity level, together forming an alert rule. You can customize the alert rules to align with specific business use cases.

 

Here is an example of an alert rule:

If CPU usage (KPI) is greater than or equal to 96% (threshold), a critical alert (severity) will be triggered.

The supported severities are:

  1. Emergency
  2. Critical
  3. Warning
  4. Information

You can combine multiple alert rules into a single alert policy for easy management purposes.

If you take a look at the chart above, you will find a new concept called Notification Rule. This is used to define the recipients' Email address and from what severity they will receive an Email:

An example of a notification rule is like this: for user A (user_a@lled.com) and user B (user_b@lled.com), they both will receive Email alerts from all severity.

If you combine the above two examples and put them into the view of alert policy, you will get:

 

At this point, you should understand the  big picture of the alert feature in IIQ 5.0.0. In my next post, I will walk you through the details of how to configure it.

 

 

Read Full Blog

Alert in IIQ 5.0.0 – Part II

Vincent Shen Vincent Shen

Mon, 11 Dec 2023 16:10:19 -0000

|

Read Time: 0 minutes

My previous post introduced one of the key features in IIQ 5.0.0 – Alert and explained how it works. In this blog, we will go into the details of how to configure it.  

How to configure an alert in IIQ 5.0.0

Configure SMTP server in IIQ

Follow the steps below to add the SMTP server in IIQ:

  1. Access Configure SMTP under Settings from the left side menu.
  2. Enter the SMTP Server IP or FQDN. Username and Password are optional.
  3. Click the Save button.

You can send a test email to verify the settings.

 

  1. SMTP configuration

Note: If you keep the SMTP Port number blank, the default will be 25 or 587 for TLS.

Manage Alerts

Create Alert Rules

To create alert rules, follow these steps:

  1. Navigate to Manage Alerts under Alerts from the left side menu.
  2. Click the Alert Rules.
  3. Click the Create Alert Rule button and a pop-up window will appear as shown below:

  1. Create Alert Rule
  2. Specify the KPI, Severity, and Threshold for it. Click the Save button.
  3. (Optional) You can create multiple Alert Rules.

Create a Notification Rule

A notification rule specifies the recipient(s) of SMTP alerts and its associated alert severity. To create a notification rule, follow these steps:

  1. Navigate to Manage Alerts located below  Alerts from the left side menu.
  2. Click the Notification Rules.
  3. Click the button Create Notification Rule and it will pop up a window as shown below.

  1. Create Notification Rule
  2. Input the Recipient Email ID(s) and choose the severity from the dropdown list of Receive Emails for.
  3. Click the Save button.

Create an alert policy

To create an alert policy, follow these steps:

  1. Navigate to Manage Alerts located under Alerts from the left side menu.
  2. Click the Alert Policies.
  3. Click the Create Policy button.
  4. Input the Name and Description in the Policy Details window and click the Next button.
  5. In the Alert Rules subpage, you can choose either Existing Alert Rules or Create Alert Rule by clicking the corresponding button. After you create the alert rules, click the Next Button.

  1. Add Alert Rules
  2. On the Cluster subpage, choose the cluster to which you want to apply the alert settings, and click the Next button.

  1. Choose clusters
  2. On the Notification Rules subpage, you can choose either Existing Notification Rules or Create Notification Rule by clicking the corresponding button. After you choose the rule, click the Next button.

  1. Specify Notification Rules
  2. Click the Save button in the final Review subpage.
  3. The following screenshot is a sample alert email:

  1. Sample email alert

View Alerts

All the alerts can be accessed in Alerts > View Alerts from the left side menu.

  1. View Alert

On this page you can:

  1. Filter alerts by selecting the Duration.
  2. Show alerts by choosing specific Clusters.
  3. Categorize alerts by different severity levels.
  4. Sort alerts by the Date & Time.

Hope you enjoy the reading. If you have any questions for suggestion on this feature, please feel free to reach out to me. (Vincent.shen@dell.com)

Read Full Blog
  • PowerScale OneFS
  • InsightIQ

Mastering Monitoring and Reporting with InsightIQ 5.0.0

Shaofei Liu Shaofei Liu

Mon, 11 Dec 2023 16:32:33 -0000

|

Read Time: 0 minutes

Overview

In the complex landscape of data management, having robust tools to monitor and analyze your data is paramount. InsightIQ 5.0.0 is your gateway to exploring the depths of historical data sourced from PowerScale OneFS. By leveraging its capabilities, you can monitor cluster activities, analyze performance, and gain insights to ensure optimal functionality.

Monitoring Clusters with Dynamic Dashboard

The InsightIQ Dashboard stands as a central hub for monitoring all your clusters, offering a comprehensive overview of their statuses and vital statistics. The dashboard facilitates quick interpretation of data and the action-based navigation links allow you to easily check on the observed statuses.

Here's a breakdown of the essential sections within this powerful monitoring interface:

 

Figure 1 IIQ Dashboard

InsightIQ Status

This section provides an overview of connected clusters, highlighting monitoring errors and any suspended monitoring activities. The InsightIQ Status icons offer a quick assessment of the monitored clusters: green signifies active monitoring, red indicates monitoring errors, grey denotes suspended monitoring or incomplete credentials. There is a fourth status icon. Blue indicates the number of PowerScale clusters whose status is outside of green, red, or grey status values. This is typically due to an internal error.

Additionally, the InsightIQ Datastore Usage icons provide insights into datastore health, with green indicating health, yellow signaling near-full capacity, and red alerting that the datastore has reached its maximum limit.

Alerts

The Alerts section within InsightIQ is a pivotal area displaying crucial data accumulated over the past 24 hours, categorized by severity—emergency, critical, warning, and information. This section shows the top three clusters with the highest number of alerts, granting immediate visibility into potential issues impacting PowerScale clusters. The dashboard offers a swift way to access the 'Alerts' section, where you have the capability to create alerts, defining Key Performance Indicators (KPIs) and thresholds, easily viewable on the dashboard for prompt action. This comprehensive alert system ensures timely responses to potential issues.

Aggregated Capacity for monitored clusters

Get insights into the used and free raw capacity across monitored clusters, as well as the estimated total usable capacity.

Performance Overview

This section presents average values for critical performance metrics like protocol latency, network throughput, CPU usage, protocol operations, active clients, and active jobs, displaying changes in statistics over the past 24 hours. It also offers a convenient link to navigate to the 'View Performance Report', facilitating in-depth analysis across various metrics.   

Monitored Clusters by % Used Capacity

This detailed breakdown showcases used capacity, free raw capacity, estimated usable capacity, and data reduction ratio for each monitored cluster. While it doesn't offer historical data, it provides real-time insights into the present cluster status. It offers quick navigation links to access both Capacity Reports and the Data Reduction Report for easy reference.   

Performance and File System Reports

The heart of InsightIQ lies in its ability to provide detailed performance reports and file system reports. These reports can be standardized or tailored to your specific needs, enabling you to track storage cluster performance efficiently. You also have the flexibility to generate Performance Reports as PDF files on a predefined schedule, enabling easy distribution via email attachments. 

InsightIQ reports are configured using modules, breakouts, and filter rules, providing a granular view of cluster components at specific data points. By employing modules and applying breakouts or filter rules, users can focus on distinct cluster components or specific attributes across the entire report. This flexibility allows the creation of tailored reports for various monitoring purposes.

Harnessing detailed metrics and insights empowers decision-making for crafting insightful performance reports. For instance, if network traffic surpasses anticipated levels across all monitored clusters, InsightIQ enables the creation of customized reports displaying detailed network throughput data. Analyzing direction-specific throughput assists in pinpointing any specific contribution to the overall traffic, aiding in precise troubleshooting and optimization strategies.

 

Figure 2 Sample Cluster Performance Report

Partitioned Performance

The Partition Performance report presents data from configurable datasets, offering insights into the top workloads consuming the most resources within a specific time range. Key data modules include: Dataset Summary, Workload Latency, Workload IOPS, Workload CPU Time, Workload Throughput, and Workload L2/L3 Cache Hits.

For more detailed information, users can focus on modules by average, top workload by max value, or pinned workload by average. 

Note: access to the Partition Performance Report, the InsightIQ user on the PowerScale cluster needs ISI_PRIV_PERFORMANCE with read permission. If unable to view the report, contact the InsightIQ admin or refer to the Dell Technologies InsightIQ 5.0.0 Administration Guide for permission configuration.

 

Figure 3  Sample Partitioned Performance Report

File System Analytic Report

File System Analytics (FSA) reports offer a comprehensive overview of the files stored within a cluster, providing essential insights into their types, locations, and usage.

InsightIQ supports two key FSA report categories: 

  • Data Usage reports focus on individual file data, revealing details like unchanged file durations. 
  • Data Property reports offer insights into the entire cluster file system, showcasing data such as file changes over specific periods and facilitating comparisons between different timeframes.

These reports help in understanding relative changes in file counts based on physical size, offering nuanced perspectives for effective file system management. For instance, by comparing Data Property reports of different clusters, you can observe patterns in file utilization—identifying clusters with regular file changes versus those housing less frequently modified files. Detecting inactive files through Data Usage reports facilitates efficient storage archiving strategies, optimizing cluster space.

These reports also play a pivotal role in verifying the expected behavior of cluster file systems. For example, dedicated archival clusters can be monitored using Data Property reports to observe file count changes. An unexpectedly high count might prompt storage admin to consider relocating files to development clusters, ensuring efficient resource utilization.

Figure 4 Data Properties Report

These FSA reports within InsightIQ not only provide visibility into cluster file systems but also serve as strategic tools for efficient storage management and troubleshooting unexpected discrepancies.

Conclusion: Empowering Data Management

InsightIQ isn't just a monitoring tool. It's a comprehensive suite offering a multitude of functionalities. It's about transforming data into actionable insights, enabling users to make informed decisions and stay ahead in the dynamic world of data management. The robust features, customizable reports, and analytics capabilities make it an invaluable asset for ensuring the optimal performance and health of PowerScale OneFS clusters.

Read Full Blog
  • PowerScale OneFS
  • InsightIQ

Understanding InsightIQ 5.0.0 Deployment Options: Simple vs. Scale

Shaofei Liu Shaofei Liu

Mon, 11 Dec 2023 16:32:33 -0000

|

Read Time: 0 minutes

Overview

InsightIQ 5.0.0 introduces two distinct deployment options catering to varying scalability needs: InsightIQ Simple and InsightIQ Scale. Let's delve into the overview of both offerings to guide your deployment decision-making process.

InsightIQ Simple

Designed for the straightforward deployment and moderate scalability, IIQ Simple accommodates up to 252 nodes or 10 clusters. Here's a snapshot of its key requirements:

  • Target Use Case: Simple deployment scenarios with moderate scaling requirements.
  • Deployment Method: VMware-based deployment using the OVA template.
  • OS Requirements: ESXi 7.0.3 and 8.0.2.
  • Hardware Requirements: VMware hardware version 15 or higher, requires a CPU of 12 vCPU, 32GB memory, and 1.5TB disk space. 

InsightIQ Scale

For organizations demanding extreme scalability, IIQ Scale steps in, supporting up to 504 nodes or 20 clusters, with potential expansion post IIQ 5.0. The details include:

  • Target Use Case: Extreme scalability requirements.
  • Deployment Method: RHEL-based deployment utilizing a specialized deployment script.
  • OS Requirements: RHEL 8.6 x64.
  • Hardware Requirements: Three virtual machines or physical servers, each with a configuration of 12 vCPU, 32GB memory, and specific storage options based on the chosen datastore location.

Here is the summary table:


InsightIQ Simple

InsightIQ Scale

Target Use Case  

Simple deployment and moderate scalability – up to 252 nodes or 10 clusters 

Extreme scalability – up to 504 nodes or 20 clusters (more in post-5.0) 

Deployment Method 

On VMware, using OVA template   

On Red Hat Enterprise Linux (RHEL) system, using an installation script

OS Requirements

ESXi 7.0.3 and 8.0.2 

RHEL 8.6 x64 

Hardware Requirements 

VMware hardware version 15 and higher – 

  • CPU: 12 vCPU 
  • Memory: 32GB 
  • Disk: 1.5TB
  • (Optional) NFS export should contain 1.5 TB

Compute: 

3 virtual machines or physical servers, with each VM or server having: 

  • CPU: 12 vCPU or Cores
  • Memory: 32GB 

 

Storage: 

  • InsightIQ datastore on NFS server.
  • 200GB local disk space per VM or server
  • 1.5TB on NFS server
  • InsightIQ datastore on local partition “/”.
  • 1TB per VM or physical server

Networking Requirements 

2 static IPs on the same network subnet, with PowerScale cluster connectivity

4 static IPs on the same network subnet, with PowerScale cluster connectivity

Leveraging the NFS export

In InsightIQ deployments, leveraging an NFS export for the datastore, whether from a PowerScale cluster or a Linux NFS server, can significantly enhance scalability. However, to ensure a seamless setup, specific prerequisites must be addressed:

Access and Permissions:

  • Guarantee accessibility of the NFS server and export from all servers/VMs where InsightIQ is deployed. This accessibility is crucial for uninterrupted data flow.
  • Set read/write permissions (chmod 777 <export_path>) to ensure unrestricted access for all users utilizing the NFS export. 

Security Measures:

  • Root User Mapping: Avoid mapping the root user (no_root_squash) to maintain secure access control.
  • Mount Access: Enable mount access to subdirectories for streamlined data retrieval and utilization.

Resource Allocation:

  • Allocate a substantial 1.5TB for the NFS export, ensuring ample space for data storage and future scalability.
  • Allocate 200GB free space on the root partition ("/") of all servers/VMs hosting InsightIQ.

By ensuring compliance with these guidelines, organizations can unlock the full potential of InsightIQ while maintaining a robust and reliable infrastructure.

IIQ 5.0.0 Support Matrix

A concise summary detailing supported OneFS versions, host OS, recommended client display configurations, and browser compatibility for both IIQ Simple and IIQ Scale deployments.

InsightIQ uses TLS 1.3 exclusively. A web browser without TLS 1.3 enabled or supported cannot access InsightIQ 5.0.0.