Standard object and file methods are used to access ECS storage services. For S3, Atmos and Swift, RESTful APIs over HTTP are used for access. For Content Addressable Storage (CAS), a proprietary access method/SDK is used. ECS natively supports all the NFSv3 procedures except for LINK. ECS buckets can now be accessed by S3a.
ECS provides multi-protocol access where data ingested through one protocol can be accessed through others. This means that data can be ingested through S3 and modified through NFSv3 or Swift, or vice versa. There are some exceptions to multi-protocol access due to protocol semantics and representations of protocol design. The following table highlights the access methods and which protocols interoperate.
Additional capabilities like Byte Range Updates and Rich ACLS
HDFS, NFS, Swift
NFS (path-based objects only and not object ID style based)
V2 APIs and Swift and Keystone v3 Authentication
HDFS, NFS, S3
SDK v3.1.544 or later
Hadoop 3.1.1 compatibility
S3, NFS, Swift
S3, Swift, HDFS, Atmos (path-based objects only and not object ID style based)
Data services, which are also referred to as head services, are responsible for taking client requests, extracting required information, and passing it to the storage engine for further processing. All head services are combined to a single process, dataheadsvc, running inside the infrastructure layer. This process is further encapsulated within a Docker container named object-main that runs on every node within ECS. Infrastructure covers Docker in more detail. ECS protocol service port requirements, such as port 9020 for S3 communication, are available in the latest ECS Security Configuration Guide.
ECS supports S3, Atmos, Swift, and CAS APIs for object access. Except for CAS, objects or data are written, retrieved, updated, and deleted using HTTP or HTTPS calls of GET, POST, PUT, DELETE, and HEAD. For CAS, standard TCP communication and specific access methods and calls are used.
ECS provides a facility for metadata search for objects using a rich query language. This is a powerful feature of ECS that allows S3 object clients to search for objects within buckets using system and custom metadata. While search is possible using any metadata, by searching on metadata that has been specifically configured to be indexed in a bucket, ECS can return queries quicker, especially for buckets with billions of objects.
Metadata search with tokenization allows the customer to use metadata search to search for objects that have a specific metadata value within an array of metadata values. The method must be chosen when the bucket is created. It can be included as an option when creating the bucket through the S3 create bucket API, and include the header x-emc-metadata-search-tokens: true in the request.
Up to thirty user-defined metadata fields can be indexed per bucket. Metadata is specified at the time of bucket creation. Metadata search feature can be enabled on buckets with server-side encryption enabled; however, any indexed user metadata attribute used as a search key will not be encrypted.
Note: There is a performance impact when writing data in buckets configured to index metadata. The impact to operations increases as the number of indexed fields increases. Impact to performance needs careful consideration on choosing if to index metadata in a bucket, and if so, how many indexes to maintain.
For CAS objects, CAS query API provides similar ability to search for objects based on metadata that is maintained for CAS objects which does not need to be enabled explicitly.
For more information about about ECS APIs and APIs for metadata search, see the latest ECS Data Access Guide. For Atmos and S3 SDKs refer to the GitHub site Dell Data Services SDK or Dell ECS. For CAS refer to the Centera Community site. Access to numerous examples, resources, and assistance for developers can be found in the ECS Community.
Client applications such as S3 Browser and Cyberduck provide a way to quickly test or access data stored in ECS. ECS Test Drive is freely provided by Dell which allows access to a public facing ECS system for testing and development purposes. After registering for ECS Test Drive, REST endpoints are provided with user credentials for each of the object protocols. Anyone can use ECS Test drive to test their S3 API application.
Note: Only the number of metadata that can be indexed per bucket is limited to thirty in ECS. There is no limitation to the total number of custom metadata stored per object, only the number indexed for fast lookup.
ECS can store Hadoop file system data. As a Hadoop-compatible file system, organizations can create big data repositories on ECS that Hadoop analytics can consume and process. The HDFS data service is compatible with Apache Hadoop 3.1.1, with support for fine-grained ACLs and extended filesystem attributes.
ECS has been validated and tested with Hortonworks. ECS also has support for services such as YARN, MapReduce, Pig, Hive/Hiveserver2, HBase, Zookeeper, Flume, Spark, and Sqoop.
ECS supports the Hadoop S3A client for storing Hadoop data. S3A is an open source connector for Hadoop, based on the official Amazon Web Services (AWS) SDK. It was created to address storage scaling and cost problems that many Hadoop admin were having with HDFS. Hadoop S3A connects Hadoop clusters to any S3 compatible object store whether in the public, hybrid, or on-premises cloud.
Note: S3A support is available on Hadoop 3.1.1.
As shown in the preceding figure, when the Hadoop cluster is set up on traditional HDFS, its S3A configuration points to the ECS Object data to do all the HDFS activity. On each Hadoop HDFS node, any traditional Hadoop component would use the Hadoop’s S3A client to perform the HDFS activity.
Hadoop configuration analysis using ECS Service Console
The ECS Service Console (SC) can read and interpret your Hadoop configuration parameters with respect to connections to ECS for S3A. Also, SC provides a function, Get_Hadoop_Config that reads the Hadoop cluster configuration and checks S3A settings for typos, errors, and values. Contact ECS support team for assistance with installing ECS SC.
Privacera implementation with Hadoop S3A
Privacera is a third-party vendor that has implemented a Hadoop client-side agent and integration with Ambari for S3 (AWS and ECS) granular security. Although Privacera supports Cloudera Distribution of Hadoop (CDH), Cloudera (another third-party vendor) does not support Privacera on CDH.
Note: CDH users must use ECS IAM security services. If you want secure access to S3A without using ECS IAM, contact the support team.
See the latest ECS Data Access Guide for more information about about S3A support.
Hadoop S3A security
ECS IAM allows the Hadoop administrator to setup access policies to control access to S3A Hadoop data. Once the access policies are defined, there are two user access options for Hadoop administrators to configure:
ECS admin and Hadoop admin need to work together to pre-define appropriate policies. The fictional examples that follow outline three types of Hadoop users that we will create policies for. They are:
For more information about ECS IAM, see ECS IAM.
Cloudera Manager is the Hadoop interface for managing the Hadoop cluster. The HDFS client zip file that we generate includes an HDFS client “Parcels” file that can install the viprfs client jar across the cluster. The jar file is installed on each node within a participating Hadoop cluster. ECS provides file system and storage functionality equivalent to what name and data nodes do in a Hadoop deployment. ECS streamlines the workflow of Hadoop by eliminating the need for migration of data to a local Hadoop DAS and/or creating a minimum of three copies. The following figure shows the ECS HDFS Client jar file installed on each Hadoop compute node and the general communication flow:
Other enhancements added in ECS for HDFS include the following:
ECS includes native file support with NFSv3. The main features for the NFSv3 file data service include:
NFS exports, permissions and user group mappings are created using the WebUI or API. NFSv3 compliant clients mount exports using namespace and bucket names. Here is a sample command to mount a bucket:
mount –t nfs –o vers=3 s3.dell.com:/namespace/bucket
To achieve client transparency during a node failure, a load balancer is recommended for this workflow.
ECS has tightly integrated the other NFS server implementations, such as lockmgr, statd, nfsd, and mountd, hence, these services are not dependent on the infrastructure layer (host operating system) to manage. NFSv3 support has the following features:
NFS file services process NFS requests coming from clients; however, data is stored as objects within ECS. An NFS file handle is mapped to an object id. Since the file is basically mapped to an object, NFS has features like the object data service, including:
Several third-party software products can access ECS object storage. Independent software vendors (ISVs) such as Panzura, Ctera, and Syncplicity create a layer of services that offer client access to ECS object storage using traditional protocols such as SMB/CIFS, NFS, and iSCSI. Organizations can also access or upload data to ECS storage with the following Dell products: