These tests validate running a Spark bundle application on the OpenShift Container Platform using Delta Lake as its modern data stack storage. Dell Technologies used the Apache Spark 3.4.1 prebuilt Hadoop version with Delta Lake 2.4.0 and 3.0.0rc1.
Dell PowerScale, ECS, and ObjectScale acted as the modern data stack and Iceberg storage in order to access the data for read and write operations.
Data can be written to or read from Spark using different API protocols, such as:
- Hadoop (hdfs://<IP>)
- S3 object storage protocol (s3a://<bucket Name>/)
hadoop-aws:3.2.3
library for accessing the data from ECS through the S3 API protocol.