These tests validate running a Spark bundle application on the using Delta Lake as its modern data stack storage. Dell Technologies used the Apache Spark 3.4.1 prebuilt Hadoop version with Delta Lake 2.4.0 and 3.0.0rc1.
Dell PowerScale and Dell ECS acted as the Delta Lake modern data stack storage in order to access the data for read and write operations. Dell Technologies also validated Spark with an NVIDIA GPU on this platform.
Data can be written to or read from Spark using different API protocols, such as:
- Hadoop (hdfs://<IP>)
- S3 object storage protocol (s3a://<bucket Name>/)
hadoop-aws:3.2.3
library for accessing the data from ECS through the S3 API protocol.