The platform supports multiple options for modern data stack storage using the protocols that are shown in the table below.
Storage option | Protocol |
PowerScale | hdfs:// (with Spark) and s3a:// (with Starburst) |
ECS | s3a:// |
ObjectScale | s3a:// |
These storage systems are managed and scaled independently from the core platform, providing a decoupled storage and compute architecture. It is also possible to use more than one of these choices in any deployment.
Applications can connect to one or more modern data stack storage options directly over the network from the application-level code:
- The Apache Hadoop client libraries (
hadoop-hdfs-client
) provide the hdfs:// protocol. - The Apache Hadoop AWS libraries (
hadoop-aws
) provide the s3a:// protocol. - Other language libraries which implement the s3a:// or hdfs:// protocols.
Applications can access multiple modern data stack storage backends during:
- Heterogenous queries
- Scenarios like reading source data from one object system while loading it into a Delta Lake table on a different system
Iceberg and Delta Lake are limited to one backend storage system per table.
Depending on the application and its implementation, the images and application bundle may require that the necessary libraries be included. The application must handle authentication to any external storage.