Apache Spark deployment | Design Guide—Data Analytics with SQL Server 2022 on Red Hat OpenShift and Dell ObjectScale | Dell Technologies Info Hub

Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

Apache Spark deployment

Apache Spark deployment

Thank you for your feedback!

Overview
To convert data files to the Delta Lake format, we must create an Apache Spark image with Delta Lake drivers.
1. The deployment of a Spark pod in OpenShift is facilitated through the utilization of a yaml manifest file. In this scenario, a custom Spark image is used, which is stored within a private image registry.
Figure 19. Spark pod yaml manifest
1. After the pod has been deployed, it can be verified by using the following oc command.
Figure 20. Spark pod status
1. We can now establish a connection to the Spark pod and commence a PySpark shell by running the pyspark command. In this process, we provide the relevant ObjectScale S3 API endpoints, as well as the access and secret keys.
Figure 21. Spark running pyspark commands
When the command is successfully run, the Spark login screen will appear, as depicted in the following image.
Figure 22. Spark shell