This chapter provides guidance on a reference architecture that Dell EMC considers appropriate for general-purpose data analytics involving all stages of an analytics pipeline using Apache Spark.
Dell EMC used an Inventory management scenario as an example, but many additional analytics scenarios can also be supported with the same environment.
Apache Spark is a unified analytics engine for large-scale data processing. Spark is a flexible, general-purpose analytics framework with language APIs for Scala, Java, Python, and R developers. This design enables Spark to run on many processor architectures. It is compatible with multiple operating systems, and supports several cluster managers for parallel execution. It is extensible and allows additional libraries to be added. It supports diverse data sources. However, this flexibility requires you to make many infrastructure and environment decisions.