The analytics pipeline is a useful metaphor for how data engineers and data scientists work through:
- Data ingestion
- Data cleansing and transformation
- Data merging and testing
Dell EMC developed a story for explaining both the challenges and most successful approaches for each of these steps in the analytic pipeline. That required a large and complex dataset to showcase real-world challenges and solutions, without incurring the cost and time of solving an enterprise-class problem.
Dell EMC built a story describing a machine learning model-based approach to inventory management for a retailer with hundreds of thousands of individual Stock Keeping Units (SKUs). It shows how Spark and related technologies provide a total solution for solving real-world data analytics problems, by developing individual pieces using a simplified dataset and requirements.