This use case story begins with a fictitious retail company that has a catalog with hundreds of thousands of SKUs, grouped into five market segments:
Their sales demand forecast used for product ordering is based on a manual roll-up of segment managers' estimates, in all sales areas. This roll-up is based on experience and knowledge of local conditions. This process is slow, and often produces incomplete forecasts when staff miss submission deadlines. The company is also experiencing high inventory carrying costs. The estimates of both the area and segment managers and the purchasers tend to overestimate demand in order to avoid stock-outs.
Management wants to add a model-based demand forecasting option to the planning process. They want it to be based on data from their sales order and supply purchasing systems. They were told that estimating individual models for each product is challenging, given the number of SKUs they manage and sales sparsity of many catalog items. The company hired an inventory management consultant who suggested a process called hierarchical forecasting, that is common for organizations with so many products. The consultant also recommended that the company may need new technology to implement the new modeling system.
The company has extensive experience with enterprise-class relational database management systems including a Massively Parallel Processing (MPP) database. They have recently started using a Hadoop Distributed File System (HDFS) data lake to offload some analytics from the overloaded Relational Database Management System (RDBMS). Spark is the most popular tool for accessing data in HDFS. Management wants any new analytics processing programs for the inventory planning system to be developed using primarily Hadoop and Spark, if possible. IT management wants to avoid bringing in new technology silos.