A product recommendation engine is a model or group of models that are used to provide decision support to determine products that a customer is likely to find appealing based on inputs. The inputs are used as the features for training the model. The inputs can include the product that the customers are currently viewing or a history of previous purchases.
In the Dell EMC AI Innovation Lab, we tested several approaches for building our recommendation engine. These approaches include Alternating Least Squares, Neural Collaborative Filtering, Random Forest, and a “Wide and Deep” structured neural network. Using Domino Data Science Platform, our data scientist was able to train each of the model candidates simultaneously using different hardware and library dependencies. For example, the Neural Collaborative Filtering model was built using Intel Analytics Zoo library and trained on our Hadoop cluster with Apache Spark while Random Forest techniques were performed on executors running entirely on the compute grid.
The ability to connect to Apache Spark clusters with Domino Data Science Platform enables data science teams to access the repository of data that is stored in Hadoop without the need to copy data to multiple locations. For client mode executions of Apache Spark jobs, the driver runs the compute grid and offload the processing to the Spark Executors, as shown in the following figure. YARN is used to manage the resources on the Hadoop cluster and results are returned to the driver.
Figure 8. Apache Spark and Domino Data Science Platform
For more information about the integrations between Apache Spark and Domino Data Science Platform, see this blog post.