Overview

Of the 30 queries in TPCx-BB, five queries (Q5, Q20, Q25, Q26, and Q28) perform widely used machine learning (ML) algorithms—supervised and unsupervised—as part of their workflow, including K-Means, Logistic Regression, and Naïve Bayes Classification.

The following table presents the description of the five queries that involve the execution of ML algorithms to represent scenarios realistically that arise in a retail-based environment:

Table 1. Description of queries using ML algorithms
Query	Description
Q5	Uses Logistic Regression to build a model to predict which users are interested in each item category based on existing users' online activities.
Q20	Groups customers based on their purchase return behavior, including frequency of purchase returns, return order ratio (total number of orders partially or fully returned compared with the total number of orders), return item ratio (total number of items returned compared with the number of items purchased), and return amount ratio (total monetary amount of items returned compared with the amount purchased).
Q25	Groups customers based on how often they visit the store by considering the recency of their last visit, frequency of visits, and monetary amount spent on each visit.
Q26	Clusters customers into book buddies or club groups based on their in-store book purchasing histories.
Q28	Builds a model to classify the sentiment of each review provided by users (Positive, Negative, Neutral).

In addition to these five ML queries, there are four queries that perform Natural Language Processing (NLP) tasks. These NLP tasks resemble business decision-making scenarios that require, for instance, identifying physical stores with flat or declining sales in four consecutive months and checking if they have any negative reviews, or analyzing online user reviews to check if there are any negative reviews for items with the highest number of returns across all stores (physical and online).

The following figure shows the workflow of queries that include an ML model training step as part of their workflow:

The figure shows the reference workflow. The data warehouse is shown on the left with arrows pointing right through the training dataset, testing dataset, and ML model, and then to the data postprocess. — Figure 2. Reference workflow for queries that include ML model training as part of their workflow (that is Q5 and Q28)

Your Browser is Out of Date

Overview