Having defined issue taxonomy and a comprehensive set of labels, the next step builds an ML training dataset.
As mentioned in the Introduction, Dell supports agents use CRM to record customer issues, queries, requests, and other important diagnostic/troubleshooting notes, which are known as “Case Logs.” Case logs are free text entered by agents that contain information about customer-facing issues, diagnostic steps, and resolution recommendations. Therefore, we decided to use these logs for training our ICC model. The logs are saved in a database along with other key information related to products and troubleshooting activities.
To build the training dataset, two approaches were used:
The results are multiclass datasets for consumer issues with approximately 24 K records in English, Mandarin, and Portuguese collected during 2019 to 2020. Each record primarily consists of two textual fields, which make the source: log title, log description and the target into one or more standardized T1-T2-T3 labels. In addition, a primary key was also stored for traceability – linking to each diagnostic/communication activity.