Home > AI Solutions > Gen AI > White Papers > Product Support Quick Notes Retrieval > Business results
From a business standpoint, DTS's improvements in semantic, search-based PSQN article retrieval have resulted in significant cost savings across various domains. These savings include reduced manual work hours, fewer dispatches, and faster handle times. In the support space, several factors contribute to efficient and satisfactory service delivery.
SMEs support tech-support agents through the NBA tool, ensuring the accessibility and validation of essential documents. These tasks involve manual symptoms/keywords generation, downstream validation of affected products, and validation of online articles. Collectively, manual validation processes previously consumed over 500 hours annually. However, DTS's model has effectively eliminated the need for these activities, resulting in substantial time savings.
As a large tech corporation selling products and services globally, DTS encounters significant operational costs and a high volume of users seeking technical support, with issues ranging from malfunctioning peripherals to faulty motherboards. As a first line of support, DTS’s primary aim is to resolve issues through “soft troubleshooting,” which addresses problems without hardware replacement, favoring software-based solutions wherever applicable. While hardware dispatches can resolve certain specific issues, low confidence levels may lead to unnecessary costs such as shipping costs or write-offs.
In the consumer space, where over 1,900 different product models exist, the need for focused and efficient troubleshooting guides is important for support agents. By reducing “troubleshooting noise” (irrelevant documents or articles) by approximately 20 percent, users can quickly access relevant answers without sifting through extensive documentation.
Given the number of support agents, average support cases, and overall support costs, along with an annual dispatch cost exceeding approximately $60 million, this solution addresses approximately 12 percent of these cases, resulting in estimated annual savings of approximately 7 million.
Semantic search, which enables advanced natural language understanding, has transformed how DTS interprets user queries. Unlike lexical search, which strictly depends on keyword matching, semantic search considers the context and intent behind a query. This nuanced approach significantly enhances the accuracy of search results, ensuring that the articles retrieved are directly relevant to the specific issues described by customers.
Consider a customer query like "no video," which in a lexical search might return less relevant articles such as "4K video playback stuttering on XPS 15 9520 and Precision 5570 Computers." It could also return "Low Audio Volume from Internal Speakers on Precision 5570," where an article about audio issues might erroneously rank in the top three due to keyword overlap. Semantic search, however, discerns that the user is likely experiencing issues with video output rather than video file playback or audio problems. It prioritizes articles like "Troubleshooting Video Output Issues on Dell Laptops," directly addressing the customer's actual concern.
As a result, support agents equipped with semantic search capabilities can resolve customer issues more swiftly and accurately, markedly enhancing the user experience and operational efficiency.
Technological advancements in Machine Learning and Natural Language Processing, specifically LLMs, allow for efficient content generation, keyword enhancement, and comprehensive understanding of textual data. By employing LLMs, organizations can significantly expedite the development process by generating high-quality content swiftly. This rapid prototyping capability not only accelerates product innovation, meeting and beating customer demands, but also fosters a competitive edge in the marketplace, driving business growth.
Data quality and availability are central to ML systems. Like many others, DTS’s system leverages data from one business unit to benefit another, and alignment is key to its success. This was demonstrated following DTS's data understanding and article tagging efforts, which required alignment on specific taxonomy data fields. Several stages of development required alignment, including bringing data into accessible data stores for processing, and centralization of the source data and coordination of its ongoing maintenance in line with DTS's system’s expectations.
The use of LLMs for summarizing KB articles has led to several key improvements in precision, adaptability, and searchability. LLMs accurately distill essential information from the articles, resulting in concise summaries that reduce redundancy and focus on the most critical aspects of the content. Moreover, the creation of concise summaries and well-defined keywords has greatly improved the searchability of the articles. DTS can efficiently find the most relevant articles for a query based on the summaries generated by model. Although automated summarizations have reduced manual labor and enhanced processing speeds, summaries may occasionally experience precision loss, particularly when dealing with complex technical content. In future iterations, DTS plans to mitigate these challenges by employing alternative automated approaches, and a human-in-the-loop strategy that integrates automated outputs with expert reviews. This approach aims to enhance the accuracy and adaptability of generated summaries while preserving the advantages of automation.
While global thresholding is simple and easy to implement, it has several limitations, such as effectively handling diverse queries, adapting to different relevance requirements, and maintaining consistent performance. These limitations eventually make global thresholding less suitable for retrieving PSQNs in a dynamic and varied tech troubleshooting domain. A single threshold cannot account for the varying levels of relevance and specificity required by different queries. Some queries may need a higher threshold to ensure only highly relevant PSQNs are retrieved, while others may require a lower threshold to include more general information. For instance, the query "Error code 404 on Product X," might need highly relevant PSQNs, suggesting a higher threshold. The more generic query, "Slow performance on Product Y," might require a broader range of PSQNs, suggesting a lower threshold. Lack of customization, inconsistent relevance of the retrieved PSQNs, inflexibility in adapting to data distribution changes over time, and handling diverse data are important limitations of the global thresholding approach. A global threshold can also lead to underfitting and overfitting certain queries. This can eventually lead to reduced agent and customer satisfaction. Conversely, dynamic thresholding adapts to the specificities in the data and provides a tailored and an effective approach for PSQN retrievals with improved relevance.
This section highlights deploying the embedding in a multi-CPU environment using PyTorch, a significant optimization that substantially improves inference speed.
Problem: In virtualized environments (such as Kubernetes or Docker), PyTorch's reliance on system tools for CPU cores can be inaccurate. Virtualization layers present a virtualized view, leading to suboptimal thread generation and hindering PyTorch's performance. Accurate CPU core detection is crucial for optimal PyTorch performance in virtualized environments.
Solution: Levering Linux kernel features, specifically Control Groups (cgroups), allows us to manage and allocate resource. The /sys/fs/cgroup/cpu (CPU Cgroup) directory contains control files for CPU management within a cgroup. Notably: cpu.cfs_quota_us defines the allowed CPU time (in microseconds) for a process within a period set by cpu.cfs_period_us. cpu.cfs_period_us which defines the CPU usage accounting period length in microseconds). This ratio of quota/period provides a relative measure of allocated CPU share, enabling optimization of PyTorch thread usage.
Setting these variables prevents oversubscription and efficiently uses available CPU cores for PyTorch computations.