Natural Language Processing

“Hey Google, do I look good today?”

“You’re more stunning than a new router fresh out of the box.”

“Aww, thank you!”

“You’re welcome.”

Oh, the joys of natural language processing, and one of many short conversations some of us have with our smart home or personal assistance devices.

The AI subfield of Natural Language Processing (NLP) trains computers to understand human language so that computers can communicate using the same language. The interdisciplinary studies of theoretical computer science, principles of linguistics, and artificial intelligence (AI) that are focused on natural human language and human-machine interactions, brought about what we know today as NLP. Linguistics provides the formula for language such as semantics, syntax, vocabulary, grammar and phrases, while computer science and machine/deep learning transform these linguistic formulas into the NLP algorithm itself.

Common examples of NLP in use today include:

Email spam detection or document classification
Website chatbots
Automated voice response systems (IVR/AVR) on support calls
Support and marketing use cases analyze written text on the Internet, in support tickets, on social media platforms, and more to determine if the content contains positive or negative sentiment about a product or service.
Real-time translation of a language to another such as in Google Translate.
Search made simple such as with Google Search
On-demand spell checking such as in Microsoft Word
On-demand next word prediction found in messaging applications such as on mobile phones.
In drug trials where text is scanned to determine overlap in intellectual property during drug development.
Personal assistance agents such as Siri, Alexa, Cortana, and Google Assistant

In the case of personal assistants as an example, NLP in action looks like the following:

You ask Siri: ‘What’s the weather today?”
Siri collects your question in audio format and converts it to text, which is processed for understanding.
Based on that understanding, a response is created, converted to audio, and then delivered to you.

Algorithmically, NLP starts with understanding the syntax of the text to extract the grammatical sense from the arrangement of words; a much easier task as most language has clearly defined grammatical rules that can be used to train the algorithms. When the syntax is understood, the algorithm works to infer meaning, nuance, and semantics, which is a harder task because language is not a precise science. The same thing can be said in multiple ways and still have the same meaning in and across multiple languages.

Tools and frameworks

Tools and frameworks that support the implementation of NLP applications, like those mentioned earlier, must be able to derive high-quality information from analyzed text through Text Mining. The components of text mining enable NLP to carry out the following operations:

Noise removal—Extraction of useful data
Tokenization—Identification and key segmentation of the useful data
Normalization—Translation of text into equivalent numerical values appropriate for a computer to understand
Pattern classification—Discovery of relevancy in segmented data pieces and classify them

Common NLP frameworks with the capabilities that are described above are listed below. The intricacies of these frameworks are outside the scope of this blog; go to the following sites to learn more.

Conclusion

We know where NLP came from and some of its applications today, but where is it going and is it ready for wider adoption? What we understand about most existing AI algorithms is that they are suitable for narrow implementations where they carry out a very specific task. Such algorithms are considered to be Artificial Narrow Intelligence, and not Artificial General Intelligence; where the latter implies that they are expert at many things. Most AI is still yet to fully have a grasp on context and what covers time, space, and causality the way humans do. NLP is no exception.

For example, an Internet search returns irrelevant results that do not answer our questions because NLP is excellent at parsing large amounts of data for similarities in content. Then, there is the nuance of spoken language mentioned before and the variance in language rules across languages and even domains. These factors make training for complete accuracy difficult. Some ways to address this might be larger data sets, more infrastructure to train, and perhaps model-based training versus the use of neural networks. However, these come with their own challenges.

At Dell, we have successfully deployed NLP in our tech support center applications, where agents write quick descriptions of a customer’s issues and the application returns predictions for the next best troubleshooting step. 3,000 agents use the tool to service over 10 K customers per day.

We use NLP techniques on input text to generate a format that the AI model can use and have employed K-nearest neighbor (KNN) clustering and logistic regressions for predictions. Microservice APIs are in place to pass information to agents as well. To address the concerns around text as input, we worked with our subject matter experts from the tech support space to identify Dell-specific lingo, which we used to develop a library of synonyms where different entries could mean the same thing. This helped greatly with cleaning up data, providing data to train, and helped us group similar words for context.

For a high turnover role (support agents), we were able to train new agents to be successful sooner by making their onboarding process easier. The support application’s ability to provide the right information quickly lessened the time spent on browsing large irrelevant amounts of information, which can lead to disgruntled customers and frustrated agents. We saw a 10% reduction in the time it took for customers to be serviced. The solution made it possible to feed newly discovered issues to our engineering teams when agents reported or searched for new technical issues with which we were not already familiar. This worked conversely to support agents from engineering as well.

Our research teams at Dell are actively feeding our findings on neural machine translations into the open-source community: one of our current projects is work on AI Voice Synthesis, where NLP works so well you can’t tell that a computer is speaking!

For more information about natural language processing (BERT) MLPerf benchmark ratings for Dell PowerEdge platforms, visit the linked blog posts, then reach out to Dell’s Emerging Tech Team for help with NLP projects in your organization.

Your Browser is Out of Date