As the requirements of DellNLP evolved, a text sanitization functionality was incorporated into the library and API. This functionality can be used for the following:
The text sanitization functionality is primarily implemented using regular expressions, however, the capability has been empirically evaluated on random samples of Dell Technologies text data to ensure these patterns cover more than 70 percent of data.
Currently, the sanitizer supports following content:
cx has ordered latitude on 1/10/2018 but warranty expires soon. Cx wants to extend the warranty and email the office email firstname.lastname@example.org.
cx has ordered latitude on <DATE> warranty expires soon. Cx wants to extend the warranty and email the office email <EMAIL>
DellNLP provides a highly configurable sanitization class for implementing similar cleaning steps in NLP pipelines.