Home > AI Solutions > Artificial Intelligence > White Papers > AI Driven Speech Recognition and Synthesis on Dell APEX Cloud Platform for Red Hat OpenShift > Software design
This solution is built upon the Red Hat OpenShift Platform. A more detailed description of the components and interfaces is provided in the following subsections.
Red Hat OpenShift is an enterprise-grade Kubernetes platform for orchestration of containerized applications. It integrates tested and trusted services to reduce the friction of developing, modernizing, deploying, running, and managing applications. It also incorporates Kubernetes enhancements, enabling users to configure and use GPU resources easily. One of the key features of Red Hat OpenShift is its user-friendly graphical interface or the web console. This console enables access to settings and management of all cluster resources. Two main perspectives are available for distinct functions. From the administrator's perspective, the user will find access to workloads, operators, storage, computing, networking, and more. The web console for the Dell APEX Cloud Platform for Red Hat OpenShift also contains a menu to monitor and manage the platform cluster. From the developer's perspective, the user can build applications and pipelines, and manage helm releases and repositories.
Red Hat OpenShift Operators are a powerful mechanism for automating the creation, configuration, and management of Kubernetes-native applications. The validated operators in the OperatorHub provide automation and easy updates on several levels, such as managing the underlying platform components and handling applications as managed services. The following operators are used in this solution:
Red Hat OpenShift AI is an open-source offering based on the Open Data Hub project that allows rapid development, training, and testing of AI/ML models. It also makes hardware acceleration access easy and supports integration with popular open-source tools such as Jupyter, TensorFlow, and PyTorch.
Red Hat OpenShift AI provides an AI sandbox platform for ML workloads in the cloud or on-premises. The platform is available as an add-on cloud service or self-managed software product. It can be ordered, installed, and deployed together with APEX Cloud Platform for Red Hat OpenShift, allowing customers to get started with their AI projects quickly.
Riva offers pre-trained speech models based on BERT (Bidirectional Encoder Representations from Transformers), Transformer-based Seq2Seq, Conformer-CTC, and others. They can be used out-of-the-box for automatic speech recognition (ASR) in multiple languages and speech synthesis (TTS) with expressive human-like voices. The models can also be retrained or fine-tuned using the NVIDIA NeMo framework to include custom datasets such as domain-specific knowledge and custom voice. Models can be deployed as a speech service on-premises or in the cloud using helm charts. Riva’s inference, powered by NVIDIA TensorRT optimizations, can deliver the real-time performance required for a natural, human-like interaction. Riva is served using the NVIDIA Triton Inference Server, also part of the NVIDIA AI Enterprise platform. See the different NVIDIA Riva pipelines represented in Figure 2.
NVIDIA Riva API server is available through the NVIDIA NGC portal. By downloading this service, using Riva Helm chart, two containers will be installed: riva-model-init and riva-speech-api. The API is available as a gRPC-based (general purpose remote procedure call) microservice for low latency streaming and high-throughput offline use cases. NVIDIA Riva supports Linux x86_64 and ARM64 architectures and is available for local Docker or Kubernetes deployment.
The following NVIDIA components are also mentioned in this paper for reference, although interacting directly with them is not required for the proposed solution.
The sample application Riva Contact is a peer-to-peer video chat with streaming ASR and NLP. Two web clients send audio streams from the video chat participants to the Riva Contact server, which submits a gRPC call to the Riva API server, which returns the ASR transcripts. These transcripts are submitted for the NLP service for named entity recognition (NER). The annotated transcripts are then returned to the web clients. This sample application is a single example of a nearly infinite variety of software programs that adopters of this paper’s solution can develop to meet their business needs. See Figure 3 for a graphical summary of the process.