Test methodology | NVIDIA Riva on Red Hat OpenShift with Dell PowerFlex | Dell Technologies Info Hub

Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

None

None

Thank you for your feedback!

Speech recognition in Riva is a GPU-accelerated compute pipeline with optimized performance and accuracy. Riva supports offline/batch and streaming recognition modes.
Automatic Speech Recognition (ASR) takes an audio stream or audio buffer as input and returns one or more text transcripts, along with additional optional metadata.
The text-to-speech (TTS) pipeline that is implemented for the Riva TTS service is based on a two-stage pipeline. Riva first generates a mel-spectrogram using the first model, and then generates speech using the second model. This pipeline forms a TTS system that enables you to synthesize natural sounding speech from raw transcripts without any additional information such as patterns or rhythms of speech.
For this paper, the PowerFlex engineering team chose the most common use cases of Riva ASR and Riva TTS along with basic performance tests were chosen to demonstrate that the PowerFlex family is well suited for NVIDIA A100 GPUs on Red Hat OpenShift environment.
For Riva ASR, they considered the following use cases:
- Conversion of an audio file to a text file.
- Performance tests for Riva ASR using the LibriSpeech dataset. LibriSpeech is a corpus of approximately 1000 hours of 16 kHz read English speech. The LibriSpeech datasets are available at the Open SLR website.
For Riva TTS, they considered the following use cases:
- Conversion of a text to an audio file.
- Performance tests for Riva TTS using the standard text file.