NVIDIA announced its application framework for building conversational AI services is now available. The new NVIDIA Jarvis framework comes with pre-trained deep learning models and software tools to help developers create casual AI services easily deployed from the cloud or at the edge.
The company offers automatic speech recognition and language understanding, real-time translations for multiple languages, and new text-to-speech capabilities to create expressive conversational AI agents.
The new offering was trained over several million GPU hours on over 1 billion pages of text, 60,000 hours of speech data, and in different languages, accents, environments, and lingos to achieve world-class accuracy, NVIDIA stated in a post.
“Conversational AI is in many ways the ultimate AI,” said Jensen Huang, founder, and CEO of NVIDIA. “Deep learning breakthroughs in speech recognition, language understanding, and speech synthesis have enabled engaging cloud services. NVIDIA Jarvis brings this state-of-the-art conversational AI out of the cloud for customers to host AI services anywhere.”
First, developers can choose pre-trained Jarvis models from the NVIDIA NGC catalog and fine-tune them with the NVIDIA Transfer Learning Toolkit. Models can also be deployed using just a few lines of code, so deep AI expertise isn’t needed.
NVIDIA also partnered with Mozilla Common Voice, an open-source voice data collection, to train voice-enabled apps, services, and devices.
“We launched Common Voice to teach machines how real people speak in their unique languages, accents, and speech patterns,” said Mark Surman, executive director at Mozilla. “NVIDIA and Mozilla have a common vision of democratizing voice technology — and ensuring that it reflects the rich diversity of people and voices that make up the internet.”