OpenAI is expanding ChatGPT’s capabilities by adding support for voice and image prompts. The company announced the features on Monday and said it is rolling it out to ChatGPT Plus and Enterprise subscribers over the next two weeks. While image prompts are available on all platforms, voice commands are limited to Android and iOS for now.
The company demonstrated how the new features work with videos it shared on an X thread. ChatGPT was able to give directions on how to lower a bike seat using image prompts. Using voice commands, the AI also told a bedtime story. To chat about images, users need to tap the photo button or choose an existing image in their gallery on Android or iOS. The company says image understanding is powered by its GPT-3.5 and GPT-4 language models.
To activate voice prompts, users must go to ‘Settings’ on their mobile app and navigate to ‘New Features’. After opting into voice conversations, a headphone button will appear at the top-right corner of the screen and users can choose a preferred voice out of five options.
“The new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech,” the company shared in a blog post. “We collaborated with professional voice actors to create each of the voices. We also use Whisper, our open-source speech recognition system, to transcribe your spoken words into text.”
OpenAI says the new text-to-speech model is capable of “crafting realistic synthetic voices from just a few seconds of real speech.” To mitigate the risks of impersonation and fraud that may arise because of the new technology, the company is limiting the use to voice chat only. Spotify is using the new text-to-speech technology to pilot its Voice Translation feature that translates podcasts into other languages in the original speaker’s voice.
To make the image chat safe and useful, OpenAI says it worked with Be My Eyes (a free mobile app for the bling and people with poor sight) to understand its use and limitations. “We’ve also taken technical measures to significantly limit ChatGPT’s ability to analyse and make direct statements about people since ChatGPT is not always accurate and these systems should respect individuals’ privacy,” the company added.
While voice and image chat are only available to paid subscribers, the company says the features will be rolled out to others “soon after.”