On the 13th of May 2024, OpenAI introduced a new AI model called GPT-4o. The “o” stands for “omni”, which means it can handle text, speech, and video. Over the next few weeks, GPT-4o will be gradually introduced across OpenAI’s products for developers and consumers.
Mira Murati, OpenAI’s CTO, explained that GPT-4o offers the same level of intelligence as GPT-4 but with improvements across different types of media. “GPT-4o can understand voice, text, and images,” said Murati during a presentation at OpenAI’s San Francisco office. “This is crucial as we look towards the future of how we interact with machines.”
Unlike GPT-4 Turbo, OpenAI’s previous top model, GPT-4o is trained on a mix of images, text, and now, speech. This addition opens up a range of possibilities.
One of the key improvements is in OpenAI’s chatbot, ChatGPT. While ChatGPT has always had a voice mode that converts the chatbot’s responses into speech, GPT-4o enhances this feature, making interactions with ChatGPT more like talking to an assistant.
Users can ask a question to the GPT-4o-powered ChatGPT and even interrupt it while it’s responding. OpenAI says the model responds in “real-time” and can even detect subtle changes in a user’s voice, allowing it to generate responses in various emotional styles, including singing.
GPT-4o also enhances ChatGPT’s ability to understand images. Whether it’s a photo or a screenshot, ChatGPT can quickly answer related questions, from “What’s happening in this code?” to “What brand is the shirt this person is wearing?”
Murati says these features will continue to evolve. For example, in the future, GPT-4o could enable ChatGPT to “watch” a live sports game and explain the rules to you.
“Our models are becoming more complex, but we want the interaction experience to be more natural and easy,” said Murati. “We’ve been focusing on improving the intelligence of these models for the past few years. But this is the first time we’re making a significant leap in terms of ease of use.”
GPT-4o also supports around 50 languages and is twice as fast, half the price, and has higher rate limits than GPT-4 Turbo in OpenAI’s API and Microsoft’s Azure OpenAI Service.
However, not all customers have access to the voice capabilities of the GPT-4o API yet. OpenAI plans to initially launch support for GPT-4o’s new audio features to “a small group of trusted partners” due to misuse concerns.
Starting today, GPT-4o is available in the free tier of ChatGPT and to subscribers of OpenAI’s premium ChatGPT Plus and Team plans, which offer “5x higher” message limits. The improved ChatGPT voice experience powered by GPT-4o will be available in alpha for Plus users in the next month or so, along with options for enterprises.
In related news, OpenAI is launching a refreshed ChatGPT UI on the web with a new, “more conversational” home screen and message layout, and a desktop version of ChatGPT for macOS that lets users ask questions via a keyboard shortcut or take and discuss screenshots. ChatGPT Plus users will get first access to the app starting today, and a Windows version will be released later this year.
Additionally, the GPT Store, OpenAI’s library of and creation tools for third-party chatbots built on its AI models, is now available to users of ChatGPT’s free tier. Free users can also use ChatGPT features that were previously behind a paywall, such as a memory feature that allows ChatGPT to “remember” preferences for future interactions, upload files and photos, and search the web for answers to timely questions.