OpenAI on Monday announced its latest artificial intelligence (AI) model, GPT-4o, which promises improved text, vision, and audio capabilities.
OpenAI announced the model during a live demonstration on Monday, with chief technology officer Mira Murati saying it was a “huge step forward in the usability” of the system. OpenAI’s latest model was launched just one day before Google’s annual developer conference, scheduled for Tuesday.
Here’s what you need to know about the GPT-4o launch.
Improved voice instructions
Users can now view multiple photos on GPT-4o and chat with models about uploaded images. According to OpenAI.
This helps students solve math problems step by step. one of the demonstrations The program, shown during Monday’s announcement, walks users through simple math problems without providing answers.
Another video posted by online coaching company Khan Academy shows how the new model works. can help teach students in real time. Students shared their screens and worked on problems in real time while being guided by a model.
Faster model with improved functionality
Murati said Monday that GPT-4o is faster and provides “GPT-4 levels of intelligence” that improve the system’s capabilities across text, vision and audio.
“This is really shifting the paradigm to the future of collaboration, making this interaction more natural and much easier,” she said.
OpenAI said the new model It can “respond to voice input in just 232 milliseconds, with an average of 320 milliseconds.” Researchers point out that this is about the same amount of time it takes a human to respond in a conversation.
New model released on Monday
GPT-4o will be available starting Monday to all users of OpenAI’s ChatGPT AI chatbot, including those using the free version.
“GPT-4o’s text and image capabilities begin rolling out today on ChatGPT. We are making GPT-4o available on the free tier and also available to Plus users with up to 5x message limits. I am.” Written in an update Monday.
OpenAI CEO Sam Altman said the new voice mode will be rolled out to ChatGPT Plus users in the coming weeks. I wrote it on social platform X.
The model is “natively multimodal”
altman Also posted on X This means the model can generate content and understand commands through voice, text, or images.
in Another blog postHe said the new audio and video modes are “the best computer interface” he has ever used.
“It feels like the AI in the movies. I’m still a little surprised that it’s real. Reaching human-level response times and expressiveness turned out to be a big change.” he wrote in a post on Monday.
Copyright 2024 Nexstar Media Inc. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed.





