OpenAI may soon launch a multimodal AI digital assistant


According to a new report, OpenAI is showing some of its customers a new multimodal AI model that can talk to you and recognize objects. Information, Citing unnamed sources who saw it, the outlet says it could be part of the company's plan to show off on Monday.

The new model reportedly provides faster, more accurate interpretation of images and audio than existing separate transcription and text-to-speech models., It will apparently be able to help customer service agents “better understand the tone of voice of callers or whether they are being sarcastic,” and “theoretically,” the model could help students with math. or can translate real-world signals, writes Information.

The outlet's sources say the model can outperform GPT-4 Turbo in “answering certain types of questions,” but is still likely to make confident mistakes.

It's possible that OpenAI is also rolling out a new built-in ChatGPT capability for making phone calls, according to developer Ananey Arora, who posted the above screenshot of call-related code. Arora also saw evidence that OpenAI had provisioned servers for real-time audio and video communications.

None of these will be GPT-5 if it is unveiled next week. CEO Sam Altman has explicitly denied that its upcoming announcement has anything to do with the model being considered “materially better” than GPT-4. Information Writes that GPT-5 may be released publicly by the end of the year.

Leave a Comment

“The Untold Story: Yung Miami’s Response to Jimmy Butler’s Advances During an NBA Playoff Game” “Unveiling the Secrets: 15 Astonishing Facts About the PGA Championship”