The next generation of AI-created videos is about to be unveiled, as Google has announced a new tool that can automatically create unique soundtracks.
Several AI-generated video creators have been impressing users over the years, such as OpenAI’s Sora, Runaway Gen-3 Alpha, and Luma AI’s Dream Machine.
But none of these magical machines have been able to create a good soundtrack to go with the video—until now.
Google on Monday announced new video-to-audio tools for its DeepMind AI generator.
“Video Generation Model are advancing at incredible speed, but many existing systems can only produce muted outputs. next Chief step “Another step towards bringing created movies to life is by creating soundtracks for these silent videos,” Google wrote.
“Today, we're sharing progress on our video-to-audio (V2A).” technologywhich makes simultaneous audiovisual production possible.”
“V2A combines video pixels with natural Language “These text prompts generate a rich soundscape for the on-screen action,” he explained.
This tool can be combined with video generation models such as VO to create dramatic soundtracks that perfectly match any scene.
The AI will create music that combines with the characters’ dialogue and other tonal elements to create the perfect auditory environment.
“It can also produce soundtracks for a range of traditional footage, including archival material, silent films, and more – opening up a wide range of creative opportunities,” DeepMind said.
Google has shared impressive examples of the new technology, including clips of a Western-style soundtrack showing a cowboy on a horse and a wild wolf howling at the moon.
Total creative control
Google's new V2A tool will give creators the power to allow AI to generate a soundtrack based on a clip's visual input and language cues, or design the soundtrack themselves.
Users can give the tool prompts and editing cues to guide the output in the desired direction.
One of the instructions read: “Cues for audio: cinematic, thriller, horror movie, music, tension, atmosphere, footprints on concrete.”
The scene shows a man walking through a destroyed building, at the end of which the same man is seen on a sinister bridge.
The AI composes a suitable soundtrack for the clip that matches the tone and pace of the narrative.
Endless soundtrack options
DeepMind's V2A can also generate an unlimited number of soundtrack ideas.
An example cue was as follows: “Cue for audio: a spaceship is moving rapidly through the vastness of space, stars are passing behind it, very high speed, science-fiction.”
The video shows a spacecraft flying through the vast expanse of space, with the light of a star shining in the distance.
The first soundtrack generated by the V2A tool was an uplifting, orchestral piece that matched the image and prompt.
The second soundtrack produced by the AI from the same prompt was darker and slower.
What is Google DeepMind?
Google's DeepMind project was born in 2010.
According to Google, “Google DeepMind brings together two of the world's leading AI labs – Google Brain and DeepMind – to form a single, focused team led by our CEO, Demis Hassabis.”
“Over the past decade, both teams were responsible for some of the biggest research breakthroughs in AI, many of which form the basis of today's thriving AI industry.”
The organization aims to bring the enormous potential of AI to everyone.
“We are a team of scientists, engineers, ethicists, and others working to build the next generation of AI systems safely and responsibly,” he wrote.
“By solving some of the toughest scientific and engineering challenges of our time, we are working to create revolutionary technologies that advance science, transform work, serve diverse communities – and improve the lives of billions of people.”
Source: Google DeepMind
Using “Cues for Audio: Ethereal Cello Atmospheres” changed things even more.
This third soundtrack immediately set a sadder and more reflective tone.
only getting better
Google said these updates are its latest effort to upgrade its entire suite of AI-generated content providers.
He hopes that some of the issues will be improved in upcoming versions.
“Since the quality of the audio output depends on the quality of the video input, artifacts or distortions in the video that fall outside the model's training distribution can cause significant degradation in audio quality,” Google said.
“We are also improving lip synchronization for videos that contain speech. V2A attempts to generate speech from the input transcript and synchronize it with the lip movements of the characters.”
“But the paired video generation model cannot be based on the transcript. This creates mismatches, often resulting in awkward lip-syncing, as the video model does not generate mouth movements that match the transcript,” he added.