Vertex AI is now the only platform with generative media models across video, image, speech, and music

Warren Barkley
Senior Director, Product Management, Google Cloud
Today, we’re continuing to invest in generative media by adding Lyria, Google’s text-to-music model, to Vertex AI in preview with allowlist. With the addition of music, Vertex AI is now the only platform with generative media models across all modalities – video, image, speech, and music. This means you can build a complete, production ready asset starting from a text prompt, to an image, to a complete video asset with music and speech.
In addition to Lyria, we’re launching new features and updates to improve our other generative media models:
-
New editing and camera control features for Veo 2, our advanced video generation model, are available in preview with allowlist to help our customers refine and repurpose video content with precision. This gives you creative control over your video, helping your teams iterate faster, produce higher-quality content, and reduce post-production time and costs.
-
Chirp 3, our groundbreaking audio generation and understanding model, now includes Instant Custom Voice, a new way to create custom voices with just 10 seconds of audio input. You can also weave AI-powered narration into your existing recordings, and add a speech transcription capability that can distinguish between speakers. Both features are available through a preview with allowlist.
-
Imagen 3, our highest quality text-to-image model, now has improved image generation and inpainting capabilities for reconstructing missing or damaged portions of an image. Our latest update significantly elevates the quality of object removal, delivering a more natural and seamless editing experience.
In alignment with our AI Principles, the development and deployment of Lyria, Veo 2, Chirp 3, and Imagen 3 on Vertex AI prioritizes safety and responsibility with built-in precautions like digital watermarking via SynthID, safety filters, and data governance. And, with our industry-first approach to indemnification, you can use content generated with a range of our products knowing Google will indemnify you for third-party IP claims, including copyright.
Lyria: Text-to-music model now available on Vertex AI
Lyria produces high-fidelity audio, meticulously capturing subtle nuances and delivering rich, detailed compositions across a range of musical genres. Lyria on Vertex AI can help enterprises:
-
Elevate brand experiences: Quickly create soundtracks for marketing campaigns, product launches, or immersive in-store experiences, all tailored to your brand's unique identity. Lyria enables you to create sonic branding that resonates deeply with your target audience, fostering emotional connections and enhancing brand recall.
-
Streamline content creation: For video production, podcasting, and digital content creation, finding the perfect royalty-free music can be a time-consuming and costly process. Lyria eliminates these hurdles, allowing you to generate custom music tracks in minutes, directly aligning with your content's mood, pacing, and narrative. This can help accelerate production workflows and reduce licensing costs.
For example:
Craft a high-octane bebop tune. Prioritize dizzying saxophone and trumpet solos, trading complex phrases at lightning speed. The piano should provide percussive, chordal accompaniment, with walking bass and rapid-fire drums driving the frenetic energy. The tone should be exhilarating, and intense. Capture the feeling of a late-night, smoky jazz club, showcasing virtuosity and improvisation. The listener should not be able to sit still.

Expanding Veo 2 with a new robust set of editing features
Today, we’re announcing the preview of a robust feature set that helps you create videos, edit them, and add visual effects with Veo 2. These features help teams edit and repurpose video content to meet your evolving needs, transforming Veo on Vertex AI from a generation tool to a comprehensive video creation and editing platform. Now you can:
-
Refine and enhance existing footage with:
-
Inpainting: Get clean, professional edits without manual retouching. You can remove unwanted background images, logos, or distractions from your videos, making them disappear smoothly and perfectly in every single frame, so it looks like they were never there.


Clean, professional edits without manual retouching
-
- Outpainting: Extend the frame of existing video footage, transforming traditional video into optimized formats for web and mobile platforms. This helps make it easy to adapt your content for various screen sizes and aspect ratios – for example, converting landscape video to portrait for social media shorts.


Outpainted video with an extended frame
- Implement sophisticated cinematic techniques: New features include directing shot composition, camera angles, and pacing that help teams use sophisticated cinematic techniques with ease, without requiring complex prompting or specialized expertise. For example, you can use camera pre-sets to move the camera in different directions, create a timelapse effect, or generate a drone style shot.

- Create a cohesive video by connecting two existing assets (interpolation): With interpolation, you can define the beginning and end of a video sequence, allowing Veo to seamlessly generate the connecting frames. This ensures smooth transitions and maintains visual continuity, creating a polished and professional final product.


Interpolation creates smooth transitions across frames
Chirp 3: Instant Custom Voice and Transcription updates
Last month, we integrated Chirp 3, our groundbreaking audio understanding and generation model, into Vertex AI. Chirp 3’s new HD voices feature offers natural and realistic speech in over 35 languages with eight speaker options.
Now, we’re announcing two new features:
-
Chirp 3: Instant Custom Voice is now generally available through an allowlist. Now, you can generate realistic custom voices from 10 seconds of audio input. This enables enterprises to personalize call centers, develop accessible content, and establish unique brand voices—all while maintaining a consistent brand identity. To ensure responsible use, Instant Custom Voice includes built-in safety features, and our allowlisting process involves rigorous diligence to verify proper voice usage permissions.
-
Chirp 3: Transcription with Diarization is now available in preview with allowlist. This powerful feature accurately separates and identifies individual speakers in multi-speaker recordings, significantly improving the clarity and usability of transcriptions for applications like meeting summaries, podcast analysis, and multi-party call recordings.
Imagen 3: Improvements to Imagen quality and editing
Over the last year we've made huge improvements to Imagen 3, our highest quality text-to-image model, capable of generating images with even better detail, richer lighting and fewer distracting artifacts than our previous models.

Imagen 3 Editing provides a powerful and user-friendly way to refine and tailor any image. We’ve made significant improvements to Imagen 3 inpainting capabilities for reconstructing missing or damaged portions of an image. Our latest update significantly elevates the quality of object removal, delivering a more natural and seamless editing experience. Here is an example of how you can quickly remove unwanted objects, blemishes, or distractions from your photos.


Easy ways to tailor images, including removing unwanted objects
Build with enterprise safety and security
Designing and developing AI to be secure, safe, and responsible is paramount. Consistent with our AI Principles, Lyria, Veo 2, Chirp 3, and Imagen 3 on Vertex AI were built with safety at the core.
-
Digital watermarking: Google DeepMind's SynthID embeds invisible watermarks into every image, video and audio frame that Imagen, Veo, and Lyria produce, helping decrease misinformation and misattribution concerns.
-
Safety filters: Veo, Imagen, Lyria, and Chirp all have built-in safeguards to help protect against the creation of harmful content and adhere to Google’s Responsible AI Principles. We will continue investing in new techniques to improve the safety and privacy protections of our models.
-
Data governance: We do not use customer data to train our models, in accordance with Google Cloud’s built-in data governance and privacy controls. Your customer data is only processed according to your instructions.
-
Copyright indemnity: Our indemnity for covered generative AI services offers peace of mind for copyright concerns.
Customers are delivering value with generative media models on Vertex AI
Generative AI is no longer a futuristic concept, but a powerful tool driving real-world business results. Companies like WPP, Agoda, Bending Spoons, Monks.Flow, The Brandtech Group, and Bloomberg Connects are using our generative media models in production. Let's look at some concrete examples of how leading enterprises are leveraging Google Cloud's generative media capabilities:
-
Goodby, Silverstein & Partners: In 1937, Salvador Dalí imagined “Giraffes on Horseback Salad” — a cinematic vision so surreal, so ahead of its time, that it proved impossible to produce. For almost a century, it lived only in sketches and notes. Now, with the power of Veo 2, Goodby Silverstein & Partners and The Dalí Museum have realized that vision — using tools finally capable of transforming surrealism into film.
“Dalí imagined a film so surreal, so untethered from convention, that it couldn’t exist in his lifetime. Now, thanks to the astonishing capabilities of Veo 2 and Imagen 3, we’ve been able to help bring that vision to life—not as a replica, but as a reawakening. It’s one of the most creatively thrilling things we’ve ever done.” – Jeff Goodby, Co-Chairman, Goodby Silverstein & Partners.

L'Oreal Groupe:
L'Oreal Groupe is leveraging Veo and Imagen to transform the end-to-end production of high-quality video and image assets, helping foster greater creative exploration across their global marketing initiatives and upholding their commitment to trustworthy AI.
"By integrating Veo and Imagen into our creative process, we're not just speeding up marketing content creation, we're changing how we approach creativity. These models act as powerful creative partners, empowering our teams to experiment with new ideas and respond to the market. We’re expanding our qualitative video and image production across 20 additional countries and languages, all while upholding our trustworthy AI values." – Thomas Ménard, Manager of AI Center Enablement, L’Oreal Groupe
Kraft Heinz:
Kraft Heinz’s Tastemaker platform empowers their teams with access to Veo 2 and Imagen 3, dramatically accelerating creative and campaign development processes.
"With Veo 2 on Vertex AI as part of our Tastemaker platform, Kraft Heinz has unlocked unprecedented speed and efficiency in our creative workflows. What once took us eight weeks is now only taking eight hours, resulting in substantial cost savings. Implementing Google Cloud AI within our platform that is deeply trained on our brand intelligence, allows innovation and creative teams to rapidly prototype, test, and deploy content, transforming how we bring our iconic brands to life." – Justin Thomas, Head Digital Experience & Growth

By leveraging our cutting-edge AI models on Vertex AI, enterprises are achieving remarkable gains in efficiency, creativity, and customer engagement. This momentum is a testament to the power of our technology and its ability to help drive tangible business value.
Get started
Get started with Veo, Imagen, and Chirp on Vertex AI today. To get started with Lyria, reach out to your Google Cloud account representative.