Gemini Drops
Gemini is constantly evolving, but Gemini Drops makes it easier to keep up with what’s being released. Check here regularly to find feature announcements, product tips, and see how our community is using Gemini to create, research, and do more.
Be in the know of what’s next
The Podcast
Ani Baddepudi, Gemini Model Behavior Product Lead, joins host Logan Kilpatrick for a deep dive into Gemini’s multimodal capabilities. Their conversation explores why Gemini was built as a natively multimodal model from day one, the future of proactive AI assistants, and how we are moving towards a world where “everything is vision.” Learn about the differences between video and image understanding and token representations, higher FPS video sampling, and more.
Chapters:
0:00 - Intro
1:12 - Why Gemini is natively multimodal
2:23 - The technology behind multimodal models
5:15 - Video understanding with Gemini 2.5
9:25 - Deciding what to build next
13:23 - Building new product experiences with multimodal AI
17:15 - The vision for proactive assistants
24:13 - Improving video usability with variable FPS and frame tokenization
27:35 - What’s next for Gemini’s multimodal development
31:47 - Deep dive on Gemini’s document understanding capabilities
37:56 - The teamwork and collaboration behind Gemini
40:56 - What’s next with model behavior
Gemini on Instagram
Research Papers
Developers
Cancel Google One AI Premium subscription any time before your trial expires. No refunds for partial billing periods, except as required by applicable law. By subscribing, you agree to terms for Google One, Google, and offers. See how Google handles data. Gemini Pro and Gemini for Gmail, Docs, and more are only available for ages 18+. Gemini for Gmail, Docs and more is available in select languages. Rate limits may apply.