Google’s Gemini 2.0 Flash brings multimodal AI to developers

Google has unveiled Gemini 2.0 Flash, the first release in its Gemini 2.0 family of artificial intelligence models. Designed as a cutting-edge workhorse for developers, the experimental model promises low latency and enhanced performance.

Available now via the Gemini API in Google AI Studio and Vertex AI, Gemini 2.0 Flash brings multimodal input and output capabilities, setting a new benchmark for AI innovation.

Gemini 2.0 Flash builds on the success of its predecessor, 1.5 Flash, boasting twice the speed and outperforming 1.5 Pro on key benchmarks. The new model supports not only multimodal inputs like images, video, and audio but also multimodal outputs, including text-to-speech, multilingual audio, and native image generation.

Developers can leverage the model for advanced tasks such as tool usage, including Google Search, code execution, and third-party function integration.

The launch also introduces a new Multimodal Live API, enabling real-time audio and video-streaming input alongside multiple tool integrations. General availability of Gemini 2.0 Flash and additional model sizes is slated for January 2024.

Google is also showcasing prototypes that leverage Gemini 2.0’s native multimodal capabilities, pushing the boundaries of agentic research. Key projects include:

Project Astra: A universal AI assistant prototype with improved memory, multilingual dialogue, and tool integration like Google Search and Maps.

Project Mariner: An experimental Chrome extension exploring human-agent interaction by completing tasks within a browser environment.

Jules: An AI-powered coding assistant integrated with GitHub, designed to assist developers in planning and executing coding tasks.

These projects are in early stages but demonstrate promising applications in real-world scenarios. Trusted testers are providing feedback, paving the way for broader deployment in the future.

In a blog post, Google CEO Sundar Pichai predicted the technology contained in Gemini 2.0 will “understand more about the world around you, think multiple steps ahead and take action on your behalf, with your supervision.”

It’s a similar goal being pursued by hard-charging rivals such as OpenAI, with its chatGPT technology, and industry powerhouse such as Microsoft with a variety of similar tools on its Windows software.

Besides trying to outshine OpenAI and other ambitious startups, Google is also trying to stay a step ahead of Apple as that trendsetting company begins to blend AI into its latest iPhones and other devices.

After releasing a software update enabling the first bundle of the iPhone’s “Apple Intelligence” features that spruced up the device’s Siri assistant, another batch of the AI technology came out with a free software update that was also released Wednesday.

Starting December 11, users can experience a chat-optimised version of Gemini 2.0 Flash through the Gemini app. The experimental model is accessible via desktop, mobile web, and soon, the Gemini mobile app.

Early next year, Google plans to integrate Gemini 2.0 into a broader range of its products.

(With inputs from AP)

Also Read: Apple’s next Ultra smartwatch will be able to send texts via satellite