The Google Gemini Generative AI Platform: Everything You Need to Know

May 11, 2024

By auroraoddi

Generative AI is revolutionizing the way we interact with technology. In this context, Google has introduced Gemini, its suite of leading generative AI models, apps, and services. Gemini represents a significant evolution from Google’s previous language models, such as LaMDA, because of its ability to work multimodally with text, images, audio, and video.

But what exactly is Gemini? How can you use it? And how does it compare with the competition? In this article we will delve into everything you need to know about Google’s new generative AI platform, from its key features to its applications and costs. We will also explore where and how you can try Gemini. So get ready to dive into the fascinating world of Gemini!

What is Gemini?

Gemini is the flagship suite of generative AI models developed by Google’s AI research labs, DeepMind and Google Research. The Gemini family includes three main models:

Gemini Ultra

The highest performing Gemini model, capable of performing a wide range of multimodal tasks, such as step-by-step physics problem solving, identifying relevant scientific papers, and generating formulas to update graphs and tables.

Gemini Pro

A “lightweight” model of Gemini, with improved reasoning, planning and comprehension capabilities over Google’s previous LaMDA model. Gemini Pro can process large amounts of text, code, audio and video, although with slower processing times.

Gemini Nano

A smaller, more efficient version of the Gemini Pro and Ultra models that can run directly on mobile devices such as the Pixel 8 Pro and Galaxy S24 smartphones. Gemini Nano powers features such as audio synthesis in recorders and smart responses in Gboard keyboards.

What distinguishes Gemini from other language models such as LaMDA is its multimodal nature. Whereas LaMDA was trained exclusively on text data, Gemini was pre-trained and refined on a wide range of audio, image, video, and code content in different languages. This feature enables Gemini models to understand and generate multimodal content, paving the way for a wide range of applications.

Differences between Gemini Apps and Gemini Templates

It is important to note that Gemini is distinct and independent from the Gemini apps (formerly known as Bard) available on the web and mobile devices. Gemini apps are simply an interface through which certain Gemini models can be accessed, a kind of “client” for Google’s generative AI.

In addition, Gemini is also independent of Imagen 2, Google’s image generation model available in some of its tools and development environments.

What Can Gemini Models Do?

Because of their multimodal nature, Gemini models can, in theory, perform a wide range of tasks from speech transcription to captioning images and videos to generating artwork. Some of these capabilities have already reached the product stage, while others are still under development.

Gemini Ultra

According to Google, Gemini Ultra can be used to help with physics homework, solving problems step-by-step on a worksheet and spotting possible errors in answers that have already been filled in. It can also be applied to tasks such as identifying scientific articles relevant to a given problem, extracting information from those articles, and generating formulas needed to recreate a graph with more recent data.

Although Gemini Ultra technically supports image generation, this functionality has not yet been integrated into the commercialized version of the model, perhaps because the mechanism is more complex than simply sending prompts to an image generator such as DALL-E 3 in ChatGPT.

Gemini Ultra is available through theAPI of Vertex AI, the AI development platform run by Google, and AI Studio, Google’s web tool for app and platform developers. It is also the basis for Gemini apps, but access to Gemini Ultra through the premium Google One AI plan requires a $20 per month subscription.

Gemini Pro

Google says Gemini Pro represents an improvement over LaMDA in terms of reasoning, planning and comprehension capabilities. An independent study found that the initial version of Gemini Pro was actually better than OpenAI’s GPT-3.5 in dealing with longer and more complex chains of reasoning. However, the study also found that, like all large language models, this version of Gemini Pro had particular difficulty with mathematical problems involving multiple digits, and users found examples of faulty reasoning and obvious errors.

Google promised remedies, and the first came in the form of Gemini 1.5 Pro. This improved model can process about 700,000 words or 30,000 lines of code, 35 times more than the previous version. Moreover, being multimodal, it can analyze up to 11 hours of audio or one hour of video in different languages, albeit slowly.

Gemini 1.5 Pro entered public preview on Vertex AI in April 2024. An additional endpoint, Gemini Pro Vision, is also available, which can process text and images/video and produce outputs similar to OpenAI’s GPT-4 with Vision model.

Gemini Nano

Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and is efficient enough to run directly on phones (such as the Pixel 8 Pro, Pixel 8, and Samsung Galaxy S24) without having to send the task to a remote server.

Currently, Gemini Nano powers some features on these devices, such as summarizing recorded conversations in the Recorder app and smart answers in the Gboard keyboard. These features can also be used offline, without any data leaving the phone, ensuring users’ privacy.

How does Gemini compare with OpenAI’s GPT-4?

Google has repeatedly boasted Gemini’s superiority in various benchmarks, claiming that Gemini Ultra outperforms the cutting-edge on 30 of 32 academic benchmarks widely used in research and development of large language models. He also claims that Gemini 1.5 Pro is more capable than Gemini Ultra in some scenarios, such as content synthesis, brainstorming and writing.

However, regardless of the question of how far the benchmarks actually indicate a better model, the scores Google refers to seem to be only slightly higher than the corresponding OpenAI models. In addition, some first impressions have not been great, with users and academics pointing out the tendency of the older version of Gemini Pro to make basic errors, struggle with translations, and provide poor coding suggestions.

How much does Gemini cost?

Gemini 1.5 Pro is free to use in Gemini apps and, for now, in AI Studio and Vertex AI.

However, once Gemini 1.5 Pro comes out of preview phase on Vertex, the model will cost $0.0025 per character and theoutput will cost $0.00005 per character. Vertex customers pay per 1,000 characters (about 140-250 words) and, in the case of templates such as Gemini Pro Vision, per image ($0.0025).

Thus, for a 500-word article containing 2,000 characters, the cost to summarize it with Gemini 1.5 Pro would be $5, while the cost to generate an article of similar length would be $0.1.

Gemini Ultra pricing has not yet been announced.

Where Can You Try Gemini?

Gemini Pro

The easiest way to experience Gemini Pro is through the Gemini apps, where the Pro and Ultra models answer queries in a wide range of languages.

Gemini Pro and Ultra are also accessible in preview via the Vertex AI API, with free use “within limits” for now and support for certain regions, including Europe, as well as features such as chat and filters.

Elsewhere, Gemini Pro and Ultra can be found in AI Studio. Using this service, developers can iterate on Gemini-based prompts and chatbots and then obtain API keys to use them in their apps or export the code to a more advanced IDE.

Code Assist (formerly Duet AI for Developers), Google’s suite of AI assistance tools for code completion and generation, uses Gemini templates. Developers can make large-scale changes throughout source code, such as updating dependencies between files and reviewing large portions of code.

Google has also integrated Gemini templates into its development tools for Chrome and Firebase, as well as into its database creation and management tools. It has also launched new Gemini-based security products, such as Gemini in Threat Intelligence, a component of Google’s Mandiant cybersecurity platform that can analyze large portions of potentially malicious code and allow users to perform natural language searches for ongoing threats or indicators of compromise.

Gemini Nano

Gemini Nano is featured on Pixel 8 Pro, Pixel 8 and Samsung Galaxy S24 devices, and will be integrated into other devices in the future. Developers interested in incorporating the model into their Android apps can sign up for a preview.

Will Gemini be coming to the iPhone?

It might! Apple and Google are reportedly in talks to integrate Gemini into a feature set to be included in an upcoming iOS update later this year. Nothing is final yet, as Apple is also negotiating with OpenAI and working on developing its own generative AI capabilities.

It remains to be seen how the collaboration between the two tech giants will evolve and what role Gemini will play in the iOS ecosystem in the future.

Article source here.