Google Gemini, explained

WOWWW - News

Google Gemini, explained

2024-06-14

Artificial intelligence has become this year’s wonder technology. But because it comes in a lot of different flavors from a lot of different companies, it can be really confusing. You’ve not only got the ChatGPT bot created by OpenAI, but you’ve got the big three — Google, Apple, and Microsoft — cooking up their own versions.

Google’s latest attempt is called Gemini, and it’s no less confusing than the others.

When I first started researching Gemini, I did a Google search for “versions of Google Gemini.” On top of the search, I got an AI-generated summary that started:

“Google Gemini has three versions: Ultra, Pro, and Nano. Ultra is the largest model and is designed for complex tasks, while Pro is the best model for scaling across a wide range of tasks, and Nano is the most efficient model for on-device tasks.”

Okay, good enough. But it’s not the complete story.

What is Gemini?

Gemini is the third zodiac sign, associated with the twins Castor and Pollux from Greek mythology.

Okay, sorry. I couldn’t resist. Gemini is a chatbot created by Google that has replaced its previous chatbot named Bard. It’s based on something called a large language model (or LLM), also called Gemini, which was developed by DeepMind, a part of Google.

Confusingly, Gemini is both a chatbox and an LLM. Screenshot: Google

So Gemini is both a chatbox and an LLM? How many types of Gemini are there?

How much time do you have? Seriously, though, we’re going to limit ourselves to the types of Gemini that you may encounter because the number of iterations feel endless.

Originally, when it was introduced in December 2023, Gemini offered three different versions (known as models): Nano as a lightweight Android version, Pro for everyday wear, and Ultra for heavyweight business / enterprise usage.

Then on May 14th, during its I/O 2024 event, Google introduced Gemini 1.5 Pro, the first in what the company called a “mid-sized multimodal model.” According to Google, the new version of Pro is about as powerful as the previous Ultra version and is meant to enhance existing apps and create new ones for day-to-day uses.

Hold on. Multimodal?

In other words, it can accept prompts in all different modes of communication: text, images, audio, and video.

So that’s it for the models, right?

Well, not quite. There’s also Gemini 1.5 Flash, which is a faster version of Gemini for developers who will be able to use it in specific applications. In other words, unless you’re a developer, it’s not something you will be working with.

So, just to reiterate, we now have four Gemini models for developers to work with: Ultra, Pro, Flash, and Nano. (We’ll tell you how you can play with it yourself in a moment.)

I watched the Google event, and they kept talking about 1 million tokens, 2 million tokens. What was that all about?

That’s what you get for watching an event that’s meant more for developers than for everyday people like us. But it’s really not all that difficult.

Tokens are the elements of words that are used to train AI models such as Gemini. The more tokens an AI model is capable of, the more info you can feed the AI and the better it will understand what you need and what it can give you.

Okay, back to Gemini 1.5 Pro. What can I do with it?

Well, if you’re a developer, you can use it to add to or create a bunch of new apps. Otherwise, Google is adding it to a lot of its existing apps and creating new ones.

Like?

Well, just as an example, let’s start with Google Photos. A new feature expected this summer, called Ask Photos, will let you search using more complex queries. Instead of just finding all the photos of your grandmother, for example, you should be able to ask it to “Find all the photos of my grandmother through the years that show her working on her carpentry projects.”

There’s also the existing Lens app, which uses both text and photos to help you identify and research stuff. Lens will now be able to find info using videos as well. Google’s demonstrated it by taking a video of a misbehaving record player and using a video to find out why the tonearm wasn’t contacting the record.

You know that sidebar in Google Docs, Sheets, Slides, Drive, and Gmail? The one where you can now access various other Google apps? Well, it’s going to be taken over by Gemini, which will be used to unify — or, at least, to connect — a variety of Google apps so that you’ll be able to, say, easily reference a Google Doc in an email or visa versa. It should be rolling out to subscribers next month.

The Google search page with an AI Overview at top.

AI Overviews explaining AI Overviews. Screenshot: Google

Even Google’s basic search has been affected: AI Overviews now lead off your search results, giving you an AI-generated summary of what Google thinks you’re looking for. (Although there’s been a lot of pushback on that and quite a few users looking to get rid of it.)

Those are existing apps. How about new ones?

Lots of them. Currently, some include:

Project Astra, which is essentially Google Assistant with the added ability to see (via your phone’s camera) and respond to, and with, spoken language. This is still in its early days, so you probably won’t see it for a while.

LearnLM, which will help students find answers to their questions using educational sources; according to the company, it’s already been built into some products and is being introduced to educators.

Veo, a “generative AI video model.” Generative as in it will generate 1080p videos that you ask it to create. You want a video of a cat wearing a nightgown and a top hat jumping over the Moon? Veos is what you want to use. Well, when you can — like Project Astra, it’s still being tested and won’t be available to the general public for a while.

This all sounds interesting. How can I sign up? And is it free?

You can start working with the Gemini 1.0 chatbot right now and right here. However, if you want to play with Gemini 1.5 Pro — which is faster and gives you more capabilities — you’ll need to subscribe to Gemini Advanced, which will cost $20 a month after a two-month trial. (Gemini Advanced is considered part of a Google One subscription, so you’ll also get 2TB of data storage and other Google One benefits.)

If you’re a business using Google Workspace and you want to try the more sophisticated levels of the AI (also starting at $20 a month), you can find more information here.

Anything else I need to know?

Just the usual cautions. Like all AI applications, Gemini’s answers can be iffy — in other words, downright wrong. The tech is definitely in its early stages, and so while it can be a useful tool, you should also check any data you get. It’s gotten so that wrong information generated by AI engines has gotten its own name: hallucinations, because by accessing wrong information, the AIs are creating their own reality. So, buyer beware.

Gemini reply about woodpeckers with overlay suggesting a double check.

It’s not a bad idea to be cautious about Gemini’s answers. Screenshot: Google

That being said, it looks like AIs are going to be with us for a long time. It’s not a bad idea to do some hands-on in order to become familiar with them and how they work. Besides ChatGPT and Gemini, there are Microsoft’s upcoming CoPilot Plus PCs, which will come with built in AI-capable hardware, not to mention Apple’s just-announced and upcoming suite of features called Apple Intelligence. So depending on your favorite operating system, not to mention your level of curiosity, you can experiment with a variety of AI chatbots, enhanced apps, and other features.

Become a WOWWW Member to be eligible for special member pricing on selected items in the HTC Boutique!