Saturday, April 20, 2024
HomeSoftware DevelopmentGemini 1.5 Professional Now Obtainable in 180+ International locations; With Native Audio...

Gemini 1.5 Professional Now Obtainable in 180+ International locations; With Native Audio Understanding, System Directions, JSON Mode and Extra

Posted by Jaclyn Konzelmann and Megan Li – Google Labs

Seize an API key in Google AI Studio, and get began with the Gemini API Cookbook

Lower than two months in the past, we made our next-generation Gemini 1.5 Professional mannequin obtainable in Google AI Studio for builders to check out. We’ve been amazed by what the group has been capable of debug, create and study utilizing our groundbreaking 1 million context window.

At present, we’re making Gemini 1.5 Professional obtainable in 180+ international locations through the Gemini API in public preview, with a first-ever native audio (speech) understanding functionality and a brand new File API to make it straightforward to deal with recordsdata. We’re additionally launching new options like system directions and JSON mode to provide builders extra management over the mannequin’s output. Lastly, we’re releasing our subsequent era textual content embedding mannequin that outperforms comparable fashions. Go to Google AI Studio to create or entry your API key, and begin constructing.

Unlock new use instances with audio and video modalities

We’re increasing the enter modalities for Gemini 1.5 Professional to incorporate audio (speech) understanding in each the Gemini API and Google AI Studio. Moreover, Gemini 1.5 Professional is now capable of purpose throughout each picture (frames) and audio (speech) for movies uploaded in Google AI Studio, and we sit up for including API assist for this quickly.

screen grab of a clooege professor using Gemini 1.5 Pro to create a quiz based on their latest lecture video in Google AI Studio
You possibly can add a recording of a lecture, like this 117,000+ token lecture from Jeff Dean, and Gemini 1.5 Professional can flip it right into a quiz with a solution key. [Video sped up for demo purposes]

Gemini API Enhancements

At present, we’re addressing plenty of high developer requests:

1. System directions: Information the mannequin’s responses with system directions, now obtainable in Google AI Studio and the Gemini API. Outline roles, codecs, targets, and guidelines to steer the mannequin’s habits on your particular use case.

2. JSON mode: Instruct the mannequin to solely output JSON objects. This mode permits structured knowledge extraction from textual content or pictures. You may get began with cURL, and Python SDK assist is coming quickly.

3. Enhancements to perform calling: Now you can choose modes to restrict the mannequin’s outputs, bettering reliability. Select textual content, perform name, or simply the perform itself.

A brand new embedding mannequin with improved efficiency

Beginning immediately, builders will have the ability to entry our subsequent era textual content embedding mannequin through the Gemini API. The brand new mannequin, text-embedding-004, (text-embedding-preview-0409 in Vertex AI), achieves a stronger retrieval efficiency and outperforms current fashions with comparable dimensions, on the MTEB benchmarks.

table showing Gecko: Versativel Text Embeddings Distilled from Large Language Models
‘Textual content-embedding-004’ (aka Gecko) utilizing 256 dims output outperforms all bigger 768 dim output fashions on MTEB benchmarks

These are simply the primary of many enhancements coming to the Gemini API and Google AI Studio within the subsequent few weeks. We’re persevering with to work on making Google AI Studio and the Gemini API the best method to construct with Gemini. Get began immediately in Google AI Studio with Gemini 1.5 Professional, discover code examples and quickstarts in our new Gemini API Cookbook, and be a part of our group channel on Discord.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments