Thinktecture Logo

Local Small Language Models in the Browser: A First Glance at Chrome’s Built-in AI and Prompt API with Gemini Nano

Author: Christian Liebel • Published: 01.07.2024 • Category: AI

As of Chrome version 127, an exciting new feature is available behind a flag that allows you to download and use the small language model (SLM) Gemini Nano locally in Chrome. As part of the Built-in AI initiative, Chrome exposes this model through an experimental Prompt API ( Thinktecture has experimented with the Prompt API in its internal tools.

What is the Prompt API?

The Prompt API lets you send natural language instructions to a built-in language model. This API is part of an initiative to discover potential use cases to shape Chrome’s built-in AI roadmap. As only a small language model is used, the API is currently only suitable for a limited subset of natural language processing tasks such as summarizing, classifying, or rephrasing text. In addition, users must comply with Google’s Prohibited Use Policy, which outlines the appropriate use of Generative AI.

The Prompt API offers both streaming and non-streaming versions. Streaming is particularly interesting because it enables immediate feedback, similar to the “ChatGPT effect.” Below is an example of using the non-streaming Prompt API for sentiment detection, which returns its result after processing:

const canCreate = await;
if (canCreate !== "no") {
  const session = await;
  const prompt = "Is the following customer review positive or negative?\n" +
    "This is the worst customer service I have ever seen!";
  const result = await session.prompt(prompt);
  console.log(result); // Negative
Code language: JavaScript (javascript)

How to enable the Prompt API

To use the Prompt API, perform the following steps:

  1. Join Chrome’s early preview program for Built-in AI. The following instructions apply to Chrome version 128.0.6563.0, but they may change. Joining the early preview program will ensure you stay updated with all developments in that area.
  2. Ensure you’re running Chrome version 127.0.6512.0 or later, which can be installed via Chrome Beta or Chrome Canary.
  3. Enable the two flags chrome://flags/#optimization-guide-on-device-model and chrome://flags/#prompt-api-for-gemini-nano.
  4. Reboot Chrome.
  5. Go to chrome://components.
  6. Find the Optimization Guide On Device Model component.
  7. Click Check for update.
  8. When the update has completed, you can use the Prompt API.

Please note that the Prompt API is designed for desktop systems operating on Windows 10 or 11, macOS 13 (Ventura) or later, and Linux. Your system should have at least 22 GB of free space and 4 GB of video RAM.

Key benefits

The Prompt API provides two key benefits:

  1. Offline availability: Unlike cloud services (e.g., Azure OpenAI, Groq, Google Cloud AI), the Prompt API and Gemini Nano can be used even when the user is offline.
  2. Origin sharing: Unlike WebLLM, the model can be shared across different origins, making it very storage-efficient.
  3. Privacy: Unlike cloud services, the data remains within the user’s browser, ensuring complete privacy.
  4. Cost-effectiveness: Unlike cloud services, processing with the Prompt API incurs no fees.


Noted below are the limitations of the current experimental state of the Prompt API:

  1. Customization: The exact language model used cannot be specified, prompts cannot be classified (system/user prompts), no conversation history is kept, tool/function calling is not supported, and the stop sequence cannot be customized.
  2. Performance: The inference speed is limited to the performance of the user’s hardware. Cloud services like Groq can reach higher and more reliable inference speeds.
  3. Language: The Gemini Nano model currently works best in English and only has limited translation capabilities.

However, this may change in the future. For example, the Chrome team plans to introduce a fine-tuning API that could improve accuracy and capabilities.

Convincing results

We recently integrated the Prompt API into internal tools that utilize large language models. One of these internal tools is our Smart Form Filler. Inspired by Blazor’s Smart Components, this tool enables users to extract information from a text block in the clipboard to populate the corresponding inputs in a form automatically. It uses a prompt that roughly looks like this:

Extract the following information from the provided text block and fill in the corresponding form inputs:
- first_name (string)
- last_name (string)
- company (string)

Provided text:
Dr. Akio Yamamoto works for Microsoft.
Code language: plaintext (plaintext)

The following screenshot shows some demo texts at the bottom of the page and an address form at the top. When you click the Paste button, the information from the clipboard is automatically used to fill in the form.

Integrating the Prompt API into our Smart Form Filler was straightforward, requiring just a few lines of code and taking less than 30 minutes. The results were surprisingly accurate, at least for English text. The highlighted text block was extracted into the form in the provided example. While the SLM did not properly extract the names (it missed Yamamoto as the last name), it could correctly identify the country.

Inference for this example takes about 4 seconds on an M1 MacBook Pro. The time to the first token could be reduced by using the streaming version of the Prompt API. Still, larger, cloud-based models are typically faster and perform better on these tasks.


Chrome’s Built-in AI initiative and the Prompt API represent an exciting step forward in making Generative AI and language models accessible locally. While there are limitations and areas for improvement, the ability to prototype and explore potential use cases locally opens up new possibilities for developers. If you want to help refine and expand these capabilities, join Chrome’s early preview program for Built-in AI.

Aktuelle Research-Insights unserer Experten für Sie

Lesen Sie, was unsere Experten bei ihrem Research bewegt und melden Sie sich zu unserem kostenlosen Thinktecture Labs-Newsletter an.

Labs-Newsletter Anmeldung

Christian Liebel

I am a cross-platform development enthusiast thrilled by the opportunities offered by modern web technologies: I help enterprises and independent software vendors to develop modern, cross-platform business applications based on Angular. Being a Microsoft MVP and Google GDE, I speak about Progressive Web Apps at user groups and conferences, both national and international. As a member of the W3C WebApps working group, I help to move the web forward.

More about me →