How to Use Google's Gemini API for Free in 2026

Google's Gemini API is one of the most developer-friendly AI APIs available today, especially because it offers a genuinely useful free tier. Unlike some competitors that throttle free users to near-uselessness, Gemini's free tier is powerful enough to build real applications.

This guide covers everything: getting started, text and streaming generation, vision, function calling, embeddings, context caching, building RAG, and production tips.

Free Tier Limits

As of May 2026, here is what the Gemini API free tier offers:

Rate Limits

Model	Requests per minute	Requests per day	Tokens per request
Gemini 1.5 Flash	30 RPM	1,500 RPD	1M context, 8K output
Gemini 1.5 Pro	10 RPM	400 RPD	1M context, 8K output
Gemini 1.5 Flash-8B	60 RPM	3,000 RPD	1M context, 8K output
Text Embedding	30 RPM	1,500 RPD	2K input tokens

Models Available for Free

Gemini 1.5 Flash — Best all-rounder. Fast, cheap, supports multimodal.
Gemini 1.5 Flash-8B — An 8B parameter distilled model, great for simple tasks.
Gemini 1.5 Pro — Slower but smarter, good for complex reasoning.
Text Embedding 004 — 768-dimensional embeddings for semantic search.
Gemini 2.0 Flash (limited free quota) — Newer model with native tool use and code execution.

What is NOT free

Batch API (50% discount but requires billing)
Fine-tuning
Model tuning via Distillation
Online caching (free tier has lower cache limits)

Getting Your API Key

The easiest way to get started:

Go to Google AI Studio
Sign in with your Google account
Click "Create API Key"
Copy the key — it starts with AIza...

You can also create keys restricted to specific APIs in the Google Cloud Console if you want tighter security.

Installation

The official Google AI SDK for TypeScript/JavaScript is @google/genai:

npm install @google/genai

Text Generation: The Basics

Here is the simplest possible usage:

import { GoogleGenAI } from "@google/genai";

const genAI = new GoogleGenAI({ apiKey: "YOUR_API_KEY" });

async function main() { const response = await genAI.models.generateContent({ model: "gemini-1.5-flash", contents: "Explain the transformer architecture in one paragraph.", }); console.log(response.text); }

main();

Streaming for Real-Time Output

For chat applications or long documents, streaming is essential:

import { GoogleGenAI } from "@google/genai";

const genAI = new GoogleGenAI({ apiKey: "YOUR_API_KEY" });

async function streamTest() { const stream = await genAI.models.generateContentStream({ model: "gemini-1.5-flash", contents: "Write a short story about a robot learning to paint.", });

for await (const chunk of stream) { process.stdout.write(chunk.text ?? ""); } }

streamTest();

The streaming API returns an async iterable. Each chunk contains a text field when content is available. Note that chunks may be empty (the model needs to think between tokens).

System Instructions

You can guide the model's behavior with system instructions:

const response = await genAI.models.generateContent({ model: "gemini-1.5-flash", contents: "What is the capital of France?", config: { systemInstruction: "You are a geography teacher. Answer concisely and include a fun fact.", }, });

Vision: Understanding Images

Gemini is multimodal. You can pass images as base64 strings or inline data:

import fs from "node:fs"; import { GoogleGenAI } from "@google/genai";

const genAI = new GoogleGenAI({ apiKey: "YOUR_API_KEY" });

async function visionTest() { const imagePath = "receipt.jpg"; const imageData = fs.readFileSync(imagePath); const base64Image = imageData.toString("base64");

const response = await genAI.models.generateContent({ model: "gemini-1.5-flash", contents: [ { text: "Extract all line items, prices, and the total from this receipt." }, { inlineData: { mimeType: "image/jpeg", data: base64Image } }, ], });

console.log(response.text); }

visionTest();

The inlineData field accepts:

mimeType: image/jpeg, image/png, image/webp, image/gif (animated too)
data: Base64-encoded binary

You can also pass multiple images in a single request, and the model can reason across them (e.g., "Find the differences between these two images").

Function Calling / Tool Use

Gemini supports function calling (what the industry now calls "tool use"). This lets the model request specific function invocations, and you execute them:

const genAI = new GoogleGenAI({ apiKey });

const getWeather = { name: "get_weather", description: "Get the current weather for a location", parameters: { type: "OBJECT", properties: { location: { type: "STRING", description: "City and state/country" }, unit: { type: "STRING", enum: ["celsius", "fahrenheit"] }, }, required: ["location"], }, };

async function functionCallTest() { const response = await genAI.models.generateContent({ model: "gemini-1.5-flash", contents: "What is the weather in Mumbai?", config: { tools: [{ functionDeclarations: [getWeather] }], }, });

// Check if model wants to call a function const toolCall = response.candidates?.[0]?.content?.parts?.[0]?.functionCall; if (toolCall) { console.log("Function to call:", toolCall.name); console.log("Arguments:", JSON.stringify(toolCall.args)); // Execute the function and return results in a follow-up call } }

functionCallTest();

After receiving a function call request, you execute the function and send the result back in a follow-up turn. This is how you build agents that can query databases, send emails, or control APIs.

Embeddings

Text embeddings convert text into vector representations for semantic search and clustering:

const genAI = new GoogleGenAI({ apiKey });

async function embeddingTest() { const result = await genAI.models.embedContent({ model: "text-embedding-004", contents: "What is the meaning of life?", });

console.log("Embedding dimensions:", result.embedding.values.length); console.log("First 5 values:", result.embedding.values.slice(0, 5)); }

embeddingTest();

Embeddings are 768-dimensional vectors. You can use them to:

Build a semantic search engine (compare cosine similarity)
Cluster documents
Classify text
Power retrieval-augmented generation (RAG) pipelines

Context Caching

If you have a large document or system prompt that rarely changes, context caching reduces costs and latency:

// Currently available on paid tier only, but worth knowing for when you scale const cache = await genAI.caches.create({ model: "models/gemini-1.5-flash-001", contents: [ { role: "user", parts: [{ text: veryLongDocument ] } }, ], ttl: "3600s", // cache lives for 1 hour });

const response = await genAI.models.generateContent({ model: "gemini-1.5-flash", contents: "Summarize the document.", cachedContent: cache.name, });

Context caching is essential for production use cases where you have shared knowledge bases, codebases, or documentation.

Building a Simple RAG System with Gemini

Here is a complete minimal RAG implementation using Gemini embeddings and generation:

import { GoogleGenAI } from "@google/genai";

const genAI = new GoogleGenAI({ apiKey: "YOUR_API_KEY" });

const documents = [ "Gemini 1.5 Flash supports up to 1M tokens of context.", "The free tier allows 30 requests per minute for Flash models.", "Function calling lets Gemini invoke external APIs.", "Embeddings convert text into 768-dimensional vectors.", ];

async function search(query: string): Promise<string> { const queryEmbedding = await genAI.models.embedContent({ model: "text-embedding-004", contents: query, });

const docEmbeddings = await Promise.all( documents.map((doc) => genAI.models.embedContent({ model: "text-embedding-004", contents: doc, }) ) );

// Simple cosine similarity const qVec = queryEmbedding.embedding.values; const similarities = docEmbeddings.map((emb, i) => ({ index: i, score: cosineSimilarity(qVec, emb.embedding.values), }));

similarities.sort((a, b) => b.score - a.score); return documents[similarities[0].index]; }

function cosineSimilarity(a: number[], b: number[]): number { const dot = a.reduce((sum, val, i) => sum + val * b[i], 0); const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0)); const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0)); return dot / (magA * magB); }

async function ragQuery(question: string) { const context = await search(question); const response = await genAI.models.generateContent({ model: "gemini-1.5-flash", contents: `Answer the question based on this context:

Context: ${context}

Question: ${question}`, }); console.log(response.text); }

ragQuery("What are the rate limits for Flash?");

Comparing Gemini with Alternatives

Feature	Gemini 1.5 Flash	GPT-4o mini	Claude 3 Haiku	Local (Llama 3)
Free tier	Yes (30 RPM)	Yes (3 RPM)	No	Yes (free hardware)
Cost (paid)	$0.075/1M in	$0.15/1M in	$0.25/1M in	Free
Context window	1M tokens	128K tokens	200K tokens	Depends on hardware
Multimodal	Yes	Yes	Yes	Limited
Speed	Very fast	Fast	Fastest	GPU-dependent
Rate limit (free)	30 RPM	3 RPM	N/A	No limit

Gemini's free tier is the most generous among major cloud AI providers. The 1M context window is unmatched by any competitor at any price point.

Production Tips

When moving from prototype to production:

Fallback Logic

Always implement a fallback chain:

async function generateWithFallback(prompt: string) { try { return await callGemini(prompt); } catch (err) { if (err.status === 429) { await sleep(2000); return await callGemini(prompt); // retry } return await callOpenAI(prompt); // fallback to another provider } }

Rate Limiting

The free tier's 30 RPM sounds generous but you will hit it. Implement a token bucket limiter:

class RateLimiter { private tokens: number; private lastRefill: number;

constructor(private maxTokens: number, private refillMs: number) { this.tokens = maxTokens; this.lastRefill = Date.now(); }

async acquire(): Promise<void> { this.refill(); if (this.tokens <= 0) { await sleep(this.refillMs); this.refill(); } this.tokens--; }

private refill() { const now = Date.now(); const elapsed = now - this.lastRefill; this.tokens = Math.min(this.maxTokens, this.tokens + elapsed / this.refillMs); this.lastRefill = now; } }

Error Handling

Common Gemini API errors:

429: Rate limit exceeded — back off and retry
400: Invalid request — check your prompt format
403: API key restricted or quota exhausted
500: Google-side issue — retry with exponential backoff
503: Model overloaded — switch to a different model

Billing

Even though the free tier is generous, set a budget alert ($0) on the Google Cloud Console so you are never surprised by a bill. The API is pay-as-you-go and things like pro model calls and high-volume usage add up fast.

Conclusion

The Gemini API free tier is genuinely useful for building real applications. With 30 RPM on Flash, support for images, function calling, and a 1M token context window, you can build sophisticated AI features without spending a dime.

Start with Flash for most tasks, upgrade to Pro when you need deeper reasoning, and use embeddings for your RAG pipelines. The @google/genai SDK makes all of this straightforward from TypeScript.

The best part? If you outgrow the free tier, the paid pricing is still cheaper than OpenAI for most workloads. Google is playing the long game with Gemini, and developers are the winners.

How to Use Google's Gemini API for Free in 2026

How to Use Google's Gemini API for Free in 2026

Free Tier Limits

Rate Limits

Models Available for Free

What is NOT free

Getting Your API Key

Installation

Text Generation: The Basics

Streaming for Real-Time Output

System Instructions

Vision: Understanding Images

Function Calling / Tool Use

Embeddings

Context Caching

Building a Simple RAG System with Gemini

Comparing Gemini with Alternatives

Production Tips

Fallback Logic

Rate Limiting

Error Handling

Billing

Conclusion

ON THIS PAGE

Continue Reading

How to Build an Offline AI Assistant Using LM Studio

Running AI Completely Offline in 2026

Phi-3 vs Llama 3 for Local AI: Developer Benchmark 2026