The complete guide to AI API costs

From understanding tokens to calculating your exact bill — everything you need to know about AI pricing, written for developers and business owners.

Contents

What are tokens — and why do they matter?

When you send text to an AI model through an API, neither words nor characters are the fundamental unit of measurement. Instead, AI models work with tokens — fragments of text produced by a process called tokenisation.

A token is roughly 4 characters of English text, or approximately ¾ of a word. Common words like "the" or "and" are a single token, while longer or rarer words may split into two or more. Punctuation, spaces, and special characters each consume tokens too.

The quick brown fox jumped over the lazy dog
Each highlighted block ≈ 1 token
9 words ≈ 9–10 tokens

Tokens matter for one simple reason: every major AI provider bills you based on token count. OpenAI, Anthropic, Google, Mistral, and every other API provider charges per million tokens processed. The more tokens your prompts and responses consume, the higher your bill.

Input tokens vs output tokens

Input tokens (prompt tokens)

Everything you send to the model: your instructions, conversation history, documents, system prompts, and any context you include. Input tokens are always cheaper because the model is just reading and processing them.

Output tokens (completion tokens)

Everything the model generates in response. Output tokens cost significantly more — typically 3–6× the input price — because generating each token requires multiple passes through the model's neural network.

This asymmetry has a critical implication for your costs: long AI-generated responses are expensive. If you ask a model to write a 2,000-word article, those output tokens can cost as much as sending a full chapter of a book as input.

The AI pricing formula — how to calculate your bill

AI API pricing follows a straightforward formula. All providers quote prices in cost per million tokens (abbreviated as MTok or 1M tokens), so the calculation is consistent across providers.

Standard cost formula
total_cost = (input_tokens / 1,000,000) × input_price
             + (output_tokens / 1,000,000) × output_price
// Prices quoted per 1M tokens (e.g. $3.00/MTok)

Worked example: a customer support chatbot

Say you're running a chatbot on Claude Sonnet 4.6 ($3.00 input / $15.00 output per million tokens) and you process 10,000 conversations per day, with an average of 500 input tokens and 300 output tokens per conversation.

Monthly cost calculation
Daily input = 10,000 × 500 = 5,000,000 tokens
Daily output = 10,000 × 300 = 3,000,000 tokens
Daily input cost = (5M / 1M) × $3.00 = $15.00
Daily output cost = (3M / 1M) × $15.00 = $45.00
Monthly total = ($15 + $45) × 30 = $1,800 / month
Output tokens dominate your costs

In this example, output tokens cost 3× more than input tokens despite representing only 37% of total volume. For most real-world applications, reducing response length is the single fastest way to cut your AI bill.

Claude vs GPT vs Gemini — which is cheaper?

The short answer: it depends entirely on which model tier you use. Every major provider offers budget, mid-range, and premium models, and the price differences within a single provider can be as large as 100× or more.

Provider & modelTierInput / 1MOutput / 1MBest for
GoogleGemini 2.5 Flash-LiteCheapest$0.10$0.40Classification, routing, high-volume tasks
OpenAIGPT-4.1 NanoCheapest$0.10$0.40Budget workloads, classification, summarisation
xAIGrok 4.1Budget alt.$0.20$0.50Low-cost general tasks
GoogleGemini 3 FlashBest value$0.50$3.00Agentic workflows, multi-turn chat, coding
AnthropicClaude Haiku 4.5Budget Claude$1.00$5.00Fast tasks with strong instruction-following
OpenAIo4-miniBest reasoning value$1.10$4.40Chain-of-thought reasoning, coding, analysis
OpenAIGPT-5Flagship$1.25$10.00General intelligence, balanced price-to-quality
GoogleGemini 2.5 ProFlagship$1.25$10.00Long context, multimodal, research
OpenAIGPT-5.2Flagship$1.75$14.00Advanced reasoning, coding, agents
GoogleGemini 3.1 ProFlagship$2.00$12.00Multimodal, coding, research
OpenAIGPT-5.4Flagship$2.50$15.00Top-tier reasoning, agents, coding
AnthropicClaude Sonnet 4.6Mid-flagship$3.00$15.00Complex instruction-following, writing
AnthropicClaude Opus 4.7Premium$5.00$25.00Frontier reasoning, 1M context, long documents
Prices are falling fast

AI API prices have dropped dramatically in recent years — roughly 80% over a twelve-month period on comparable models. This trend is expected to continue, so factor future reductions into long-term budget planning.

What is the cheapest AI API available right now?

Two models are tied for the cheapest from major providers: Google Gemini 2.5 Flash-Lite and OpenAI GPT-4.1 Nano, both at $0.10 per million input tokens and $0.40 per million output tokens. You could process 10 million words of text — about 20 full novels — for around $13 in input costs.

Cheapest models by use case

Use case
Recommended model
Approx. price (input)
Simple classification & routing
Gemini 2.5 Flash-Lite or GPT-4.1 Nano
$0.10/MTok
General chatbots
Gemini 3 Flash or GPT-5
$0.50–$1.25
Reasoning / chain-of-thought
o4-mini
$1.10/MTok
Complex reasoning (flagship)
GPT-5.4 or Gemini 3.1 Pro
$2.00–$2.50
Long document analysis
Claude Opus 4.7 (1M context)
$5.00/MTok
Writing & content
Claude Sonnet 4.6
$3.00/MTok

How many tokens is 1,000 words?

The rule of thumb for English text

1,000 words ≈ 1,300 to 1,500 tokens. The most widely used estimate is 1,333 tokens per 1,000 words (multiply word count by 1.33).

Content
Word count
Approx. tokens
Short tweet
~15 words
~20
Email
~200 words
~260–300
Blog post
~1,000 words
~1,300–1,500
Novel chapter
~10,000 words
~13,000–15,000
Full novel
~90,000 words
~117,000–135,000

Cost optimisation strategies — cut your AI bill by up to 90%

AI API costs are almost never fixed. With the right architecture, most applications can reduce token usage by 50–90% without sacrificing quality.

Save 50–90%
01

Use prompt caching

Anthropic, OpenAI, and Google offer prompt caching where repeated context is cached and served at 10–20% of normal input price.

Save 50%
02

Use batch APIs

Process requests asynchronously (typically within 24 hours) at 50% off standard pricing. Perfect for non-real-time workloads.

Save 30–70%
03

Downsize your model

Route simple tasks to cheaper models. Most tasks don't need the most expensive model.

Save 20–50%
04

Trim your output tokens

Output tokens cost 3–6× more than input. Set explicit length limits and use structured outputs where possible.

Save 20–40%
05

Optimise system prompts

A bloated 2,000-token system prompt across 100,000 daily calls adds 200M tokens of unnecessary input per day.

Save 30–60%
06

Use RAG over large context

Send only the most relevant snippets rather than entire knowledge bases. A fraction of the cost with similar accuracy.

Prompt caching in depth

Prompt caching is the single biggest cost lever available today. If a portion of your prompt remains unchanged between requests — a system prompt, a large document — the provider caches the processed representation and charges you a fraction of the normal price on subsequent requests.

Cache pricing at a glance

Anthropic: Cache write 1.25× input price; cache read just 0.1× (90% savings on warm cache).

OpenAI: Cached input tokens cost 50% of normal rate.

Google: Context caching priced per storage hour plus reduced per-token rate.

Frequently asked questions

Is Claude, GPT, or Gemini cheaper?

At the budget tier, Gemini 2.5 Flash-Lite and GPT-4.1 Nano are tied at $0.10/$0.40 per million tokens. At the flagship tier, GPT-5 and Gemini 2.5 Pro are tied at $1.25/$10.00 for best price-to-performance. Claude Opus 4.7 at $5.00/$25.00 is the most expensive flagship but uniquely offers a 1M-token context window.

What is the cheapest AI API?

Google Gemini 2.5 Flash-Lite and OpenAI GPT-4.1 Nano are tied at $0.10/$0.40 per million tokens. xAI's Grok 4.1 is third at $0.20/$0.50.

How many tokens is 1,000 words?

For standard English prose, 1,000 words ≈ 1,300–1,500 tokens (multiply words by 1.33). Code tokenises more efficiently; non-Latin languages require more tokens per semantic unit.

Why are output tokens more expensive than input tokens?

Generating tokens requires significantly more compute than reading them. Output performs a separate forward pass for every single token produced. Most providers price output tokens at 3–6× the input rate.

Do AI API prices keep changing?

Yes, frequently. Prices have fallen dramatically since 2023. Build estimates with future reductions in mind.

Is there a free AI API I can use?

Yes. Google's Gemini API offers a free tier sufficient for prototyping. OpenAI offers free credits to new developers.

What is a context window?

The maximum number of tokens a model can process in a single request. Claude Opus 4.7 supports 1M tokens at standard pricing; most flagship models support at least 200K tokens.