The complete guide to AI API costs
From understanding tokens to calculating your exact bill — everything you need to know about AI pricing, written for developers and business owners.
What are tokens — and why do they matter?
When you send text to an AI model through an API, neither words nor characters are the fundamental unit of measurement. Instead, AI models work with tokens — fragments of text produced by a process called tokenisation.
A token is roughly 4 characters of English text, or approximately ¾ of a word. Common words like "the" or "and" are a single token, while longer or rarer words may split into two or more. Punctuation, spaces, and special characters each consume tokens too.
Tokens matter for one simple reason: every major AI provider bills you based on token count. OpenAI, Anthropic, Google, Mistral, and every other API provider charges per million tokens processed. The more tokens your prompts and responses consume, the higher your bill.
Input tokens vs output tokens
Everything you send to the model: your instructions, conversation history, documents, system prompts, and any context you include. Input tokens are always cheaper because the model is just reading and processing them.
Everything the model generates in response. Output tokens cost significantly more — typically 3–6× the input price — because generating each token requires multiple passes through the model's neural network.
This asymmetry has a critical implication for your costs: long AI-generated responses are expensive. If you ask a model to write a 2,000-word article, those output tokens can cost as much as sending a full chapter of a book as input.
The AI pricing formula — how to calculate your bill
AI API pricing follows a straightforward formula. All providers quote prices in cost per million tokens (abbreviated as MTok or 1M tokens), so the calculation is consistent across providers.
Worked example: a customer support chatbot
Say you're running a chatbot on Claude Sonnet 4.6 ($3.00 input / $15.00 output per million tokens) and you process 10,000 conversations per day, with an average of 500 input tokens and 300 output tokens per conversation.
In this example, output tokens cost 3× more than input tokens despite representing only 37% of total volume. For most real-world applications, reducing response length is the single fastest way to cut your AI bill.
Claude vs GPT vs Gemini — which is cheaper?
The short answer: it depends entirely on which model tier you use. Every major provider offers budget, mid-range, and premium models, and the price differences within a single provider can be as large as 100× or more.
| Provider & model | Tier | Input / 1M | Output / 1M | Best for |
|---|---|---|---|---|
| GoogleGemini 2.5 Flash-Lite | Cheapest | $0.10 | $0.40 | Classification, routing, high-volume tasks |
| OpenAIGPT-4.1 Nano | Cheapest | $0.10 | $0.40 | Budget workloads, classification, summarisation |
| xAIGrok 4.1 | Budget alt. | $0.20 | $0.50 | Low-cost general tasks |
| GoogleGemini 3 Flash | Best value | $0.50 | $3.00 | Agentic workflows, multi-turn chat, coding |
| AnthropicClaude Haiku 4.5 | Budget Claude | $1.00 | $5.00 | Fast tasks with strong instruction-following |
| OpenAIo4-mini | Best reasoning value | $1.10 | $4.40 | Chain-of-thought reasoning, coding, analysis |
| OpenAIGPT-5 | Flagship | $1.25 | $10.00 | General intelligence, balanced price-to-quality |
| GoogleGemini 2.5 Pro | Flagship | $1.25 | $10.00 | Long context, multimodal, research |
| OpenAIGPT-5.2 | Flagship | $1.75 | $14.00 | Advanced reasoning, coding, agents |
| GoogleGemini 3.1 Pro | Flagship | $2.00 | $12.00 | Multimodal, coding, research |
| OpenAIGPT-5.4 | Flagship | $2.50 | $15.00 | Top-tier reasoning, agents, coding |
| AnthropicClaude Sonnet 4.6 | Mid-flagship | $3.00 | $15.00 | Complex instruction-following, writing |
| AnthropicClaude Opus 4.7 | Premium | $5.00 | $25.00 | Frontier reasoning, 1M context, long documents |
AI API prices have dropped dramatically in recent years — roughly 80% over a twelve-month period on comparable models. This trend is expected to continue, so factor future reductions into long-term budget planning.
What is the cheapest AI API available right now?
Two models are tied for the cheapest from major providers: Google Gemini 2.5 Flash-Lite and OpenAI GPT-4.1 Nano, both at $0.10 per million input tokens and $0.40 per million output tokens. You could process 10 million words of text — about 20 full novels — for around $13 in input costs.
Cheapest models by use case
How many tokens is 1,000 words?
1,000 words ≈ 1,300 to 1,500 tokens. The most widely used estimate is 1,333 tokens per 1,000 words (multiply word count by 1.33).
Cost optimisation strategies — cut your AI bill by up to 90%
AI API costs are almost never fixed. With the right architecture, most applications can reduce token usage by 50–90% without sacrificing quality.
Use prompt caching
Anthropic, OpenAI, and Google offer prompt caching where repeated context is cached and served at 10–20% of normal input price.
Use batch APIs
Process requests asynchronously (typically within 24 hours) at 50% off standard pricing. Perfect for non-real-time workloads.
Downsize your model
Route simple tasks to cheaper models. Most tasks don't need the most expensive model.
Trim your output tokens
Output tokens cost 3–6× more than input. Set explicit length limits and use structured outputs where possible.
Optimise system prompts
A bloated 2,000-token system prompt across 100,000 daily calls adds 200M tokens of unnecessary input per day.
Use RAG over large context
Send only the most relevant snippets rather than entire knowledge bases. A fraction of the cost with similar accuracy.
Prompt caching in depth
Prompt caching is the single biggest cost lever available today. If a portion of your prompt remains unchanged between requests — a system prompt, a large document — the provider caches the processed representation and charges you a fraction of the normal price on subsequent requests.
Anthropic: Cache write 1.25× input price; cache read just 0.1× (90% savings on warm cache).
OpenAI: Cached input tokens cost 50% of normal rate.
Google: Context caching priced per storage hour plus reduced per-token rate.
Frequently asked questions
At the budget tier, Gemini 2.5 Flash-Lite and GPT-4.1 Nano are tied at $0.10/$0.40 per million tokens. At the flagship tier, GPT-5 and Gemini 2.5 Pro are tied at $1.25/$10.00 for best price-to-performance. Claude Opus 4.7 at $5.00/$25.00 is the most expensive flagship but uniquely offers a 1M-token context window.
Google Gemini 2.5 Flash-Lite and OpenAI GPT-4.1 Nano are tied at $0.10/$0.40 per million tokens. xAI's Grok 4.1 is third at $0.20/$0.50.
For standard English prose, 1,000 words ≈ 1,300–1,500 tokens (multiply words by 1.33). Code tokenises more efficiently; non-Latin languages require more tokens per semantic unit.
Generating tokens requires significantly more compute than reading them. Output performs a separate forward pass for every single token produced. Most providers price output tokens at 3–6× the input rate.
Yes, frequently. Prices have fallen dramatically since 2023. Build estimates with future reductions in mind.
Yes. Google's Gemini API offers a free tier sufficient for prototyping. OpenAI offers free credits to new developers.
The maximum number of tokens a model can process in a single request. Claude Opus 4.7 supports 1M tokens at standard pricing; most flagship models support at least 200K tokens.