Question 1

Is Claude, GPT, or Gemini cheaper?

Accepted Answer

At the budget tier, Google Gemini 2.5 Flash-Lite and OpenAI GPT-4.1 Nano are tied for cheapest at $0.10/$0.40 per million tokens. xAI's Grok 4.1 is a close third at $0.20/$0.50. Claude's budget model (Haiku 4.5) at $1.00/$5.00 costs more but delivers superior instruction-following. At the flagship tier, OpenAI GPT-5 and Google Gemini 2.5 Pro are tied at $1.25/$10.00 for best price-to-performance. Gemini 3.1 Pro ($2.00/$12.00) and GPT-5.4 ($2.50/$15.00) sit above them. Claude Opus 4.7 at $5.00/$25.00 is the most expensive flagship but uniquely offers a 1M-token context window at standard pricing and excels at long-form writing and document analysis.

Question 2

What is the cheapest AI API?

Accepted Answer

Google Gemini 2.5 Flash-Lite and OpenAI GPT-4.1 Nano are tied for cheapest at $0.10/$0.40 per million tokens. xAI's Grok 4.1 is third at $0.20/$0.50. If you use third-party providers like OpenRouter, some Mistral and Qwen models are available for even less — sometimes under $0.10 per million tokens — though with different capability profiles.

Question 3

How many tokens is 1,000 words?

Accepted Answer

For standard English prose, 1,000 words is approximately 1,300 to 1,500 tokens. The widely used estimate is 1,333 tokens per 1,000 words (multiply words by 1.33). Code tokenises more efficiently (roughly 900–1,200 tokens per 1,000 words of code). Non-Latin languages like Chinese or Japanese can require 2–3× more tokens per semantic unit.

Question 4

Why are output tokens more expensive than input tokens?

Accepted Answer

Generating tokens requires significantly more compute than reading them. Processing your input is a single forward pass through the neural network. Generating output performs a separate forward pass for every single token it produces, building the response one token at a time in an autoregressive process. A model generating a 500-token response requires roughly 500 sequential compute operations versus a handful of passes to process your prompt. Most providers price output tokens at 3–6× the input rate to reflect this asymmetry.

Question 5

How do I estimate my monthly API costs before building?

Accepted Answer

Follow four steps: (1) Estimate your average input tokens per request by counting your typical prompt length, system prompt, and context. (2) Estimate your average output tokens by considering how long typical responses need to be. (3) Multiply by your expected request volume per month. (4) Apply the formula: (input_tokens / 1M × input_price) + (output_tokens / 1M × output_price). Add a 20–30% safety margin for variability, and factor in prompt caching if your prompts include repeated system-level context.

Question 6

What is the difference between the API and a subscription like ChatGPT Plus?

Accepted Answer

Subscriptions (ChatGPT Plus at $20/month, Claude Pro at $20/month) give individual users unlimited access through a web interface with fair-use rate limits. They are not suitable for building applications or processing high volumes of text programmatically. The API charges per token and has no monthly cap — you pay only for what you use. It is intended for developers building applications, automations, and integrations. For low-volume personal use, a subscription is usually cheaper. For anything above 100 requests per month or any programmatic use, the API is the appropriate choice.

Question 7

Do AI API prices keep changing?

Accepted Answer

Yes, frequently. AI API prices have generally fallen dramatically since 2023, and competitive pressure (especially from Google and newer entrants like DeepSeek and xAI) has accelerated this trend. Providers regularly cut prices on older model versions when newer ones launch. Build your pricing estimates with future reductions in mind. Contracts made at today's prices may look expensive within a year.

Question 8

Is there a free AI API I can use?

Accepted Answer

Yes. Google's Gemini API (via Google AI Studio) offers a free tier with meaningful rate limits — sufficient for prototyping, development, and low-volume production use. OpenAI offers free credits to new developers. Many providers also offer free tiers on their smaller or older models via platforms like OpenRouter.

Question 9

What is a context window and does it affect my cost?

Accepted Answer

A context window is the maximum number of tokens a model can process in a single request — including both your input and the generated output. Claude Opus 4.7 supports 1M tokens at standard pricing; most flagship models now support at least 200K tokens. A larger context window does not directly increase cost — you only pay for the tokens you actually use. However, if your application routinely sends large amounts of context (long conversation histories, full documents), your token usage — and therefore your bill — will be proportionally higher. Summarisation and RAG are common techniques to keep context usage efficient.

Provider & model	Tier	Input / 1M	Output / 1M	Best for
GoogleGemini 2.5 Flash-Lite	Cheapest	$0.10	$0.40	Classification, routing, high-volume tasks
OpenAIGPT-4.1 Nano	Cheapest	$0.10	$0.40	Budget workloads, classification, summarisation
xAIGrok 4.1	Budget alt.	$0.20	$0.50	Low-cost general tasks
GoogleGemini 3 Flash	Best value	$0.50	$3.00	Agentic workflows, multi-turn chat, coding
AnthropicClaude Haiku 4.5	Budget Claude	$1.00	$5.00	Fast tasks with strong instruction-following
OpenAIo4-mini	Best reasoning value	$1.10	$4.40	Chain-of-thought reasoning, coding, analysis
OpenAIGPT-5	Flagship	$1.25	$10.00	General intelligence, balanced price-to-quality
GoogleGemini 2.5 Pro	Flagship	$1.25	$10.00	Long context, multimodal, research
OpenAIGPT-5.2	Flagship	$1.75	$14.00	Advanced reasoning, coding, agents
GoogleGemini 3.1 Pro	Flagship	$2.00	$12.00	Multimodal, coding, research
OpenAIGPT-5.4	Flagship	$2.50	$15.00	Top-tier reasoning, agents, coding
AnthropicClaude Sonnet 4.6	Mid-flagship	$3.00	$15.00	Complex instruction-following, writing
AnthropicClaude Opus 4.7	Premium	$5.00	$25.00	Frontier reasoning, 1M context, long documents

The complete guide to AI API costs

What are tokens — and why do they matter?

Input tokens vs output tokens

The AI pricing formula — how to calculate your bill

Worked example: a customer support chatbot

Claude vs GPT vs Gemini — which is cheaper?

What is the cheapest AI API available right now?

Cheapest models by use case

How many tokens is 1,000 words?

Cost optimisation strategies — cut your AI bill by up to 90%

Use prompt caching

Use batch APIs

Downsize your model

Trim your output tokens

Optimise system prompts

Use RAG over large context

Prompt caching in depth

Frequently asked questions