Why isn't the count exact?

Exact tokenization requires the model's own BPE / SentencePiece table. OpenAI ships tiktoken; Anthropic and Google ship their own SDKs. Loading those tables in a browser would add ~10 MB of JavaScript per model, which isn't worth it for a quick estimate. The heuristic stays within 10% for English text and code.

How is the style detected?

If more than 6% of characters are symbols typical of code or JSON ({ } [ ] ; : = ( ) | ", '), the style is classified as 'code' and the chars-per-token ratio drops from 4 to 3.5. Everything else is treated as prose.

Are the prices current?

Prices are updated periodically and reflect each provider's public list price for direct API access. Discounts (batch API, prompt caching, enterprise contracts) aren't applied. Cross-check the vendor's pricing page before signing a contract.

What does 'output ratio' mean?

How long the model's reply will be relative to your prompt. A 1× ratio means the output is roughly the same length as the input. Classification tasks have ratios of ~0.05; code generation 2–5×; long-form rewriting 1.5–3×.

Does the cost include the context window?

Yes. Modern API billing charges for every token in the conversation, including any system prompt, prior turns, and tool definitions. Run your full assembled prompt through the widget for the most accurate estimate.

What about prompt caching?

Most major vendors now offer a discounted rate (50–90% off) for repeated parts of a prompt. This calculator does not apply caching discounts because they depend on hit rate; for a production system, model the cached portion separately at the vendor's cached-input price.

Is non-English text more expensive?

Yes, materially. The tokenizers were trained predominantly on English; non-English Latin scripts pay a 10–20% token premium, and CJK scripts can pay 2–4× the per-character rate. Until we ship a real tokenizer, treat the heuristic as a lower bound for non-English content.

Token Counter & API Cost Calculator

Tokens in your prompt, dollars on your bill — both estimated as you type.

Token counts and API spend scale linearly. Knowing the count up front lets you size context windows correctly, budget API runs, and spot prompts that won’t fit. The widget below estimates token counts heuristically (no tokenizer in the browser), runs the result through current per-1M pricing for the major hosted models, and quotes a per-call cost based on a chosen output-to-input ratio.

Input text

Characters: 219
Words: 33
Tokens (est.): 55
Style detected: Prose

Per-call API cost

Estimates assume the input above plus an output of length 1× the input.

Model

Output / input ratio · 1×

Input cost: $0.000165
Output cost: $0.000825
Total per call: $0.00099

Claude Sonnet 4.6 pricing — $3.00 / 1M input, $15.00 / 1M output. Context window: 200k tokens.

Token counts are heuristic (~4 chars / token for prose, ~3.5 for code). Real tokenizer output may differ by ±10% — the ‘exact’ count needs the model’s BPE tables, which we don’t ship in the browser.

How to use

Paste your prompt
Anything you'd send to the model — system prompt, user message, tool definitions, the lot. The token count and style detection update as you type.
Pick the model and output ratio
Output tokens cost 3–5× more than input tokens at most vendors, so getting the ratio right matters. 1× is a safe default for short replies; 0.3× for classification tasks; 3–5× for code generation.
Read the cost row
Input cost + output cost = total per call. Multiply by your expected request volume to project monthly spend.

Heuristic accuracy

Content type	Heuristic	Typical error vs real tokenizer
English prose	chars ÷ 4	±5–8%
JSON / source code	chars ÷ 3.5	±8–12%
Markdown	chars ÷ 4	±6–10%
Non-English Latin scripts	chars ÷ 4	±10–15%
CJK / Arabic	chars ÷ 1.5	±20%+ (use real tokenizer)

Frequently asked questions

Why isn't the count exact?: Exact tokenization requires the model's own BPE / SentencePiece table. OpenAI ships tiktoken; Anthropic and Google ship their own SDKs. Loading those tables in a browser would add ~10 MB of JavaScript per model, which isn't worth it for a quick estimate. The heuristic stays within 10% for English text and code.
How is the style detected?: If more than 6% of characters are symbols typical of code or JSON ({ } [ ] < > ; : = ( ) | ", '), the style is classified as 'code' and the chars-per-token ratio drops from 4 to 3.5. Everything else is treated as prose.
Are the prices current?: Prices are updated periodically and reflect each provider's public list price for direct API access. Discounts (batch API, prompt caching, enterprise contracts) aren't applied. Cross-check the vendor's pricing page before signing a contract.
What does 'output ratio' mean?: How long the model's reply will be relative to your prompt. A 1× ratio means the output is roughly the same length as the input. Classification tasks have ratios of ~0.05; code generation 2–5×; long-form rewriting 1.5–3×.
Does the cost include the context window?: Yes. Modern API billing charges for every token in the conversation, including any system prompt, prior turns, and tool definitions. Run your full assembled prompt through the widget for the most accurate estimate.
What about prompt caching?: Most major vendors now offer a discounted rate (50–90% off) for repeated parts of a prompt. This calculator does not apply caching discounts because they depend on hit rate; for a production system, model the cached portion separately at the vendor's cached-input price.
Is non-English text more expensive?: Yes, materially. The tokenizers were trained predominantly on English; non-English Latin scripts pay a 10–20% token premium, and CJK scripts can pay 2–4× the per-character rate. Until we ship a real tokenizer, treat the heuristic as a lower bound for non-English content.

About

Why tokens not characters

Large language models charge for tokens because that's the unit of compute. A BPE tokenizer breaks text into sub-word chunks: common words are often a single token, rare or compound words may be several. 'Convertitive' for example is typically 4 tokens, while 'the' is 1 — there's no clean per-character or per-word rule. The heuristic here is a calibration that works because, averaged over enough text, the ratio is stable.

Cost planning notes

For a chat product expecting 100K conversations per day with ~3K input tokens and ~600 output tokens per turn at GPT-4o rates: 100,000 × 3,000 × $2.50 / 1M = $750/day input, 100,000 × 600 × $10 / 1M = $600/day output. Total ≈ $1,350/day or ~$40K/month. Cut by half with GPT-4o mini, in half again with prompt caching.