Methodology
AI tokens methodology
Token count is a heuristic estimate. Pricing is exact at refresh time. Different precision floors.
The token counter estimates how many tokens a piece of text will use for a given large-language-model API, and multiplies by current published pricing to estimate cost. Both halves of that sentence have meaningful precision limits.
Token estimation: heuristic, not exact
Every modern LLM uses a tokeniser — typically BPE (Byte Pair Encoding) for GPT and Claude, SentencePiece for Gemini and Llama — that converts text to a sequence of integer token IDs. The exact mapping is model-specific and proprietary; running the actual tokeniser requires the tokeniser model file (typically 1-5 MB) bundled into the client.
We don’t bundle tokenisers because they update with model releases and the bundle size adds up across 4+ vendors. Instead we use the published character-to-token ratios from each vendor’s documentation:
- GPT-3.5/4/5: ~4 chars per token for English; higher for code; lower for non-Latin scripts.
- Claude 3/4: ~3.5 chars per token. The Claude tokeniser is slightly more aggressive than GPT’s.
- Gemini: ~4 chars per token for English.
- Llama 3/4: ~4 chars per token.
These ratios are within ~10% of the true token count for typical English prose. They drift further for code (which tokenises into more pieces because of identifier splits), non-Latin scripts (Chinese, Japanese, Arabic — sometimes 2-3× more tokens per character), and structured data (JSON, XML — somewhere between English and code).
Pricing: exact but stale
Each model has published per-token pricing for input and (separately) for output tokens. We hardcode these prices in a registry that we refresh manually when vendors update their pricing (typically every 1-3 months as new models ship and old ones get repriced).
Pricing in the registry is correct as of the most recent deploy. For real production cost forecasting, double-check against the vendor’s pricing page — and budget for 15-30% headroom because the actual cost depends on output length, which is non-deterministic.
What we model
For each model, the calculator estimates:
- Input tokens (from the user’s prompt).
- Output tokens (from a user-specified estimate or vendor default).
- Cost = input_tokens × input_price + output_tokens × output_price.
- The total in USD with 6 decimal places.
What we don’t model
- Cached input pricing. Several vendors (OpenAI, Anthropic) offer discounted pricing for input tokens that match a recently-seen prompt prefix. Worth knowing about; not modelled here.
- Batch API discounts. Async batch endpoints often offer 50% off; not modelled.
- Image/audio/video inputs. Multi-modal token costs vary by model and are computed differently from text. Roadmap.
- Fine-tuned model pricing. Vendors price fine-tunes differently from base models.
How accurate is the estimate really?
For typical English prose at modest length (50-5000 characters), our token count is within 10% of the true count and our cost estimate is within 10-15% of the actual API bill. That’s plenty for back-of-envelope sizing — “is this prompt 1 cent or 1 dollar?” — and inadequate for cents-precise billing. For the latter, use the vendor’s official tokeniser; for everything else, ours is a useful gut check.
Related
Published May 14, 2026