Skip to content

Methodology

AI tokens methodology

Token count is a heuristic estimate. Pricing is exact at refresh time. Different precision floors.

The token counter estimates how many tokens a piece of text will use for a given large-language-model API, and multiplies by current published pricing to estimate cost. Both halves of that sentence have meaningful precision limits.

Token estimation: heuristic, not exact

Every modern LLM uses a tokeniser — typically BPE (Byte Pair Encoding) for GPT and Claude, SentencePiece for Gemini and Llama — that converts text to a sequence of integer token IDs. The exact mapping is model-specific and proprietary; running the actual tokeniser requires the tokeniser model file (typically 1-5 MB) bundled into the client.

We don’t bundle tokenisers because they update with model releases and the bundle size adds up across 4+ vendors. Instead we use the published character-to-token ratios from each vendor’s documentation:

  • GPT-3.5/4/5: ~4 chars per token for English; higher for code; lower for non-Latin scripts.
  • Claude 3/4: ~3.5 chars per token. The Claude tokeniser is slightly more aggressive than GPT’s.
  • Gemini: ~4 chars per token for English.
  • Llama 3/4: ~4 chars per token.

These ratios are within ~10% of the true token count for typical English prose. They drift further for code (which tokenises into more pieces because of identifier splits), non-Latin scripts (Chinese, Japanese, Arabic — sometimes 2-3× more tokens per character), and structured data (JSON, XML — somewhere between English and code).

Pricing: exact but stale

Each model has published per-token pricing for input and (separately) for output tokens. We hardcode these prices in a registry that we refresh manually when vendors update their pricing (typically every 1-3 months as new models ship and old ones get repriced).

Pricing in the registry is correct as of the most recent deploy. For real production cost forecasting, double-check against the vendor’s pricing page — and budget for 15-30% headroom because the actual cost depends on output length, which is non-deterministic.

What we model

For each model, the calculator estimates:

  • Input tokens (from the user’s prompt).
  • Output tokens (from a user-specified estimate or vendor default).
  • Cost = input_tokens × input_price + output_tokens × output_price.
  • The total in USD with 6 decimal places.

What we don’t model

  • Cached input pricing. Several vendors (OpenAI, Anthropic) offer discounted pricing for input tokens that match a recently-seen prompt prefix. Worth knowing about; not modelled here.
  • Batch API discounts. Async batch endpoints often offer 50% off; not modelled.
  • Image/audio/video inputs. Multi-modal token costs vary by model and are computed differently from text. Roadmap.
  • Fine-tuned model pricing. Vendors price fine-tunes differently from base models.

How accurate is the estimate really?

For typical English prose at modest length (50-5000 characters), our token count is within 10% of the true count and our cost estimate is within 10-15% of the actual API bill. That’s plenty for back-of-envelope sizing — “is this prompt 1 cent or 1 dollar?” — and inadequate for cents-precise billing. For the latter, use the vendor’s official tokeniser; for everything else, ours is a useful gut check.

Related

Published May 14, 2026