Skip to content

Methodology

AI tokens methodology

Token count is a heuristic estimate. Pricing is exact at refresh time. Different precision floors.

By Published Updated

The token counter estimates how many tokens a piece of text will use for a given large-language-model API, and multiplies by current published pricing to estimate cost. Both halves of that sentence have meaningful precision limits.

Token estimation: heuristic, not exact

Every modern LLM uses a tokeniser — typically BPE (Byte Pair Encoding) for GPT and Claude, SentencePiece for Gemini and Llama — that converts text to a sequence of integer token IDs. The exact mapping is model-specific and proprietary; running the actual tokeniser requires the tokeniser model file (typically 1-5 MB) bundled into the client.

We don’t bundle tokenisers because they update with model releases and the bundle size adds up across 4+ vendors. Instead we use the published character-to-token ratios from each vendor’s documentation:

  • GPT-3.5/4/5: ~4 chars per token for English; higher for code; lower for non-Latin scripts.
  • Claude 3/4: ~3.5 chars per token. The Claude tokeniser is slightly more aggressive than GPT’s.
  • Gemini: ~4 chars per token for English.
  • Llama 3/4: ~4 chars per token.

These ratios are within ~10% of the true token count for typical English prose. They drift further for code (which tokenises into more pieces because of identifier splits), non-Latin scripts (Chinese, Japanese, Arabic — sometimes 2-3× more tokens per character), and structured data (JSON, XML — somewhere between English and code).

Pricing: exact but stale

Each model has published per-token pricing for input and (separately) for output tokens. We hardcode these prices in a registry that we refresh manually when vendors update their pricing (typically every 1-3 months as new models ship and old ones get repriced).

Pricing in the registry is correct as of the most recent deploy. For real production cost forecasting, double-check against the vendor’s pricing page — and budget for 15-30% headroom because the actual cost depends on output length, which is non-deterministic.

What we model

For each model, the calculator estimates:

  • Input tokens (from the user’s prompt).
  • Output tokens (from a user-specified estimate or vendor default).
  • Cost = input_tokens × input_price + output_tokens × output_price.
  • The total in USD with 6 decimal places.

What we don’t model

  • Cached input pricing. Several vendors (OpenAI, Anthropic) offer discounted pricing for input tokens that match a recently-seen prompt prefix. Worth knowing about; not modelled here.
  • Batch API discounts. Async batch endpoints often offer 50% off; not modelled.
  • Image/audio/video inputs. Multi-modal token costs vary by model and are computed differently from text. Roadmap.
  • Fine-tuned model pricing. Vendors price fine-tunes differently from base models.

Algorithm details: the BPE merge loop

Both GPT and Claude tokenisers are Byte Pair Encoding variants. The training-time procedure (Sennrich et al., 2016) starts from a base vocabulary of single bytes and repeatedly applies the merge:find the most-frequent adjacent pair (a, b) in the corpus, add a new token “ab” to the vocabulary, replace every (a, b) occurrence with it. The procedure halts when the vocabulary reaches the target size — 100,277 for GPT-4o’s cl100k_base, ~128k for Llama 3, ~256k for Gemini. At inference time the tokeniser applies the saved merge list greedily to the input.

Our character-ratio heuristic skips the merge loop entirely. For a piece of text with N characters and observed mean tokens-per-character r: tokens ≈ ⌈N × r⌉. The constants we use:

Model familyr (tokens/char)1/r (chars/token)Source
GPT-4o / 4.10.254.0OpenAI docs & tiktoken benchmark
Claude 3.5 / 40.2863.5Anthropic docs
Gemini 1.5+0.254.0Google AI Studio docs
Llama 3 / 40.254.0Meta model card

Cost derivation: given input tokens T_in, output tokens T_out, and the vendor’s per-million-token rates p_in and p_out, total USD cost = (T_in × p_in + T_out × p_out) / 1,000,000. We round to six decimal places to preserve sub-cent precision for short prompts.

Sources & references

Heuristics in this page are calibrated against OpenAI’s own tiktoken reference tokeniser on a 100k-sample English Wikipedia corpus. The BPE algorithm is documented in Sennrich, Haddow & Birch (2016); SentencePiece, used by Gemini and Llama, in Kudo & Richardson (2018). See the Sources & references block below for the primary citations and the vendor pricing pages we mirror.

Assumptions & limitations

  • English-prose calibration only. The tokens-per-character constants are fitted to English Wikipedia text. Code, JSON, Chinese, Japanese, Arabic, and other non-Latin scripts can deviate by 30-300% (Chinese typically uses 2-3× more tokens per character).
  • No support for cached input pricing. OpenAI and Anthropic both offer 50-90% discounts on re-used prefix tokens. The cost estimate uses full uncached pricing.
  • No batch-API discount. Async batch endpoints typically halve the per-token cost; not reflected here.
  • Output length is user-provided.We can’t predict response length; ±50% on T_out is typical depending on the prompt.
  • Vision and audio inputs not modelled. Each vendor counts non-text tokens differently (image tiles for GPT-4o, audio seconds for Gemini, etc.).
  • Pricing is a snapshot. The registry updates monthly; mid-month vendor price changes are not reflected until the next deploy.
  • Fine-tuned and reserved-capacity pricing differs. The estimate uses standard on-demand rates only.

How accurate is the estimate really?

For typical English prose at modest length (50-5000 characters), our token count is within 10% of the true count and our cost estimate is within 10-15% of the actual API bill. That’s plenty for back-of-envelope sizing — “is this prompt 1 cent or 1 dollar?” — and inadequate for cents-precise billing. For the latter, use the vendor’s official tokeniser; for everything else, ours is a useful gut check.

Frequently asked questions

How does Convertitive estimate token counts?
Token counts are heuristic estimates, not exact values. The approximation follows the widely observed ~4 characters per token ratio for English prose, which aligns with the Byte Pair Encoding (BPE) algorithm described in Sennrich et al. (2016). For code, multilingual text, or emoji the ratio differs — code averages ~3 chars/token, and many Unicode code points outside the Basic Multilingual Plane cost 1–3 tokens each in GPT-4o's cl100k_base vocabulary.
What tokenization algorithm do OpenAI models use?
GPT-3.5, GPT-4, and GPT-4o use Byte Pair Encoding (BPE) with the cl100k_base vocabulary (100,000 tokens). BPE merges frequent byte pairs iteratively until reaching the vocabulary size. The tiktoken library (openai/tiktoken on GitHub) is the canonical open-source implementation. Claude and Gemini use SentencePiece-based tokenizers with overlapping but distinct vocabularies — exact token counts differ between providers.
How accurate is the LLM cost estimate?
The pricing component is exact at the time of the last manual update; cost estimates are only as fresh as the embedded price table. Token count is heuristic (±10–30% depending on content type), so the final cost estimate carries the same variance. For production billing forecasting, use the provider's own tokenizer and live pricing API.
What are the assumptions behind the token cost calculation?
We assume: (1) all tokens are billed at standard input/output rates with no prompt-caching discount; (2) the full input is sent on every request (no context truncation); (3) output length is either user-supplied or set to a vendor-published default. Batch API discounts (e.g. 50% off for OpenAI Batch API) and context-caching credits (e.g. Anthropic's prompt caching) are not reflected.
Where does the pricing data come from?
Prices are manually sourced from each provider's public pricing page: openai.com/pricing, anthropic.com/pricing, ai.google.dev/pricing, together.ai, and replicate.com. They are updated on a best-effort basis and may lag provider-announced changes by days to weeks. Always verify current rates on the provider's pricing page before committing to a production budget.

Sources & references

Authoritative references cited by this piece. Verified by Buğra Sözeri on the dates shown and re-checked at every deploy.

Related

Published May 14, 2026 · Last reviewed May 31, 2026