Methodology
AI tokens methodology
Token count is a heuristic estimate. Pricing is exact at refresh time. Different precision floors.
By Buğra SözeriPublished Updated
The token counter estimates how many tokens a piece of text will use for a given large-language-model API, and multiplies by current published pricing to estimate cost. Both halves of that sentence have meaningful precision limits.
Token estimation: heuristic, not exact
Every modern LLM uses a tokeniser — typically BPE (Byte Pair Encoding) for GPT and Claude, SentencePiece for Gemini and Llama — that converts text to a sequence of integer token IDs. The exact mapping is model-specific and proprietary; running the actual tokeniser requires the tokeniser model file (typically 1-5 MB) bundled into the client.
We don’t bundle tokenisers because they update with model releases and the bundle size adds up across 4+ vendors. Instead we use the published character-to-token ratios from each vendor’s documentation:
- GPT-3.5/4/5: ~4 chars per token for English; higher for code; lower for non-Latin scripts.
- Claude 3/4: ~3.5 chars per token. The Claude tokeniser is slightly more aggressive than GPT’s.
- Gemini: ~4 chars per token for English.
- Llama 3/4: ~4 chars per token.
These ratios are within ~10% of the true token count for typical English prose. They drift further for code (which tokenises into more pieces because of identifier splits), non-Latin scripts (Chinese, Japanese, Arabic — sometimes 2-3× more tokens per character), and structured data (JSON, XML — somewhere between English and code).
Pricing: exact but stale
Each model has published per-token pricing for input and (separately) for output tokens. We hardcode these prices in a registry that we refresh manually when vendors update their pricing (typically every 1-3 months as new models ship and old ones get repriced).
Pricing in the registry is correct as of the most recent deploy. For real production cost forecasting, double-check against the vendor’s pricing page — and budget for 15-30% headroom because the actual cost depends on output length, which is non-deterministic.
What we model
For each model, the calculator estimates:
- Input tokens (from the user’s prompt).
- Output tokens (from a user-specified estimate or vendor default).
- Cost = input_tokens × input_price + output_tokens × output_price.
- The total in USD with 6 decimal places.
What we don’t model
- Cached input pricing. Several vendors (OpenAI, Anthropic) offer discounted pricing for input tokens that match a recently-seen prompt prefix. Worth knowing about; not modelled here.
- Batch API discounts. Async batch endpoints often offer 50% off; not modelled.
- Image/audio/video inputs. Multi-modal token costs vary by model and are computed differently from text. Roadmap.
- Fine-tuned model pricing. Vendors price fine-tunes differently from base models.
Algorithm details: the BPE merge loop
Both GPT and Claude tokenisers are Byte Pair Encoding variants. The training-time procedure (Sennrich et al., 2016) starts from a base vocabulary of single bytes and repeatedly applies the merge:find the most-frequent adjacent pair (a, b) in the corpus, add a new token “ab” to the vocabulary, replace every (a, b) occurrence with it. The procedure halts when the vocabulary reaches the target size — 100,277 for GPT-4o’s cl100k_base, ~128k for Llama 3, ~256k for Gemini. At inference time the tokeniser applies the saved merge list greedily to the input.
Our character-ratio heuristic skips the merge loop entirely. For a piece of text with N characters and observed mean tokens-per-character r: tokens ≈ ⌈N × r⌉. The constants we use:
| Model family | r (tokens/char) | 1/r (chars/token) | Source |
|---|---|---|---|
| GPT-4o / 4.1 | 0.25 | 4.0 | OpenAI docs & tiktoken benchmark |
| Claude 3.5 / 4 | 0.286 | 3.5 | Anthropic docs |
| Gemini 1.5+ | 0.25 | 4.0 | Google AI Studio docs |
| Llama 3 / 4 | 0.25 | 4.0 | Meta model card |
Cost derivation: given input tokens T_in, output tokens T_out, and the vendor’s per-million-token rates p_in and p_out, total USD cost = (T_in × p_in + T_out × p_out) / 1,000,000. We round to six decimal places to preserve sub-cent precision for short prompts.
Sources & references
Heuristics in this page are calibrated against OpenAI’s own tiktoken reference tokeniser on a 100k-sample English Wikipedia corpus. The BPE algorithm is documented in Sennrich, Haddow & Birch (2016); SentencePiece, used by Gemini and Llama, in Kudo & Richardson (2018). See the Sources & references block below for the primary citations and the vendor pricing pages we mirror.
Assumptions & limitations
- English-prose calibration only. The tokens-per-character constants are fitted to English Wikipedia text. Code, JSON, Chinese, Japanese, Arabic, and other non-Latin scripts can deviate by 30-300% (Chinese typically uses 2-3× more tokens per character).
- No support for cached input pricing. OpenAI and Anthropic both offer 50-90% discounts on re-used prefix tokens. The cost estimate uses full uncached pricing.
- No batch-API discount. Async batch endpoints typically halve the per-token cost; not reflected here.
- Output length is user-provided.We can’t predict response length; ±50% on
T_outis typical depending on the prompt. - Vision and audio inputs not modelled. Each vendor counts non-text tokens differently (image tiles for GPT-4o, audio seconds for Gemini, etc.).
- Pricing is a snapshot. The registry updates monthly; mid-month vendor price changes are not reflected until the next deploy.
- Fine-tuned and reserved-capacity pricing differs. The estimate uses standard on-demand rates only.
How accurate is the estimate really?
For typical English prose at modest length (50-5000 characters), our token count is within 10% of the true count and our cost estimate is within 10-15% of the actual API bill. That’s plenty for back-of-envelope sizing — “is this prompt 1 cent or 1 dollar?” — and inadequate for cents-precise billing. For the latter, use the vendor’s official tokeniser; for everything else, ours is a useful gut check.
Frequently asked questions
- How does Convertitive estimate token counts?
- Token counts are heuristic estimates, not exact values. The approximation follows the widely observed ~4 characters per token ratio for English prose, which aligns with the Byte Pair Encoding (BPE) algorithm described in Sennrich et al. (2016). For code, multilingual text, or emoji the ratio differs — code averages ~3 chars/token, and many Unicode code points outside the Basic Multilingual Plane cost 1–3 tokens each in GPT-4o's cl100k_base vocabulary.
- What tokenization algorithm do OpenAI models use?
- GPT-3.5, GPT-4, and GPT-4o use Byte Pair Encoding (BPE) with the cl100k_base vocabulary (100,000 tokens). BPE merges frequent byte pairs iteratively until reaching the vocabulary size. The tiktoken library (openai/tiktoken on GitHub) is the canonical open-source implementation. Claude and Gemini use SentencePiece-based tokenizers with overlapping but distinct vocabularies — exact token counts differ between providers.
- How accurate is the LLM cost estimate?
- The pricing component is exact at the time of the last manual update; cost estimates are only as fresh as the embedded price table. Token count is heuristic (±10–30% depending on content type), so the final cost estimate carries the same variance. For production billing forecasting, use the provider's own tokenizer and live pricing API.
- What are the assumptions behind the token cost calculation?
- We assume: (1) all tokens are billed at standard input/output rates with no prompt-caching discount; (2) the full input is sent on every request (no context truncation); (3) output length is either user-supplied or set to a vendor-published default. Batch API discounts (e.g. 50% off for OpenAI Batch API) and context-caching credits (e.g. Anthropic's prompt caching) are not reflected.
- Where does the pricing data come from?
- Prices are manually sourced from each provider's public pricing page: openai.com/pricing, anthropic.com/pricing, ai.google.dev/pricing, together.ai, and replicate.com. They are updated on a best-effort basis and may lag provider-announced changes by days to weeks. Always verify current rates on the provider's pricing page before committing to a production budget.
Sources & references
Authoritative references cited by this piece. Verified by Buğra Sözeri on the dates shown and re-checked at every deploy.
- Sennrich, Haddow & Birch (2016) — Neural Machine Translation of Rare Words with Subword Units (BPE) — The peer-reviewed introduction of Byte Pair Encoding for neural sequence models — the algorithm GPT and Claude tokenisers descend from.(as of )
- Kudo & Richardson (2018) — SentencePiece: A simple and language independent subword tokenizer — Defines the SentencePiece algorithm used by Gemini, Llama and most multilingual models.(as of )
- OpenAI — Pricing (current model rate card) — Canonical source for the GPT input/output per-million-token prices our registry mirrors.(as of )
- Anthropic — Models & Pricing — Canonical source for Claude model identifiers and per-token pricing.(as of )
- OpenAI tiktoken — Reference BPE tokeniser — The open-source ground-truth tokeniser our 4-chars-per-token heuristic is benchmarked against.(as of )
Related
Published May 14, 2026 · Last reviewed May 31, 2026