- Why isn't the count exact?
- Exact tokenization requires the model's own BPE / SentencePiece table. OpenAI ships tiktoken; Anthropic and Google ship their own SDKs. Loading those tables in a browser would add ~10 MB of JavaScript per model, which isn't worth it for a quick estimate. The heuristic stays within 10% for English text and code.
- How is the style detected?
- If more than 6% of characters are symbols typical of code or JSON ({ } [ ] < > ; : = ( ) | ", '), the style is classified as 'code' and the chars-per-token ratio drops from 4 to 3.5. Everything else is treated as prose.
- Are the prices current?
- Prices are updated periodically and reflect each provider's public list price for direct API access. Discounts (batch API, prompt caching, enterprise contracts) aren't applied. Cross-check the vendor's pricing page before signing a contract.
- What does 'output ratio' mean?
- How long the model's reply will be relative to your prompt. A 1× ratio means the output is roughly the same length as the input. Classification tasks have ratios of ~0.05; code generation 2–5×; long-form rewriting 1.5–3×.
- Does the cost include the context window?
- Yes. Modern API billing charges for every token in the conversation, including any system prompt, prior turns, and tool definitions. Run your full assembled prompt through the widget for the most accurate estimate.
- What about prompt caching?
- Most major vendors now offer a discounted rate (50–90% off) for repeated parts of a prompt. This calculator does not apply caching discounts because they depend on hit rate; for a production system, model the cached portion separately at the vendor's cached-input price.
- Is non-English text more expensive?
- Yes, materially. The tokenizers were trained predominantly on English; non-English Latin scripts pay a 10–20% token premium, and CJK scripts can pay 2–4× the per-character rate. Until we ship a real tokenizer, treat the heuristic as a lower bound for non-English content.