Glossary
GPT token
The atomic unit of LLM input and output
By Buğra SözeriPublished Updated
A GPT token (more generally, a token) is the unit a large language model processes. Models don’t see characters or words directly — text is first tokenised into a sequence of integer IDs from a fixed vocabulary, typically 50,000-200,000 tokens.
OpenAI’s GPT-3, GPT-4, and GPT-5 use BPE (Byte Pair Encoding) tokenisers. Common English words are usually one token (“the” → 1, “and” → 1); longer or rarer words split into multiple tokens (“tokenization” → maybe 3); code splits much more heavily (identifiers, brackets, indentation each become their own tokens).
Practical ratios:
- English prose: ~4 characters per token, ~0.75 words per token
- Code: ~2-3 characters per token (heavier splitting)
- Non-Latin scripts (Chinese, Japanese, Arabic): can be 1 character per token or worse
Both input and output tokens are billed. Output tokens typically cost 3-5× input. Use our token counter for live estimation across GPT, Claude, Gemini, and Llama models.
The non-Latin script tax: a Turkish, Greek, or Russian paragraph of the same semantic content as English typically costs 2-3× more tokens because the tokeniser was trained predominantly on English text and falls back to character-level splitting for rarer scripts. A Chinese paragraph can be 4-6× more tokens. This translates directly to cost — running the same chatbot in Japanese vs English can easily double the per-conversation bill. The 2024-vintage tokenisers (OpenAI o200k_base, Claude’s newer tokeniser) added many more non-Latin tokens and narrowed the gap, but English remains the cheapest language to operate an LLM in.
Prompt-caching changes the math: OpenAI, Anthropic, and Google all offer prompt caching as of 2024-25 — repeated input prefixes are charged at 25-90% of the regular input rate after first use. For chatbot workloads with stable system prompts and long contexts, this reduces effective input cost dramatically. The cache is per-prefix (hash of the leading tokens) so reordering breaks it; structure prompts with stable content first, dynamic content last. Reference: OpenAI tiktoken — the reference BPE tokeniser.
Worked example: counting tokens in a real prompt
Prompt: “Summarise the following meeting transcript in three bullet points.” followed by 2,000 words of English transcript and a request for a 200-word summary. Using the cl100k_base tokeniser (GPT-4, GPT-4o family): the instruction is 11 tokens; 2,000 English words tokenise to roughly 2,700 tokens; the model returns 200 words ≈ 270 output tokens. At illustrative 2026 GPT-4o pricing (USD 2.50/M input, USD 10/M output): input cost 2,711 / 1,000,000 × 2.50 ≈ 0.0068 USD; output cost 270 / 1,000,000 × 10 ≈ 0.0027 USD. Total ≈ 0.95 cents per call. Translate the same transcript to Japanese (~6,500 tokens at the same tokeniser) and the per-call cost roughly triples — without the model doing any more reasoning.
When token counts matter operationally
Beyond pricing, tokens determine context-window fit. A 128,000-token model can hold roughly 96,000 English words, 64,000 lines of Python, or 24,000 Japanese characters before older content gets evicted. RAG (retrieval-augmented generation) pipelines should chunk source documents at 200-1,000 token windows for embedding quality; chunks too large blur the embedding, chunks too small fragment semantically related content. Tools: tiktoken (OpenAI), @anthropic-ai/tokenizer (Anthropic), Hugging Face’s AutoTokenizer (open models). Related: context window, LLM. Background: Hugging Face — Tokenizer summary.
Frequently asked questions
- What is a GPT token?
- A token is the smallest unit an LLM processes — roughly 4 English characters or 0.75 words. Text is split into tokens using a tokeniser (e.g. BPE) before the model ever sees it.
- How many tokens does a typical paragraph use?
- An average English paragraph of 100 words tokenises to around 130–140 tokens. The same paragraph in Japanese or Chinese can cost 3–6× more tokens because the tokeniser was trained predominantly on English.
- What is the difference between input and output tokens?
- Input tokens are the prompt fed to the model; output tokens are the generated response. Output tokens typically cost 3–5× more than input tokens in most commercial pricing tiers.
- Why does prompt caching matter for token costs?
- Cached input prefixes are re-charged at 10–25% of the normal rate on repeat calls. Structuring prompts with stable system instructions first and dynamic content last maximises cache hit rate and can cut per-call costs dramatically.
Related
Published May 14, 2026 · Last reviewed May 31, 2026