Tokens
The atomic units that LLMs read and write — sub-word pieces produced by a tokenizer. Pricing and context limits are measured in tokens, not words.
A token is the basic unit of text an LLM processes. Tokenizers (BPE, SentencePiece, tiktoken) break text into sub-word pieces. As a rough rule, 1 token ≈ 0.75 English words or 4 characters. Numbers, code and non-English languages often use more tokens per character.
Every input and output is billed in tokens, almost always quoted per million (Mtok). Input tokens are typically 4–8× cheaper than output tokens. Cached or batched input is cheaper still on most providers.
Knowing your token counts matters: it controls cost, fits within the context window, and decides how much retrieved or attached content you can include in a single call.
Related terms
- Large Language Model (LLM)
A neural network trained on massive text corpora to predict the next token, used for chat, coding, reasoning and as the brain inside AI agents.
- Context Window
The maximum number of tokens an LLM can read in a single request, including the prompt, retrieved documents and the model's own reply.
More to explore
Other wiki entries that touch on Tokens.
- Prompt Engineering
The practice of designing inputs to LLMs to reliably produce useful outputs — through structure, examples, role-setting and constraints.
- Model Context Protocol (MCP)
An open standard from Anthropic for connecting LLMs to external tools and data sources through a uniform server interface.