Tokens

The atomic units that LLMs read and write — sub-word pieces produced by a tokenizer. Pricing and context limits are measured in tokens, not words.

A token is the basic unit of text an LLM processes. Tokenizers (BPE, SentencePiece, tiktoken) break text into sub-word pieces. As a rough rule, 1 token ≈ 0.75 English words or 4 characters. Numbers, code and non-English languages often use more tokens per character.

Every input and output is billed in tokens, almost always quoted per million (Mtok). Input tokens are typically 4–8× cheaper than output tokens. Cached or batched input is cheaper still on most providers.

Knowing your token counts matters: it controls cost, fits within the context window, and decides how much retrieved or attached content you can include in a single call.

Related terms

Large Language Model (LLM)
A neural network trained on massive text corpora to predict the next token, used for chat, coding, reasoning and as the brain inside AI agents.
Context Window
The maximum number of tokens an LLM can read in a single request, including the prompt, retrieved documents and the model's own reply.

More to explore

Other wiki entries that touch on Tokens.

Prompt Engineering
The practice of designing inputs to LLMs to reliably produce useful outputs — through structure, examples, role-setting and constraints.
Model Context Protocol (MCP)
An open standard from Anthropic for connecting LLMs to external tools and data sources through a uniform server interface.