Prompt cachingUp to 90% off repeated context.
Long system prompts, big documents, reused tool definitions — price them once, reuse them cheap.
Cache pricing per model
| Model | Base in | 5m write | 1h write | Read | Savings vs base |
|---|
Included—the provider has no explicit cache-write fee — writes are covered by the base input price. Only Claude exposes separate 5-min / 1-hour write tiers; Gemini and GPT cache reads are free to write by design.
All prices USD per 1M tokens · Alloneia's final rate, official already discounted.
Anthropiccache_control header
Mark a message block with cache_control and pick a TTL. The next request that starts with the same prefix gets charged at the cache read rate — typically 10% of the base input price.
"messages": [
{
"role": "system",
"content": [{
"type": "text",
"text": "You are a senior dev reviewer…",
"cache_control": { "type": "ephemeral", "ttl": "5m" }
}]
},
{ "role": "user", "content": "Review this PR" }
]
OpenAIprompt_cache_key
OpenAI's cache is keyed on the prefix plus an optional prompt_cache_key. Same prefix + same key = cache hit at the discounted input rate.
{
"model": "gpt-5.4",
"messages": [...],
"prompt_cache_key": "assistant-v3"
}