Three tiers · same models

Pay less as you ship more.

Every tier uses the same OpenAI-compatible API. You just pay 1×, 0.8× or 0.6× of the provider's list price as your monthly volume grows.

Starter

30%off provider list price

You pay 0.70× of what Anthropic / OpenAI / Google charge

Unlocks at $0 / mo

Drop-in replacement. Every model, every cache tier, zero commitment.

  • Volume rebate
  • PDF receipts
  • Auto-retry on 5xx
  • Recommended concurrency: 32
  • Requests / min: 120
  • Target uptime: 99.9%
Start free
Most popular

Growth

44%off provider list price

You pay 0.56× of what Anthropic / OpenAI / Google charge

Unlocks at $500 / mo

20% off everything once you cross $500 / mo. Auto-applied.

  • Volume rebate
  • PDF receipts
  • Auto-retry on 5xx
  • Recommended concurrency: 128
  • Requests / min: 600
  • Target uptime: 99.9%
Start free

Scale

58%off provider list price

You pay 0.42× of what Anthropic / OpenAI / Google charge

Unlocks at $1,000 / mo

40% off plus invoices and dedicated concurrency once you cross $1k / mo.

  • Volume rebate
  • PDF receipts
  • Auto-retry on 5xx
  • Recommended concurrency: 512
  • Requests / min: up to 2,000
  • Target uptime: 99.9%
Contact sales

Compare tiers

1.0×0.8×0.6×
Volume rebate
PDF receipts
Auto-retry on 5xx
Recommended concurrency32128512
Requests / min120600up to 2,000
Target uptime99.9%99.9%99.9%
Activation threshold$0$500$1,000

Tiers are evaluated on a rolling 30-day window. Top-ups never expire.

Tier is recomputed on every request from your rolling 30-day spend. Discount applied automatically — no promo codes, no invoices to edit.

Prompt caching

Up to 90% off repeated context.

Long system prompts, big documents, reused tool definitions — price them once, reuse them cheap.

Cache pricing per model

ModelBase in5m write1h writeReadSavings vs base

Includedthe provider has no explicit cache-write fee — writes are covered by the base input price. Only Claude exposes separate 5-min / 1-hour write tiers; Gemini and GPT cache reads are free to write by design.

All prices USD per 1M tokens · Alloneia's final rate, official already discounted.

Anthropiccache_control header

Mark a message block with cache_control and pick a TTL. The next request that starts with the same prefix gets charged at the cache read rate — typically 10% of the base input price.

"messages": [
  {
    "role": "system",
    "content": [{
      "type": "text",
      "text": "You are a senior dev reviewer…",
      "cache_control": { "type": "ephemeral", "ttl": "5m" }
    }]
  },
  { "role": "user", "content": "Review this PR" }
]

OpenAIprompt_cache_key

OpenAI's cache is keyed on the prefix plus an optional prompt_cache_key. Same prefix + same key = cache hit at the discounted input rate.

{
  "model": "gpt-5.4",
  "messages": [...],
  "prompt_cache_key": "assistant-v3"
}

5 min vs 1 hour — which to pick?

  • 5 min·Good default for chat sessions, IDE agents, and anything that gets re-queried every few seconds. Cheapest write price.
  • 1 hour·Use for long-running batch jobs, RAG prompts, or when your traffic has slow bursts. More expensive to write, but the hit keeps paying off for up to 60 minutes.

Savings calculator

No cache-enabled models configured yet.

FAQ

Do I need a different endpoint to use caching?

No. Same /v1/chat/completions. Add cache_control (Anthropic) or prompt_cache_key (OpenAI) to your existing request body.

What's the difference between 5m and 1h?

The TTL of the cached entry. 5-minute cache is cheaper to write, perfect for chat sessions. 1-hour cache costs more to write but is worth it for long-lived prompts reused across many requests.

How much of my prompt can I cache?

Anthropic supports up to 4 cache breakpoints per request, any contiguous prefix. OpenAI caches automatically on prefix matches of ≥1024 tokens.

Is caching available on every model?

No. Claude models expose full 5m + 1h writes; GPT-5 and Gemini 2.5 support cached reads only. The table above shows what's available per model.

Does Alloneia charge extra to use caching?

No. We pass through the provider's cache pricing — same 0.7× discount applies to both base and cache tiers.

What happens if the cache misses?

You get billed at the base input price — exactly what you'd pay without caching. No penalty, no surprise.