Rate Limiting

Vaikora's rate limiting protects your LLM applications from abuse and unexpected cost spikes. Set per-key, per-user, and global limits on requests and tokens with intelligent retry strategies.

Get a demo · Open-source gateway on GitHub · MCP server

Default Limits

Vaikora applies sensible defaults: 100 requests/minute per API key and 1M tokens/day per user. Adjust these via the Control Plane to match your application's traffic patterns.

Rate Limit Headers

Every response includes X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers so your application can track quota consumption in real time.

Handling Rate Limits

When limits are exceeded, Vaikora returns 429 Too Many Requests. Implement exponential backoff with jitter in your retry logic to smoothly handle temporary overages.

Burst Handling

Configure burst allowances to absorb traffic spikes without hard rejections. Bursts consume your per-minute budget faster, then refill on a rolling window.

Related pages