PromptCache Documentation

PromptCache License Version

A smart semantic cache for high-scale GenAI workloads.

What is PromptCache?

PromptCache is a lightweight middleware that sits between your application and your LLM provider. It uses semantic understanding to detect when a new prompt has the same intent as a previous one β€” and returns the cached result instantly.

Key Benefits

  • Reduce Costs: Save up to 80% on LLM API costs
  • Improve Latency: ~300ms vs ~1.5s average response time
  • Better Scale: Unlimited throughput without API rate limits
  • Smart Matching: Semantic understanding prevents incorrect cache hits

What’s New in v0.4.0

  • πŸ” API Authentication - Bearer-token middleware gating all management endpoints (API_AUTH_TOKEN)
  • 🌊 Streaming (SSE) - Full stream: true support across OpenAI, Mistral, and Claude β€” including streamed cache hits
  • βš™οΈ Runtime Config API - GET/PATCH /v1/config for live threshold and gray-zone updates
  • πŸ”₯ Cache Warming - POST /v1/cache/warm to bulk pre-populate from historical prompt/response pairs

Previously in v0.3.0

  • πŸ“Š Prometheus metrics, structured logging, health checks
  • πŸ—ƒοΈ Cache management API, LRU eviction
  • ⚑ ANN index for 5x faster similarity search
  • πŸ”„ Graceful shutdown, retry logic

Architecture

PromptCache uses a three-stage verification strategy:

  1. High similarity (β‰₯70%) β†’ Direct cache hit
  2. Low similarity (<30%) β†’ Skip cache directly
  3. Gray zone (30-70%) β†’ LLM verification for accuracy

This ensures cached responses are semantically correct, not just β€œclose enough”.

Supported Providers

  • OpenAI: text-embedding-3-small + gpt-4o-mini
  • Mistral AI: mistral-embed + mistral-small-latest
  • Claude (Anthropic): voyage-3 + claude-3-haiku

Features

  • βœ… Multiple provider support (OpenAI, Mistral, Claude)
  • βœ… Dynamic provider switching via API
  • βœ… Configurable similarity thresholds (env vars + runtime PATCH)
  • βœ… Gray zone verification control
  • βœ… OpenAI-compatible API (including SSE streaming)
  • βœ… Bearer-token authentication for management endpoints
  • βœ… Cache warming from historical data
  • βœ… Docker support
  • βœ… Thread-safe operations
  • βœ… BadgerDB persistence
  • βœ… Prometheus metrics export
  • βœ… Health check endpoints
  • βœ… Cache management API
  • βœ… Structured JSON logging
  • βœ… LRU cache eviction
  • βœ… Request tracing (X-Request-ID)

Community

License

MIT License - see LICENSE for details.


Back to top

Copyright © 2025 PromptCache. Distributed under the MIT License.