PromptCache Documentation

PromptCache License Version

A smart semantic cache for high-scale GenAI workloads.

What is PromptCache?

PromptCache is a lightweight middleware that sits between your application and your LLM provider. It uses semantic understanding to detect when a new prompt has the same intent as a previous one β€” and returns the cached result instantly.

Key Benefits

  • Reduce Costs: Save up to 80% on LLM API costs
  • Improve Latency: ~300ms vs ~1.5s average response time
  • Better Scale: Unlimited throughput without API rate limits
  • Smart Matching: Semantic understanding prevents incorrect cache hits

What’s New in v0.3.0

  • πŸ“Š Prometheus Metrics - Export hit rates, latency, and request counts
  • πŸ₯ Health Checks - Kubernetes-ready liveness/readiness probes
  • πŸ—ƒοΈ Cache Management API - View stats, clear cache, delete entries
  • πŸ“ Structured Logging - JSON logs for easy aggregation
  • ⚑ ANN Index - 5x faster similarity search
  • πŸ”„ Graceful Shutdown - Clean request draining
  • πŸ” Retry Logic - Automatic retries with exponential backoff

Architecture

PromptCache uses a three-stage verification strategy:

  1. High similarity (β‰₯70%) β†’ Direct cache hit
  2. Low similarity (<30%) β†’ Skip cache directly
  3. Gray zone (30-70%) β†’ LLM verification for accuracy

This ensures cached responses are semantically correct, not just β€œclose enough”.

Supported Providers

  • OpenAI: text-embedding-3-small + gpt-4o-mini
  • Mistral AI: mistral-embed + mistral-small-latest
  • Claude (Anthropic): voyage-3 + claude-3-haiku

Features

  • βœ… Multiple provider support (OpenAI, Mistral, Claude)
  • βœ… Dynamic provider switching via API
  • βœ… Configurable similarity thresholds
  • βœ… Gray zone verification control
  • βœ… OpenAI-compatible API
  • βœ… Docker support
  • βœ… Thread-safe operations
  • βœ… BadgerDB persistence
  • βœ… Prometheus metrics export
  • βœ… Health check endpoints
  • βœ… Cache management API
  • βœ… Structured JSON logging
  • βœ… LRU cache eviction
  • βœ… Request tracing (X-Request-ID)

Community

License

MIT License - see LICENSE for details.


Back to top

Copyright © 2025 PromptCache. Distributed under the MIT License.