API Reference
Complete reference for PromptCache REST API endpoints.
Base URL
http://localhost:8080
Health Checks
Kubernetes-ready health check endpoints.
GET /health
General health status.
Response (200 OK)
{
"status": "healthy",
"time": "2026-01-19T12:00:00Z"
}
GET /health/ready
Readiness probe - verifies storage is accessible.
Response (200 OK)
{
"status": "ready"
}
Response (503 Service Unavailable)
{
"status": "not ready",
"error": "storage not accessible"
}
GET /health/live
Liveness probe - simple alive check.
Response (200 OK)
{
"status": "alive"
}
Metrics & Statistics
Endpoints for monitoring and observability.
GET /metrics
Prometheus-compatible metrics export.
Response (200 OK)
# HELP promptcache_cache_hits_total Total number of cache hits
# TYPE promptcache_cache_hits_total counter
promptcache_cache_hits_total 1234
# HELP promptcache_cache_misses_total Total number of cache misses
# TYPE promptcache_cache_misses_total counter
promptcache_cache_misses_total 567
# HELP promptcache_requests_total Total number of requests
# TYPE promptcache_requests_total counter
promptcache_requests_total 1801
# HELP promptcache_request_latency_seconds Request latency histogram
# TYPE promptcache_request_latency_seconds histogram
promptcache_request_latency_seconds_sum 45.2
promptcache_request_latency_seconds_count 1801
Example - cURL
curl http://localhost:8080/metrics
GET /v1/stats
JSON statistics for dashboards.
Response (200 OK)
{
"cache_hits": 1234,
"cache_misses": 567,
"cache_hit_rate": 0.685,
"gray_zone_checks": 89,
"total_requests": 1801,
"failed_requests": 2,
"avg_latency_ms": 25.1,
"stored_vectors": 892,
"provider_calls": 567,
"provider_errors": 1
}
Example - cURL
curl http://localhost:8080/v1/stats
Cache Management
Endpoints for managing cached entries.
GET /v1/cache/stats
Get cache statistics.
Response (200 OK)
{
"entry_count": 892,
"max_entries": 100000,
"ttl_hours": 24
}
DELETE /v1/cache
Clear the entire cache.
Response (200 OK)
{
"message": "Cache cleared successfully",
"deleted_count": 892
}
Example - cURL
curl -X DELETE http://localhost:8080/v1/cache
DELETE /v1/cache/:key
Delete a specific cache entry.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| key | string | Yes | The cache key to delete (URL path parameter) |
Response (200 OK)
{
"message": "Entry deleted successfully",
"key": "abc123..."
}
Response (404 Not Found)
{
"error": "Entry not found"
}
Chat Completions
OpenAI-compatible endpoint for chat completions with semantic caching.
POST /v1/chat/completions
Create a chat completion with automatic caching.
Request Headers
Content-Type: application/json
Request Body
{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "What is quantum computing?"
}
]
}
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model name (passed to provider) |
| messages | array | Yes | Array of message objects |
| messages[].role | string | Yes | Message role (system, user, assistant) |
| messages[].content | string | Yes | Message content |
Response (200 OK)
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1703721600,
"model": "gpt-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing is..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 50,
"total_tokens": 60
}
}
Cache Behavior
- Cache Hit: Returns cached response immediately (~300ms)
- Cache Miss: Forwards to provider, caches response, returns result (~1.5s)
- Semantic Match: Uses embeddings to detect similar prompts
Example - Python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Explain AI"}]
)
Example - cURL
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Explain AI"}]
}'
Example - JavaScript
const response = await fetch('http://localhost:8080/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Explain AI' }]
})
});
Provider Management
Endpoints for managing embedding providers at runtime.
GET /v1/config/provider
Get the current provider and available options.
Response (200 OK)
{
"provider": "openai",
"available_providers": ["openai", "mistral", "claude"]
}
Example - cURL
curl http://localhost:8080/v1/config/provider
Example - Python
import requests
response = requests.get('http://localhost:8080/v1/config/provider')
print(response.json())
POST /v1/config/provider
Switch the embedding provider at runtime.
Request Headers
Content-Type: application/json
Request Body
{
"provider": "mistral"
}
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| provider | string | Yes | Provider name (openai, mistral, claude) |
Response (200 OK)
{
"message": "Provider updated successfully",
"provider": "mistral"
}
Response (400 Bad Request)
{
"error": "unsupported provider: invalid (supported: openai, mistral, claude)"
}
Example - cURL
curl -X POST http://localhost:8080/v1/config/provider \
-H "Content-Type: application/json" \
-d '{"provider": "mistral"}'
Example - Python
import requests
response = requests.post(
'http://localhost:8080/v1/config/provider',
json={'provider': 'mistral'}
)
print(response.json())
Example - JavaScript
const response = await fetch('http://localhost:8080/v1/config/provider', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ provider: 'mistral' })
});
Use Cases
- A/B testing different providers
- Failover during provider outages
- Cost optimization based on load
- Performance testing
Error Responses
All endpoints may return these error responses:
400 Bad Request
{
"error": "Invalid JSON"
}
500 Internal Server Error
{
"error": "Failed to call OpenAI: connection timeout"
}
Rate Limiting
PromptCache does not implement rate limiting. Rate limits are inherited from your provider’s API.
Authentication
PromptCache uses your provider’s API key. Configure it via environment variables:
export OPENAI_API_KEY=your-key # For OpenAI
export MISTRAL_API_KEY=your-key # For Mistral
export ANTHROPIC_API_KEY=your-key # For Claude
export VOYAGE_API_KEY=your-key # For Claude embeddings
SDK Support
PromptCache is compatible with any OpenAI SDK:
- Python:
openaipackage - Node.js:
openaipackage - Go:
go-openaipackage - Ruby:
ruby-openaigem - Java: OpenAI Java client
Just change the base_url to point to PromptCache.