Configuration Guide

Configure PromptCache behavior through environment variables.

Quick Reference

Variable Description Default
PORT Server port 8080
STORAGE_PATH BadgerDB data directory ./badger_data
CACHE_TTL_HOURS Cache entry TTL 24
CACHE_MAX_ENTRIES Maximum cache entries 100000
EMBEDDING_PROVIDER AI provider openai
CACHE_HIGH_THRESHOLD Direct cache hit threshold 0.70
CACHE_LOW_THRESHOLD Clear miss threshold 0.30
ENABLE_GRAY_ZONE_VERIFIER LLM verification toggle true
LOG_LEVEL Logging level info
REQUEST_MAX_BYTES Max request body size 1048576
HTTP_TIMEOUT_SECONDS HTTP client timeout 30
HTTP_MAX_RETRIES Retry attempts 3
HTTP_RETRY_BASE_WAIT_MS Base retry wait 500

Server Settings

Port

export PORT=8080  # Default: 8080

Storage Path

export STORAGE_PATH=./badger_data  # Default: ./badger_data

Request Size Limit

Maximum request body size in bytes.

export REQUEST_MAX_BYTES=1048576  # Default: 1MB

Log Level

export LOG_LEVEL=info  # Options: debug, info, warn, error

Cache Settings

TTL (Time-To-Live)

Cache entry lifetime in hours.

export CACHE_TTL_HOURS=24  # Default: 24 hours

Maximum Entries

Maximum number of cached entries. When exceeded, LRU eviction removes oldest entries.

export CACHE_MAX_ENTRIES=100000  # Default: 100000

HTTP Client Settings

Timeout

HTTP client timeout for API calls.

export HTTP_TIMEOUT_SECONDS=30  # Default: 30 seconds

Retry Configuration

export HTTP_MAX_RETRIES=3           # Default: 3 retries
export HTTP_RETRY_BASE_WAIT_MS=500  # Default: 500ms base wait

Retries use exponential backoff with jitter.


Provider Selection

Choose your embedding provider:

export EMBEDDING_PROVIDER=openai  # Options: openai, mistral, claude

Default: openai

Supported Providers:

  • openai - OpenAI embeddings and verification
  • mistral - Mistral AI embeddings and verification
  • claude - Voyage AI embeddings + Anthropic verification

Similarity Thresholds

Control when prompts are considered similar enough to return cached results.

High Threshold

Minimum similarity score for direct cache hits.

export CACHE_HIGH_THRESHOLD=0.70  # Range: 0.0 to 1.0

Default: 0.70 (70% similarity)

Recommendations:

  • 0.85-0.95: Very strict matching, fewer cache hits but higher accuracy
  • 0.70-0.85: Balanced approach (recommended)
  • 0.50-0.70: Aggressive caching, more hits but potential for false positives

Low Threshold

Maximum similarity score for clear misses (skip cache entirely).

export CACHE_LOW_THRESHOLD=0.30  # Range: 0.0 to 1.0

Default: 0.30 (30% similarity)

Recommendations:

  • 0.40-0.60: Narrow gray zone, more verification calls
  • 0.25-0.40: Balanced approach (recommended)
  • 0.10-0.25: Wide gray zone, fewer clear misses

Always ensure CACHE_HIGH_THRESHOLD > CACHE_LOW_THRESHOLD


Gray Zone Verification

Enable or disable LLM-based verification for prompts in the gray zone (between low and high thresholds).

export ENABLE_GRAY_ZONE_VERIFIER=true  # Options: true, false, 1, 0, yes, no

Default: true (enabled)

When to Enable

  • Production environments requiring high accuracy
  • Varied prompt patterns
  • Critical applications where wrong answers are costly
  • When you can afford the extra verification API calls

When to Disable

  • Cost optimization (skip verification API calls)
  • Speed priority (accept slightly lower accuracy)
  • Highly standardized prompts
  • Development/testing environments

Cost Impact:

  • Enabled: Extra API call for each gray zone match (~$0.0001 per call with gpt-4o-mini)
  • Disabled: No verification cost, but potential for incorrect cache hits

Model Overrides

Override default models for each provider:

OpenAI

export OPENAI_EMBED_MODEL=text-embedding-3-small   # Default
export OPENAI_VERIFY_MODEL=gpt-4o-mini             # Default

Mistral

export MISTRAL_EMBED_MODEL=mistral-embed           # Default
export MISTRAL_VERIFY_MODEL=mistral-small-latest   # Default

Claude / Voyage

export VOYAGE_EMBED_MODEL=voyage-3                        # Default
export CLAUDE_VERIFY_MODEL=claude-3-haiku-20240307        # Default

Provider API Keys

OpenAI

export OPENAI_API_KEY=your-openai-api-key

Required when EMBEDDING_PROVIDER=openai (default).

Mistral AI

export MISTRAL_API_KEY=your-mistral-api-key

Required when EMBEDDING_PROVIDER=mistral.

Claude (Anthropic)

export ANTHROPIC_API_KEY=your-anthropic-api-key
export VOYAGE_API_KEY=your-voyage-api-key

Both keys required when EMBEDDING_PROVIDER=claude.

Claude uses Voyage AI for embeddings. You need both API keys.


Example Configurations

Strict Accuracy (Production)

export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=your-key
export CACHE_HIGH_THRESHOLD=0.90
export CACHE_LOW_THRESHOLD=0.35
export ENABLE_GRAY_ZONE_VERIFIER=true

Profile: High accuracy, moderate cache hit rate

export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=your-key
export CACHE_HIGH_THRESHOLD=0.70
export CACHE_LOW_THRESHOLD=0.30
export ENABLE_GRAY_ZONE_VERIFIER=true

Profile: Good balance of accuracy and performance

Aggressive Caching (Cost Optimization)

export EMBEDDING_PROVIDER=mistral
export MISTRAL_API_KEY=your-key
export CACHE_HIGH_THRESHOLD=0.60
export CACHE_LOW_THRESHOLD=0.25
export ENABLE_GRAY_ZONE_VERIFIER=false

Profile: Maximum cache hits, lower accuracy, minimum cost

Development

export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=your-key
export CACHE_HIGH_THRESHOLD=0.70
export CACHE_LOW_THRESHOLD=0.30
export ENABLE_GRAY_ZONE_VERIFIER=false

Profile: Fast responses, no verification overhead


Docker Compose Configuration

Edit docker-compose.yml:

services:
  prompt-cache:
    environment:
      - EMBEDDING_PROVIDER=${EMBEDDING_PROVIDER:-openai}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - MISTRAL_API_KEY=${MISTRAL_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - VOYAGE_API_KEY=${VOYAGE_API_KEY}
      - CACHE_HIGH_THRESHOLD=${CACHE_HIGH_THRESHOLD:-0.70}
      - CACHE_LOW_THRESHOLD=${CACHE_LOW_THRESHOLD:-0.30}
      - ENABLE_GRAY_ZONE_VERIFIER=${ENABLE_GRAY_ZONE_VERIFIER:-true}

Configuration Validation

PromptCache validates configuration on startup:

Cache Configuration: HighThreshold=0.70, LowThreshold=0.30, GrayZoneVerifier=true

Invalid configurations are automatically corrected:

  • If HIGH_THRESHOLD <= LOW_THRESHOLD, both reset to defaults (0.70/0.30)
  • Invalid threshold values (< 0 or > 1) are ignored
  • Invalid provider names return an error

Dynamic Configuration

Some settings can be changed at runtime:

Provider Switching

curl -X POST http://localhost:8080/v1/config/provider \
  -H "Content-Type: application/json" \
  -d '{"provider": "mistral"}'

Threshold Updates

Threshold updates require a service restart. Dynamic threshold updates via API are planned for v0.3.0.


Performance Tuning

For Maximum Cache Hits

CACHE_HIGH_THRESHOLD=0.60
CACHE_LOW_THRESHOLD=0.20
ENABLE_GRAY_ZONE_VERIFIER=true

For Minimum Latency

CACHE_HIGH_THRESHOLD=0.70
CACHE_LOW_THRESHOLD=0.30
ENABLE_GRAY_ZONE_VERIFIER=false

For Maximum Accuracy

CACHE_HIGH_THRESHOLD=0.95
CACHE_LOW_THRESHOLD=0.40
ENABLE_GRAY_ZONE_VERIFIER=true

Troubleshooting

Cache hit rate too low

  • Lower CACHE_HIGH_THRESHOLD (e.g., 0.60)
  • Widen gray zone by adjusting thresholds
  • Enable gray zone verifier

Too many false positives

  • Raise CACHE_HIGH_THRESHOLD (e.g., 0.85)
  • Enable gray zone verifier
  • Narrow gray zone

High API costs

  • Disable gray zone verifier
  • Use cheaper provider (Mistral)
  • Raise CACHE_LOW_THRESHOLD to reduce gray zone

Slow responses

  • Disable gray zone verifier
  • Use faster provider
  • Ensure adequate hardware resources

Back to top

Copyright © 2025 PromptCache. Distributed under the MIT License.