Provider Guide
PromptCache supports multiple AI providers for embeddings and semantic verification.
Supported Providers
| Provider | Embedding Model | Verification Model | Cost (per 1M tokens) |
|---|---|---|---|
| OpenAI | text-embedding-3-small | gpt-4o-mini | $0.02 / $0.15 |
| Mistral AI | mistral-embed | mistral-small-latest | $0.10 / $0.20 |
| Claude | voyage-3 (Voyage AI) | claude-3-haiku | $0.10 / $0.25 |
OpenAI (Default)
Setup
export EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=your-openai-api-key
Models Used
- Embeddings:
text-embedding-3-small(1536 dimensions) - Verification:
gpt-4o-mini
Characteristics
Pros:
- Fast response times
- High-quality embeddings
- Most cost-effective
- Excellent documentation
Cons:
- Requires OpenAI account
- API rate limits on free tier
Best For
- General purpose use
- Cost-conscious deployments
- Production environments
- High-volume workloads
Mistral AI
Setup
export EMBEDDING_PROVIDER=mistral
export MISTRAL_API_KEY=your-mistral-api-key
Get your API key from: console.mistral.ai
Models Used
- Embeddings:
mistral-embed(1024 dimensions) - Verification:
mistral-small-latest
Characteristics
Pros:
- European data residency
- Strong multilingual support
- Privacy-focused
- Open source models available
Cons:
- Slightly higher cost than OpenAI
- Smaller ecosystem
- Newer provider
Best For
- European deployments (GDPR compliance)
- Multilingual applications
- Privacy-sensitive workloads
- Supporting open source AI
Claude (Anthropic)
Setup
export EMBEDDING_PROVIDER=claude
export ANTHROPIC_API_KEY=your-anthropic-api-key
export VOYAGE_API_KEY=your-voyage-api-key
Get your API keys:
- Anthropic: console.anthropic.com
- Voyage AI: dashboard.voyageai.com
Claude uses Voyage AI for embeddings as recommended by Anthropic. You need both API keys.
Models Used
- Embeddings:
voyage-3via Voyage AI (1024 dimensions) - Verification:
claude-3-haiku-20240307
Characteristics
Pros:
- Constitutional AI (safer outputs)
- Excellent reasoning capabilities
- Strong context understanding
- Longer context windows
Cons:
- Requires two API keys
- Highest cost
- More complex setup
Best For
- High-quality reasoning requirements
- Safety-critical applications
- Complex prompt understanding
- Long-context scenarios
Comparing Providers
Performance
Based on benchmarks with 1000 requests:
| Provider | Avg Latency (Cache Miss) | Avg Latency (Cache Hit) | Embedding Quality |
|---|---|---|---|
| OpenAI | ~1.2s | ~280ms | Excellent |
| Mistral | ~1.4s | ~290ms | Very Good |
| Claude | ~1.6s | ~300ms | Excellent |
Cost Comparison
For 1M cached requests (assuming 50% cache hit rate):
| Provider | Embedding Cost | Verification Cost | Total Cost |
|---|---|---|---|
| OpenAI | $10 | $7.50 | $17.50 |
| Mistral | $50 | $10 | $60 |
| Claude | $50 + $50 | $12.50 | $112.50 |
OpenAI is the most cost-effective for high-volume workloads
Switching Providers
At Startup
Set the EMBEDDING_PROVIDER environment variable before starting:
export EMBEDDING_PROVIDER=mistral
export MISTRAL_API_KEY=your-key
./prompt-cache
At Runtime (Dynamic Switching)
Use the provider management API:
curl -X POST http://localhost:8080/v1/config/provider \
-H "Content-Type: application/json" \
-d '{"provider": "mistral"}'
Response:
{
"message": "Provider updated successfully",
"provider": "mistral"
}
Check Current Provider
curl http://localhost:8080/v1/config/provider
Response:
{
"provider": "openai",
"available_providers": ["openai", "mistral", "claude"]
}
Multi-Provider Strategy
Failover Strategy
Start with OpenAI, fallback to Mistral:
import requests
def switch_provider(provider_name):
response = requests.post(
'http://localhost:8080/v1/config/provider',
json={'provider': provider_name}
)
return response.status_code == 200
# Monitor OpenAI health
if openai_api_down:
switch_provider('mistral')
A/B Testing
Test multiple providers simultaneously:
providers = ['openai', 'mistral', 'claude']
results = {}
for provider in providers:
switch_provider(provider)
start = time.time()
# Run test queries
results[provider] = time.time() - start
print(f"Fastest: {min(results, key=results.get)}")
Cost Optimization
Use cheaper providers during high load:
import psutil
cpu_usage = psutil.cpu_percent()
if cpu_usage > 80:
# Switch to most cost-effective provider
switch_provider('openai')
else:
# Use preferred provider
switch_provider('claude')
Provider-Specific Notes
OpenAI
- Rate Limits: 3,500 RPM (free tier), 10,000+ RPM (paid)
- Region: US-based
- Compliance: SOC 2, GDPR-compliant
Mistral AI
- Rate Limits: Varies by plan
- Region: Europe-based (Paris)
- Compliance: GDPR-native, ISO 27001
Claude
- Rate Limits: 4,000 RPM (default)
- Region: US-based
- Compliance: SOC 2, HIPAA-eligible
Troubleshooting
Provider Not Found
Error: unsupported provider: xxx
Solution: Use openai, mistral, or claude (case-insensitive)
API Key Missing
Error: OPENAI_API_KEY is not set
Solution: Export the required API key before starting
Rate Limit Exceeded
Solution:
- Upgrade your provider plan
- Switch to another provider
- Implement request throttling
Poor Cache Hit Rate
Solution:
- Try different providers - embedding quality varies
- Adjust similarity thresholds
- Enable gray zone verification
Best Practices
- Development: Use OpenAI (fastest, cheapest)
- Production: Use OpenAI or Mistral depending on requirements
- Compliance: Use Mistral for EU/GDPR, Claude for HIPAA
- Experimentation: Use dynamic switching to compare providers
- Failover: Have API keys for multiple providers ready
Future Providers
Planned support for:
- Local models (Ollama)
- Azure OpenAI
- Google PaLM
- Cohere
Want to add a provider? Open an issue or submit a PR!