You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
376 lines
12 KiB
376 lines
12 KiB
# Pinecone RAG Integration with Swarms
|
|
|
|
## Overview
|
|
|
|
Pinecone is a fully managed vector database service designed specifically for high-performance AI applications. It provides a serverless, auto-scaling platform for vector similarity search that's optimized for production workloads. Pinecone offers enterprise-grade features including global distribution, real-time updates, metadata filtering, and comprehensive monitoring, making it ideal for production RAG systems that require reliability and scale.
|
|
|
|
## Key Features
|
|
|
|
- **Serverless Architecture**: Automatic scaling with pay-per-use pricing
|
|
- **Real-time Updates**: Live index updates without rebuilding
|
|
- **Global Distribution**: Multi-region deployment with low latency
|
|
- **Advanced Filtering**: Rich metadata filtering with complex queries
|
|
- **High Availability**: 99.9% uptime SLA with built-in redundancy
|
|
- **Performance Optimization**: Sub-millisecond query response times
|
|
- **Enterprise Security**: SOC 2 compliance with end-to-end encryption
|
|
- **Monitoring & Analytics**: Built-in observability and performance insights
|
|
|
|
## Architecture
|
|
|
|
Pinecone integrates with Swarms agents as a cloud-native vector database service:
|
|
|
|
```
|
|
[Agent] -> [Pinecone Memory] -> [Serverless Vector DB] -> [Global Search] -> [Retrieved Context]
|
|
```
|
|
|
|
The system leverages Pinecone's distributed infrastructure to provide consistent, high-performance vector operations across global regions.
|
|
|
|
## Setup & Configuration
|
|
|
|
### Installation
|
|
|
|
```bash
|
|
pip install pinecone-client
|
|
pip install swarms
|
|
pip install litellm
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# Pinecone credentials
|
|
export PINECONE_API_KEY="your-pinecone-api-key"
|
|
export PINECONE_ENVIRONMENT="your-environment" # e.g., "us-east1-gcp"
|
|
|
|
# Optional: Index configuration
|
|
export PINECONE_INDEX_NAME="swarms-knowledge-base"
|
|
|
|
# OpenAI API key for LLM
|
|
export OPENAI_API_KEY="your-openai-api-key"
|
|
```
|
|
|
|
### Dependencies
|
|
|
|
- `pinecone-client>=2.2.0`
|
|
- `swarms`
|
|
- `litellm`
|
|
- `numpy`
|
|
|
|
## Code Example
|
|
|
|
```python
|
|
"""
|
|
Agent with Pinecone RAG (Retrieval-Augmented Generation)
|
|
|
|
This example demonstrates using Pinecone as a vector database for RAG operations,
|
|
allowing agents to store and retrieve documents for enhanced context.
|
|
"""
|
|
|
|
import os
|
|
import time
|
|
from swarms import Agent
|
|
from swarms_memory import PineconeMemory
|
|
|
|
# Initialize Pinecone wrapper for RAG operations
|
|
rag_db = PineconeMemory(
|
|
api_key=os.getenv("PINECONE_API_KEY", "your-pinecone-api-key"),
|
|
index_name="knowledge-base",
|
|
embedding_model="text-embedding-3-small",
|
|
namespace="examples"
|
|
)
|
|
|
|
# Add documents to the knowledge base
|
|
documents = [
|
|
"Pinecone is a vector database that makes it easy to add semantic search to applications.",
|
|
"RAG combines retrieval and generation for more accurate AI responses.",
|
|
"Vector embeddings enable semantic search across documents.",
|
|
"The swarms framework supports multiple memory backends including Pinecone.",
|
|
"Swarms is the first and most reliable multi-agent production-grade framework.",
|
|
"Kye Gomez is Founder and CEO of Swarms."
|
|
]
|
|
|
|
# Add documents individually
|
|
for doc in documents:
|
|
rag_db.add(doc)
|
|
|
|
# Wait for Pinecone's eventual consistency to ensure documents are indexed
|
|
print("Waiting for documents to be indexed...")
|
|
time.sleep(2)
|
|
|
|
# Create agent with RAG capabilities
|
|
agent = Agent(
|
|
agent_name="RAG-Agent",
|
|
agent_description="Swarms Agent with Pinecone-powered RAG for enhanced knowledge retrieval",
|
|
model_name="gpt-4o",
|
|
max_loops=1,
|
|
dynamic_temperature_enabled=True,
|
|
long_term_memory=rag_db
|
|
)
|
|
|
|
# Query with RAG
|
|
response = agent.run("What is Pinecone and how does it relate to RAG? Who is the founder of Swarms?")
|
|
print(response)
|
|
```
|
|
|
|
## Use Cases
|
|
|
|
### 1. Production AI Applications
|
|
- **Scenario**: Customer-facing AI products requiring 99.9% uptime
|
|
- **Benefits**: Serverless scaling, global distribution, enterprise SLA
|
|
- **Best For**: SaaS platforms, mobile apps, web services
|
|
|
|
### 2. Real-time Recommendation Systems
|
|
- **Scenario**: E-commerce, content, or product recommendations
|
|
- **Benefits**: Sub-millisecond queries, real-time updates, global edge
|
|
- **Best For**: E-commerce platforms, streaming services, social media
|
|
|
|
### 3. Enterprise Knowledge Management
|
|
- **Scenario**: Large-scale corporate knowledge bases with global teams
|
|
- **Benefits**: Multi-region deployment, advanced security, comprehensive monitoring
|
|
- **Best For**: Fortune 500 companies, consulting firms, research organizations
|
|
|
|
### 4. Multi-tenant AI Platforms
|
|
- **Scenario**: AI platform providers serving multiple customers
|
|
- **Benefits**: Namespace isolation, flexible scaling, usage-based pricing
|
|
- **Best For**: AI service providers, B2B platforms, managed AI solutions
|
|
|
|
## Performance Characteristics
|
|
|
|
### Scaling
|
|
- **Serverless**: Automatic scaling based on traffic patterns
|
|
- **Global**: Multi-region deployment for worldwide low latency
|
|
- **Elastic**: Pay-per-use pricing model with no minimum commitments
|
|
- **High Availability**: 99.9% uptime SLA with built-in redundancy
|
|
|
|
### Performance Metrics
|
|
- **Query Latency**: < 10ms median, < 100ms 99th percentile
|
|
- **Throughput**: 10,000+ QPS per replica
|
|
- **Global Latency**: < 50ms from major worldwide regions
|
|
- **Update Latency**: Real-time updates with immediate consistency
|
|
|
|
### Pod Types and Performance
|
|
|
|
| Pod Type | Use Case | Performance | Cost | Best For |
|
|
|----------|----------|-------------|------|----------|
|
|
| **p1.x1** | Development, small apps | Good | Low | Prototypes, testing |
|
|
| **p1.x2** | Medium applications | Better | Medium | Production apps |
|
|
| **p1.x4** | High-performance apps | Best | High | Enterprise, high-traffic |
|
|
| **p2.x1** | Cost-optimized large scale | Good | Medium | Large datasets, batch processing |
|
|
|
|
## Cloud Deployment
|
|
|
|
### Production Configuration
|
|
```python
|
|
# High-performance production setup
|
|
memory = PineconeMemory(
|
|
index_name="production-knowledge-base",
|
|
embedding_model="text-embedding-3-small",
|
|
pod_type="p1.x2", # Higher performance
|
|
replicas=2, # High availability
|
|
metric="cosine"
|
|
)
|
|
```
|
|
|
|
### Multi-region Setup
|
|
```python
|
|
# Configure for global deployment
|
|
import pinecone
|
|
|
|
# List available environments
|
|
environments = pinecone.list_environments()
|
|
print("Available regions:", environments)
|
|
|
|
# Choose optimal region based on user base
|
|
memory = PineconeMemory(
|
|
index_name="global-knowledge-base",
|
|
embedding_model="text-embedding-3-small",
|
|
pod_type="p1.x2"
|
|
# Environment set via PINECONE_ENVIRONMENT
|
|
)
|
|
```
|
|
|
|
### Cost Optimization
|
|
```python
|
|
# Cost-optimized configuration
|
|
memory = PineconeMemory(
|
|
index_name="cost-optimized-kb",
|
|
embedding_model="text-embedding-3-small",
|
|
pod_type="p2.x1", # Cost-optimized for large datasets
|
|
replicas=1, # Single replica for cost savings
|
|
shards=1 # Single shard for simplicity
|
|
)
|
|
```
|
|
|
|
## Advanced Features
|
|
|
|
### Namespace Management
|
|
```python
|
|
# Organize data with namespaces
|
|
medical_docs = ["Medical knowledge documents..."]
|
|
legal_docs = ["Legal knowledge documents..."]
|
|
|
|
# Add to different namespaces
|
|
memory.add_documents(medical_docs, namespace="medical")
|
|
memory.add_documents(legal_docs, namespace="legal")
|
|
|
|
# Query specific namespace
|
|
medical_results = memory.search("medical query", namespace="medical")
|
|
legal_results = memory.search("legal query", namespace="legal")
|
|
```
|
|
|
|
### Complex Filtering
|
|
```python
|
|
# Advanced metadata filtering
|
|
complex_filter = {
|
|
"$and": [
|
|
{"category": {"$in": ["ai", "ml"]}},
|
|
{"difficulty": {"$ne": "beginner"}},
|
|
{"$or": [
|
|
{"type": "concept"},
|
|
{"type": "implementation"}
|
|
]}
|
|
]
|
|
}
|
|
|
|
results = memory.search(
|
|
"advanced AI concepts",
|
|
filter_dict=complex_filter,
|
|
top_k=5
|
|
)
|
|
```
|
|
|
|
### Batch Operations
|
|
```python
|
|
# Efficient batch processing
|
|
large_dataset = load_large_document_collection() # Your data loading logic
|
|
|
|
# Process in batches
|
|
batch_size = 100
|
|
for i in range(0, len(large_dataset), batch_size):
|
|
batch = large_dataset[i:i + batch_size]
|
|
documents = [item['text'] for item in batch]
|
|
metadata = [item['metadata'] for item in batch]
|
|
|
|
memory.add_documents(
|
|
documents=documents,
|
|
metadata=metadata,
|
|
batch_size=batch_size
|
|
)
|
|
```
|
|
|
|
### Real-time Updates
|
|
```python
|
|
# Dynamic knowledge base updates
|
|
def update_knowledge_base(new_documents, updated_documents, deleted_ids):
|
|
"""Update knowledge base in real-time"""
|
|
# Add new documents
|
|
if new_documents:
|
|
memory.add_documents(new_documents)
|
|
|
|
# Update existing documents
|
|
for doc_id, content in updated_documents.items():
|
|
memory.update_document(doc_id, content)
|
|
|
|
# Remove outdated documents
|
|
if deleted_ids:
|
|
memory.delete_documents(ids=deleted_ids)
|
|
|
|
print("Knowledge base updated in real-time")
|
|
```
|
|
|
|
## Monitoring and Analytics
|
|
|
|
### Built-in Metrics
|
|
```python
|
|
# Monitor index performance
|
|
stats = memory.get_index_stats()
|
|
print(f"Total vectors: {stats['total_vector_count']}")
|
|
print(f"Index fullness: {stats['index_fullness']}")
|
|
|
|
# Namespace statistics
|
|
for namespace, ns_stats in stats.get('namespaces', {}).items():
|
|
print(f"Namespace '{namespace}': {ns_stats['vector_count']} vectors")
|
|
```
|
|
|
|
### Custom Monitoring
|
|
```python
|
|
import time
|
|
from datetime import datetime
|
|
|
|
class MonitoredPineconeMemory(PineconeMemory):
|
|
def __init__(self, *args, **kwargs):
|
|
super().__init__(*args, **kwargs)
|
|
self.query_metrics = []
|
|
|
|
def search(self, *args, **kwargs):
|
|
start_time = time.time()
|
|
results = super().search(*args, **kwargs)
|
|
duration = time.time() - start_time
|
|
|
|
# Log metrics
|
|
self.query_metrics.append({
|
|
'timestamp': datetime.now(),
|
|
'duration': duration,
|
|
'results_count': len(results['documents'])
|
|
})
|
|
|
|
return results
|
|
|
|
def get_performance_stats(self):
|
|
if not self.query_metrics:
|
|
return {}
|
|
|
|
durations = [m['duration'] for m in self.query_metrics]
|
|
return {
|
|
'avg_latency': sum(durations) / len(durations),
|
|
'min_latency': min(durations),
|
|
'max_latency': max(durations),
|
|
'total_queries': len(self.query_metrics)
|
|
}
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Index Design**: Choose appropriate pod type based on performance requirements
|
|
2. **Metadata Strategy**: Design rich metadata schema for effective filtering
|
|
3. **Namespace Organization**: Use namespaces for logical data separation
|
|
4. **Batch Processing**: Use batch operations for better throughput and cost efficiency
|
|
5. **Error Handling**: Implement robust error handling with exponential backoff
|
|
6. **Monitoring**: Set up comprehensive monitoring and alerting
|
|
7. **Cost Management**: Monitor usage and optimize pod configuration
|
|
8. **Security**: Use API key rotation and access controls
|
|
9. **Regional Selection**: Choose regions closest to your users
|
|
10. **Version Management**: Track schema changes and implement migration strategies
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **API Quota Exceeded**: Monitor API usage and implement rate limiting. Consider upgrading plan or optimizing query patterns. Use batch operations to reduce API calls.
|
|
|
|
2. **High Latency**: Check pod type and consider upgrading. Verify regional configuration. Optimize query complexity and top_k values.
|
|
|
|
3. **Index Capacity Issues**: Monitor index fullness metrics. Consider scaling up pod type or adding shards. Implement data archival strategies.
|
|
|
|
4. **Connection Errors**: Verify API key and environment configuration. Check network connectivity and firewall settings. Implement retry logic with exponential backoff.
|
|
|
|
### Performance Tuning
|
|
```python
|
|
# Optimize query performance
|
|
def optimized_search(memory, query, top_k=3):
|
|
"""Optimized search with caching and error handling"""
|
|
try:
|
|
results = memory.search(
|
|
query=query,
|
|
top_k=min(top_k, 10), # Limit top_k for performance
|
|
include_metadata=True,
|
|
include_values=False # Don't return vectors unless needed
|
|
)
|
|
return results
|
|
except Exception as e:
|
|
print(f"Search failed: {e}")
|
|
# Implement fallback strategy
|
|
return {"documents": [], "metadata": [], "scores": [], "ids": []}
|
|
```
|
|
|
|
This comprehensive guide provides everything needed to integrate Pinecone with Swarms agents for production-scale RAG applications using the unified LiteLLM embeddings approach. |