12 KiB

Raw Blame History

Pinecone RAG Integration with Swarms

Overview

Pinecone is a fully managed vector database service designed specifically for high-performance AI applications. It provides a serverless, auto-scaling platform for vector similarity search that's optimized for production workloads. Pinecone offers enterprise-grade features including global distribution, real-time updates, metadata filtering, and comprehensive monitoring, making it ideal for production RAG systems that require reliability and scale.

Key Features

Serverless Architecture: Automatic scaling with pay-per-use pricing
Real-time Updates: Live index updates without rebuilding
Global Distribution: Multi-region deployment with low latency
Advanced Filtering: Rich metadata filtering with complex queries
High Availability: 99.9% uptime SLA with built-in redundancy
Performance Optimization: Sub-millisecond query response times
Enterprise Security: SOC 2 compliance with end-to-end encryption
Monitoring & Analytics: Built-in observability and performance insights

Architecture

Pinecone integrates with Swarms agents as a cloud-native vector database service:

[Agent] -> [Pinecone Memory] -> [Serverless Vector DB] -> [Global Search] -> [Retrieved Context]

The system leverages Pinecone's distributed infrastructure to provide consistent, high-performance vector operations across global regions.

Setup & Configuration

Installation

pip install pinecone-client
pip install swarms
pip install litellm

Environment Variables

# Pinecone credentials
export PINECONE_API_KEY="your-pinecone-api-key"
export PINECONE_ENVIRONMENT="your-environment"  # e.g., "us-east1-gcp"

# Optional: Index configuration
export PINECONE_INDEX_NAME="swarms-knowledge-base"

# OpenAI API key for LLM
export OPENAI_API_KEY="your-openai-api-key"

Dependencies

pinecone-client>=2.2.0
swarms
litellm
numpy

Code Example

"""
Agent with Pinecone RAG (Retrieval-Augmented Generation)

This example demonstrates using Pinecone as a vector database for RAG operations,
allowing agents to store and retrieve documents for enhanced context.
"""

import os
import time
from swarms import Agent
from swarms_memory import PineconeMemory

# Initialize Pinecone wrapper for RAG operations
rag_db = PineconeMemory(
    api_key=os.getenv("PINECONE_API_KEY", "your-pinecone-api-key"),
    index_name="knowledge-base",
    embedding_model="text-embedding-3-small",
    namespace="examples"
)

# Add documents to the knowledge base
documents = [
    "Pinecone is a vector database that makes it easy to add semantic search to applications.",
    "RAG combines retrieval and generation for more accurate AI responses.",
    "Vector embeddings enable semantic search across documents.",
    "The swarms framework supports multiple memory backends including Pinecone.",
    "Swarms is the first and most reliable multi-agent production-grade framework.",
    "Kye Gomez is Founder and CEO of Swarms."
]

# Add documents individually
for doc in documents:
    rag_db.add(doc)

# Wait for Pinecone's eventual consistency to ensure documents are indexed
print("Waiting for documents to be indexed...")
time.sleep(2)

# Create agent with RAG capabilities
agent = Agent(
    agent_name="RAG-Agent",
    agent_description="Swarms Agent with Pinecone-powered RAG for enhanced knowledge retrieval",
    model_name="gpt-4o",
    max_loops=1,
    dynamic_temperature_enabled=True,
    long_term_memory=rag_db
)

# Query with RAG
response = agent.run("What is Pinecone and how does it relate to RAG? Who is the founder of Swarms?")
print(response)

Use Cases

1. Production AI Applications

Scenario: Customer-facing AI products requiring 99.9% uptime
Benefits: Serverless scaling, global distribution, enterprise SLA
Best For: SaaS platforms, mobile apps, web services

2. Real-time Recommendation Systems

Scenario: E-commerce, content, or product recommendations
Benefits: Sub-millisecond queries, real-time updates, global edge
Best For: E-commerce platforms, streaming services, social media

3. Enterprise Knowledge Management

Scenario: Large-scale corporate knowledge bases with global teams
Benefits: Multi-region deployment, advanced security, comprehensive monitoring
Best For: Fortune 500 companies, consulting firms, research organizations

4. Multi-tenant AI Platforms

Scenario: AI platform providers serving multiple customers
Benefits: Namespace isolation, flexible scaling, usage-based pricing
Best For: AI service providers, B2B platforms, managed AI solutions

Performance Characteristics

Scaling

Serverless: Automatic scaling based on traffic patterns
Global: Multi-region deployment for worldwide low latency
Elastic: Pay-per-use pricing model with no minimum commitments
High Availability: 99.9% uptime SLA with built-in redundancy

Performance Metrics

Query Latency: < 10ms median, < 100ms 99th percentile
Throughput: 10,000+ QPS per replica
Global Latency: < 50ms from major worldwide regions
Update Latency: Real-time updates with immediate consistency

Pod Types and Performance

Pod Type	Use Case	Performance	Cost	Best For
p1.x1	Development, small apps	Good	Low	Prototypes, testing
p1.x2	Medium applications	Better	Medium	Production apps
p1.x4	High-performance apps	Best	High	Enterprise, high-traffic
p2.x1	Cost-optimized large scale	Good	Medium	Large datasets, batch processing

Cloud Deployment

Production Configuration

# High-performance production setup
memory = PineconeMemory(
    index_name="production-knowledge-base",
    embedding_model="text-embedding-3-small",
    pod_type="p1.x2",  # Higher performance
    replicas=2,         # High availability
    metric="cosine"
)

Multi-region Setup

# Configure for global deployment
import pinecone

# List available environments
environments = pinecone.list_environments()
print("Available regions:", environments)

# Choose optimal region based on user base
memory = PineconeMemory(
    index_name="global-knowledge-base",
    embedding_model="text-embedding-3-small",
    pod_type="p1.x2"
    # Environment set via PINECONE_ENVIRONMENT
)

Cost Optimization

# Cost-optimized configuration
memory = PineconeMemory(
    index_name="cost-optimized-kb",
    embedding_model="text-embedding-3-small",
    pod_type="p2.x1",  # Cost-optimized for large datasets
    replicas=1,        # Single replica for cost savings
    shards=1          # Single shard for simplicity
)

Advanced Features

Namespace Management

# Organize data with namespaces
medical_docs = ["Medical knowledge documents..."]
legal_docs = ["Legal knowledge documents..."]

# Add to different namespaces
memory.add_documents(medical_docs, namespace="medical")
memory.add_documents(legal_docs, namespace="legal")

# Query specific namespace
medical_results = memory.search("medical query", namespace="medical")
legal_results = memory.search("legal query", namespace="legal")

Complex Filtering

# Advanced metadata filtering
complex_filter = {
    "$and": [
        {"category": {"$in": ["ai", "ml"]}},
        {"difficulty": {"$ne": "beginner"}},
        {"$or": [
            {"type": "concept"},
            {"type": "implementation"}
        ]}
    ]
}

results = memory.search(
    "advanced AI concepts",
    filter_dict=complex_filter,
    top_k=5
)

Batch Operations

# Efficient batch processing
large_dataset = load_large_document_collection()  # Your data loading logic

# Process in batches
batch_size = 100
for i in range(0, len(large_dataset), batch_size):
    batch = large_dataset[i:i + batch_size]
    documents = [item['text'] for item in batch]
    metadata = [item['metadata'] for item in batch]
    
    memory.add_documents(
        documents=documents,
        metadata=metadata,
        batch_size=batch_size
    )

Real-time Updates

# Dynamic knowledge base updates
def update_knowledge_base(new_documents, updated_documents, deleted_ids):
    """Update knowledge base in real-time"""
    # Add new documents
    if new_documents:
        memory.add_documents(new_documents)
    
    # Update existing documents
    for doc_id, content in updated_documents.items():
        memory.update_document(doc_id, content)
    
    # Remove outdated documents
    if deleted_ids:
        memory.delete_documents(ids=deleted_ids)
    
    print("Knowledge base updated in real-time")

Monitoring and Analytics

Built-in Metrics

# Monitor index performance
stats = memory.get_index_stats()
print(f"Total vectors: {stats['total_vector_count']}")
print(f"Index fullness: {stats['index_fullness']}")

# Namespace statistics
for namespace, ns_stats in stats.get('namespaces', {}).items():
    print(f"Namespace '{namespace}': {ns_stats['vector_count']} vectors")

Custom Monitoring

import time
from datetime import datetime

class MonitoredPineconeMemory(PineconeMemory):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.query_metrics = []
    
    def search(self, *args, **kwargs):
        start_time = time.time()
        results = super().search(*args, **kwargs)
        duration = time.time() - start_time
        
        # Log metrics
        self.query_metrics.append({
            'timestamp': datetime.now(),
            'duration': duration,
            'results_count': len(results['documents'])
        })
        
        return results
    
    def get_performance_stats(self):
        if not self.query_metrics:
            return {}
        
        durations = [m['duration'] for m in self.query_metrics]
        return {
            'avg_latency': sum(durations) / len(durations),
            'min_latency': min(durations),
            'max_latency': max(durations),
            'total_queries': len(self.query_metrics)
        }

Best Practices

Index Design: Choose appropriate pod type based on performance requirements
Metadata Strategy: Design rich metadata schema for effective filtering
Namespace Organization: Use namespaces for logical data separation
Batch Processing: Use batch operations for better throughput and cost efficiency
Error Handling: Implement robust error handling with exponential backoff
Monitoring: Set up comprehensive monitoring and alerting
Cost Management: Monitor usage and optimize pod configuration
Security: Use API key rotation and access controls
Regional Selection: Choose regions closest to your users
Version Management: Track schema changes and implement migration strategies

Troubleshooting

Common Issues

API Quota Exceeded: Monitor API usage and implement rate limiting. Consider upgrading plan or optimizing query patterns. Use batch operations to reduce API calls.
High Latency: Check pod type and consider upgrading. Verify regional configuration. Optimize query complexity and top_k values.
Index Capacity Issues: Monitor index fullness metrics. Consider scaling up pod type or adding shards. Implement data archival strategies.
Connection Errors: Verify API key and environment configuration. Check network connectivity and firewall settings. Implement retry logic with exponential backoff.

Performance Tuning

# Optimize query performance
def optimized_search(memory, query, top_k=3):
    """Optimized search with caching and error handling"""
    try:
        results = memory.search(
            query=query,
            top_k=min(top_k, 10),  # Limit top_k for performance
            include_metadata=True,
            include_values=False   # Don't return vectors unless needed
        )
        return results
    except Exception as e:
        print(f"Search failed: {e}")
        # Implement fallback strategy
        return {"documents": [], "metadata": [], "scores": [], "ids": []}

This comprehensive guide provides everything needed to integrate Pinecone with Swarms agents for production-scale RAG applications using the unified LiteLLM embeddings approach.

12 KiB Raw Blame History