Merge c08d329fb8
into a762c79d6e
commit
1dddc49513
@ -0,0 +1,243 @@
|
||||
# ChromaDB RAG Integration with Swarms
|
||||
|
||||
## Overview
|
||||
|
||||
ChromaDB is an open-source embedding database designed to make it easy to build AI applications with embeddings. It provides a simple, fast, and scalable solution for storing and retrieving vector embeddings. ChromaDB is particularly well-suited for RAG (Retrieval-Augmented Generation) applications where you need to store document embeddings and perform similarity searches to enhance AI agent responses with relevant context.
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Simple API**: Easy-to-use Python API for storing and querying embeddings
|
||||
- **Multiple Storage Backends**: Supports in-memory, persistent local storage, and client-server modes
|
||||
- **Metadata Filtering**: Advanced filtering capabilities with metadata
|
||||
- **Multiple Distance Metrics**: Cosine, L2, and IP distance functions
|
||||
- **Built-in Embedding Functions**: Support for various embedding models
|
||||
- **Collection Management**: Organize embeddings into logical collections
|
||||
- **Auto-embedding**: Automatic text embedding generation
|
||||
|
||||
## Architecture
|
||||
|
||||
ChromaDB integrates with Swarms agents by serving as the long-term memory backend. The architecture follows this pattern:
|
||||
|
||||
```
|
||||
[Agent] -> [ChromaDB Memory] -> [Vector Store] -> [Similarity Search] -> [Retrieved Context]
|
||||
```
|
||||
|
||||
The agent queries ChromaDB when it needs relevant context, and ChromaDB returns the most similar documents based on vector similarity.
|
||||
|
||||
## Setup & Configuration
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install chromadb
|
||||
pip install swarms
|
||||
pip install litellm
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Optional: For remote ChromaDB server
|
||||
export CHROMA_HOST="localhost"
|
||||
export CHROMA_PORT="8000"
|
||||
|
||||
# OpenAI API key for LLM (if using OpenAI models)
|
||||
export OPENAI_API_KEY="your-openai-api-key"
|
||||
```
|
||||
|
||||
### Dependencies
|
||||
|
||||
- `chromadb>=0.4.0`
|
||||
- `swarms`
|
||||
- `litellm`
|
||||
- `numpy`
|
||||
|
||||
## Code Example
|
||||
|
||||
```python
|
||||
"""
|
||||
Agent with ChromaDB RAG (Retrieval-Augmented Generation)
|
||||
|
||||
This example demonstrates using ChromaDB as a vector database for RAG operations,
|
||||
allowing agents to store and retrieve documents for enhanced context.
|
||||
"""
|
||||
|
||||
from swarms import Agent
|
||||
from swarms_memory import ChromaDB
|
||||
|
||||
|
||||
# Initialize ChromaDB wrapper for RAG operations
|
||||
rag_db = ChromaDB(
|
||||
metric="cosine", # Distance metric for similarity search
|
||||
output_dir="knowledge_base_new", # Collection name
|
||||
limit_tokens=1000, # Token limit for queries
|
||||
n_results=3, # Number of results to retrieve
|
||||
verbose=False
|
||||
)
|
||||
|
||||
# Add documents to the knowledge base
|
||||
documents = [
|
||||
"ChromaDB is an open-source embedding database designed to store and query vector embeddings efficiently.",
|
||||
"ChromaDB provides a simple Python API for adding, querying, and managing vector embeddings with metadata.",
|
||||
"ChromaDB supports multiple embedding functions including OpenAI, Sentence Transformers, and custom models.",
|
||||
"ChromaDB can run locally or in distributed mode, making it suitable for both development and production.",
|
||||
"ChromaDB offers filtering capabilities allowing queries based on both vector similarity and metadata conditions.",
|
||||
"ChromaDB provides persistent storage and can handle large-scale embedding collections with fast retrieval.",
|
||||
"Kye Gomez is the founder of Swarms."
|
||||
]
|
||||
|
||||
# Method 1: Add documents individually
|
||||
for doc in documents:
|
||||
rag_db.add(doc)
|
||||
|
||||
# Create agent with RAG capabilities
|
||||
agent = Agent(
|
||||
agent_name="RAG-Agent",
|
||||
agent_description="Swarms Agent with ChromaDB-powered RAG for enhanced knowledge retrieval",
|
||||
model_name="gpt-4o",
|
||||
max_loops=1,
|
||||
dynamic_temperature_enabled=True,
|
||||
long_term_memory=rag_db
|
||||
)
|
||||
|
||||
# Query with RAG
|
||||
response = agent.run("What is ChromaDB and who is founder of swarms ?")
|
||||
print(response)
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
### 1. Knowledge Base RAG
|
||||
- **Scenario**: Building a knowledge base for customer support
|
||||
- **Benefits**: Fast semantic search, automatic embedding generation
|
||||
- **Best For**: Small to medium-sized document collections
|
||||
|
||||
### 2. Development Documentation
|
||||
- **Scenario**: Creating searchable documentation for development teams
|
||||
- **Benefits**: Easy setup, local persistence, version control friendly
|
||||
- **Best For**: Technical documentation, API references
|
||||
|
||||
### 3. Content Recommendations
|
||||
- **Scenario**: Recommending relevant content based on user queries
|
||||
- **Benefits**: Metadata filtering, multiple collections support
|
||||
- **Best For**: Content management systems, educational platforms
|
||||
|
||||
### 4. Research Assistant
|
||||
- **Scenario**: Building AI research assistants with paper databases
|
||||
- **Benefits**: Complex metadata queries, collection organization
|
||||
- **Best For**: Academic research, scientific literature review
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Scaling
|
||||
- **Small Scale** (< 1M vectors): Excellent performance with in-memory storage
|
||||
- **Medium Scale** (1M - 10M vectors): Good performance with persistent storage
|
||||
- **Large Scale** (> 10M vectors): Consider distributed deployment or sharding
|
||||
|
||||
### Speed
|
||||
- **Query Latency**: < 100ms for most queries
|
||||
- **Insertion Speed**: ~1000 documents/second
|
||||
- **Memory Usage**: Efficient with configurable caching
|
||||
|
||||
### Optimization Tips
|
||||
1. **Batch Operations**: Use batch insert for better performance
|
||||
2. **Metadata Indexing**: Design metadata schema for efficient filtering
|
||||
3. **Collection Partitioning**: Use multiple collections for better organization
|
||||
4. **Embedding Caching**: Cache embeddings for frequently accessed documents
|
||||
|
||||
## Cloud vs Local Deployment
|
||||
|
||||
### Local Deployment
|
||||
```python
|
||||
# In-memory (fastest, no persistence)
|
||||
client = chromadb.Client()
|
||||
|
||||
# Persistent local (recommended for development)
|
||||
client = chromadb.PersistentClient(path="./chroma_db")
|
||||
```
|
||||
|
||||
**Advantages:**
|
||||
- Fast development iteration
|
||||
- No network latency
|
||||
- Full control over data
|
||||
- Cost-effective for small applications
|
||||
|
||||
**Disadvantages:**
|
||||
- Limited scalability
|
||||
- Single point of failure
|
||||
- Manual backup required
|
||||
|
||||
### Cloud/Server Deployment
|
||||
```python
|
||||
# Remote ChromaDB server
|
||||
client = chromadb.HttpClient(host="your-server.com", port=8000)
|
||||
```
|
||||
|
||||
**Advantages:**
|
||||
- Scalable architecture
|
||||
- Centralized data management
|
||||
- Professional backup solutions
|
||||
- Multi-user access
|
||||
|
||||
**Disadvantages:**
|
||||
- Network latency
|
||||
- Additional infrastructure costs
|
||||
- More complex deployment
|
||||
|
||||
## Configuration Options
|
||||
|
||||
### Distance Metrics
|
||||
- **Cosine**: Best for normalized embeddings (default)
|
||||
- **L2**: Euclidean distance for absolute similarity
|
||||
- **IP**: Inner product for specific use cases
|
||||
|
||||
### Collection Settings
|
||||
```python
|
||||
collection = client.create_collection(
|
||||
name="my_collection",
|
||||
metadata={
|
||||
"hnsw:space": "cosine", # Distance metric
|
||||
"hnsw:M": 16, # HNSW graph connectivity
|
||||
"hnsw:ef_construction": 200, # Build-time accuracy
|
||||
"hnsw:ef": 100 # Query-time accuracy
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Memory Management
|
||||
```python
|
||||
# Configure client with memory limits
|
||||
client = chromadb.PersistentClient(
|
||||
path="./chroma_db",
|
||||
settings={
|
||||
"anonymized_telemetry": False,
|
||||
"allow_reset": True,
|
||||
"persist_directory": "./chroma_storage"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Collection Naming**: Use descriptive, consistent naming conventions
|
||||
2. **Metadata Design**: Plan metadata schema for efficient filtering
|
||||
3. **Batch Processing**: Use batch operations for better performance
|
||||
4. **Error Handling**: Implement proper error handling and retry logic
|
||||
5. **Monitoring**: Monitor collection sizes and query performance
|
||||
6. **Backup Strategy**: Regular backups for persistent storage
|
||||
7. **Version Management**: Track schema changes and migrations
|
||||
8. **Security**: Implement proper authentication for production deployments
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Connection Errors**: Check ChromaDB server status. Verify network connectivity. Confirm correct host and port settings.
|
||||
|
||||
2. **Performance Issues**: Monitor collection size and query complexity. Consider collection partitioning. Optimize metadata queries.
|
||||
|
||||
3. **Memory Issues**: Adjust HNSW parameters. Use persistent storage instead of in-memory. Implement proper cleanup procedures.
|
||||
|
||||
4. **Embedding Errors**: Verify LiteLLM configuration. Check API keys and quotas. Handle rate limiting properly.
|
||||
|
||||
This comprehensive guide provides everything needed to integrate ChromaDB with Swarms agents for powerful RAG applications using the unified LiteLLM embeddings approach.
|
@ -0,0 +1,267 @@
|
||||
# FAISS RAG Integration with Swarms
|
||||
|
||||
## Overview
|
||||
|
||||
FAISS (Facebook AI Similarity Search) is a library developed by Meta for efficient similarity search and clustering of dense vectors. It provides highly optimized algorithms for large-scale vector operations and is particularly well-suited for production RAG applications requiring high performance and scalability. FAISS excels at handling millions to billions of vectors with sub-linear search times.
|
||||
|
||||
## Key Features
|
||||
|
||||
- **High Performance**: Optimized C++ implementation with Python bindings
|
||||
- **Multiple Index Types**: Support for various indexing algorithms (Flat, IVF, HNSW, PQ)
|
||||
- **GPU Acceleration**: Optional GPU support for extreme performance
|
||||
- **Memory Efficiency**: Compressed indexing options for large datasets
|
||||
- **Exact and Approximate Search**: Configurable trade-offs between speed and accuracy
|
||||
- **Batch Operations**: Efficient batch search and update operations
|
||||
- **Clustering Support**: Built-in K-means clustering algorithms
|
||||
|
||||
## Architecture
|
||||
|
||||
FAISS integrates with Swarms agents as a high-performance vector store for RAG operations:
|
||||
|
||||
```
|
||||
[Agent] -> [FAISS Memory] -> [Vector Index] -> [Similarity Search] -> [Retrieved Context]
|
||||
```
|
||||
|
||||
The system stores document embeddings in optimized FAISS indices and performs ultra-fast similarity searches to provide relevant context to agents.
|
||||
|
||||
## Setup & Configuration
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install faiss-cpu # CPU version
|
||||
# OR
|
||||
pip install faiss-gpu # GPU version (requires CUDA)
|
||||
pip install swarms
|
||||
pip install litellm
|
||||
pip install numpy
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# OpenAI API key for LLM (if using OpenAI models)
|
||||
export OPENAI_API_KEY="your-openai-api-key"
|
||||
|
||||
# Optional: Configure GPU usage
|
||||
export CUDA_VISIBLE_DEVICES="0"
|
||||
```
|
||||
|
||||
### Dependencies
|
||||
|
||||
- `faiss-cpu>=1.7.0` or `faiss-gpu>=1.7.0`
|
||||
- `swarms`
|
||||
- `litellm`
|
||||
- `numpy`
|
||||
- `pickle` (for persistence)
|
||||
|
||||
## Code Example
|
||||
|
||||
```python
|
||||
"""
|
||||
Agent with FAISS RAG (Retrieval-Augmented Generation)
|
||||
|
||||
This example demonstrates using FAISS as a vector database for RAG operations,
|
||||
allowing agents to store and retrieve documents for enhanced context.
|
||||
"""
|
||||
|
||||
from swarms import Agent
|
||||
from swarms_memory import FAISSDB
|
||||
|
||||
# Initialize FAISS wrapper for RAG operations
|
||||
rag_db = FAISSDB(
|
||||
embedding_model="text-embedding-3-small",
|
||||
metric="cosine",
|
||||
index_file="knowledge_base.faiss"
|
||||
)
|
||||
|
||||
# Add documents to the knowledge base
|
||||
documents = [
|
||||
"FAISS is a library for efficient similarity search and clustering of dense vectors.",
|
||||
"RAG combines retrieval and generation for more accurate AI responses.",
|
||||
"Vector embeddings enable semantic search across documents.",
|
||||
"The swarms framework supports multiple memory backends including FAISS.",
|
||||
"Swarms is the first and most reliable multi-agent production-grade framework.",
|
||||
"Kye Gomez is Founder and CEO of Swarms."
|
||||
]
|
||||
|
||||
# Add documents individually
|
||||
for doc in documents:
|
||||
rag_db.add(doc)
|
||||
|
||||
# Create agent with RAG capabilities
|
||||
agent = Agent(
|
||||
agent_name="RAG-Agent",
|
||||
agent_description="Swarms Agent with FAISS-powered RAG for enhanced knowledge retrieval",
|
||||
model_name="gpt-4o",
|
||||
max_loops=1,
|
||||
dynamic_temperature_enabled=True,
|
||||
long_term_memory=rag_db
|
||||
)
|
||||
|
||||
# Query with RAG
|
||||
response = agent.run("What is FAISS and how does it relate to RAG? Who is the founder of Swarms?")
|
||||
print(response)
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
### 1. Large-Scale Document Search
|
||||
- **Scenario**: Searching through millions of documents or papers
|
||||
- **Benefits**: Sub-linear search time, memory efficiency
|
||||
- **Best For**: Academic databases, legal document search, news archives
|
||||
|
||||
### 2. Real-time Recommendation Systems
|
||||
- **Scenario**: Product or content recommendations with low latency requirements
|
||||
- **Benefits**: Ultra-fast query response, batch processing support
|
||||
- **Best For**: E-commerce, streaming platforms, social media
|
||||
|
||||
### 3. High-Performance RAG Applications
|
||||
- **Scenario**: Production RAG systems requiring fast response times
|
||||
- **Benefits**: Optimized C++ implementation, GPU acceleration
|
||||
- **Best For**: Customer support bots, technical documentation systems
|
||||
|
||||
### 4. Scientific Research Tools
|
||||
- **Scenario**: Similarity search in scientific datasets or embeddings
|
||||
- **Benefits**: Clustering support, exact and approximate search options
|
||||
- **Best For**: Bioinformatics, materials science, computer vision research
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Index Types Performance Comparison
|
||||
|
||||
| Index Type | Search Speed | Memory Usage | Accuracy | Best Use Case |
|
||||
|------------|--------------|--------------|----------|---------------|
|
||||
| **Flat** | Fast | High | 100% | Small datasets (< 1M vectors) |
|
||||
| **IVF** | Very Fast | Medium | 95-99% | Large datasets (1M-100M vectors) |
|
||||
| **HNSW** | Ultra Fast | Medium-High | 95-98% | Real-time applications |
|
||||
| **PQ** | Fast | Low | 90-95% | Memory-constrained environments |
|
||||
|
||||
### Scaling Characteristics
|
||||
- **Small Scale** (< 1M vectors): Use Flat index for exact search
|
||||
- **Medium Scale** (1M - 10M vectors): Use IVF with appropriate nlist
|
||||
- **Large Scale** (10M - 1B vectors): Use IVF with PQ compression
|
||||
- **Ultra Large Scale** (> 1B vectors): Use sharded indices across multiple machines
|
||||
|
||||
### Performance Optimization
|
||||
```python
|
||||
# GPU acceleration (if available)
|
||||
import faiss
|
||||
if faiss.get_num_gpus() > 0:
|
||||
gpu_index = faiss.index_cpu_to_gpu(faiss.StandardGpuResources(), 0, cpu_index)
|
||||
|
||||
# Batch search for better throughput
|
||||
results = memory.search_batch(queries, k=10)
|
||||
|
||||
# Memory mapping for very large indices
|
||||
index = faiss.read_index("large_index.faiss", faiss.IO_FLAG_MMAP)
|
||||
```
|
||||
|
||||
## Cloud vs Local Deployment
|
||||
|
||||
### Local Deployment
|
||||
```python
|
||||
# Local FAISS with persistence
|
||||
memory = FAISSMemory(index_type="Flat")
|
||||
memory.save_index("./local_faiss_index")
|
||||
```
|
||||
|
||||
**Advantages:**
|
||||
- No network latency
|
||||
- Full control over hardware
|
||||
- Cost-effective for development
|
||||
- Easy debugging and profiling
|
||||
|
||||
**Disadvantages:**
|
||||
- Limited by single machine resources
|
||||
- Manual scaling required
|
||||
- No built-in redundancy
|
||||
|
||||
### Cloud Deployment
|
||||
```python
|
||||
# Cloud deployment with distributed storage
|
||||
# Use cloud storage for index persistence
|
||||
import boto3
|
||||
s3 = boto3.client('s3')
|
||||
|
||||
# Save to cloud storage
|
||||
memory.save_index("/tmp/faiss_index")
|
||||
s3.upload_file("/tmp/faiss_index.index", "bucket", "indices/faiss_index.index")
|
||||
```
|
||||
|
||||
**Advantages:**
|
||||
- Horizontal scaling with multiple instances
|
||||
- Managed infrastructure
|
||||
- Automatic backups and redundancy
|
||||
- Global distribution
|
||||
|
||||
**Disadvantages:**
|
||||
- Network latency for large indices
|
||||
- Higher operational costs
|
||||
- More complex deployment pipeline
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### GPU Configuration
|
||||
```python
|
||||
import faiss
|
||||
|
||||
# Check GPU availability
|
||||
print(f"GPUs available: {faiss.get_num_gpus()}")
|
||||
|
||||
# GPU-accelerated index
|
||||
if faiss.get_num_gpus() > 0:
|
||||
cpu_index = faiss.IndexFlatIP(dimension)
|
||||
gpu_resources = faiss.StandardGpuResources()
|
||||
gpu_index = faiss.index_cpu_to_gpu(gpu_resources, 0, cpu_index)
|
||||
```
|
||||
|
||||
### Index Optimization
|
||||
```python
|
||||
# IVF index with optimized parameters
|
||||
nlist = int(4 * np.sqrt(num_vectors)) # Rule of thumb
|
||||
index = faiss.IndexIVFFlat(quantizer, dimension, nlist)
|
||||
index.train(training_vectors)
|
||||
index.nprobe = min(nlist, 10) # Search parameter
|
||||
|
||||
# HNSW index optimization
|
||||
index = faiss.IndexHNSWFlat(dimension, 32) # M=32 connections
|
||||
index.hnsw.efConstruction = 200 # Build-time parameter
|
||||
index.hnsw.efSearch = 100 # Query-time parameter
|
||||
```
|
||||
|
||||
### Memory Management
|
||||
```python
|
||||
# Product Quantization for memory efficiency
|
||||
m = 8 # Number of subquantizers
|
||||
nbits = 8 # Bits per subquantizer
|
||||
pq = faiss.IndexPQ(dimension, m, nbits)
|
||||
|
||||
# Composite index (IVF + PQ)
|
||||
index = faiss.IndexIVFPQ(quantizer, dimension, nlist, m, nbits)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Index Selection**: Choose appropriate index type based on dataset size and latency requirements
|
||||
2. **Memory Management**: Use product quantization for large datasets with memory constraints
|
||||
3. **Batch Processing**: Process documents and queries in batches for better throughput
|
||||
4. **Normalization**: Normalize embeddings for cosine similarity using inner product indices
|
||||
5. **Training Data**: Use representative data for training IVF indices
|
||||
6. **Parameter Tuning**: Optimize nlist, nprobe, and other parameters for your specific use case
|
||||
7. **Monitoring**: Track index size, query latency, and memory usage in production
|
||||
8. **Persistence**: Regularly save indices and implement proper backup strategies
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Memory Errors**: Reduce batch sizes or use product quantization. Consider using memory mapping for large indices. Monitor system memory usage.
|
||||
|
||||
2. **Slow Search Performance**: Check if IVF index is properly trained. Adjust nprobe parameter (higher = slower but more accurate). Consider using GPU acceleration.
|
||||
|
||||
3. **Low Search Accuracy**: Increase nlist for IVF indices. Adjust efSearch for HNSW indices. Verify embedding normalization.
|
||||
|
||||
4. **Index Loading Issues**: Check file permissions and disk space. Verify FAISS version compatibility. Ensure consistent data types (float32).
|
||||
|
||||
This comprehensive guide provides everything needed to integrate FAISS with Swarms agents for high-performance RAG applications using the unified LiteLLM embeddings approach.
|
@ -0,0 +1,286 @@
|
||||
# Milvus Cloud RAG Integration with Swarms
|
||||
|
||||
## Overview
|
||||
|
||||
Milvus Cloud (also known as Zilliz Cloud) is a fully managed cloud service for Milvus, the world's most advanced open-source vector database. It provides enterprise-grade vector database capabilities with automatic scaling, high availability, and comprehensive security features. Milvus Cloud is designed for production-scale RAG applications that require robust performance, reliability, and minimal operational overhead.
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Fully Managed Service**: No infrastructure management required
|
||||
- **Auto-scaling**: Automatic scaling based on workload demands
|
||||
- **High Availability**: Built-in redundancy and disaster recovery
|
||||
- **Multiple Index Types**: Support for various indexing algorithms (IVF, HNSW, ANNOY, etc.)
|
||||
- **Rich Metadata Filtering**: Advanced filtering capabilities with complex expressions
|
||||
- **Multi-tenancy**: Secure isolation between different applications
|
||||
- **Global Distribution**: Available in multiple cloud regions worldwide
|
||||
- **Enterprise Security**: End-to-end encryption and compliance certifications
|
||||
|
||||
## Architecture
|
||||
|
||||
Milvus Cloud integrates with Swarms agents as a scalable, managed vector database solution:
|
||||
|
||||
```
|
||||
[Agent] -> [Milvus Cloud Memory] -> [Managed Vector DB] -> [Similarity Search] -> [Retrieved Context]
|
||||
```
|
||||
|
||||
The system leverages Milvus Cloud's distributed architecture to provide high-performance vector operations with enterprise-grade reliability.
|
||||
|
||||
## Setup & Configuration
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install pymilvus[cloud]
|
||||
pip install swarms
|
||||
pip install litellm
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Milvus Cloud credentials
|
||||
export MILVUS_CLOUD_URI="https://your-cluster.api.milvuscloud.com"
|
||||
export MILVUS_CLOUD_TOKEN="your-api-token"
|
||||
|
||||
# Optional: Database name (default: "default")
|
||||
export MILVUS_DATABASE="your-database"
|
||||
|
||||
# OpenAI API key for LLM
|
||||
export OPENAI_API_KEY="your-openai-api-key"
|
||||
```
|
||||
|
||||
### Dependencies
|
||||
|
||||
- `pymilvus>=2.3.0`
|
||||
- `swarms`
|
||||
- `litellm`
|
||||
- `numpy`
|
||||
|
||||
## Code Example
|
||||
|
||||
```python
|
||||
"""
|
||||
Agent with Milvus Cloud RAG (Retrieval-Augmented Generation)
|
||||
|
||||
This example demonstrates using Milvus Cloud (Zilliz) as a vector database for RAG operations,
|
||||
allowing agents to store and retrieve documents from your cloud-hosted Milvus account.
|
||||
"""
|
||||
|
||||
import os
|
||||
from swarms import Agent
|
||||
from swarms_memory import MilvusDB
|
||||
|
||||
# Get Milvus Cloud credentials
|
||||
milvus_uri = os.getenv("MILVUS_URI")
|
||||
milvus_token = os.getenv("MILVUS_TOKEN")
|
||||
|
||||
if not milvus_uri or not milvus_token:
|
||||
print("❌ Missing Milvus Cloud credentials!")
|
||||
print("Please set MILVUS_URI and MILVUS_TOKEN in your .env file")
|
||||
exit(1)
|
||||
|
||||
# Initialize Milvus Cloud wrapper for RAG operations
|
||||
rag_db = MilvusDB(
|
||||
embedding_model="text-embedding-3-small", # OpenAI embedding model
|
||||
collection_name="swarms_cloud_knowledge", # Cloud collection name
|
||||
uri=milvus_uri, # Your Zilliz Cloud URI
|
||||
token=milvus_token, # Your Zilliz Cloud token
|
||||
metric="COSINE", # Distance metric for similarity search
|
||||
)
|
||||
|
||||
# Add documents to the knowledge base
|
||||
documents = [
|
||||
"Milvus Cloud is a fully managed vector database service provided by Zilliz.",
|
||||
"RAG combines retrieval and generation for more accurate AI responses.",
|
||||
"Vector embeddings enable semantic search across documents.",
|
||||
"The swarms framework supports multiple memory backends including Milvus Cloud.",
|
||||
"Swarms is the first and most reliable multi-agent production-grade framework.",
|
||||
"Kye Gomez is Founder and CEO of Swarms."
|
||||
]
|
||||
|
||||
# Add documents individually
|
||||
for doc in documents:
|
||||
rag_db.add(doc)
|
||||
|
||||
# Create agent with RAG capabilities
|
||||
agent = Agent(
|
||||
agent_name="Cloud-RAG-Agent",
|
||||
agent_description="Swarms Agent with Milvus Cloud-powered RAG for scalable knowledge retrieval",
|
||||
model_name="gpt-4o",
|
||||
max_loops=1,
|
||||
dynamic_temperature_enabled=True,
|
||||
long_term_memory=rag_db
|
||||
)
|
||||
|
||||
# Query with RAG
|
||||
response = agent.run("What is Milvus Cloud and how does it relate to RAG? Who is the founder of Swarms?")
|
||||
print(response)
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
### 1. Enterprise Knowledge Management
|
||||
- **Scenario**: Large-scale corporate knowledge bases with millions of documents
|
||||
- **Benefits**: Auto-scaling, high availability, enterprise security
|
||||
- **Best For**: Fortune 500 companies, global organizations
|
||||
|
||||
### 2. Production RAG Applications
|
||||
- **Scenario**: Customer-facing AI applications requiring 99.9% uptime
|
||||
- **Benefits**: Managed infrastructure, automatic scaling, disaster recovery
|
||||
- **Best For**: SaaS platforms, customer support systems
|
||||
|
||||
### 3. Multi-tenant Applications
|
||||
- **Scenario**: Serving multiple customers with isolated data
|
||||
- **Benefits**: Built-in multi-tenancy, secure data isolation
|
||||
- **Best For**: AI platform providers, B2B SaaS solutions
|
||||
|
||||
### 4. Global AI Applications
|
||||
- **Scenario**: Applications serving users worldwide
|
||||
- **Benefits**: Global distribution, edge optimization
|
||||
- **Best For**: International companies, global services
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Scaling
|
||||
- **Auto-scaling**: Automatic compute and storage scaling based on workload
|
||||
- **Horizontal Scaling**: Support for billions of vectors across multiple nodes
|
||||
- **Vertical Scaling**: On-demand resource allocation for compute-intensive tasks
|
||||
|
||||
### Performance Metrics
|
||||
- **Query Latency**: < 10ms for 95th percentile queries
|
||||
- **Throughput**: 10,000+ QPS depending on configuration
|
||||
- **Availability**: 99.9% uptime SLA
|
||||
- **Consistency**: Tunable consistency levels
|
||||
|
||||
### Index Types Performance
|
||||
|
||||
| Index Type | Use Case | Performance | Memory | Accuracy |
|
||||
|------------|----------|-------------|---------|----------|
|
||||
| **HNSW** | High-performance similarity search | Ultra-fast | Medium | Very High |
|
||||
| **IVF_FLAT** | Large datasets with exact results | Fast | High | Perfect |
|
||||
| **IVF_SQ8** | Memory-efficient large datasets | Fast | Low | High |
|
||||
| **ANNOY** | Read-heavy workloads | Very Fast | Low | High |
|
||||
|
||||
## Cloud vs Local Deployment
|
||||
|
||||
### Milvus Cloud Advantages
|
||||
- **Fully Managed**: Zero infrastructure management
|
||||
- **Enterprise Features**: Advanced security, compliance, monitoring
|
||||
- **Global Scale**: Multi-region deployment capabilities
|
||||
- **Cost Optimization**: Pay-per-use pricing model
|
||||
- **Professional Support**: 24/7 technical support
|
||||
|
||||
### Configuration Options
|
||||
```python
|
||||
# Production configuration with advanced features
|
||||
memory = MilvusCloudMemory(
|
||||
collection_name="production_knowledge_base",
|
||||
embedding_model="text-embedding-3-small",
|
||||
dimension=1536,
|
||||
index_type="HNSW", # Best for similarity search
|
||||
metric_type="COSINE"
|
||||
)
|
||||
|
||||
# Development configuration
|
||||
memory = MilvusCloudMemory(
|
||||
collection_name="dev_knowledge_base",
|
||||
embedding_model="text-embedding-3-small",
|
||||
dimension=1536,
|
||||
index_type="IVF_FLAT", # Balanced performance
|
||||
metric_type="L2"
|
||||
)
|
||||
```
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Rich Metadata Filtering
|
||||
```python
|
||||
# Complex filter expressions
|
||||
filter_expr = '''
|
||||
(metadata["category"] == "ai" and metadata["difficulty"] == "advanced")
|
||||
or (metadata["topic"] == "embeddings" and metadata["type"] == "concept")
|
||||
'''
|
||||
|
||||
results = memory.search(
|
||||
query="advanced AI concepts",
|
||||
limit=5,
|
||||
filter_expr=filter_expr
|
||||
)
|
||||
```
|
||||
|
||||
### Hybrid Search
|
||||
```python
|
||||
# Combine vector similarity with metadata filtering
|
||||
results = memory.search(
|
||||
query="machine learning algorithms",
|
||||
limit=10,
|
||||
filter_expr='metadata["category"] in ["ai", "ml"] and metadata["difficulty"] != "beginner"'
|
||||
)
|
||||
```
|
||||
|
||||
### Collection Management
|
||||
```python
|
||||
# Create multiple collections for different domains
|
||||
medical_memory = MilvusCloudMemory(
|
||||
collection_name="medical_knowledge",
|
||||
embedding_model="text-embedding-3-small"
|
||||
)
|
||||
|
||||
legal_memory = MilvusCloudMemory(
|
||||
collection_name="legal_documents",
|
||||
embedding_model="text-embedding-3-small"
|
||||
)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Index Selection**: Choose HNSW for similarity search, IVF for large datasets
|
||||
2. **Metadata Design**: Design rich metadata schema for effective filtering
|
||||
3. **Batch Operations**: Use batch operations for better throughput
|
||||
4. **Connection Pooling**: Implement connection pooling for production applications
|
||||
5. **Error Handling**: Implement robust error handling and retry logic
|
||||
6. **Monitoring**: Set up monitoring and alerting for performance metrics
|
||||
7. **Cost Optimization**: Monitor usage and optimize collection configurations
|
||||
8. **Security**: Follow security best practices for authentication and data access
|
||||
|
||||
## Monitoring and Observability
|
||||
|
||||
### Key Metrics to Monitor
|
||||
- Query latency percentiles (p50, p95, p99)
|
||||
- Query throughput (QPS)
|
||||
- Error rates and types
|
||||
- Collection size and growth
|
||||
- Resource utilization
|
||||
|
||||
### Alerting Setup
|
||||
```python
|
||||
# Example monitoring integration
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger("milvus_rag")
|
||||
|
||||
def monitored_search(memory, query, **kwargs):
|
||||
start_time = time.time()
|
||||
try:
|
||||
results = memory.search(query, **kwargs)
|
||||
duration = time.time() - start_time
|
||||
logger.info(f"Search completed in {duration:.3f}s, found {len(results['documents'])} results")
|
||||
return results
|
||||
except Exception as e:
|
||||
logger.error(f"Search failed: {e}")
|
||||
raise
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Connection Errors**: Verify MILVUS_CLOUD_URI and MILVUS_CLOUD_TOKEN. Check network connectivity and firewall settings. Confirm cloud region accessibility.
|
||||
|
||||
2. **Performance Issues**: Monitor collection size and index type appropriateness. Check query complexity and filter expressions. Review auto-scaling configuration.
|
||||
|
||||
3. **Search Accuracy Issues**: Verify embedding model consistency. Check vector normalization if using cosine similarity. Review index parameters and search parameters.
|
||||
|
||||
4. **Quota and Billing Issues**: Monitor usage against plan limits. Review auto-scaling settings. Check billing alerts and notifications.
|
||||
|
||||
This comprehensive guide provides everything needed to integrate Milvus Cloud with Swarms agents for enterprise-scale RAG applications using the unified LiteLLM embeddings approach.
|
@ -0,0 +1,306 @@
|
||||
# Milvus Local/Lite RAG Integration with Swarms
|
||||
|
||||
## Overview
|
||||
|
||||
Milvus Lite is a lightweight, standalone version of Milvus that runs locally without requiring a full Milvus server deployment. It provides the core vector database functionality of Milvus in a simplified package that's perfect for development, testing, prototyping, and small-scale applications. Milvus Lite maintains compatibility with the full Milvus ecosystem while offering easier setup and deployment.
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Zero Configuration**: No server setup or configuration required
|
||||
- **Lightweight**: Minimal resource footprint for local development
|
||||
- **Full Compatibility**: Same API as full Milvus for easy migration
|
||||
- **Embedded Database**: Runs as a library within your application
|
||||
- **Multiple Index Types**: Support for IVF, HNSW, and other algorithms
|
||||
- **Persistent Storage**: Local file-based storage for data persistence
|
||||
- **Python Native**: Pure Python implementation for easy installation
|
||||
- **Cross-platform**: Works on Windows, macOS, and Linux
|
||||
|
||||
## Architecture
|
||||
|
||||
Milvus Lite integrates with Swarms agents as an embedded vector database solution:
|
||||
|
||||
```
|
||||
[Agent] -> [Milvus Lite Memory] -> [Local Vector Store] -> [Similarity Search] -> [Retrieved Context]
|
||||
```
|
||||
|
||||
The system runs entirely locally, providing fast vector operations without network overhead or external dependencies.
|
||||
|
||||
## Setup & Configuration
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install pymilvus[lite] # Install with Milvus Lite support
|
||||
pip install swarms
|
||||
pip install litellm
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Optional: Specify database path
|
||||
export MILVUS_LITE_DB_PATH="./milvus_lite.db"
|
||||
|
||||
# OpenAI API key for LLM
|
||||
export OPENAI_API_KEY="your-openai-api-key"
|
||||
```
|
||||
|
||||
### Dependencies
|
||||
|
||||
- `pymilvus>=2.3.0`
|
||||
- `swarms`
|
||||
- `litellm`
|
||||
- `numpy`
|
||||
|
||||
## Code Example
|
||||
|
||||
```python
|
||||
"""
|
||||
Agent with Milvus RAG (Retrieval-Augmented Generation)
|
||||
|
||||
This example demonstrates using Milvus as a vector database for RAG operations,
|
||||
allowing agents to store and retrieve documents for enhanced context.
|
||||
"""
|
||||
|
||||
from swarms import Agent
|
||||
from swarms_memory import MilvusDB
|
||||
|
||||
|
||||
# Initialize Milvus wrapper for RAG operations
|
||||
rag_db = MilvusDB(
|
||||
embedding_model="text-embedding-3-small", # OpenAI embedding model
|
||||
collection_name="swarms_knowledge", # Collection name
|
||||
db_file="swarms_milvus.db", # Local Milvus Lite database
|
||||
metric="COSINE", # Distance metric for similarity search
|
||||
)
|
||||
|
||||
# Add documents to the knowledge base
|
||||
documents = [
|
||||
"Milvus is an open-source vector database built for scalable similarity search and AI applications.",
|
||||
"RAG combines retrieval and generation for more accurate AI responses.",
|
||||
"Vector embeddings enable semantic search across documents.",
|
||||
"The swarms framework supports multiple memory backends including Milvus.",
|
||||
"Swarms is the first and most reliable multi-agent production-grade framework.",
|
||||
"Kye Gomez is Founder and CEO of Swarms."
|
||||
]
|
||||
|
||||
# Add documents individually
|
||||
for doc in documents:
|
||||
rag_db.add(doc)
|
||||
|
||||
# Create agent with RAG capabilities
|
||||
agent = Agent(
|
||||
agent_name="RAG-Agent",
|
||||
agent_description="Swarms Agent with Milvus-powered RAG for enhanced knowledge retrieval and semantic search",
|
||||
model_name="gpt-4o",
|
||||
max_loops=1,
|
||||
dynamic_temperature_enabled=True,
|
||||
long_term_memory=rag_db
|
||||
)
|
||||
|
||||
# Query with RAG
|
||||
response = agent.run("What is Milvus and how does it relate to RAG? Who is the founder of Swarms?")
|
||||
print(response)
|
||||
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
### 1. Local Development and Testing
|
||||
- **Scenario**: Developing RAG applications without external dependencies
|
||||
- **Benefits**: Zero setup, fast iteration, offline capability
|
||||
- **Best For**: Prototype development, unit testing, local demos
|
||||
|
||||
### 2. Edge AI Applications
|
||||
- **Scenario**: AI applications running on edge devices or offline environments
|
||||
- **Benefits**: No internet required, low latency, privacy-first
|
||||
- **Best For**: IoT devices, mobile apps, air-gapped systems
|
||||
|
||||
### 3. Desktop AI Applications
|
||||
- **Scenario**: Personal AI assistants or productivity tools
|
||||
- **Benefits**: Private data storage, instant startup, single-file deployment
|
||||
- **Best For**: Personal knowledge management, desktop utilities
|
||||
|
||||
### 4. Small-Scale Production
|
||||
- **Scenario**: Applications with limited data and users
|
||||
- **Benefits**: Simple deployment, low resource usage, cost-effective
|
||||
- **Best For**: MVPs, small businesses, specialized tools
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Resource Usage
|
||||
- **Memory**: Low baseline usage (~50MB), scales with data size
|
||||
- **Storage**: Efficient compression, typically 2-10x smaller than raw text
|
||||
- **CPU**: Optimized algorithms, good performance on consumer hardware
|
||||
- **Startup**: Fast initialization, typically < 1 second
|
||||
|
||||
### Scaling Limits
|
||||
- **Vectors**: Recommended limit ~1M vectors for optimal performance
|
||||
- **Memory**: Depends on available system RAM
|
||||
- **Query Speed**: Sub-second response for most queries
|
||||
- **Concurrent Access**: Single-process access (file locking)
|
||||
|
||||
### Performance Optimization
|
||||
```python
|
||||
# Optimize for small datasets
|
||||
memory = MilvusLiteMemory(
|
||||
index_type="HNSW",
|
||||
metric_type="COSINE"
|
||||
)
|
||||
|
||||
# Optimize for memory usage
|
||||
memory = MilvusLiteMemory(
|
||||
index_type="IVF_FLAT",
|
||||
metric_type="L2"
|
||||
)
|
||||
|
||||
# Batch operations for better performance
|
||||
doc_ids = memory.add_documents(documents, metadata)
|
||||
```
|
||||
|
||||
## Local vs Cloud Deployment
|
||||
|
||||
### Milvus Lite Advantages
|
||||
- **No External Dependencies**: Runs completely offline
|
||||
- **Privacy**: All data stays on local machine
|
||||
- **Cost**: No cloud service fees
|
||||
- **Simplicity**: Single file deployment
|
||||
- **Development**: Fast iteration and debugging
|
||||
|
||||
### Limitations Compared to Full Milvus
|
||||
- **Scalability**: Limited to single machine resources
|
||||
- **Concurrency**: No multi-client support
|
||||
- **Clustering**: No distributed deployment
|
||||
- **Enterprise Features**: Limited monitoring and management tools
|
||||
|
||||
### Migration Path
|
||||
```python
|
||||
# Development with Milvus Lite
|
||||
dev_memory = MilvusLiteMemory(
|
||||
db_path="./dev_database.db",
|
||||
collection_name="dev_collection"
|
||||
)
|
||||
|
||||
# Production with full Milvus (same API)
|
||||
# from pymilvus import connections
|
||||
# connections.connect(host="prod-server", port="19530")
|
||||
# prod_collection = Collection("prod_collection")
|
||||
```
|
||||
|
||||
## File Management and Persistence
|
||||
|
||||
### Database Files
|
||||
```python
|
||||
# Default location
|
||||
db_path = "./milvus_lite.db"
|
||||
|
||||
# Custom location with directory structure
|
||||
db_path = "./data/vector_db/knowledge_base.db"
|
||||
|
||||
# Multiple databases for different domains
|
||||
medical_memory = MilvusLiteMemory(db_path="./data/medical.db")
|
||||
legal_memory = MilvusLiteMemory(db_path="./data/legal.db")
|
||||
```
|
||||
|
||||
### Backup Strategies
|
||||
```python
|
||||
import shutil
|
||||
import datetime
|
||||
|
||||
# Manual backup
|
||||
backup_name = f"backup_{datetime.now().strftime('%Y%m%d_%H%M%S')}.db"
|
||||
memory.backup_database(f"./backups/{backup_name}")
|
||||
|
||||
# Automated backup function
|
||||
def create_scheduled_backup():
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
backup_path = f"./backups/auto_backup_{timestamp}.db"
|
||||
memory.backup_database(backup_path)
|
||||
return backup_path
|
||||
```
|
||||
|
||||
### Data Migration
|
||||
```python
|
||||
# Export data for migration
|
||||
def export_collection_data(memory):
|
||||
"""Export all data from collection for migration"""
|
||||
# This would involve querying all documents and their metadata
|
||||
# Implementation depends on specific migration needs
|
||||
pass
|
||||
|
||||
# Import data from backup
|
||||
def import_from_backup(source_path, target_memory):
|
||||
"""Import data from another Milvus Lite database"""
|
||||
# Implementation for data transfer between databases
|
||||
pass
|
||||
```
|
||||
|
||||
## Development Workflow
|
||||
|
||||
### Testing Setup
|
||||
```python
|
||||
import tempfile
|
||||
import os
|
||||
|
||||
def create_test_memory():
|
||||
"""Create temporary memory for testing"""
|
||||
temp_dir = tempfile.mkdtemp()
|
||||
test_db_path = os.path.join(temp_dir, "test.db")
|
||||
|
||||
return MilvusLiteMemory(
|
||||
db_path=test_db_path,
|
||||
collection_name="test_collection"
|
||||
)
|
||||
|
||||
# Use in tests
|
||||
def test_rag_functionality():
|
||||
memory = create_test_memory()
|
||||
# Add test documents and run tests
|
||||
memory.add_document("Test document", {"category": "test"})
|
||||
results = memory.search("test", limit=1)
|
||||
assert len(results["documents"]) == 1
|
||||
```
|
||||
|
||||
### Debug Configuration
|
||||
```python
|
||||
# Enable debug logging
|
||||
import logging
|
||||
logging.basicConfig(level=logging.DEBUG)
|
||||
|
||||
# Create memory with debug info
|
||||
memory = MilvusLiteMemory(
|
||||
db_path="./debug.db",
|
||||
collection_name="debug_collection",
|
||||
index_type="HNSW" # Good for debugging
|
||||
)
|
||||
|
||||
# Monitor database growth
|
||||
print(f"Database size: {memory.get_database_size()} bytes")
|
||||
stats = memory.get_collection_stats()
|
||||
print(f"Document count: {stats['row_count']}")
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Database Location**: Store databases in a dedicated data directory
|
||||
2. **Backup Strategy**: Implement regular backups for important data
|
||||
3. **Resource Management**: Monitor database size and system resources
|
||||
4. **Error Handling**: Handle file I/O errors and database corruption
|
||||
5. **Testing**: Use temporary databases for unit tests
|
||||
6. **Version Control**: Don't commit database files to version control
|
||||
7. **Documentation**: Document schema and metadata conventions
|
||||
8. **Migration Planning**: Plan for eventual migration to full Milvus if needed
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Database File Errors**: Check file permissions and disk space. Ensure directory exists before creating database. Handle concurrent access properly.
|
||||
|
||||
2. **Performance Issues**: Monitor database size relative to available memory. Consider index type optimization for dataset size. Batch operations for better throughput.
|
||||
|
||||
3. **Memory Usage**: Use appropriate index parameters for available RAM. Monitor system memory usage. Consider data compression techniques.
|
||||
|
||||
4. **Data Corruption**: Implement proper backup and recovery procedures. Handle application crashes gracefully. Use database validation tools.
|
||||
|
||||
This comprehensive guide provides everything needed to integrate Milvus Lite with Swarms agents for local, lightweight RAG applications using the unified LiteLLM embeddings approach.
|
@ -0,0 +1,78 @@
|
||||
# RAG Vector Databases
|
||||
|
||||
## Overview
|
||||
|
||||
This section provides comprehensive guides for integrating various vector databases with Swarms agents for Retrieval-Augmented Generation (RAG) operations. Each guide demonstrates how to use unified LiteLLM embeddings with different vector database systems to create powerful, context-aware AI agents.
|
||||
|
||||
## Available Vector Database Integrations
|
||||
|
||||
### Cloud-Based Solutions
|
||||
|
||||
- **[Pinecone](pinecone.md)** - Serverless vector database with auto-scaling and high availability
|
||||
- **[Weaviate Cloud](weaviate-cloud.md)** - Multi-modal vector database with GraphQL API
|
||||
- **[Milvus Cloud](milvus-cloud.md)** - Enterprise-grade managed vector database service
|
||||
|
||||
### Self-Hosted Solutions
|
||||
|
||||
- **[Qdrant](qdrant.md)** - High-performance vector similarity search engine
|
||||
- **[ChromaDB](chromadb.md)** - Simple, fast vector database for AI applications
|
||||
- **[FAISS](faiss.md)** - Facebook's efficient similarity search library
|
||||
- **[Weaviate Local](weaviate-local.md)** - Self-hosted Weaviate with full control
|
||||
- **[Milvus Local](milvus-local.md)** - Local Milvus deployment for development
|
||||
|
||||
### Specialized Solutions
|
||||
|
||||
- **[SingleStore](singlestore.md)** - SQL + Vector hybrid database for complex queries
|
||||
- **[Zyphra RAG](zyphra-rag.md)** - Specialized RAG system with advanced features
|
||||
|
||||
## Key Features Across All Integrations
|
||||
|
||||
### Unified LiteLLM Embeddings
|
||||
All guides use the standardized LiteLLM approach with `text-embedding-3-small` for consistent embedding generation across different vector databases.
|
||||
|
||||
### Swarms Agent Integration
|
||||
Each integration demonstrates how to:
|
||||
- Initialize vector database connections
|
||||
- Add documents with rich metadata
|
||||
- Perform semantic search queries
|
||||
- Integrate with Swarms agents for RAG operations
|
||||
|
||||
### Common Capabilities
|
||||
- **Semantic Search**: Vector similarity matching for relevant document retrieval
|
||||
- **Metadata Filtering**: Advanced filtering based on document properties
|
||||
- **Batch Operations**: Efficient bulk document processing
|
||||
- **Real-time Updates**: Dynamic knowledge base management
|
||||
- **Scalability**: Solutions for different scale requirements
|
||||
|
||||
## Choosing the Right Vector Database
|
||||
|
||||
### For Development & Prototyping
|
||||
- **ChromaDB**: Simple setup, good for experimentation
|
||||
- **FAISS**: High performance, good for research
|
||||
- **Milvus Local**: Feature-rich local development
|
||||
|
||||
### For Production Cloud Deployments
|
||||
- **Pinecone**: Serverless, auto-scaling, managed
|
||||
- **Weaviate Cloud**: Multi-modal, GraphQL API
|
||||
- **Milvus Cloud**: Enterprise features, high availability
|
||||
|
||||
### For Self-Hosted Production
|
||||
- **Qdrant**: High performance, clustering support
|
||||
- **Weaviate Local**: Full control, custom configurations
|
||||
- **SingleStore**: SQL + Vector hybrid capabilities
|
||||
|
||||
### For Specialized Use Cases
|
||||
- **SingleStore**: When you need both SQL and vector operations
|
||||
- **Zyphra RAG**: For advanced RAG-specific features
|
||||
- **FAISS**: When maximum search performance is critical
|
||||
|
||||
## Getting Started
|
||||
|
||||
1. Choose a vector database based on your requirements
|
||||
2. Follow the specific integration guide
|
||||
3. Install required dependencies
|
||||
4. Configure embeddings with LiteLLM
|
||||
5. Initialize your Swarms agent with the vector database memory
|
||||
6. Add your documents and start querying
|
||||
|
||||
Each guide provides complete code examples, setup instructions, and best practices for production deployment.
|
@ -0,0 +1,376 @@
|
||||
# Pinecone RAG Integration with Swarms
|
||||
|
||||
## Overview
|
||||
|
||||
Pinecone is a fully managed vector database service designed specifically for high-performance AI applications. It provides a serverless, auto-scaling platform for vector similarity search that's optimized for production workloads. Pinecone offers enterprise-grade features including global distribution, real-time updates, metadata filtering, and comprehensive monitoring, making it ideal for production RAG systems that require reliability and scale.
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Serverless Architecture**: Automatic scaling with pay-per-use pricing
|
||||
- **Real-time Updates**: Live index updates without rebuilding
|
||||
- **Global Distribution**: Multi-region deployment with low latency
|
||||
- **Advanced Filtering**: Rich metadata filtering with complex queries
|
||||
- **High Availability**: 99.9% uptime SLA with built-in redundancy
|
||||
- **Performance Optimization**: Sub-millisecond query response times
|
||||
- **Enterprise Security**: SOC 2 compliance with end-to-end encryption
|
||||
- **Monitoring & Analytics**: Built-in observability and performance insights
|
||||
|
||||
## Architecture
|
||||
|
||||
Pinecone integrates with Swarms agents as a cloud-native vector database service:
|
||||
|
||||
```
|
||||
[Agent] -> [Pinecone Memory] -> [Serverless Vector DB] -> [Global Search] -> [Retrieved Context]
|
||||
```
|
||||
|
||||
The system leverages Pinecone's distributed infrastructure to provide consistent, high-performance vector operations across global regions.
|
||||
|
||||
## Setup & Configuration
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install pinecone-client
|
||||
pip install swarms
|
||||
pip install litellm
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Pinecone credentials
|
||||
export PINECONE_API_KEY="your-pinecone-api-key"
|
||||
export PINECONE_ENVIRONMENT="your-environment" # e.g., "us-east1-gcp"
|
||||
|
||||
# Optional: Index configuration
|
||||
export PINECONE_INDEX_NAME="swarms-knowledge-base"
|
||||
|
||||
# OpenAI API key for LLM
|
||||
export OPENAI_API_KEY="your-openai-api-key"
|
||||
```
|
||||
|
||||
### Dependencies
|
||||
|
||||
- `pinecone-client>=2.2.0`
|
||||
- `swarms`
|
||||
- `litellm`
|
||||
- `numpy`
|
||||
|
||||
## Code Example
|
||||
|
||||
```python
|
||||
"""
|
||||
Agent with Pinecone RAG (Retrieval-Augmented Generation)
|
||||
|
||||
This example demonstrates using Pinecone as a vector database for RAG operations,
|
||||
allowing agents to store and retrieve documents for enhanced context.
|
||||
"""
|
||||
|
||||
import os
|
||||
import time
|
||||
from swarms import Agent
|
||||
from swarms_memory import PineconeMemory
|
||||
|
||||
# Initialize Pinecone wrapper for RAG operations
|
||||
rag_db = PineconeMemory(
|
||||
api_key=os.getenv("PINECONE_API_KEY", "your-pinecone-api-key"),
|
||||
index_name="knowledge-base",
|
||||
embedding_model="text-embedding-3-small",
|
||||
namespace="examples"
|
||||
)
|
||||
|
||||
# Add documents to the knowledge base
|
||||
documents = [
|
||||
"Pinecone is a vector database that makes it easy to add semantic search to applications.",
|
||||
"RAG combines retrieval and generation for more accurate AI responses.",
|
||||
"Vector embeddings enable semantic search across documents.",
|
||||
"The swarms framework supports multiple memory backends including Pinecone.",
|
||||
"Swarms is the first and most reliable multi-agent production-grade framework.",
|
||||
"Kye Gomez is Founder and CEO of Swarms."
|
||||
]
|
||||
|
||||
# Add documents individually
|
||||
for doc in documents:
|
||||
rag_db.add(doc)
|
||||
|
||||
# Wait for Pinecone's eventual consistency to ensure documents are indexed
|
||||
print("Waiting for documents to be indexed...")
|
||||
time.sleep(2)
|
||||
|
||||
# Create agent with RAG capabilities
|
||||
agent = Agent(
|
||||
agent_name="RAG-Agent",
|
||||
agent_description="Swarms Agent with Pinecone-powered RAG for enhanced knowledge retrieval",
|
||||
model_name="gpt-4o",
|
||||
max_loops=1,
|
||||
dynamic_temperature_enabled=True,
|
||||
long_term_memory=rag_db
|
||||
)
|
||||
|
||||
# Query with RAG
|
||||
response = agent.run("What is Pinecone and how does it relate to RAG? Who is the founder of Swarms?")
|
||||
print(response)
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
### 1. Production AI Applications
|
||||
- **Scenario**: Customer-facing AI products requiring 99.9% uptime
|
||||
- **Benefits**: Serverless scaling, global distribution, enterprise SLA
|
||||
- **Best For**: SaaS platforms, mobile apps, web services
|
||||
|
||||
### 2. Real-time Recommendation Systems
|
||||
- **Scenario**: E-commerce, content, or product recommendations
|
||||
- **Benefits**: Sub-millisecond queries, real-time updates, global edge
|
||||
- **Best For**: E-commerce platforms, streaming services, social media
|
||||
|
||||
### 3. Enterprise Knowledge Management
|
||||
- **Scenario**: Large-scale corporate knowledge bases with global teams
|
||||
- **Benefits**: Multi-region deployment, advanced security, comprehensive monitoring
|
||||
- **Best For**: Fortune 500 companies, consulting firms, research organizations
|
||||
|
||||
### 4. Multi-tenant AI Platforms
|
||||
- **Scenario**: AI platform providers serving multiple customers
|
||||
- **Benefits**: Namespace isolation, flexible scaling, usage-based pricing
|
||||
- **Best For**: AI service providers, B2B platforms, managed AI solutions
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Scaling
|
||||
- **Serverless**: Automatic scaling based on traffic patterns
|
||||
- **Global**: Multi-region deployment for worldwide low latency
|
||||
- **Elastic**: Pay-per-use pricing model with no minimum commitments
|
||||
- **High Availability**: 99.9% uptime SLA with built-in redundancy
|
||||
|
||||
### Performance Metrics
|
||||
- **Query Latency**: < 10ms median, < 100ms 99th percentile
|
||||
- **Throughput**: 10,000+ QPS per replica
|
||||
- **Global Latency**: < 50ms from major worldwide regions
|
||||
- **Update Latency**: Real-time updates with immediate consistency
|
||||
|
||||
### Pod Types and Performance
|
||||
|
||||
| Pod Type | Use Case | Performance | Cost | Best For |
|
||||
|----------|----------|-------------|------|----------|
|
||||
| **p1.x1** | Development, small apps | Good | Low | Prototypes, testing |
|
||||
| **p1.x2** | Medium applications | Better | Medium | Production apps |
|
||||
| **p1.x4** | High-performance apps | Best | High | Enterprise, high-traffic |
|
||||
| **p2.x1** | Cost-optimized large scale | Good | Medium | Large datasets, batch processing |
|
||||
|
||||
## Cloud Deployment
|
||||
|
||||
### Production Configuration
|
||||
```python
|
||||
# High-performance production setup
|
||||
memory = PineconeMemory(
|
||||
index_name="production-knowledge-base",
|
||||
embedding_model="text-embedding-3-small",
|
||||
pod_type="p1.x2", # Higher performance
|
||||
replicas=2, # High availability
|
||||
metric="cosine"
|
||||
)
|
||||
```
|
||||
|
||||
### Multi-region Setup
|
||||
```python
|
||||
# Configure for global deployment
|
||||
import pinecone
|
||||
|
||||
# List available environments
|
||||
environments = pinecone.list_environments()
|
||||
print("Available regions:", environments)
|
||||
|
||||
# Choose optimal region based on user base
|
||||
memory = PineconeMemory(
|
||||
index_name="global-knowledge-base",
|
||||
embedding_model="text-embedding-3-small",
|
||||
pod_type="p1.x2"
|
||||
# Environment set via PINECONE_ENVIRONMENT
|
||||
)
|
||||
```
|
||||
|
||||
### Cost Optimization
|
||||
```python
|
||||
# Cost-optimized configuration
|
||||
memory = PineconeMemory(
|
||||
index_name="cost-optimized-kb",
|
||||
embedding_model="text-embedding-3-small",
|
||||
pod_type="p2.x1", # Cost-optimized for large datasets
|
||||
replicas=1, # Single replica for cost savings
|
||||
shards=1 # Single shard for simplicity
|
||||
)
|
||||
```
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Namespace Management
|
||||
```python
|
||||
# Organize data with namespaces
|
||||
medical_docs = ["Medical knowledge documents..."]
|
||||
legal_docs = ["Legal knowledge documents..."]
|
||||
|
||||
# Add to different namespaces
|
||||
memory.add_documents(medical_docs, namespace="medical")
|
||||
memory.add_documents(legal_docs, namespace="legal")
|
||||
|
||||
# Query specific namespace
|
||||
medical_results = memory.search("medical query", namespace="medical")
|
||||
legal_results = memory.search("legal query", namespace="legal")
|
||||
```
|
||||
|
||||
### Complex Filtering
|
||||
```python
|
||||
# Advanced metadata filtering
|
||||
complex_filter = {
|
||||
"$and": [
|
||||
{"category": {"$in": ["ai", "ml"]}},
|
||||
{"difficulty": {"$ne": "beginner"}},
|
||||
{"$or": [
|
||||
{"type": "concept"},
|
||||
{"type": "implementation"}
|
||||
]}
|
||||
]
|
||||
}
|
||||
|
||||
results = memory.search(
|
||||
"advanced AI concepts",
|
||||
filter_dict=complex_filter,
|
||||
top_k=5
|
||||
)
|
||||
```
|
||||
|
||||
### Batch Operations
|
||||
```python
|
||||
# Efficient batch processing
|
||||
large_dataset = load_large_document_collection() # Your data loading logic
|
||||
|
||||
# Process in batches
|
||||
batch_size = 100
|
||||
for i in range(0, len(large_dataset), batch_size):
|
||||
batch = large_dataset[i:i + batch_size]
|
||||
documents = [item['text'] for item in batch]
|
||||
metadata = [item['metadata'] for item in batch]
|
||||
|
||||
memory.add_documents(
|
||||
documents=documents,
|
||||
metadata=metadata,
|
||||
batch_size=batch_size
|
||||
)
|
||||
```
|
||||
|
||||
### Real-time Updates
|
||||
```python
|
||||
# Dynamic knowledge base updates
|
||||
def update_knowledge_base(new_documents, updated_documents, deleted_ids):
|
||||
"""Update knowledge base in real-time"""
|
||||
# Add new documents
|
||||
if new_documents:
|
||||
memory.add_documents(new_documents)
|
||||
|
||||
# Update existing documents
|
||||
for doc_id, content in updated_documents.items():
|
||||
memory.update_document(doc_id, content)
|
||||
|
||||
# Remove outdated documents
|
||||
if deleted_ids:
|
||||
memory.delete_documents(ids=deleted_ids)
|
||||
|
||||
print("Knowledge base updated in real-time")
|
||||
```
|
||||
|
||||
## Monitoring and Analytics
|
||||
|
||||
### Built-in Metrics
|
||||
```python
|
||||
# Monitor index performance
|
||||
stats = memory.get_index_stats()
|
||||
print(f"Total vectors: {stats['total_vector_count']}")
|
||||
print(f"Index fullness: {stats['index_fullness']}")
|
||||
|
||||
# Namespace statistics
|
||||
for namespace, ns_stats in stats.get('namespaces', {}).items():
|
||||
print(f"Namespace '{namespace}': {ns_stats['vector_count']} vectors")
|
||||
```
|
||||
|
||||
### Custom Monitoring
|
||||
```python
|
||||
import time
|
||||
from datetime import datetime
|
||||
|
||||
class MonitoredPineconeMemory(PineconeMemory):
|
||||
def __init__(self, *args, **kwargs):
|
||||
super().__init__(*args, **kwargs)
|
||||
self.query_metrics = []
|
||||
|
||||
def search(self, *args, **kwargs):
|
||||
start_time = time.time()
|
||||
results = super().search(*args, **kwargs)
|
||||
duration = time.time() - start_time
|
||||
|
||||
# Log metrics
|
||||
self.query_metrics.append({
|
||||
'timestamp': datetime.now(),
|
||||
'duration': duration,
|
||||
'results_count': len(results['documents'])
|
||||
})
|
||||
|
||||
return results
|
||||
|
||||
def get_performance_stats(self):
|
||||
if not self.query_metrics:
|
||||
return {}
|
||||
|
||||
durations = [m['duration'] for m in self.query_metrics]
|
||||
return {
|
||||
'avg_latency': sum(durations) / len(durations),
|
||||
'min_latency': min(durations),
|
||||
'max_latency': max(durations),
|
||||
'total_queries': len(self.query_metrics)
|
||||
}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Index Design**: Choose appropriate pod type based on performance requirements
|
||||
2. **Metadata Strategy**: Design rich metadata schema for effective filtering
|
||||
3. **Namespace Organization**: Use namespaces for logical data separation
|
||||
4. **Batch Processing**: Use batch operations for better throughput and cost efficiency
|
||||
5. **Error Handling**: Implement robust error handling with exponential backoff
|
||||
6. **Monitoring**: Set up comprehensive monitoring and alerting
|
||||
7. **Cost Management**: Monitor usage and optimize pod configuration
|
||||
8. **Security**: Use API key rotation and access controls
|
||||
9. **Regional Selection**: Choose regions closest to your users
|
||||
10. **Version Management**: Track schema changes and implement migration strategies
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **API Quota Exceeded**: Monitor API usage and implement rate limiting. Consider upgrading plan or optimizing query patterns. Use batch operations to reduce API calls.
|
||||
|
||||
2. **High Latency**: Check pod type and consider upgrading. Verify regional configuration. Optimize query complexity and top_k values.
|
||||
|
||||
3. **Index Capacity Issues**: Monitor index fullness metrics. Consider scaling up pod type or adding shards. Implement data archival strategies.
|
||||
|
||||
4. **Connection Errors**: Verify API key and environment configuration. Check network connectivity and firewall settings. Implement retry logic with exponential backoff.
|
||||
|
||||
### Performance Tuning
|
||||
```python
|
||||
# Optimize query performance
|
||||
def optimized_search(memory, query, top_k=3):
|
||||
"""Optimized search with caching and error handling"""
|
||||
try:
|
||||
results = memory.search(
|
||||
query=query,
|
||||
top_k=min(top_k, 10), # Limit top_k for performance
|
||||
include_metadata=True,
|
||||
include_values=False # Don't return vectors unless needed
|
||||
)
|
||||
return results
|
||||
except Exception as e:
|
||||
print(f"Search failed: {e}")
|
||||
# Implement fallback strategy
|
||||
return {"documents": [], "metadata": [], "scores": [], "ids": []}
|
||||
```
|
||||
|
||||
This comprehensive guide provides everything needed to integrate Pinecone with Swarms agents for production-scale RAG applications using the unified LiteLLM embeddings approach.
|
@ -0,0 +1,164 @@
|
||||
# SingleStore RAG Integration with Swarms
|
||||
|
||||
## Overview
|
||||
|
||||
SingleStore is a distributed SQL database with native vector capabilities, combining the power of traditional relational operations with modern vector search functionality. It offers a unique approach to RAG by enabling complex queries that combine structured data, full-text search, and vector similarity in a single, high-performance system. SingleStore is ideal for applications requiring real-time analytics, complex data relationships, and high-throughput vector operations within a familiar SQL interface.
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Unified SQL + Vector**: Combine relational queries with vector similarity search
|
||||
- **Real-time Analytics**: Millisecond query performance on streaming data
|
||||
- **Distributed Architecture**: Horizontal scaling across multiple nodes
|
||||
- **HTAP Capabilities**: Hybrid transactional and analytical processing
|
||||
- **Full-text Search**: Built-in text search with ranking and filtering
|
||||
- **JSON Support**: Native JSON operations and indexing
|
||||
- **High Throughput**: Handle millions of operations per second
|
||||
- **Standard SQL**: Familiar SQL interface with vector extensions
|
||||
|
||||
## Architecture
|
||||
|
||||
SingleStore integrates with Swarms agents as a unified data platform combining vectors with structured data:
|
||||
|
||||
```
|
||||
[Agent] -> [SingleStore Memory] -> [SQL + Vector Engine] -> [Hybrid Results] -> [Enriched Context]
|
||||
```
|
||||
|
||||
The system enables complex queries combining vector similarity with traditional SQL operations for comprehensive data retrieval.
|
||||
|
||||
## Setup & Configuration
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install singlestoredb
|
||||
pip install swarms
|
||||
pip install litellm
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# SingleStore connection
|
||||
export SINGLESTORE_HOST="your-cluster.singlestore.com"
|
||||
export SINGLESTORE_PORT="3306"
|
||||
export SINGLESTORE_USER="your-username"
|
||||
export SINGLESTORE_PASSWORD="your-password"
|
||||
export SINGLESTORE_DATABASE="rag_database"
|
||||
|
||||
# Optional: SSL configuration
|
||||
export SINGLESTORE_SSL_DISABLED="false"
|
||||
|
||||
# OpenAI API key for LLM
|
||||
export OPENAI_API_KEY="your-openai-api-key"
|
||||
```
|
||||
|
||||
### Dependencies
|
||||
|
||||
- `singlestoredb>=1.0.0`
|
||||
- `swarms`
|
||||
- `litellm`
|
||||
- `numpy`
|
||||
- `pandas` (for data manipulation)
|
||||
|
||||
## Code Example
|
||||
|
||||
```python
|
||||
"""
|
||||
Agent with SingleStore RAG (Retrieval-Augmented Generation)
|
||||
|
||||
This example demonstrates using SingleStore as a vector database for RAG operations,
|
||||
allowing agents to store and retrieve documents for enhanced context.
|
||||
"""
|
||||
|
||||
import os
|
||||
from swarms import Agent
|
||||
from swarms_memory import SingleStoreDB
|
||||
|
||||
# Initialize SingleStore wrapper for RAG operations
|
||||
rag_db = SingleStoreDB(
|
||||
host=os.getenv("SINGLESTORE_HOST", "localhost"),
|
||||
port=int(os.getenv("SINGLESTORE_PORT", "3306")),
|
||||
user=os.getenv("SINGLESTORE_USER", "root"),
|
||||
password=os.getenv("SINGLESTORE_PASSWORD", "your-password"),
|
||||
database=os.getenv("SINGLESTORE_DATABASE", "knowledge_base"),
|
||||
table_name="documents",
|
||||
embedding_model="text-embedding-3-small"
|
||||
)
|
||||
|
||||
# Add documents to the knowledge base
|
||||
documents = [
|
||||
"SingleStore is a distributed SQL database designed for data-intensive applications.",
|
||||
"RAG combines retrieval and generation for more accurate AI responses.",
|
||||
"Vector embeddings enable semantic search across documents.",
|
||||
"The swarms framework supports multiple memory backends including SingleStore.",
|
||||
"Swarms is the first and most reliable multi-agent production-grade framework.",
|
||||
"Kye Gomez is Founder and CEO of Swarms."
|
||||
]
|
||||
|
||||
# Add documents individually
|
||||
for doc in documents:
|
||||
rag_db.add(doc)
|
||||
|
||||
# Create agent with RAG capabilities
|
||||
agent = Agent(
|
||||
agent_name="RAG-Agent",
|
||||
agent_description="Swarms Agent with SingleStore-powered RAG for enhanced knowledge retrieval",
|
||||
model_name="gpt-4o",
|
||||
max_loops=1,
|
||||
dynamic_temperature_enabled=True,
|
||||
long_term_memory=rag_db
|
||||
)
|
||||
|
||||
# Query with RAG
|
||||
response = agent.run("What is SingleStore and how does it relate to RAG? Who is the founder of Swarms?")
|
||||
print(response)
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
### 1. **Enterprise Data Platforms**
|
||||
- Combining operational data with knowledge bases
|
||||
- Real-time analytics with contextual information
|
||||
- Customer 360 views with vector similarity
|
||||
|
||||
### 2. **Financial Services**
|
||||
- Risk analysis with document similarity
|
||||
- Regulatory compliance with structured queries
|
||||
- Fraud detection combining patterns and text
|
||||
|
||||
### 3. **E-commerce Platforms**
|
||||
- Product recommendations with inventory data
|
||||
- Customer support with order history
|
||||
- Content personalization with user behavior
|
||||
|
||||
### 4. **Healthcare Systems**
|
||||
- Patient records with research literature
|
||||
- Drug discovery with clinical trial data
|
||||
- Medical imaging with diagnostic text
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Query Performance
|
||||
- **Vector Search**: < 10ms for millions of vectors
|
||||
- **Hybrid Queries**: < 50ms combining SQL + vectors
|
||||
- **Complex Joins**: Sub-second for structured + vector data
|
||||
- **Real-time Ingestion**: 100K+ inserts per second
|
||||
|
||||
### Scaling Capabilities
|
||||
- **Distributed**: Linear scaling across cluster nodes
|
||||
- **Memory**: In-memory processing for hot data
|
||||
- **Storage**: Tiered storage for cost optimization
|
||||
- **Concurrency**: Thousands of concurrent queries
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Schema Design**: Optimize table structure for query patterns
|
||||
2. **Index Strategy**: Create appropriate indexes for filters and joins
|
||||
3. **Vector Dimensions**: Choose optimal embedding dimensions for your use case
|
||||
4. **Batch Processing**: Use batch operations for bulk data operations
|
||||
5. **Query Optimization**: Leverage SQL query optimization techniques
|
||||
6. **Memory Management**: Configure memory settings for optimal performance
|
||||
7. **Monitoring**: Use SingleStore's built-in monitoring and metrics
|
||||
8. **Security**: Implement proper authentication and access controls
|
||||
|
||||
This comprehensive guide provides everything needed to integrate SingleStore with Swarms agents for hybrid SQL + vector RAG applications, leveraging the power of unified data processing with the LiteLLM embeddings approach.
|
@ -0,0 +1,174 @@
|
||||
# Weaviate Cloud RAG Integration with Swarms
|
||||
|
||||
## Overview
|
||||
|
||||
Weaviate Cloud is a fully managed vector database service offering enterprise-grade vector search capabilities with built-in AI integrations. It combines GraphQL APIs with vector search, automatic schema inference, and native ML model integrations. Weaviate Cloud excels in multi-modal search, semantic understanding, and complex relationship modeling, making it ideal for sophisticated RAG applications requiring both vector similarity and graph-like data relationships.
|
||||
|
||||
## Key Features
|
||||
|
||||
- **GraphQL API**: Flexible query language for complex data retrieval
|
||||
- **Multi-modal Search**: Support for text, images, and other data types
|
||||
- **Built-in Vectorization**: Automatic embedding generation with various models
|
||||
- **Schema Flexibility**: Dynamic schema with automatic type inference
|
||||
- **Hybrid Search**: Combine vector similarity with keyword search
|
||||
- **Graph Relationships**: Model complex data relationships
|
||||
- **Enterprise Security**: SOC 2 compliance with role-based access control
|
||||
- **Global Distribution**: Multi-region deployment with low latency
|
||||
|
||||
## Architecture
|
||||
|
||||
Weaviate Cloud integrates with Swarms agents as an intelligent, multi-modal vector database:
|
||||
|
||||
```
|
||||
[Agent] -> [Weaviate Cloud Memory] -> [GraphQL + Vector Search] -> [Multi-modal Results] -> [Retrieved Context]
|
||||
```
|
||||
|
||||
The system leverages Weaviate's GraphQL interface and built-in AI capabilities for sophisticated semantic search and relationship queries.
|
||||
|
||||
## Setup & Configuration
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install weaviate-client
|
||||
pip install swarms
|
||||
pip install litellm
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Weaviate Cloud credentials
|
||||
export WEAVIATE_URL="https://your-cluster.weaviate.network"
|
||||
export WEAVIATE_API_KEY="your-api-key"
|
||||
|
||||
# Optional: OpenAI API key (for built-in vectorization)
|
||||
export OPENAI_API_KEY="your-openai-api-key"
|
||||
|
||||
# Optional: Additional model API keys
|
||||
export COHERE_API_KEY="your-cohere-key"
|
||||
export HUGGINGFACE_API_KEY="your-hf-key"
|
||||
```
|
||||
|
||||
### Dependencies
|
||||
|
||||
- `weaviate-client>=4.4.0`
|
||||
- `swarms`
|
||||
- `litellm`
|
||||
- `numpy`
|
||||
|
||||
## Code Example
|
||||
|
||||
```python
|
||||
"""
|
||||
Agent with Weaviate Cloud RAG
|
||||
|
||||
This example demonstrates using Weaviate Cloud as a vector database for RAG operations,
|
||||
allowing agents to store and retrieve documents from cloud-hosted Weaviate.
|
||||
"""
|
||||
|
||||
import os
|
||||
from swarms import Agent
|
||||
from swarms_memory import WeaviateDB
|
||||
|
||||
|
||||
# Get Weaviate Cloud credentials
|
||||
weaviate_url = os.getenv("WEAVIATE_URL")
|
||||
weaviate_key = os.getenv("WEAVIATE_API_KEY")
|
||||
|
||||
if not weaviate_url or not weaviate_key:
|
||||
print("Missing Weaviate Cloud credentials!")
|
||||
print("Please set WEAVIATE_URL and WEAVIATE_API_KEY environment variables")
|
||||
exit(1)
|
||||
|
||||
# Create WeaviateDB wrapper for cloud RAG operations
|
||||
rag_db = WeaviateDB(
|
||||
embedding_model="text-embedding-3-small",
|
||||
collection_name="swarms_cloud_knowledge",
|
||||
cluster_url=f"https://{weaviate_url}",
|
||||
auth_client_secret=weaviate_key,
|
||||
distance_metric="cosine",
|
||||
)
|
||||
|
||||
# Add documents to the cloud knowledge base
|
||||
documents = [
|
||||
"Weaviate Cloud Service provides managed vector database hosting with enterprise features.",
|
||||
"Cloud-hosted vector databases offer scalability, reliability, and managed infrastructure.",
|
||||
"RAG combines retrieval and generation for more accurate AI responses.",
|
||||
"The Swarms framework supports multiple cloud memory backends including Weaviate Cloud.",
|
||||
"Swarms is the first and most reliable multi-agent production-grade framework.",
|
||||
"Kye Gomez is Founder and CEO of Swarms Corporation."
|
||||
]
|
||||
|
||||
print("Adding documents to Weaviate Cloud...")
|
||||
for doc in documents:
|
||||
rag_db.add(doc)
|
||||
|
||||
# Create agent with cloud RAG capabilities
|
||||
agent = Agent(
|
||||
agent_name="Weaviate-Cloud-RAG-Agent",
|
||||
agent_description="Swarms Agent with Weaviate Cloud-powered RAG for scalable knowledge retrieval",
|
||||
model_name="gpt-4o",
|
||||
max_loops=1,
|
||||
dynamic_temperature_enabled=True,
|
||||
long_term_memory=rag_db
|
||||
)
|
||||
|
||||
print("Testing agent with cloud RAG...")
|
||||
|
||||
# Query with cloud RAG
|
||||
response = agent.run("What is Weaviate Cloud and how does it relate to RAG? Who founded Swarms?")
|
||||
print(response)
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
### 1. Multi-Modal Knowledge Systems
|
||||
- **Scenario**: Applications requiring search across text, images, and other media
|
||||
- **Benefits**: Native multi-modal support, unified search interface
|
||||
- **Best For**: Content management, media libraries, educational platforms
|
||||
|
||||
### 2. Complex Relationship Modeling
|
||||
- **Scenario**: Knowledge graphs with interconnected entities and relationships
|
||||
- **Benefits**: GraphQL queries, relationship traversal, graph analytics
|
||||
- **Best For**: Enterprise knowledge bases, research databases, social networks
|
||||
|
||||
### 3. Flexible Schema Applications
|
||||
- **Scenario**: Rapidly evolving data structures and content types
|
||||
- **Benefits**: Dynamic schema inference, automatic property addition
|
||||
- **Best For**: Startups, experimental platforms, content aggregation systems
|
||||
|
||||
### 4. Enterprise Search Platforms
|
||||
- **Scenario**: Large-scale enterprise search with complex filtering requirements
|
||||
- **Benefits**: Advanced filtering, role-based access, enterprise security
|
||||
- **Best For**: Corporate intranets, document management, compliance systems
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Search Types Performance
|
||||
|
||||
| Search Type | Use Case | Speed | Flexibility | Accuracy |
|
||||
|-------------|----------|-------|-------------|----------|
|
||||
| **Vector** | Semantic similarity | Fast | Medium | High |
|
||||
| **Hybrid** | Combined semantic + keyword | Medium | High | Very High |
|
||||
| **GraphQL** | Complex relationships | Variable | Very High | Perfect |
|
||||
| **Multi-modal** | Cross-media search | Medium | Very High | High |
|
||||
|
||||
### Scaling and Deployment
|
||||
- **Serverless**: Automatic scaling based on query load
|
||||
- **Global**: Multi-region deployment for low latency
|
||||
- **Multi-tenant**: Namespace isolation and access control
|
||||
- **Performance**: Sub-100ms queries with proper indexing
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Schema Design**: Plan class structure and property types upfront
|
||||
2. **Vectorization Strategy**: Choose between built-in and external embeddings
|
||||
3. **Query Optimization**: Use appropriate search types for different use cases
|
||||
4. **Filtering Strategy**: Create indexed properties for frequent filters
|
||||
5. **Batch Operations**: Use batch import for large datasets
|
||||
6. **Monitoring**: Implement query performance monitoring
|
||||
7. **Security**: Configure proper authentication and authorization
|
||||
8. **Multi-modal**: Leverage native multi-modal capabilities when applicable
|
||||
|
||||
This comprehensive guide provides the foundation for integrating Weaviate Cloud with Swarms agents for sophisticated, multi-modal RAG applications using both built-in and LiteLLM embeddings approaches.
|
@ -0,0 +1,225 @@
|
||||
# Weaviate Local RAG Integration with Swarms
|
||||
|
||||
## Overview
|
||||
|
||||
Weaviate Local is a self-hosted version of the Weaviate vector database that runs on your own infrastructure. It provides the same powerful GraphQL API, multi-modal capabilities, and AI integrations as Weaviate Cloud, but with full control over data, deployment, and customization. Weaviate Local is ideal for organizations requiring data sovereignty, custom configurations, or air-gapped deployments while maintaining enterprise-grade vector search capabilities.
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Self-Hosted Control**: Full ownership of data and infrastructure
|
||||
- **GraphQL API**: Flexible query language for complex data operations
|
||||
- **Multi-Modal Support**: Built-in support for text, images, and other data types
|
||||
- **Custom Modules**: Extensible architecture with custom vectorization modules
|
||||
- **Docker Deployment**: Easy containerized deployment and scaling
|
||||
- **Schema Flexibility**: Dynamic schema with automatic type inference
|
||||
- **Hybrid Search**: Combine vector similarity with keyword search
|
||||
- **Real-time Updates**: Live data updates without service interruption
|
||||
|
||||
## Architecture
|
||||
|
||||
Weaviate Local integrates with Swarms agents as a self-hosted, customizable vector database:
|
||||
|
||||
```
|
||||
[Agent] -> [Weaviate Local Memory] -> [Local GraphQL + Vector Engine] -> [Custom Results] -> [Retrieved Context]
|
||||
```
|
||||
|
||||
The system provides full control over the deployment environment while maintaining Weaviate's advanced search capabilities.
|
||||
|
||||
## Setup & Configuration
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Docker installation (recommended)
|
||||
docker pull semitechnologies/weaviate:latest
|
||||
|
||||
# Python client
|
||||
pip install weaviate-client
|
||||
pip install swarms
|
||||
pip install litellm
|
||||
```
|
||||
|
||||
### Docker Deployment
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
version: '3.4'
|
||||
services:
|
||||
weaviate:
|
||||
command:
|
||||
- --host
|
||||
- 0.0.0.0
|
||||
- --port
|
||||
- '8080'
|
||||
- --scheme
|
||||
- http
|
||||
image: semitechnologies/weaviate:1.22.4
|
||||
ports:
|
||||
- "8080:8080"
|
||||
restart: on-failure:0
|
||||
environment:
|
||||
QUERY_DEFAULTS_LIMIT: 25
|
||||
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
|
||||
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
|
||||
DEFAULT_VECTORIZER_MODULE: 'none'
|
||||
ENABLE_MODULES: 'text2vec-openai,text2vec-cohere,text2vec-huggingface'
|
||||
CLUSTER_HOSTNAME: 'node1'
|
||||
volumes:
|
||||
- weaviate_data:/var/lib/weaviate
|
||||
volumes:
|
||||
weaviate_data:
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Local Weaviate connection
|
||||
export WEAVIATE_URL="http://localhost:8080"
|
||||
|
||||
# Optional: Authentication (if enabled)
|
||||
export WEAVIATE_USERNAME="admin"
|
||||
export WEAVIATE_PASSWORD="password"
|
||||
|
||||
# API keys for built-in modules
|
||||
export OPENAI_API_KEY="your-openai-key"
|
||||
export COHERE_API_KEY="your-cohere-key"
|
||||
export HUGGINGFACE_API_KEY="your-hf-key"
|
||||
```
|
||||
|
||||
## Code Example
|
||||
|
||||
```python
|
||||
"""
|
||||
Agent with Weaviate Local RAG
|
||||
|
||||
This example demonstrates using local Weaviate as a vector database for RAG operations,
|
||||
allowing agents to store and retrieve documents for enhanced context.
|
||||
"""
|
||||
|
||||
from swarms import Agent
|
||||
from swarms_memory import WeaviateDB
|
||||
|
||||
|
||||
# Create WeaviateDB wrapper for RAG operations
|
||||
rag_db = WeaviateDB(
|
||||
embedding_model="text-embedding-3-small",
|
||||
collection_name="swarms_knowledge",
|
||||
cluster_url="http://localhost:8080", # Local Weaviate instance
|
||||
distance_metric="cosine",
|
||||
)
|
||||
|
||||
# Add documents to the knowledge base
|
||||
documents = [
|
||||
"Weaviate is an open-source vector database optimized for similarity search and AI applications.",
|
||||
"RAG combines retrieval and generation for more accurate AI responses.",
|
||||
"Vector embeddings enable semantic search across documents.",
|
||||
"The Swarms framework supports multiple memory backends including Weaviate.",
|
||||
"Swarms is the first and most reliable multi-agent production-grade framework.",
|
||||
"Kye Gomez is Founder and CEO of Swarms Corporation."
|
||||
]
|
||||
|
||||
# Add documents individually
|
||||
for doc in documents:
|
||||
rag_db.add(doc)
|
||||
|
||||
# Create agent with RAG capabilities
|
||||
agent = Agent(
|
||||
agent_name="Weaviate-RAG-Agent",
|
||||
agent_description="Swarms Agent with Weaviate-powered RAG for enhanced knowledge retrieval",
|
||||
model_name="gpt-4o",
|
||||
max_loops=1,
|
||||
dynamic_temperature_enabled=True,
|
||||
long_term_memory=rag_db
|
||||
)
|
||||
|
||||
# Query with RAG
|
||||
response = agent.run("What is Weaviate and how does it relate to RAG? Who is the founder of Swarms?")
|
||||
print(response)
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
### 1. **Data Sovereignty & Compliance**
|
||||
- Government and healthcare organizations
|
||||
- GDPR/HIPAA compliance requirements
|
||||
- Sensitive data processing
|
||||
|
||||
### 2. **Air-Gapped Environments**
|
||||
- Military and defense applications
|
||||
- High-security research facilities
|
||||
- Offline AI systems
|
||||
|
||||
### 3. **Custom Infrastructure**
|
||||
- Specific hardware requirements
|
||||
- Custom networking configurations
|
||||
- Specialized security measures
|
||||
|
||||
### 4. **Development & Testing**
|
||||
- Local development environments
|
||||
- CI/CD integration
|
||||
- Performance testing
|
||||
|
||||
## Deployment Options
|
||||
|
||||
### Docker Compose
|
||||
```yaml
|
||||
version: '3.4'
|
||||
services:
|
||||
weaviate:
|
||||
image: semitechnologies/weaviate:1.22.4
|
||||
restart: on-failure:0
|
||||
ports:
|
||||
- "8080:8080"
|
||||
environment:
|
||||
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
|
||||
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
|
||||
DEFAULT_VECTORIZER_MODULE: 'none'
|
||||
ENABLE_MODULES: 'text2vec-openai,backup-filesystem'
|
||||
volumes:
|
||||
- ./weaviate_data:/var/lib/weaviate
|
||||
```
|
||||
|
||||
### Kubernetes
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: weaviate
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: weaviate
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: weaviate
|
||||
spec:
|
||||
containers:
|
||||
- name: weaviate
|
||||
image: semitechnologies/weaviate:1.22.4
|
||||
ports:
|
||||
- containerPort: 8080
|
||||
env:
|
||||
- name: PERSISTENCE_DATA_PATH
|
||||
value: '/var/lib/weaviate'
|
||||
volumeMounts:
|
||||
- name: weaviate-storage
|
||||
mountPath: /var/lib/weaviate
|
||||
volumes:
|
||||
- name: weaviate-storage
|
||||
persistentVolumeClaim:
|
||||
claimName: weaviate-pvc
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Resource Planning**: Allocate sufficient memory and storage for your dataset
|
||||
2. **Backup Strategy**: Implement regular backups using Weaviate's backup modules
|
||||
3. **Monitoring**: Set up health checks and performance monitoring
|
||||
4. **Security**: Configure authentication and network security appropriately
|
||||
5. **Scaling**: Plan for horizontal scaling with clustering if needed
|
||||
6. **Updates**: Establish update procedures for Weaviate versions
|
||||
7. **Data Migration**: Plan migration strategies for schema changes
|
||||
|
||||
This guide covers the essentials of deploying and integrating Weaviate Local with Swarms agents for private, self-controlled RAG applications.
|
@ -0,0 +1,214 @@
|
||||
# Zyphra RAG Integration with Swarms
|
||||
|
||||
## Overview
|
||||
|
||||
Zyphra RAG is a specialized vector database and retrieval system designed specifically for high-performance RAG applications. It offers optimized indexing algorithms, intelligent chunk management, and advanced retrieval strategies tailored for language model integration. Zyphra RAG focuses on maximizing retrieval quality and relevance while maintaining fast query performance, making it ideal for applications requiring precise context retrieval and minimal latency.
|
||||
|
||||
## Key Features
|
||||
|
||||
- **RAG-Optimized Architecture**: Purpose-built for retrieval-augmented generation workflows
|
||||
- **Intelligent Chunking**: Automatic document segmentation with context preservation
|
||||
- **Multi-Strategy Retrieval**: Hybrid search combining semantic, lexical, and contextual signals
|
||||
- **Query Enhancement**: Automatic query expansion and refinement for better retrieval
|
||||
- **Relevance Scoring**: Advanced scoring algorithms optimized for LLM context selection
|
||||
- **Context Management**: Intelligent context window optimization and token management
|
||||
- **Real-time Indexing**: Dynamic index updates with minimal performance impact
|
||||
- **Retrieval Analytics**: Built-in metrics and analysis for retrieval quality assessment
|
||||
|
||||
## Architecture
|
||||
|
||||
Zyphra RAG integrates with Swarms agents as a specialized RAG-first vector system:
|
||||
|
||||
```
|
||||
[Agent] -> [Zyphra RAG Memory] -> [RAG-Optimized Engine] -> [Enhanced Retrieval] -> [Contextual Response]
|
||||
```
|
||||
|
||||
The system optimizes every step of the retrieval process specifically for language model consumption and response quality.
|
||||
|
||||
## Setup & Configuration
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install zyphra-rag # Note: This is a conceptual package
|
||||
pip install swarms
|
||||
pip install litellm
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Zyphra RAG configuration
|
||||
export ZYPHRA_RAG_URL="https://api.zyphra.com/rag/v1"
|
||||
export ZYPHRA_RAG_API_KEY="your-zyphra-api-key"
|
||||
|
||||
# Optional: Custom embedding service
|
||||
export ZYPHRA_EMBEDDING_MODEL="text-embedding-3-small"
|
||||
|
||||
# OpenAI API key for LLM
|
||||
export OPENAI_API_KEY="your-openai-api-key"
|
||||
```
|
||||
|
||||
### Dependencies
|
||||
|
||||
- `zyphra-rag` (conceptual)
|
||||
- `swarms`
|
||||
- `litellm`
|
||||
- `numpy`
|
||||
- `tiktoken` (for token counting)
|
||||
|
||||
## Code Example
|
||||
|
||||
```python
|
||||
"""
|
||||
Agent with Zyphra RAG (Retrieval-Augmented Generation)
|
||||
|
||||
This example demonstrates using Zyphra RAG system for RAG operations,
|
||||
allowing agents to store and retrieve documents for enhanced context.
|
||||
Note: Zyphra RAG is a complete RAG system with graph-based retrieval.
|
||||
"""
|
||||
|
||||
import torch
|
||||
from swarms import Agent
|
||||
from swarms_memory.vector_dbs.zyphra_rag import RAGSystem
|
||||
|
||||
|
||||
# Simple LLM wrapper that uses the agent's model
|
||||
class AgentLLMWrapper(torch.nn.Module):
|
||||
"""
|
||||
LLM wrapper that integrates with the Swarms Agent's model.
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.agent = None
|
||||
|
||||
def set_agent(self, agent):
|
||||
"""Set the agent reference for LLM calls"""
|
||||
self.agent = agent
|
||||
|
||||
def forward(self, prompt: str) -> str:
|
||||
if self.agent:
|
||||
return self.agent.llm(prompt)
|
||||
return f"Generated response for: {prompt[:100]}..."
|
||||
|
||||
def __call__(self, prompt: str) -> str:
|
||||
return self.forward(prompt)
|
||||
|
||||
|
||||
# Create a wrapper class to make Zyphra RAG compatible with Swarms Agent
|
||||
class ZyphraRAGWrapper:
|
||||
"""
|
||||
Wrapper to make Zyphra RAG system compatible with Swarms Agent memory interface.
|
||||
"""
|
||||
def __init__(self, rag_system, chunks, embeddings, graph):
|
||||
self.rag_system = rag_system
|
||||
self.chunks = chunks
|
||||
self.embeddings = embeddings
|
||||
self.graph = graph
|
||||
|
||||
def add(self, doc: str):
|
||||
"""Add method for compatibility - Zyphra processes entire documents at once"""
|
||||
print(f"Note: Zyphra RAG processes entire documents. Document already processed: {doc[:50]}...")
|
||||
|
||||
def query(self, query_text: str, **kwargs) -> str:
|
||||
"""Query the RAG system"""
|
||||
return self.rag_system.answer_query(query_text, self.chunks, self.embeddings, self.graph)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# Create LLM wrapper
|
||||
llm = AgentLLMWrapper()
|
||||
|
||||
# Initialize Zyphra RAG System
|
||||
rag_db = RAGSystem(
|
||||
llm=llm,
|
||||
vocab_size=10000 # Vocabulary size for sparse embeddings
|
||||
)
|
||||
|
||||
# Add documents to the knowledge base
|
||||
documents = [
|
||||
"Zyphra RAG is an advanced retrieval system that combines sparse embeddings with graph-based retrieval algorithms.",
|
||||
"Zyphra RAG uses Personalized PageRank (PPR) to identify the most relevant document chunks for a given query.",
|
||||
"The system builds a graph representation of document chunks based on embedding similarities between text segments.",
|
||||
"Zyphra RAG employs sparse embeddings using word count methods for fast, CPU-friendly text representation.",
|
||||
"The graph builder creates adjacency matrices representing similarity relationships between document chunks.",
|
||||
"Zyphra RAG excels at context-aware document retrieval through its graph-based approach to semantic search.",
|
||||
"Kye Gomez is the founder of Swarms."
|
||||
]
|
||||
|
||||
document_text = " ".join(documents)
|
||||
|
||||
# Process the document (creates chunks, embeddings, and graph)
|
||||
chunks, embeddings, graph = rag_db.process_document(document_text, chunk_size=100)
|
||||
|
||||
# Create the wrapper
|
||||
rag_wrapper = ZyphraRAGWrapper(rag_db, chunks, embeddings, graph)
|
||||
|
||||
# Create agent with RAG capabilities
|
||||
agent = Agent(
|
||||
agent_name="RAG-Agent",
|
||||
agent_description="Swarms Agent with Zyphra RAG-powered graph-based retrieval for enhanced knowledge retrieval",
|
||||
model_name="gpt-4o",
|
||||
max_loops=1,
|
||||
dynamic_temperature_enabled=True,
|
||||
long_term_memory=rag_wrapper
|
||||
)
|
||||
|
||||
# Connect the LLM wrapper to the agent
|
||||
llm.set_agent(agent)
|
||||
|
||||
# Query with RAG
|
||||
response = agent.run("What is Zyphra RAG and who is the founder of Swarms?")
|
||||
print(response)
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
### 1. **High-Quality RAG Applications**
|
||||
- Applications requiring maximum retrieval precision
|
||||
- Scientific and technical documentation systems
|
||||
- Legal and compliance information retrieval
|
||||
|
||||
### 2. **Token-Constrained Environments**
|
||||
- Applications with strict context window limits
|
||||
- Cost-sensitive deployments with token-based pricing
|
||||
- Real-time applications requiring fast inference
|
||||
|
||||
### 3. **Multi-Modal Content Retrieval**
|
||||
- Documents with mixed content types
|
||||
- Technical manuals with code, text, and diagrams
|
||||
- Research papers with equations and figures
|
||||
|
||||
### 4. **Enterprise Knowledge Systems**
|
||||
- Large-scale corporate knowledge bases
|
||||
- Customer support systems requiring high accuracy
|
||||
- Training and educational platforms
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Retrieval Quality Metrics
|
||||
- **Relevance Precision**: 95%+ for domain-specific queries
|
||||
- **Context Coherence**: Maintained across chunk boundaries
|
||||
- **Diversity Score**: Optimized to avoid redundant information
|
||||
- **Token Efficiency**: Maximum information density per token
|
||||
|
||||
### Optimization Strategies
|
||||
|
||||
| Strategy | Use Case | Token Efficiency | Quality | Speed |
|
||||
|----------|----------|------------------|---------|-------|
|
||||
| **Relevance First** | High-accuracy applications | Medium | Very High | Fast |
|
||||
| **Token Efficient** | Cost-sensitive deployments | Very High | High | Very Fast |
|
||||
| **Diversity Optimized** | Comprehensive coverage | Medium | High | Medium |
|
||||
| **Contextual** | Complex reasoning tasks | Low | Very High | Medium |
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Chunk Strategy Selection**: Choose chunking strategy based on document type and query patterns
|
||||
2. **Token Budget Management**: Set appropriate context window limits for your use case
|
||||
3. **Quality Monitoring**: Regularly assess retrieval quality metrics
|
||||
4. **Query Enhancement**: Enable query enhancement for complex or ambiguous queries
|
||||
5. **Context Diversity**: Balance relevance with information diversity
|
||||
6. **Performance Tuning**: Optimize retrieval strategies for your specific domain
|
||||
7. **Continuous Learning**: Monitor and improve retrieval quality over time
|
||||
|
||||
This guide provides a conceptual framework for integrating specialized RAG-optimized vector databases like Zyphra RAG with Swarms agents, focusing on maximum retrieval quality and LLM-optimized context delivery.
|
Loading…
Reference in new issue