You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
286 lines
9.8 KiB
286 lines
9.8 KiB
# Milvus Cloud RAG Integration with Swarms
|
|
|
|
## Overview
|
|
|
|
Milvus Cloud (also known as Zilliz Cloud) is a fully managed cloud service for Milvus, the world's most advanced open-source vector database. It provides enterprise-grade vector database capabilities with automatic scaling, high availability, and comprehensive security features. Milvus Cloud is designed for production-scale RAG applications that require robust performance, reliability, and minimal operational overhead.
|
|
|
|
## Key Features
|
|
|
|
- **Fully Managed Service**: No infrastructure management required
|
|
- **Auto-scaling**: Automatic scaling based on workload demands
|
|
- **High Availability**: Built-in redundancy and disaster recovery
|
|
- **Multiple Index Types**: Support for various indexing algorithms (IVF, HNSW, ANNOY, etc.)
|
|
- **Rich Metadata Filtering**: Advanced filtering capabilities with complex expressions
|
|
- **Multi-tenancy**: Secure isolation between different applications
|
|
- **Global Distribution**: Available in multiple cloud regions worldwide
|
|
- **Enterprise Security**: End-to-end encryption and compliance certifications
|
|
|
|
## Architecture
|
|
|
|
Milvus Cloud integrates with Swarms agents as a scalable, managed vector database solution:
|
|
|
|
```
|
|
[Agent] -> [Milvus Cloud Memory] -> [Managed Vector DB] -> [Similarity Search] -> [Retrieved Context]
|
|
```
|
|
|
|
The system leverages Milvus Cloud's distributed architecture to provide high-performance vector operations with enterprise-grade reliability.
|
|
|
|
## Setup & Configuration
|
|
|
|
### Installation
|
|
|
|
```bash
|
|
pip install pymilvus[cloud]
|
|
pip install swarms
|
|
pip install litellm
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# Milvus Cloud credentials
|
|
export MILVUS_CLOUD_URI="https://your-cluster.api.milvuscloud.com"
|
|
export MILVUS_CLOUD_TOKEN="your-api-token"
|
|
|
|
# Optional: Database name (default: "default")
|
|
export MILVUS_DATABASE="your-database"
|
|
|
|
# OpenAI API key for LLM
|
|
export OPENAI_API_KEY="your-openai-api-key"
|
|
```
|
|
|
|
### Dependencies
|
|
|
|
- `pymilvus>=2.3.0`
|
|
- `swarms`
|
|
- `litellm`
|
|
- `numpy`
|
|
|
|
## Code Example
|
|
|
|
```python
|
|
"""
|
|
Agent with Milvus Cloud RAG (Retrieval-Augmented Generation)
|
|
|
|
This example demonstrates using Milvus Cloud (Zilliz) as a vector database for RAG operations,
|
|
allowing agents to store and retrieve documents from your cloud-hosted Milvus account.
|
|
"""
|
|
|
|
import os
|
|
from swarms import Agent
|
|
from swarms_memory import MilvusDB
|
|
|
|
# Get Milvus Cloud credentials
|
|
milvus_uri = os.getenv("MILVUS_URI")
|
|
milvus_token = os.getenv("MILVUS_TOKEN")
|
|
|
|
if not milvus_uri or not milvus_token:
|
|
print("❌ Missing Milvus Cloud credentials!")
|
|
print("Please set MILVUS_URI and MILVUS_TOKEN in your .env file")
|
|
exit(1)
|
|
|
|
# Initialize Milvus Cloud wrapper for RAG operations
|
|
rag_db = MilvusDB(
|
|
embedding_model="text-embedding-3-small", # OpenAI embedding model
|
|
collection_name="swarms_cloud_knowledge", # Cloud collection name
|
|
uri=milvus_uri, # Your Zilliz Cloud URI
|
|
token=milvus_token, # Your Zilliz Cloud token
|
|
metric="COSINE", # Distance metric for similarity search
|
|
)
|
|
|
|
# Add documents to the knowledge base
|
|
documents = [
|
|
"Milvus Cloud is a fully managed vector database service provided by Zilliz.",
|
|
"RAG combines retrieval and generation for more accurate AI responses.",
|
|
"Vector embeddings enable semantic search across documents.",
|
|
"The swarms framework supports multiple memory backends including Milvus Cloud.",
|
|
"Swarms is the first and most reliable multi-agent production-grade framework.",
|
|
"Kye Gomez is Founder and CEO of Swarms."
|
|
]
|
|
|
|
# Add documents individually
|
|
for doc in documents:
|
|
rag_db.add(doc)
|
|
|
|
# Create agent with RAG capabilities
|
|
agent = Agent(
|
|
agent_name="Cloud-RAG-Agent",
|
|
agent_description="Swarms Agent with Milvus Cloud-powered RAG for scalable knowledge retrieval",
|
|
model_name="gpt-4o",
|
|
max_loops=1,
|
|
dynamic_temperature_enabled=True,
|
|
long_term_memory=rag_db
|
|
)
|
|
|
|
# Query with RAG
|
|
response = agent.run("What is Milvus Cloud and how does it relate to RAG? Who is the founder of Swarms?")
|
|
print(response)
|
|
```
|
|
|
|
## Use Cases
|
|
|
|
### 1. Enterprise Knowledge Management
|
|
- **Scenario**: Large-scale corporate knowledge bases with millions of documents
|
|
- **Benefits**: Auto-scaling, high availability, enterprise security
|
|
- **Best For**: Fortune 500 companies, global organizations
|
|
|
|
### 2. Production RAG Applications
|
|
- **Scenario**: Customer-facing AI applications requiring 99.9% uptime
|
|
- **Benefits**: Managed infrastructure, automatic scaling, disaster recovery
|
|
- **Best For**: SaaS platforms, customer support systems
|
|
|
|
### 3. Multi-tenant Applications
|
|
- **Scenario**: Serving multiple customers with isolated data
|
|
- **Benefits**: Built-in multi-tenancy, secure data isolation
|
|
- **Best For**: AI platform providers, B2B SaaS solutions
|
|
|
|
### 4. Global AI Applications
|
|
- **Scenario**: Applications serving users worldwide
|
|
- **Benefits**: Global distribution, edge optimization
|
|
- **Best For**: International companies, global services
|
|
|
|
## Performance Characteristics
|
|
|
|
### Scaling
|
|
- **Auto-scaling**: Automatic compute and storage scaling based on workload
|
|
- **Horizontal Scaling**: Support for billions of vectors across multiple nodes
|
|
- **Vertical Scaling**: On-demand resource allocation for compute-intensive tasks
|
|
|
|
### Performance Metrics
|
|
- **Query Latency**: < 10ms for 95th percentile queries
|
|
- **Throughput**: 10,000+ QPS depending on configuration
|
|
- **Availability**: 99.9% uptime SLA
|
|
- **Consistency**: Tunable consistency levels
|
|
|
|
### Index Types Performance
|
|
|
|
| Index Type | Use Case | Performance | Memory | Accuracy |
|
|
|------------|----------|-------------|---------|----------|
|
|
| **HNSW** | High-performance similarity search | Ultra-fast | Medium | Very High |
|
|
| **IVF_FLAT** | Large datasets with exact results | Fast | High | Perfect |
|
|
| **IVF_SQ8** | Memory-efficient large datasets | Fast | Low | High |
|
|
| **ANNOY** | Read-heavy workloads | Very Fast | Low | High |
|
|
|
|
## Cloud vs Local Deployment
|
|
|
|
### Milvus Cloud Advantages
|
|
- **Fully Managed**: Zero infrastructure management
|
|
- **Enterprise Features**: Advanced security, compliance, monitoring
|
|
- **Global Scale**: Multi-region deployment capabilities
|
|
- **Cost Optimization**: Pay-per-use pricing model
|
|
- **Professional Support**: 24/7 technical support
|
|
|
|
### Configuration Options
|
|
```python
|
|
# Production configuration with advanced features
|
|
memory = MilvusCloudMemory(
|
|
collection_name="production_knowledge_base",
|
|
embedding_model="text-embedding-3-small",
|
|
dimension=1536,
|
|
index_type="HNSW", # Best for similarity search
|
|
metric_type="COSINE"
|
|
)
|
|
|
|
# Development configuration
|
|
memory = MilvusCloudMemory(
|
|
collection_name="dev_knowledge_base",
|
|
embedding_model="text-embedding-3-small",
|
|
dimension=1536,
|
|
index_type="IVF_FLAT", # Balanced performance
|
|
metric_type="L2"
|
|
)
|
|
```
|
|
|
|
## Advanced Features
|
|
|
|
### Rich Metadata Filtering
|
|
```python
|
|
# Complex filter expressions
|
|
filter_expr = '''
|
|
(metadata["category"] == "ai" and metadata["difficulty"] == "advanced")
|
|
or (metadata["topic"] == "embeddings" and metadata["type"] == "concept")
|
|
'''
|
|
|
|
results = memory.search(
|
|
query="advanced AI concepts",
|
|
limit=5,
|
|
filter_expr=filter_expr
|
|
)
|
|
```
|
|
|
|
### Hybrid Search
|
|
```python
|
|
# Combine vector similarity with metadata filtering
|
|
results = memory.search(
|
|
query="machine learning algorithms",
|
|
limit=10,
|
|
filter_expr='metadata["category"] in ["ai", "ml"] and metadata["difficulty"] != "beginner"'
|
|
)
|
|
```
|
|
|
|
### Collection Management
|
|
```python
|
|
# Create multiple collections for different domains
|
|
medical_memory = MilvusCloudMemory(
|
|
collection_name="medical_knowledge",
|
|
embedding_model="text-embedding-3-small"
|
|
)
|
|
|
|
legal_memory = MilvusCloudMemory(
|
|
collection_name="legal_documents",
|
|
embedding_model="text-embedding-3-small"
|
|
)
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Index Selection**: Choose HNSW for similarity search, IVF for large datasets
|
|
2. **Metadata Design**: Design rich metadata schema for effective filtering
|
|
3. **Batch Operations**: Use batch operations for better throughput
|
|
4. **Connection Pooling**: Implement connection pooling for production applications
|
|
5. **Error Handling**: Implement robust error handling and retry logic
|
|
6. **Monitoring**: Set up monitoring and alerting for performance metrics
|
|
7. **Cost Optimization**: Monitor usage and optimize collection configurations
|
|
8. **Security**: Follow security best practices for authentication and data access
|
|
|
|
## Monitoring and Observability
|
|
|
|
### Key Metrics to Monitor
|
|
- Query latency percentiles (p50, p95, p99)
|
|
- Query throughput (QPS)
|
|
- Error rates and types
|
|
- Collection size and growth
|
|
- Resource utilization
|
|
|
|
### Alerting Setup
|
|
```python
|
|
# Example monitoring integration
|
|
import logging
|
|
|
|
logger = logging.getLogger("milvus_rag")
|
|
|
|
def monitored_search(memory, query, **kwargs):
|
|
start_time = time.time()
|
|
try:
|
|
results = memory.search(query, **kwargs)
|
|
duration = time.time() - start_time
|
|
logger.info(f"Search completed in {duration:.3f}s, found {len(results['documents'])} results")
|
|
return results
|
|
except Exception as e:
|
|
logger.error(f"Search failed: {e}")
|
|
raise
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **Connection Errors**: Verify MILVUS_CLOUD_URI and MILVUS_CLOUD_TOKEN. Check network connectivity and firewall settings. Confirm cloud region accessibility.
|
|
|
|
2. **Performance Issues**: Monitor collection size and index type appropriateness. Check query complexity and filter expressions. Review auto-scaling configuration.
|
|
|
|
3. **Search Accuracy Issues**: Verify embedding model consistency. Check vector normalization if using cosine similarity. Review index parameters and search parameters.
|
|
|
|
4. **Quota and Billing Issues**: Monitor usage against plan limits. Review auto-scaling settings. Check billing alerts and notifications.
|
|
|
|
This comprehensive guide provides everything needed to integrate Milvus Cloud with Swarms agents for enterprise-scale RAG applications using the unified LiteLLM embeddings approach. |