You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
swarms/docs/rag/milvus-cloud.md

9.8 KiB

Milvus Cloud RAG Integration with Swarms

Overview

Milvus Cloud (also known as Zilliz Cloud) is a fully managed cloud service for Milvus, the world's most advanced open-source vector database. It provides enterprise-grade vector database capabilities with automatic scaling, high availability, and comprehensive security features. Milvus Cloud is designed for production-scale RAG applications that require robust performance, reliability, and minimal operational overhead.

Key Features

  • Fully Managed Service: No infrastructure management required
  • Auto-scaling: Automatic scaling based on workload demands
  • High Availability: Built-in redundancy and disaster recovery
  • Multiple Index Types: Support for various indexing algorithms (IVF, HNSW, ANNOY, etc.)
  • Rich Metadata Filtering: Advanced filtering capabilities with complex expressions
  • Multi-tenancy: Secure isolation between different applications
  • Global Distribution: Available in multiple cloud regions worldwide
  • Enterprise Security: End-to-end encryption and compliance certifications

Architecture

Milvus Cloud integrates with Swarms agents as a scalable, managed vector database solution:

[Agent] -> [Milvus Cloud Memory] -> [Managed Vector DB] -> [Similarity Search] -> [Retrieved Context]

The system leverages Milvus Cloud's distributed architecture to provide high-performance vector operations with enterprise-grade reliability.

Setup & Configuration

Installation

pip install pymilvus[cloud]
pip install swarms
pip install litellm

Environment Variables

# Milvus Cloud credentials
export MILVUS_CLOUD_URI="https://your-cluster.api.milvuscloud.com"
export MILVUS_CLOUD_TOKEN="your-api-token"

# Optional: Database name (default: "default")
export MILVUS_DATABASE="your-database"

# OpenAI API key for LLM
export OPENAI_API_KEY="your-openai-api-key"

Dependencies

  • pymilvus>=2.3.0
  • swarms
  • litellm
  • numpy

Code Example

"""
Agent with Milvus Cloud RAG (Retrieval-Augmented Generation)

This example demonstrates using Milvus Cloud (Zilliz) as a vector database for RAG operations,
allowing agents to store and retrieve documents from your cloud-hosted Milvus account.
"""

import os
from swarms import Agent
from swarms_memory import MilvusDB

# Get Milvus Cloud credentials
milvus_uri = os.getenv("MILVUS_URI")
milvus_token = os.getenv("MILVUS_TOKEN")

if not milvus_uri or not milvus_token:
    print("❌ Missing Milvus Cloud credentials!")
    print("Please set MILVUS_URI and MILVUS_TOKEN in your .env file")
    exit(1)

# Initialize Milvus Cloud wrapper for RAG operations
rag_db = MilvusDB(
    embedding_model="text-embedding-3-small",  # OpenAI embedding model
    collection_name="swarms_cloud_knowledge",  # Cloud collection name
    uri=milvus_uri,                           # Your Zilliz Cloud URI
    token=milvus_token,                       # Your Zilliz Cloud token
    metric="COSINE",                          # Distance metric for similarity search
)

# Add documents to the knowledge base
documents = [
    "Milvus Cloud is a fully managed vector database service provided by Zilliz.",
    "RAG combines retrieval and generation for more accurate AI responses.",
    "Vector embeddings enable semantic search across documents.",
    "The swarms framework supports multiple memory backends including Milvus Cloud.",
    "Swarms is the first and most reliable multi-agent production-grade framework.",
    "Kye Gomez is Founder and CEO of Swarms."
]

# Add documents individually
for doc in documents:
    rag_db.add(doc)

# Create agent with RAG capabilities
agent = Agent(
    agent_name="Cloud-RAG-Agent",
    agent_description="Swarms Agent with Milvus Cloud-powered RAG for scalable knowledge retrieval",
    model_name="gpt-4o",
    max_loops=1,
    dynamic_temperature_enabled=True,
    long_term_memory=rag_db
)

# Query with RAG
response = agent.run("What is Milvus Cloud and how does it relate to RAG? Who is the founder of Swarms?")
print(response)

Use Cases

1. Enterprise Knowledge Management

  • Scenario: Large-scale corporate knowledge bases with millions of documents
  • Benefits: Auto-scaling, high availability, enterprise security
  • Best For: Fortune 500 companies, global organizations

2. Production RAG Applications

  • Scenario: Customer-facing AI applications requiring 99.9% uptime
  • Benefits: Managed infrastructure, automatic scaling, disaster recovery
  • Best For: SaaS platforms, customer support systems

3. Multi-tenant Applications

  • Scenario: Serving multiple customers with isolated data
  • Benefits: Built-in multi-tenancy, secure data isolation
  • Best For: AI platform providers, B2B SaaS solutions

4. Global AI Applications

  • Scenario: Applications serving users worldwide
  • Benefits: Global distribution, edge optimization
  • Best For: International companies, global services

Performance Characteristics

Scaling

  • Auto-scaling: Automatic compute and storage scaling based on workload
  • Horizontal Scaling: Support for billions of vectors across multiple nodes
  • Vertical Scaling: On-demand resource allocation for compute-intensive tasks

Performance Metrics

  • Query Latency: < 10ms for 95th percentile queries
  • Throughput: 10,000+ QPS depending on configuration
  • Availability: 99.9% uptime SLA
  • Consistency: Tunable consistency levels

Index Types Performance

Index Type Use Case Performance Memory Accuracy
HNSW High-performance similarity search Ultra-fast Medium Very High
IVF_FLAT Large datasets with exact results Fast High Perfect
IVF_SQ8 Memory-efficient large datasets Fast Low High
ANNOY Read-heavy workloads Very Fast Low High

Cloud vs Local Deployment

Milvus Cloud Advantages

  • Fully Managed: Zero infrastructure management
  • Enterprise Features: Advanced security, compliance, monitoring
  • Global Scale: Multi-region deployment capabilities
  • Cost Optimization: Pay-per-use pricing model
  • Professional Support: 24/7 technical support

Configuration Options

# Production configuration with advanced features
memory = MilvusCloudMemory(
    collection_name="production_knowledge_base",
    embedding_model="text-embedding-3-small",
    dimension=1536,
    index_type="HNSW",  # Best for similarity search
    metric_type="COSINE"
)

# Development configuration
memory = MilvusCloudMemory(
    collection_name="dev_knowledge_base",
    embedding_model="text-embedding-3-small", 
    dimension=1536,
    index_type="IVF_FLAT",  # Balanced performance
    metric_type="L2"
)

Advanced Features

Rich Metadata Filtering

# Complex filter expressions
filter_expr = '''
(metadata["category"] == "ai" and metadata["difficulty"] == "advanced") 
or (metadata["topic"] == "embeddings" and metadata["type"] == "concept")
'''

results = memory.search(
    query="advanced AI concepts",
    limit=5,
    filter_expr=filter_expr
)
# Combine vector similarity with metadata filtering
results = memory.search(
    query="machine learning algorithms",
    limit=10,
    filter_expr='metadata["category"] in ["ai", "ml"] and metadata["difficulty"] != "beginner"'
)

Collection Management

# Create multiple collections for different domains
medical_memory = MilvusCloudMemory(
    collection_name="medical_knowledge",
    embedding_model="text-embedding-3-small"
)

legal_memory = MilvusCloudMemory(
    collection_name="legal_documents", 
    embedding_model="text-embedding-3-small"
)

Best Practices

  1. Index Selection: Choose HNSW for similarity search, IVF for large datasets
  2. Metadata Design: Design rich metadata schema for effective filtering
  3. Batch Operations: Use batch operations for better throughput
  4. Connection Pooling: Implement connection pooling for production applications
  5. Error Handling: Implement robust error handling and retry logic
  6. Monitoring: Set up monitoring and alerting for performance metrics
  7. Cost Optimization: Monitor usage and optimize collection configurations
  8. Security: Follow security best practices for authentication and data access

Monitoring and Observability

Key Metrics to Monitor

  • Query latency percentiles (p50, p95, p99)
  • Query throughput (QPS)
  • Error rates and types
  • Collection size and growth
  • Resource utilization

Alerting Setup

# Example monitoring integration
import logging

logger = logging.getLogger("milvus_rag")

def monitored_search(memory, query, **kwargs):
    start_time = time.time()
    try:
        results = memory.search(query, **kwargs)
        duration = time.time() - start_time
        logger.info(f"Search completed in {duration:.3f}s, found {len(results['documents'])} results")
        return results
    except Exception as e:
        logger.error(f"Search failed: {e}")
        raise

Troubleshooting

Common Issues

  1. Connection Errors: Verify MILVUS_CLOUD_URI and MILVUS_CLOUD_TOKEN. Check network connectivity and firewall settings. Confirm cloud region accessibility.

  2. Performance Issues: Monitor collection size and index type appropriateness. Check query complexity and filter expressions. Review auto-scaling configuration.

  3. Search Accuracy Issues: Verify embedding model consistency. Check vector normalization if using cosine similarity. Review index parameters and search parameters.

  4. Quota and Billing Issues: Monitor usage against plan limits. Review auto-scaling settings. Check billing alerts and notifications.

This comprehensive guide provides everything needed to integrate Milvus Cloud with Swarms agents for enterprise-scale RAG applications using the unified LiteLLM embeddings approach.