15 KiB
Qdrant RAG Integration
This example demonstrates how to integrate Qdrant vector database with Swarms agents for Retrieval-Augmented Generation (RAG). Qdrant is a high-performance vector database that enables agents to store, index, and retrieve documents using semantic similarity search for enhanced context and more accurate responses.
Prerequisites
- Python 3.7+
- OpenAI API key
- Swarms library
- Qdrant client and swarms-memory
Installation
pip install qdrant-client fastembed swarms-memory litellm
Note: The
litellm
package is required for using LiteLLM provider models like OpenAI, Azure, Cohere, etc.
Tutorial Steps
Step 1: Install Swarms
First, install the latest version of Swarms:
pip3 install -U swarms
Step 2: Environment Setup
Set up your environment variables in a .env
file:
OPENAI_API_KEY="your-api-key-here"
QDRANT_URL="https://your-cluster.qdrant.io"
QDRANT_API_KEY="your-api-key"
WORKSPACE_DIR="agent_workspace"
Step 3: Choose Deployment
Select your Qdrant deployment option:
- In-memory: For testing and development (data is not persisted)
- Local server: For production deployments with persistent storage
- Qdrant Cloud: Managed cloud service (recommended for production)
Step 4: Configure Database
Set up the vector database wrapper with your preferred embedding model and collection settings
Step 5: Add Documents
Load documents using individual or batch processing methods
Step 6: Create Agent
Initialize your agent with RAG capabilities and start querying
Code
Basic Setup with Individual Document Processing
from qdrant_client import QdrantClient, models
from swarms import Agent
from swarms_memory import QdrantDB
import os
# Client Configuration Options
# Option 1: In-memory (testing only - data is NOT persisted)
# ":memory:" creates a temporary in-memory database that's lost when program ends
client = QdrantClient(":memory:")
# Option 2: Local Qdrant Server
# Requires: docker run -p 6333:6333 qdrant/qdrant
# client = QdrantClient(host="localhost", port=6333)
# Option 3: Qdrant Cloud (recommended for production)
# Get credentials from https://cloud.qdrant.io
# client = QdrantClient(
# url=os.getenv("QDRANT_URL"), # e.g., "https://xyz-abc.eu-central.aws.cloud.qdrant.io"
# api_key=os.getenv("QDRANT_API_KEY") # Your Qdrant Cloud API key
# )
# Create vector database wrapper
rag_db = QdrantDB(
client=client,
embedding_model="text-embedding-3-small",
collection_name="knowledge_base",
distance=models.Distance.COSINE,
n_results=3
)
# Add documents to the knowledge base
documents = [
"Qdrant is a vector database optimized for similarity search and AI applications.",
"RAG combines retrieval and generation for more accurate AI responses.",
"Vector embeddings enable semantic search across documents.",
"The swarms framework supports multiple memory backends including Qdrant."
]
# Method 1: Add documents individually
for doc in documents:
rag_db.add(doc)
# Create agent with RAG capabilities
agent = Agent(
agent_name="RAG-Agent",
agent_description="Agent with Qdrant-powered RAG for enhanced knowledge retrieval",
model_name="gpt-4.1",
max_loops=1,
dynamic_temperature_enabled=True,
long_term_memory=rag_db
)
# Query with RAG
try:
response = agent.run("What is Qdrant and how does it relate to RAG?")
print(response)
except Exception as e:
print(f"Error during query: {e}")
# Handle error appropriately
Advanced Setup with Batch Processing and Metadata
from qdrant_client import QdrantClient, models
from swarms import Agent
from swarms_memory import QdrantDB
import os
# Initialize client (using in-memory for this example)
client = QdrantClient(":memory:")
# Create vector database wrapper
rag_db = QdrantDB(
client=client,
embedding_model="text-embedding-3-small",
collection_name="advanced_knowledge_base",
distance=models.Distance.COSINE,
n_results=3
)
# Method 2: Batch add documents (more efficient for large datasets)
# Example with metadata
documents_with_metadata = [
"Machine learning is a subset of artificial intelligence.",
"Deep learning uses neural networks with multiple layers.",
"Natural language processing enables computers to understand human language.",
"Computer vision allows machines to interpret visual information.",
"Reinforcement learning learns through interaction with an environment."
]
metadata = [
{"category": "AI", "difficulty": "beginner", "topic": "overview"},
{"category": "ML", "difficulty": "intermediate", "topic": "neural_networks"},
{"category": "NLP", "difficulty": "intermediate", "topic": "language"},
{"category": "CV", "difficulty": "advanced", "topic": "vision"},
{"category": "RL", "difficulty": "advanced", "topic": "learning"}
]
# Batch add with metadata
doc_ids = rag_db.batch_add(documents_with_metadata, metadata=metadata, batch_size=3)
print(f"Added {len(doc_ids)} documents in batch")
# Query with metadata return
results_with_metadata = rag_db.query(
"What is artificial intelligence?",
n_results=3,
return_metadata=True
)
for i, result in enumerate(results_with_metadata):
print(f"\nResult {i+1}:")
print(f" Document: {result['document']}")
print(f" Category: {result['category']}")
print(f" Difficulty: {result['difficulty']}")
print(f" Topic: {result['topic']}")
print(f" Score: {result['score']:.4f}")
# Create agent with RAG capabilities
agent = Agent(
agent_name="Advanced-RAG-Agent",
agent_description="Advanced agent with metadata-enhanced RAG capabilities",
model_name="gpt-4.1",
max_loops=1,
dynamic_temperature_enabled=True,
long_term_memory=rag_db
)
# Query with enhanced context
response = agent.run("Explain the relationship between machine learning and artificial intelligence")
print(response)
Production Setup
Setting up Qdrant Cloud
-
Sign up at cloud.qdrant.io
-
Create a cluster
-
Get your cluster URL and API key
-
Set environment variables:
export QDRANT_URL="https://your-cluster.eu-central.aws.cloud.qdrant.io" export QDRANT_API_KEY="your-api-key-here"
Running Local Qdrant Server
# Docker
docker run -p 6333:6333 qdrant/qdrant
# Docker Compose
version: '3.7'
services:
qdrant:
image: qdrant/qdrant
ports:
- "6333:6333"
volumes:
- ./qdrant_storage:/qdrant/storage
Production Configuration Example
from qdrant_client import QdrantClient, models
from swarms_memory import QdrantDB
import os
import logging
# Setup logging for production monitoring
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
try:
# Connect to Qdrant server with proper error handling
client = QdrantClient(
host=os.getenv("QDRANT_HOST", "localhost"),
port=int(os.getenv("QDRANT_PORT", "6333")),
api_key=os.getenv("QDRANT_API_KEY"), # Use environment variable
timeout=30 # 30 second timeout
)
# Production RAG configuration with enhanced settings
rag_db = QdrantDB(
client=client,
embedding_model="text-embedding-3-large", # Higher quality embeddings
collection_name="production_knowledge",
distance=models.Distance.COSINE,
n_results=10,
api_key=os.getenv("OPENAI_API_KEY") # Secure API key handling
)
logger.info("Successfully initialized production RAG database")
except Exception as e:
logger.error(f"Failed to initialize RAG database: {e}")
raise
Configuration Options
Distance Metrics
Metric | Description | Best For |
---|---|---|
COSINE | Cosine similarity (default) | Normalized embeddings, text similarity |
EUCLIDEAN | Euclidean distance | Absolute distance measurements |
DOT | Dot product | Maximum inner product search |
Embedding Model Options
LiteLLM Provider Models (Recommended)
Model | Provider | Dimensions | Description |
---|---|---|---|
text-embedding-3-small |
OpenAI | 1536 | Efficient, cost-effective |
text-embedding-3-large |
OpenAI | 3072 | Best quality |
azure/your-deployment |
Azure | Variable | Azure OpenAI embeddings |
cohere/embed-english-v3.0 |
Cohere | 1024 | Advanced language understanding |
voyage/voyage-3-large |
Voyage AI | 1024 | High-quality embeddings |
SentenceTransformer Models
Model | Dimensions | Description |
---|---|---|
all-MiniLM-L6-v2 |
384 | Fast, general-purpose |
all-mpnet-base-v2 |
768 | Higher quality |
all-roberta-large-v1 |
1024 | Best quality |
Usage Example
# OpenAI embeddings (default example)
rag_db = QdrantDB(
client=client,
embedding_model="text-embedding-3-small",
collection_name="openai_collection"
)
Note: QdrantDB supports all LiteLLM provider models (Azure, Cohere, Voyage AI, etc.), SentenceTransformer models, and custom embedding functions. See the embedding model options table above for the complete list.
Use Cases
Document Q&A System
Create an intelligent document question-answering system:
# Load company documents into Qdrant
company_documents = [
"Company policy on remote work allows flexible scheduling with core hours 10 AM - 3 PM.",
"API documentation: Use POST /api/v1/users to create new user accounts.",
"Product specifications: Our software supports Windows, Mac, and Linux platforms."
]
for doc in company_documents:
rag_db.add(doc)
# Agent can now answer questions using the documents
agent = Agent(
agent_name="Company-DocQA-Agent",
agent_description="Intelligent document Q&A system for company information",
model_name="gpt-4.1",
long_term_memory=rag_db
)
answer = agent.run("What is the company policy on remote work?")
print(answer)
Knowledge Base Management
Build a comprehensive knowledge management system:
class KnowledgeBaseAgent:
def __init__(self):
self.client = QdrantClient(":memory:")
self.rag_db = QdrantDB(
client=self.client,
embedding_model="text-embedding-3-small",
collection_name="knowledge_base",
n_results=5
)
self.agent = Agent(
agent_name="KB-Management-Agent",
agent_description="Knowledge base management and retrieval system",
model_name="gpt-4.1",
long_term_memory=self.rag_db
)
def add_knowledge(self, text: str, metadata: dict = None):
"""Add new knowledge to the base"""
if metadata:
return self.rag_db.batch_add([text], metadata=[metadata])
return self.rag_db.add(text)
def query(self, question: str):
"""Query the knowledge base"""
return self.agent.run(question)
def bulk_import(self, documents: list, metadata_list: list = None):
"""Import multiple documents efficiently"""
return self.rag_db.batch_add(documents, metadata=metadata_list, batch_size=50)
# Usage
kb = KnowledgeBaseAgent()
kb.add_knowledge("Python is a high-level programming language.", {"category": "programming"})
kb.add_knowledge("Qdrant is optimized for vector similarity search.", {"category": "databases"})
result = kb.query("What programming languages are mentioned?")
print(result)
Best Practices
Document Processing Strategy
Practice | Recommendation | Details |
---|---|---|
Chunking | 200-500 tokens | Split large documents into optimal chunks for retrieval |
Overlap | 20-50 tokens | Maintain context between consecutive chunks |
Preprocessing | Clean & normalize | Remove noise and standardize text format |
Collection Organization
Practice | Recommendation | Details |
---|---|---|
Separation | Type-based collections | Use separate collections for docs, policies, code, etc. |
Naming | Consistent conventions | Follow clear, descriptive naming patterns |
Lifecycle | Update strategies | Plan for document versioning and updates |
Embedding Model Selection
Environment | Recommended Model | Use Case |
---|---|---|
Development | all-MiniLM-L6-v2 |
Fast iteration and testing |
Production | text-embedding-3-small/large |
High-quality production deployment |
Specialized | Domain-specific models | Industry or domain-focused applications |
Performance Optimization
Setting | Recommendation | Rationale |
---|---|---|
Retrieval Count | Start with 3-5 results | Balance relevance with performance |
Batch Operations | Use batch_add() |
Efficient bulk document processing |
Metadata | Strategic storage | Enable filtering and enhanced context |
Production Deployment
Component | Best Practice | Implementation |
---|---|---|
Storage | Persistent server | Use Qdrant Cloud or self-hosted server |
Error Handling | Robust mechanisms | Implement retry logic and graceful failures |
Monitoring | Performance tracking | Monitor metrics and embedding quality |
Performance Tips
- Development: Use in-memory mode for rapid prototyping and testing
- Production: Deploy dedicated Qdrant server with appropriate resource allocation
- Scalability: Use batch operations for adding multiple documents efficiently
- Memory Management: Monitor memory usage with large document collections
- API Usage: Consider rate limits when using cloud-based embedding services
- Caching: Implement caching strategies for frequently accessed documents
Customization
You can modify the system configuration to create specialized RAG agents for different use cases:
Use Case | Configuration | Description |
---|---|---|
Technical Documentation | High n_results (10-15), precise embeddings | Comprehensive technical Q&A |
Customer Support | Fast embeddings, metadata filtering | Quick response with categorization |
Research Assistant | Large embedding model, broad retrieval | Deep analysis and synthesis |
Code Documentation | Code-specific embeddings, semantic chunking | Programming-focused assistance |