You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
6.6 KiB
6.6 KiB
Qdrant RAG Example with Document Ingestion
This example demonstrates how to use the agent structure from example.py
with Qdrant RAG to ingest a vast array of PDF documents and text files for advanced quantitative trading analysis.
🚀 Features
- Document Ingestion: Process PDF, TXT, and Markdown files automatically
- Qdrant Vector Database: High-performance vector storage with similarity search
- Sentence Transformer Embeddings: Local embedding generation using state-of-the-art models
- Intelligent Chunking: Smart text chunking with overlap for better retrieval
- Concurrent Processing: Multi-threaded document processing for large collections
- RAG Integration: Seamless integration with Swarms Agent framework
- Financial Analysis: Specialized for quantitative trading and financial research
📋 Prerequisites
- Python 3.10+
- Qdrant client (local or cloud)
- Sentence transformers for embeddings
- Swarms framework
🛠️ Installation
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables (optional, for cloud deployment):
export QDRANT_URL="your_qdrant_url" export QDRANT_API_KEY="your_qdrant_api_key"
🏗️ Architecture
The example consists of three main components:
1. DocumentProcessor
- Handles file discovery and text extraction
- Supports PDF, TXT, and Markdown formats
- Concurrent processing for large document collections
- Error handling and validation
2. QdrantRAGMemory
- Vector database management with Qdrant
- Intelligent text chunking with overlap
- Semantic search capabilities
- Metadata storage and retrieval
3. QuantitativeTradingRAGAgent
- Combines Swarms Agent with RAG capabilities
- Financial analysis specialization
- Document context enhancement
- Query processing and response generation
📖 Usage
Basic Setup
from qdrant_rag_example import QuantitativeTradingRAGAgent
# Initialize the agent
agent = QuantitativeTradingRAGAgent(
agent_name="Financial-Analysis-Agent",
collection_name="financial_documents",
model_name="claude-sonnet-4-20250514"
)
Document Ingestion
# Ingest documents from a directory
documents_path = "./financial_documents"
num_ingested = agent.ingest_documents(documents_path)
print(f"Ingested {num_ingested} documents")
Querying Documents
# Search for relevant information
results = agent.query_documents("gold ETFs investment strategies", limit=5)
for result in results:
print(f"Document: {result['document_name']}")
print(f"Relevance: {result['similarity_score']:.3f}")
print(f"Content: {result['chunk_text'][:200]}...")
Running Analysis
# Run financial analysis with RAG context
task = "What are the best top 3 ETFs for gold coverage?"
response = agent.run_analysis(task)
print(response)
📁 Directory Structure
financial_documents/
├── research_papers/
│ ├── gold_etf_analysis.pdf
│ ├── market_research.pdf
│ └── portfolio_strategies.pdf
├── company_reports/
│ ├── annual_reports.txt
│ └── quarterly_updates.md
└── market_data/
├── historical_prices.csv
└── volatility_analysis.txt
⚙️ Configuration Options
Agent Configuration
agent_name
: Name of the agentcollection_name
: Qdrant collection namemodel_name
: LLM model to usemax_loops
: Maximum agent execution loopschunk_size
: Text chunk size (default: 1000)chunk_overlap
: Overlap between chunks (default: 200)
Document Processing
supported_extensions
: File types to processmax_workers
: Concurrent processing threadsscore_threshold
: Similarity search threshold
🔍 Advanced Features
Custom Embedding Models
# Use different sentence transformer models
from sentence_transformers import SentenceTransformer
custom_model = SentenceTransformer("all-mpnet-base-v2")
# Update the embedding model in QdrantRAGMemory
Cloud Deployment
# Connect to Qdrant cloud
agent = QuantitativeTradingRAGAgent(
qdrant_url="https://your-instance.qdrant.io",
qdrant_api_key="your_api_key"
)
Batch Processing
# Process multiple directories
directories = ["./docs1", "./docs2", "./docs3"]
for directory in directories:
agent.ingest_documents(directory)
📊 Performance Considerations
- Chunk Size: Larger chunks (1000-2000 chars) for detailed analysis, smaller (500-1000) for precise retrieval
- Overlap: 10-20% overlap between chunks for better context continuity
- Concurrency: Adjust
max_workers
based on your system capabilities - Vector Size: 768 dimensions for sentence-transformers, 1536 for OpenAI embeddings
🚨 Error Handling
The system includes comprehensive error handling for:
- File not found errors
- Unsupported file types
- Processing failures
- Network connectivity issues
- Invalid document content
🔧 Troubleshooting
Common Issues
-
Import Errors: Ensure all dependencies are installed
pip install -r requirements.txt
-
Memory Issues: Reduce chunk size or use cloud Qdrant
agent = QuantitativeTradingRAGAgent(chunk_size=500)
-
Processing Failures: Check file permissions and formats
# Verify supported formats processor = DocumentProcessor(supported_extensions=['.pdf', '.txt'])
Performance Optimization
- Use SSD storage for document processing
- Increase
max_workers
for multi-core systems - Consider cloud Qdrant for large document collections
- Implement document caching for frequently accessed files
📈 Use Cases
- Financial Research: Analyze market reports, earnings calls, and research papers
- Legal Document Review: Process contracts, regulations, and case law
- Academic Research: Index research papers and academic literature
- Compliance Monitoring: Track regulatory changes and compliance requirements
- Risk Assessment: Analyze risk reports and market analysis
🤝 Contributing
To extend this example:
- Add support for additional file formats
- Implement custom embedding strategies
- Add document versioning and change tracking
- Integrate with other vector databases
- Add document summarization capabilities
📄 License
This example is part of the Swarms framework and follows the same licensing terms.
🆘 Support
For issues and questions:
- Check the Swarms documentation
- Review the example code and error messages
- Ensure all dependencies are properly installed
- Verify Qdrant connection and configuration