You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
2 weeks ago | |
---|---|---|
.. | ||
README.md | 2 weeks ago | |
full_agent_rag_example.py | 4 months ago | |
pinecone_example.py | 3 months ago | |
qdrant_agent.py | 1 month ago | |
qdrant_rag_example.py | 2 weeks ago | |
simple_example.py | 2 weeks ago |
README.md
Qdrant RAG Example with Document Ingestion
This example demonstrates how to use the agent structure from example.py
with Qdrant RAG to ingest a vast array of PDF documents and text files for advanced quantitative trading analysis.
🚀 Features
- Document Ingestion: Process PDF, TXT, and Markdown files automatically
- Qdrant Vector Database: High-performance vector storage with similarity search
- Sentence Transformer Embeddings: Local embedding generation using state-of-the-art models
- Intelligent Chunking: Smart text chunking with overlap for better retrieval
- Concurrent Processing: Multi-threaded document processing for large collections
- RAG Integration: Seamless integration with Swarms Agent framework
- Financial Analysis: Specialized for quantitative trading and financial research
📋 Prerequisites
- Python 3.10+
- Qdrant client (local or cloud)
- Sentence transformers for embeddings
- Swarms framework
🛠️ Installation
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables (optional, for cloud deployment):
export QDRANT_URL="your_qdrant_url" export QDRANT_API_KEY="your_qdrant_api_key"
🏗️ Architecture
The example consists of three main components:
1. DocumentProcessor
- Handles file discovery and text extraction
- Supports PDF, TXT, and Markdown formats
- Concurrent processing for large document collections
- Error handling and validation
2. QdrantRAGMemory
- Vector database management with Qdrant
- Intelligent text chunking with overlap
- Semantic search capabilities
- Metadata storage and retrieval
3. QuantitativeTradingRAGAgent
- Combines Swarms Agent with RAG capabilities
- Financial analysis specialization
- Document context enhancement
- Query processing and response generation
📖 Usage
Basic Setup
from qdrant_rag_example import QuantitativeTradingRAGAgent
# Initialize the agent
agent = QuantitativeTradingRAGAgent(
agent_name="Financial-Analysis-Agent",
collection_name="financial_documents",
model_name="claude-sonnet-4-20250514"
)
Document Ingestion
# Ingest documents from a directory
documents_path = "./financial_documents"
num_ingested = agent.ingest_documents(documents_path)
print(f"Ingested {num_ingested} documents")
Querying Documents
# Search for relevant information
results = agent.query_documents("gold ETFs investment strategies", limit=5)
for result in results:
print(f"Document: {result['document_name']}")
print(f"Relevance: {result['similarity_score']:.3f}")
print(f"Content: {result['chunk_text'][:200]}...")
Running Analysis
# Run financial analysis with RAG context
task = "What are the best top 3 ETFs for gold coverage?"
response = agent.run_analysis(task)
print(response)
📁 Directory Structure
financial_documents/
├── research_papers/
│ ├── gold_etf_analysis.pdf
│ ├── market_research.pdf
│ └── portfolio_strategies.pdf
├── company_reports/
│ ├── annual_reports.txt
│ └── quarterly_updates.md
└── market_data/
├── historical_prices.csv
└── volatility_analysis.txt
⚙️ Configuration Options
Agent Configuration
agent_name
: Name of the agentcollection_name
: Qdrant collection namemodel_name
: LLM model to usemax_loops
: Maximum agent execution loopschunk_size
: Text chunk size (default: 1000)chunk_overlap
: Overlap between chunks (default: 200)
Document Processing
supported_extensions
: File types to processmax_workers
: Concurrent processing threadsscore_threshold
: Similarity search threshold
🔍 Advanced Features
Custom Embedding Models
# Use different sentence transformer models
from sentence_transformers import SentenceTransformer
custom_model = SentenceTransformer("all-mpnet-base-v2")
# Update the embedding model in QdrantRAGMemory
Cloud Deployment
# Connect to Qdrant cloud
agent = QuantitativeTradingRAGAgent(
qdrant_url="https://your-instance.qdrant.io",
qdrant_api_key="your_api_key"
)
Batch Processing
# Process multiple directories
directories = ["./docs1", "./docs2", "./docs3"]
for directory in directories:
agent.ingest_documents(directory)
📊 Performance Considerations
- Chunk Size: Larger chunks (1000-2000 chars) for detailed analysis, smaller (500-1000) for precise retrieval
- Overlap: 10-20% overlap between chunks for better context continuity
- Concurrency: Adjust
max_workers
based on your system capabilities - Vector Size: 768 dimensions for sentence-transformers, 1536 for OpenAI embeddings
🚨 Error Handling
The system includes comprehensive error handling for:
- File not found errors
- Unsupported file types
- Processing failures
- Network connectivity issues
- Invalid document content
🔧 Troubleshooting
Common Issues
-
Import Errors: Ensure all dependencies are installed
pip install -r requirements.txt
-
Memory Issues: Reduce chunk size or use cloud Qdrant
agent = QuantitativeTradingRAGAgent(chunk_size=500)
-
Processing Failures: Check file permissions and formats
# Verify supported formats processor = DocumentProcessor(supported_extensions=['.pdf', '.txt'])
Performance Optimization
- Use SSD storage for document processing
- Increase
max_workers
for multi-core systems - Consider cloud Qdrant for large document collections
- Implement document caching for frequently accessed files
📈 Use Cases
- Financial Research: Analyze market reports, earnings calls, and research papers
- Legal Document Review: Process contracts, regulations, and case law
- Academic Research: Index research papers and academic literature
- Compliance Monitoring: Track regulatory changes and compliance requirements
- Risk Assessment: Analyze risk reports and market analysis
🤝 Contributing
To extend this example:
- Add support for additional file formats
- Implement custom embedding strategies
- Add document versioning and change tracking
- Integrate with other vector databases
- Add document summarization capabilities
📄 License
This example is part of the Swarms framework and follows the same licensing terms.
🆘 Support
For issues and questions:
- Check the Swarms documentation
- Review the example code and error messages
- Ensure all dependencies are properly installed
- Verify Qdrant connection and configuration