swarms/examples/single_agent/rag/README.md

# Qdrant RAG Example with Document Ingestion

This example demonstrates how to use the agent structure from `example.py` with Qdrant RAG to ingest a vast array of PDF documents and text files for advanced quantitative trading analysis.

## 🚀 Features

- **Document Ingestion**: Process PDF, TXT, and Markdown files automatically
- **Qdrant Vector Database**: High-performance vector storage with similarity search
- **Sentence Transformer Embeddings**: Local embedding generation using state-of-the-art models
- **Intelligent Chunking**: Smart text chunking with overlap for better retrieval
- **Concurrent Processing**: Multi-threaded document processing for large collections
- **RAG Integration**: Seamless integration with Swarms Agent framework
- **Financial Analysis**: Specialized for quantitative trading and financial research

## 📋 Prerequisites

- Python 3.10+
- Qdrant client (local or cloud)
- Sentence transformers for embeddings
- Swarms framework

## 🛠️ Installation

1. **Install dependencies**:
   ```bash
   pip install -r requirements.txt
   ```

2. **Set up environment variables** (optional, for cloud deployment):
   ```bash
   export QDRANT_URL="your_qdrant_url"
   export QDRANT_API_KEY="your_qdrant_api_key"
   ```

## 🏗️ Architecture

The example consists of three main components:

### 1. DocumentProcessor
- Handles file discovery and text extraction
- Supports PDF, TXT, and Markdown formats
- Concurrent processing for large document collections
- Error handling and validation

### 2. QdrantRAGMemory
- Vector database management with Qdrant
- Intelligent text chunking with overlap
- Semantic search capabilities
- Metadata storage and retrieval

### 3. QuantitativeTradingRAGAgent
- Combines Swarms Agent with RAG capabilities
- Financial analysis specialization
- Document context enhancement
- Query processing and response generation

## 📖 Usage

### Basic Setup

```python
from qdrant_rag_example import QuantitativeTradingRAGAgent

# Initialize the agent
agent = QuantitativeTradingRAGAgent(
    agent_name="Financial-Analysis-Agent",
    collection_name="financial_documents",
    model_name="claude-sonnet-4-20250514"
)
```

### Document Ingestion

```python
# Ingest documents from a directory
documents_path = "./financial_documents"
num_ingested = agent.ingest_documents(documents_path)
print(f"Ingested {num_ingested} documents")
```

### Querying Documents

```python
# Search for relevant information
results = agent.query_documents("gold ETFs investment strategies", limit=5)
for result in results:
    print(f"Document: {result['document_name']}")
    print(f"Relevance: {result['similarity_score']:.3f}")
    print(f"Content: {result['chunk_text'][:200]}...")
```

### Running Analysis

```python
# Run financial analysis with RAG context
task = "What are the best top 3 ETFs for gold coverage?"
response = agent.run_analysis(task)
print(response)
```

## 📁 Directory Structure

```
financial_documents/
├── research_papers/
│   ├── gold_etf_analysis.pdf
│   ├── market_research.pdf
│   └── portfolio_strategies.pdf
├── company_reports/
│   ├── annual_reports.txt
│   └── quarterly_updates.md
└── market_data/
    ├── historical_prices.csv
    └── volatility_analysis.txt
```

## ⚙️ Configuration Options

### Agent Configuration
- `agent_name`: Name of the agent
- `collection_name`: Qdrant collection name
- `model_name`: LLM model to use
- `max_loops`: Maximum agent execution loops
- `chunk_size`: Text chunk size (default: 1000)
- `chunk_overlap`: Overlap between chunks (default: 200)

### Document Processing
- `supported_extensions`: File types to process
- `max_workers`: Concurrent processing threads
- `score_threshold`: Similarity search threshold

## 🔍 Advanced Features

### Custom Embedding Models
```python
# Use different sentence transformer models
from sentence_transformers import SentenceTransformer

custom_model = SentenceTransformer("all-mpnet-base-v2")
# Update the embedding model in QdrantRAGMemory
```

### Cloud Deployment
```python
# Connect to Qdrant cloud
agent = QuantitativeTradingRAGAgent(
    qdrant_url="https://your-instance.qdrant.io",
    qdrant_api_key="your_api_key"
)
```

### Batch Processing
```python
# Process multiple directories
directories = ["./docs1", "./docs2", "./docs3"]
for directory in directories:
    agent.ingest_documents(directory)
```

## 📊 Performance Considerations

- **Chunk Size**: Larger chunks (1000-2000 chars) for detailed analysis, smaller (500-1000) for precise retrieval
- **Overlap**: 10-20% overlap between chunks for better context continuity
- **Concurrency**: Adjust `max_workers` based on your system capabilities
- **Vector Size**: 768 dimensions for sentence-transformers, 1536 for OpenAI embeddings

## 🚨 Error Handling

The system includes comprehensive error handling for:
- File not found errors
- Unsupported file types
- Processing failures
- Network connectivity issues
- Invalid document content

## 🔧 Troubleshooting

### Common Issues

1. **Import Errors**: Ensure all dependencies are installed
   ```bash
   pip install -r requirements.txt
   ```

2. **Memory Issues**: Reduce chunk size or use cloud Qdrant
   ```python
   agent = QuantitativeTradingRAGAgent(chunk_size=500)
   ```

3. **Processing Failures**: Check file permissions and formats
   ```python
   # Verify supported formats
   processor = DocumentProcessor(supported_extensions=['.pdf', '.txt'])
   ```

### Performance Optimization

- Use SSD storage for document processing
- Increase `max_workers` for multi-core systems
- Consider cloud Qdrant for large document collections
- Implement document caching for frequently accessed files

## 📈 Use Cases

- **Financial Research**: Analyze market reports, earnings calls, and research papers
- **Legal Document Review**: Process contracts, regulations, and case law
- **Academic Research**: Index research papers and academic literature
- **Compliance Monitoring**: Track regulatory changes and compliance requirements
- **Risk Assessment**: Analyze risk reports and market analysis

## 🤝 Contributing

To extend this example:
1. Add support for additional file formats
2. Implement custom embedding strategies
3. Add document versioning and change tracking
4. Integrate with other vector databases
5. Add document summarization capabilities

## 📄 License

This example is part of the Swarms framework and follows the same licensing terms.

## 🆘 Support

For issues and questions:
- Check the Swarms documentation
- Review the example code and error messages
- Ensure all dependencies are properly installed
- Verify Qdrant connection and configuration