You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
swarms/examples/single_agent/rag
Kye Gomez 7f6ba0eb73
[Agent][Docs]
2 weeks ago
..
README.md [Agent][Docs] 2 weeks ago
full_agent_rag_example.py [feats][swarms.communication] + [docs][cleanup] + [tests][cleanup and swarms.communication] 4 months ago
pinecone_example.py [FEAT][CouncilAsAJudge] 3 months ago
qdrant_agent.py cleanup ! 1 month ago
qdrant_rag_example.py [Agent][Docs] 2 weeks ago
simple_example.py [Agent][Docs] 2 weeks ago

README.md

Qdrant RAG Example with Document Ingestion

This example demonstrates how to use the agent structure from example.py with Qdrant RAG to ingest a vast array of PDF documents and text files for advanced quantitative trading analysis.

🚀 Features

  • Document Ingestion: Process PDF, TXT, and Markdown files automatically
  • Qdrant Vector Database: High-performance vector storage with similarity search
  • Sentence Transformer Embeddings: Local embedding generation using state-of-the-art models
  • Intelligent Chunking: Smart text chunking with overlap for better retrieval
  • Concurrent Processing: Multi-threaded document processing for large collections
  • RAG Integration: Seamless integration with Swarms Agent framework
  • Financial Analysis: Specialized for quantitative trading and financial research

📋 Prerequisites

  • Python 3.10+
  • Qdrant client (local or cloud)
  • Sentence transformers for embeddings
  • Swarms framework

🛠️ Installation

  1. Install dependencies:

    pip install -r requirements.txt
    
  2. Set up environment variables (optional, for cloud deployment):

    export QDRANT_URL="your_qdrant_url"
    export QDRANT_API_KEY="your_qdrant_api_key"
    

🏗️ Architecture

The example consists of three main components:

1. DocumentProcessor

  • Handles file discovery and text extraction
  • Supports PDF, TXT, and Markdown formats
  • Concurrent processing for large document collections
  • Error handling and validation

2. QdrantRAGMemory

  • Vector database management with Qdrant
  • Intelligent text chunking with overlap
  • Semantic search capabilities
  • Metadata storage and retrieval

3. QuantitativeTradingRAGAgent

  • Combines Swarms Agent with RAG capabilities
  • Financial analysis specialization
  • Document context enhancement
  • Query processing and response generation

📖 Usage

Basic Setup

from qdrant_rag_example import QuantitativeTradingRAGAgent

# Initialize the agent
agent = QuantitativeTradingRAGAgent(
    agent_name="Financial-Analysis-Agent",
    collection_name="financial_documents",
    model_name="claude-sonnet-4-20250514"
)

Document Ingestion

# Ingest documents from a directory
documents_path = "./financial_documents"
num_ingested = agent.ingest_documents(documents_path)
print(f"Ingested {num_ingested} documents")

Querying Documents

# Search for relevant information
results = agent.query_documents("gold ETFs investment strategies", limit=5)
for result in results:
    print(f"Document: {result['document_name']}")
    print(f"Relevance: {result['similarity_score']:.3f}")
    print(f"Content: {result['chunk_text'][:200]}...")

Running Analysis

# Run financial analysis with RAG context
task = "What are the best top 3 ETFs for gold coverage?"
response = agent.run_analysis(task)
print(response)

📁 Directory Structure

financial_documents/
├── research_papers/
│   ├── gold_etf_analysis.pdf
│   ├── market_research.pdf
│   └── portfolio_strategies.pdf
├── company_reports/
│   ├── annual_reports.txt
│   └── quarterly_updates.md
└── market_data/
    ├── historical_prices.csv
    └── volatility_analysis.txt

⚙️ Configuration Options

Agent Configuration

  • agent_name: Name of the agent
  • collection_name: Qdrant collection name
  • model_name: LLM model to use
  • max_loops: Maximum agent execution loops
  • chunk_size: Text chunk size (default: 1000)
  • chunk_overlap: Overlap between chunks (default: 200)

Document Processing

  • supported_extensions: File types to process
  • max_workers: Concurrent processing threads
  • score_threshold: Similarity search threshold

🔍 Advanced Features

Custom Embedding Models

# Use different sentence transformer models
from sentence_transformers import SentenceTransformer

custom_model = SentenceTransformer("all-mpnet-base-v2")
# Update the embedding model in QdrantRAGMemory

Cloud Deployment

# Connect to Qdrant cloud
agent = QuantitativeTradingRAGAgent(
    qdrant_url="https://your-instance.qdrant.io",
    qdrant_api_key="your_api_key"
)

Batch Processing

# Process multiple directories
directories = ["./docs1", "./docs2", "./docs3"]
for directory in directories:
    agent.ingest_documents(directory)

📊 Performance Considerations

  • Chunk Size: Larger chunks (1000-2000 chars) for detailed analysis, smaller (500-1000) for precise retrieval
  • Overlap: 10-20% overlap between chunks for better context continuity
  • Concurrency: Adjust max_workers based on your system capabilities
  • Vector Size: 768 dimensions for sentence-transformers, 1536 for OpenAI embeddings

🚨 Error Handling

The system includes comprehensive error handling for:

  • File not found errors
  • Unsupported file types
  • Processing failures
  • Network connectivity issues
  • Invalid document content

🔧 Troubleshooting

Common Issues

  1. Import Errors: Ensure all dependencies are installed

    pip install -r requirements.txt
    
  2. Memory Issues: Reduce chunk size or use cloud Qdrant

    agent = QuantitativeTradingRAGAgent(chunk_size=500)
    
  3. Processing Failures: Check file permissions and formats

    # Verify supported formats
    processor = DocumentProcessor(supported_extensions=['.pdf', '.txt'])
    

Performance Optimization

  • Use SSD storage for document processing
  • Increase max_workers for multi-core systems
  • Consider cloud Qdrant for large document collections
  • Implement document caching for frequently accessed files

📈 Use Cases

  • Financial Research: Analyze market reports, earnings calls, and research papers
  • Legal Document Review: Process contracts, regulations, and case law
  • Academic Research: Index research papers and academic literature
  • Compliance Monitoring: Track regulatory changes and compliance requirements
  • Risk Assessment: Analyze risk reports and market analysis

🤝 Contributing

To extend this example:

  1. Add support for additional file formats
  2. Implement custom embedding strategies
  3. Add document versioning and change tracking
  4. Integrate with other vector databases
  5. Add document summarization capabilities

📄 License

This example is part of the Swarms framework and follows the same licensing terms.

🆘 Support

For issues and questions:

  • Check the Swarms documentation
  • Review the example code and error messages
  • Ensure all dependencies are properly installed
  • Verify Qdrant connection and configuration