9.0 KiB

Raw Blame History

FAISS RAG Integration with Swarms

Overview

FAISS (Facebook AI Similarity Search) is a library developed by Meta for efficient similarity search and clustering of dense vectors. It provides highly optimized algorithms for large-scale vector operations and is particularly well-suited for production RAG applications requiring high performance and scalability. FAISS excels at handling millions to billions of vectors with sub-linear search times.

Key Features

High Performance: Optimized C++ implementation with Python bindings
Multiple Index Types: Support for various indexing algorithms (Flat, IVF, HNSW, PQ)
GPU Acceleration: Optional GPU support for extreme performance
Memory Efficiency: Compressed indexing options for large datasets
Exact and Approximate Search: Configurable trade-offs between speed and accuracy
Batch Operations: Efficient batch search and update operations
Clustering Support: Built-in K-means clustering algorithms

Architecture

FAISS integrates with Swarms agents as a high-performance vector store for RAG operations:

[Agent] -> [FAISS Memory] -> [Vector Index] -> [Similarity Search] -> [Retrieved Context]

The system stores document embeddings in optimized FAISS indices and performs ultra-fast similarity searches to provide relevant context to agents.

Setup & Configuration

Installation

pip install faiss-cpu  # CPU version
# OR
pip install faiss-gpu  # GPU version (requires CUDA)
pip install swarms
pip install litellm
pip install numpy

Environment Variables

# OpenAI API key for LLM (if using OpenAI models)
export OPENAI_API_KEY="your-openai-api-key"

# Optional: Configure GPU usage
export CUDA_VISIBLE_DEVICES="0"

Dependencies

faiss-cpu>=1.7.0 or faiss-gpu>=1.7.0
swarms
litellm
numpy
pickle (for persistence)

Code Example

"""
Agent with FAISS RAG (Retrieval-Augmented Generation)

This example demonstrates using FAISS as a vector database for RAG operations,
allowing agents to store and retrieve documents for enhanced context.
"""

from swarms import Agent
from swarms_memory import FAISSDB

# Initialize FAISS wrapper for RAG operations
rag_db = FAISSDB(
    embedding_model="text-embedding-3-small",
    metric="cosine",
    index_file="knowledge_base.faiss"
)

# Add documents to the knowledge base
documents = [
    "FAISS is a library for efficient similarity search and clustering of dense vectors.",
    "RAG combines retrieval and generation for more accurate AI responses.",
    "Vector embeddings enable semantic search across documents.",
    "The swarms framework supports multiple memory backends including FAISS.",
    "Swarms is the first and most reliable multi-agent production-grade framework.",
    "Kye Gomez is Founder and CEO of Swarms."
]

# Add documents individually
for doc in documents:
    rag_db.add(doc)

# Create agent with RAG capabilities
agent = Agent(
    agent_name="RAG-Agent",
    agent_description="Swarms Agent with FAISS-powered RAG for enhanced knowledge retrieval",
    model_name="gpt-4o",
    max_loops=1,
    dynamic_temperature_enabled=True,
    long_term_memory=rag_db
)

# Query with RAG
response = agent.run("What is FAISS and how does it relate to RAG? Who is the founder of Swarms?")
print(response)

Use Cases

1. Large-Scale Document Search

Scenario: Searching through millions of documents or papers
Benefits: Sub-linear search time, memory efficiency
Best For: Academic databases, legal document search, news archives

2. Real-time Recommendation Systems

Scenario: Product or content recommendations with low latency requirements
Benefits: Ultra-fast query response, batch processing support
Best For: E-commerce, streaming platforms, social media

3. High-Performance RAG Applications

Scenario: Production RAG systems requiring fast response times
Benefits: Optimized C++ implementation, GPU acceleration
Best For: Customer support bots, technical documentation systems

4. Scientific Research Tools

Scenario: Similarity search in scientific datasets or embeddings
Benefits: Clustering support, exact and approximate search options
Best For: Bioinformatics, materials science, computer vision research

Performance Characteristics

Index Types Performance Comparison

Index Type	Search Speed	Memory Usage	Accuracy	Best Use Case
Flat	Fast	High	100%	Small datasets (< 1M vectors)
IVF	Very Fast	Medium	95-99%	Large datasets (1M-100M vectors)
HNSW	Ultra Fast	Medium-High	95-98%	Real-time applications
PQ	Fast	Low	90-95%	Memory-constrained environments

Scaling Characteristics

Small Scale (< 1M vectors): Use Flat index for exact search
Medium Scale (1M - 10M vectors): Use IVF with appropriate nlist
Large Scale (10M - 1B vectors): Use IVF with PQ compression
Ultra Large Scale (> 1B vectors): Use sharded indices across multiple machines

Performance Optimization

# GPU acceleration (if available)
import faiss
if faiss.get_num_gpus() > 0:
    gpu_index = faiss.index_cpu_to_gpu(faiss.StandardGpuResources(), 0, cpu_index)

# Batch search for better throughput
results = memory.search_batch(queries, k=10)

# Memory mapping for very large indices
index = faiss.read_index("large_index.faiss", faiss.IO_FLAG_MMAP)

Cloud vs Local Deployment

Local Deployment

# Local FAISS with persistence
memory = FAISSMemory(index_type="Flat")
memory.save_index("./local_faiss_index")

Advantages:

No network latency
Full control over hardware
Cost-effective for development
Easy debugging and profiling

Disadvantages:

Limited by single machine resources
Manual scaling required
No built-in redundancy

Cloud Deployment

# Cloud deployment with distributed storage
# Use cloud storage for index persistence
import boto3
s3 = boto3.client('s3')

# Save to cloud storage
memory.save_index("/tmp/faiss_index")
s3.upload_file("/tmp/faiss_index.index", "bucket", "indices/faiss_index.index")