# FAISS RAG Integration with Swarms ## Overview FAISS (Facebook AI Similarity Search) is a library developed by Meta for efficient similarity search and clustering of dense vectors. It provides highly optimized algorithms for large-scale vector operations and is particularly well-suited for production RAG applications requiring high performance and scalability. FAISS excels at handling millions to billions of vectors with sub-linear search times. ## Key Features - **High Performance**: Optimized C++ implementation with Python bindings - **Multiple Index Types**: Support for various indexing algorithms (Flat, IVF, HNSW, PQ) - **GPU Acceleration**: Optional GPU support for extreme performance - **Memory Efficiency**: Compressed indexing options for large datasets - **Exact and Approximate Search**: Configurable trade-offs between speed and accuracy - **Batch Operations**: Efficient batch search and update operations - **Clustering Support**: Built-in K-means clustering algorithms ## Architecture FAISS integrates with Swarms agents as a high-performance vector store for RAG operations: ``` [Agent] -> [FAISS Memory] -> [Vector Index] -> [Similarity Search] -> [Retrieved Context] ``` The system stores document embeddings in optimized FAISS indices and performs ultra-fast similarity searches to provide relevant context to agents. ## Setup & Configuration ### Installation ```bash pip install faiss-cpu # CPU version # OR pip install faiss-gpu # GPU version (requires CUDA) pip install swarms pip install litellm pip install numpy ``` ### Environment Variables ```bash # OpenAI API key for LLM (if using OpenAI models) export OPENAI_API_KEY="your-openai-api-key" # Optional: Configure GPU usage export CUDA_VISIBLE_DEVICES="0" ``` ### Dependencies - `faiss-cpu>=1.7.0` or `faiss-gpu>=1.7.0` - `swarms` - `litellm` - `numpy` - `pickle` (for persistence) ## Code Example ```python """ FAISS RAG Integration with Swarms Agent This example demonstrates how to integrate FAISS as a high-performance vector database for RAG operations with Swarms agents using LiteLLM embeddings. """ import faiss import numpy as np import pickle import os from typing import List, Dict, Any from swarms import Agent from litellm import embedding class FAISSMemory: """FAISS-based memory system for RAG operations""" def __init__(self, dimension: int = 1536, # text-embedding-3-small dimension embedding_model: str = "text-embedding-3-small", index_type: str = "Flat", nlist: int = 100): """ Initialize FAISS memory system Args: dimension: Vector dimension (1536 for text-embedding-3-small) embedding_model: LiteLLM embedding model name index_type: FAISS index type ('Flat', 'IVF', 'HNSW') nlist: Number of clusters for IVF index """ self.dimension = dimension self.embedding_model = embedding_model self.index_type = index_type self.nlist = nlist # Initialize FAISS index self.index = self._create_index() # Storage for documents and metadata self.documents = [] self.metadata = [] self.id_to_index = {} self.next_id = 0 def _create_index(self): """Create appropriate FAISS index based on type""" if self.index_type == "Flat": # Exact search, good for small to medium datasets return faiss.IndexFlatIP(self.dimension) # Inner product elif self.index_type == "IVF": # Approximate search with inverted file index quantizer = faiss.IndexFlatIP(self.dimension) index = faiss.IndexIVFFlat(quantizer, self.dimension, self.nlist) return index elif self.index_type == "HNSW": # Hierarchical Navigable Small World graphs index = faiss.IndexHNSWFlat(self.dimension, 32) index.hnsw.efConstruction = 200 index.hnsw.efSearch = 100 return index else: raise ValueError(f"Unsupported index type: {self.index_type}") def _get_embeddings(self, texts: List[str]) -> np.ndarray: """Generate embeddings using LiteLLM""" response = embedding( model=self.embedding_model, input=texts ) embeddings = np.array([item["embedding"] for item in response["data"]]) # Normalize for cosine similarity (convert to inner product) faiss.normalize_L2(embeddings) return embeddings.astype('float32') def add_documents(self, documents: List[str], metadata: List[Dict] = None): """Add multiple documents to the index""" if metadata is None: metadata = [{}] * len(documents) # Generate embeddings embeddings = self._get_embeddings(documents) # Train index if necessary (for IVF) if self.index_type == "IVF" and not self.index.is_trained: self.index.train(embeddings) # Add to index start_id = len(self.documents) self.index.add(embeddings) # Store documents and metadata self.documents.extend(documents) self.metadata.extend(metadata) # Update ID mapping for i, doc in enumerate(documents): self.id_to_index[self.next_id] = start_id + i self.next_id += 1 return list(range(start_id, start_id + len(documents))) def add_document(self, document: str, metadata: Dict = None): """Add a single document to the index""" return self.add_documents([document], [metadata or {}])[0] def search(self, query: str, k: int = 3, score_threshold: float = None) -> Dict[str, Any]: """Search for similar documents""" if len(self.documents) == 0: return {"documents": [], "metadata": [], "scores": [], "ids": []} # Generate query embedding query_embedding = self._get_embeddings([query]) # Search scores, indices = self.index.search(query_embedding, k) scores = scores[0] # Get first (and only) query result indices = indices[0] # Filter by score threshold if provided if score_threshold is not None: valid_indices = scores >= score_threshold scores = scores[valid_indices] indices = indices[valid_indices] # Prepare results results = { "documents": [self.documents[idx] for idx in indices if idx < len(self.documents)], "metadata": [self.metadata[idx] for idx in indices if idx < len(self.metadata)], "scores": scores.tolist(), "ids": indices.tolist() } return results def save_index(self, filepath: str): """Save FAISS index and metadata to disk""" # Save FAISS index faiss.write_index(self.index, f"{filepath}.index") # Save metadata and documents with open(f"{filepath}.pkl", 'wb') as f: pickle.dump({ 'documents': self.documents, 'metadata': self.metadata, 'id_to_index': self.id_to_index, 'next_id': self.next_id, 'dimension': self.dimension, 'embedding_model': self.embedding_model, 'index_type': self.index_type, 'nlist': self.nlist }, f) def load_index(self, filepath: str): """Load FAISS index and metadata from disk""" # Load FAISS index self.index = faiss.read_index(f"{filepath}.index") # Load metadata and documents with open(f"{filepath}.pkl", 'rb') as f: data = pickle.load(f) self.documents = data['documents'] self.metadata = data['metadata'] self.id_to_index = data['id_to_index'] self.next_id = data['next_id'] self.dimension = data['dimension'] self.embedding_model = data['embedding_model'] self.index_type = data['index_type'] self.nlist = data.get('nlist', 100) # Initialize FAISS memory system # Option 1: Flat index (exact search, good for small datasets) memory = FAISSMemory( dimension=1536, # text-embedding-3-small dimension embedding_model="text-embedding-3-small", index_type="Flat" ) # Option 2: IVF index (approximate search, good for large datasets) # memory = FAISSMemory( # dimension=1536, # embedding_model="text-embedding-3-small", # index_type="IVF", # nlist=100 # ) # Option 3: HNSW index (very fast approximate search) # memory = FAISSMemory( # dimension=1536, # embedding_model="text-embedding-3-small", # index_type="HNSW" # ) # Sample documents for the knowledge base documents = [ "FAISS is a library for efficient similarity search and clustering of dense vectors.", "RAG combines retrieval and generation for more accurate AI responses with relevant context.", "Vector embeddings enable semantic search across large document collections.", "The Swarms framework supports multiple memory backends including FAISS for high performance.", "LiteLLM provides a unified interface for different embedding models and providers.", "FAISS supports both exact and approximate search algorithms for different use cases.", "GPU acceleration in FAISS can provide significant speedups for large-scale applications.", "Index types in FAISS include Flat, IVF, HNSW, and PQ for different performance characteristics.", ] # Document metadata metadatas = [ {"category": "library", "topic": "faiss", "difficulty": "intermediate"}, {"category": "ai", "topic": "rag", "difficulty": "intermediate"}, {"category": "ai", "topic": "embeddings", "difficulty": "beginner"}, {"category": "framework", "topic": "swarms", "difficulty": "beginner"}, {"category": "library", "topic": "litellm", "difficulty": "beginner"}, {"category": "search", "topic": "algorithms", "difficulty": "advanced"}, {"category": "performance", "topic": "gpu", "difficulty": "advanced"}, {"category": "indexing", "topic": "algorithms", "difficulty": "advanced"}, ] # Add documents to FAISS memory print("Adding documents to FAISS index...") doc_ids = memory.add_documents(documents, metadatas) print(f"Added {len(doc_ids)} documents to the index") # Create Swarms agent with FAISS-powered RAG agent = Agent( agent_name="FAISS-RAG-Agent", agent_description="High-performance agent with FAISS-powered RAG for fast knowledge retrieval", model_name="gpt-4o", max_loops=1, dynamic_temperature_enabled=True, ) def query_with_faiss_rag(query_text: str, k: int = 3): """Query with RAG using FAISS for high-performance retrieval""" print(f"\nQuerying: {query_text}") # Retrieve relevant documents using FAISS results = memory.search(query_text, k=k) if not results["documents"]: return agent.run(query_text) # Prepare context from retrieved documents context = "\n".join([ f"Document {i+1}: {doc}" for i, doc in enumerate(results["documents"]) ]) # Show retrieved documents and scores print("Retrieved documents:") for i, (doc, score) in enumerate(zip(results["documents"], results["scores"])): print(f" {i+1}. (Score: {score:.4f}) {doc[:100]}...") # Enhanced prompt with context enhanced_prompt = f""" Based on the following retrieved context, please answer the question: Context: {context} Question: {query_text} Please provide a comprehensive answer based primarily on the context provided. """ # Run agent with enhanced prompt response = agent.run(enhanced_prompt) return response # Example usage and testing if __name__ == "__main__": # Test different queries queries = [ "What is FAISS and what are its key features?", "How does RAG work and why is it useful?", "What are the different FAISS index types?", "How can GPU acceleration improve performance?", ] for query in queries: response = query_with_faiss_rag(query, k=3) print(f"Answer: {response}\n") print("-" * 80) # Demonstrate adding new documents dynamically print("\nAdding new document...") new_doc = "FAISS supports product quantization (PQ) for memory-efficient storage of large vector datasets." new_metadata = {"category": "compression", "topic": "pq", "difficulty": "advanced"} memory.add_document(new_doc, new_metadata) # Query about the new document response = query_with_faiss_rag("What is product quantization in FAISS?") print(f"Answer about PQ: {response}") # Save the index for future use print("\nSaving FAISS index...") memory.save_index("./faiss_knowledge_base") print("Index saved successfully!") # Demonstrate loading (in a real application, you'd do this separately) print("\nTesting index loading...") new_memory = FAISSMemory() new_memory.load_index("./faiss_knowledge_base") test_results = new_memory.search("What is FAISS?", k=2) print(f"Loaded index test - found {len(test_results['documents'])} documents") ``` ## Use Cases ### 1. Large-Scale Document Search - **Scenario**: Searching through millions of documents or papers - **Benefits**: Sub-linear search time, memory efficiency - **Best For**: Academic databases, legal document search, news archives ### 2. Real-time Recommendation Systems - **Scenario**: Product or content recommendations with low latency requirements - **Benefits**: Ultra-fast query response, batch processing support - **Best For**: E-commerce, streaming platforms, social media ### 3. High-Performance RAG Applications - **Scenario**: Production RAG systems requiring fast response times - **Benefits**: Optimized C++ implementation, GPU acceleration - **Best For**: Customer support bots, technical documentation systems ### 4. Scientific Research Tools - **Scenario**: Similarity search in scientific datasets or embeddings - **Benefits**: Clustering support, exact and approximate search options - **Best For**: Bioinformatics, materials science, computer vision research ## Performance Characteristics ### Index Types Performance Comparison | Index Type | Search Speed | Memory Usage | Accuracy | Best Use Case | |------------|--------------|--------------|----------|---------------| | **Flat** | Fast | High | 100% | Small datasets (< 1M vectors) | | **IVF** | Very Fast | Medium | 95-99% | Large datasets (1M-100M vectors) | | **HNSW** | Ultra Fast | Medium-High | 95-98% | Real-time applications | | **PQ** | Fast | Low | 90-95% | Memory-constrained environments | ### Scaling Characteristics - **Small Scale** (< 1M vectors): Use Flat index for exact search - **Medium Scale** (1M - 10M vectors): Use IVF with appropriate nlist - **Large Scale** (10M - 1B vectors): Use IVF with PQ compression - **Ultra Large Scale** (> 1B vectors): Use sharded indices across multiple machines ### Performance Optimization ```python # GPU acceleration (if available) import faiss if faiss.get_num_gpus() > 0: gpu_index = faiss.index_cpu_to_gpu(faiss.StandardGpuResources(), 0, cpu_index) # Batch search for better throughput results = memory.search_batch(queries, k=10) # Memory mapping for very large indices index = faiss.read_index("large_index.faiss", faiss.IO_FLAG_MMAP) ``` ## Cloud vs Local Deployment ### Local Deployment ```python # Local FAISS with persistence memory = FAISSMemory(index_type="Flat") memory.save_index("./local_faiss_index") ``` **Advantages:** - No network latency - Full control over hardware - Cost-effective for development - Easy debugging and profiling **Disadvantages:** - Limited by single machine resources - Manual scaling required - No built-in redundancy ### Cloud Deployment ```python # Cloud deployment with distributed storage # Use cloud storage for index persistence import boto3 s3 = boto3.client('s3') # Save to cloud storage memory.save_index("/tmp/faiss_index") s3.upload_file("/tmp/faiss_index.index", "bucket", "indices/faiss_index.index") ``` **Advantages:** - Horizontal scaling with multiple instances - Managed infrastructure - Automatic backups and redundancy - Global distribution **Disadvantages:** - Network latency for large indices - Higher operational costs - More complex deployment pipeline ## Advanced Configuration ### GPU Configuration ```python import faiss # Check GPU availability print(f"GPUs available: {faiss.get_num_gpus()}") # GPU-accelerated index if faiss.get_num_gpus() > 0: cpu_index = faiss.IndexFlatIP(dimension) gpu_resources = faiss.StandardGpuResources() gpu_index = faiss.index_cpu_to_gpu(gpu_resources, 0, cpu_index) ``` ### Index Optimization ```python # IVF index with optimized parameters nlist = int(4 * np.sqrt(num_vectors)) # Rule of thumb index = faiss.IndexIVFFlat(quantizer, dimension, nlist) index.train(training_vectors) index.nprobe = min(nlist, 10) # Search parameter # HNSW index optimization index = faiss.IndexHNSWFlat(dimension, 32) # M=32 connections index.hnsw.efConstruction = 200 # Build-time parameter index.hnsw.efSearch = 100 # Query-time parameter ``` ### Memory Management ```python # Product Quantization for memory efficiency m = 8 # Number of subquantizers nbits = 8 # Bits per subquantizer pq = faiss.IndexPQ(dimension, m, nbits) # Composite index (IVF + PQ) index = faiss.IndexIVFPQ(quantizer, dimension, nlist, m, nbits) ``` ## Best Practices 1. **Index Selection**: Choose appropriate index type based on dataset size and latency requirements 2. **Memory Management**: Use product quantization for large datasets with memory constraints 3. **Batch Processing**: Process documents and queries in batches for better throughput 4. **Normalization**: Normalize embeddings for cosine similarity using inner product indices 5. **Training Data**: Use representative data for training IVF indices 6. **Parameter Tuning**: Optimize nlist, nprobe, and other parameters for your specific use case 7. **Monitoring**: Track index size, query latency, and memory usage in production 8. **Persistence**: Regularly save indices and implement proper backup strategies ## Troubleshooting ### Common Issues 1. **Memory Errors** - Reduce batch sizes or use product quantization - Consider using memory mapping for large indices - Monitor system memory usage 2. **Slow Search Performance** - Check if IVF index is properly trained - Adjust nprobe parameter (higher = slower but more accurate) - Consider using GPU acceleration 3. **Low Search Accuracy** - Increase nlist for IVF indices - Adjust efSearch for HNSW indices - Verify embedding normalization 4. **Index Loading Issues** - Check file permissions and disk space - Verify FAISS version compatibility - Ensure consistent data types (float32) This comprehensive guide provides everything needed to integrate FAISS with Swarms agents for high-performance RAG applications using the unified LiteLLM embeddings approach.