swarms/docs/swarms_memory/faiss.md

# FAISSDB: Documentation

The `FAISSDB` class is a highly customizable wrapper for the FAISS (Facebook AI Similarity Search) library, designed for efficient similarity search and clustering of dense vectors. This class facilitates the creation of a Retrieval-Augmented Generation (RAG) system by providing methods to add documents to a FAISS index and query the index for similar documents. It supports custom embedding models, preprocessing functions, and other customizations to fit various use cases.


### Parameters

| Parameter              | Type                                             | Default                       | Description                                                                 |
|------------------------|--------------------------------------------------|-------------------------------|-----------------------------------------------------------------------------|
| `dimension`            | `int`                                            | `768`                         | Dimension of the document embeddings.                                       |
| `index_type`           | `str`                                            | `'Flat'`                      | Type of FAISS index to use (`'Flat'` or `'IVF'`).                           |
| `embedding_model`      | `Optional[Any]`                                  | `None`                        | Custom embedding model.                                                     |
| `embedding_function`   | `Optional[Callable[[str], List[float]]]`         | `None`                        | Custom function to generate embeddings from text.                           |
| `preprocess_function`  | `Optional[Callable[[str], str]]`                 | `None`                        | Custom function to preprocess text before embedding.                        |
| `postprocess_function` | `Optional[Callable[[List[Dict[str, Any]]], List[Dict[str, Any]]]]` | `None` | Custom function to postprocess the results.                                  |
| `metric`               | `str`                                            | `'cosine'`                    | Distance metric for FAISS index (`'cosine'` or `'l2'`).                     |
| `logger_config`        | `Optional[Dict[str, Any]]`                       | `None`                        | Configuration for the logger.                                               |

## Methods

### `__init__`

Initializes the FAISSDB instance, setting up the logger, creating the FAISS index, and configuring custom functions if provided.

### `add`

Adds a document to the FAISS index.

#### Parameters

| Parameter | Type                    | Default | Description                                     |
|-----------|-------------------------|---------|-------------------------------------------------|
| `doc`     | `str`                   | None    | The document to be added.                       |
| `metadata`| `Optional[Dict[str, Any]]` | None    | Additional metadata for the document.            |

#### Example Usage

```python
db = FAISSDB(dimension=768)
db.add("This is a sample document.", {"category": "sample"})
```

### `query`

Queries the FAISS index for similar documents.

#### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `query`   | `str` | None    | The query string. |
| `top_k`   | `int` | `5`     | The number of top results to return. |

#### Returns

| Type | Description |
|------|-------------|
| `List[Dict[str, Any]]` | A list of dictionaries containing the top_k most similar documents. |

#### Example Usage

```python
results = db.query("What is artificial intelligence?")
for result in results:
    print(f"Score: {result['score']}, Text: {result['metadata']['text']}")
```

## Internal Methods

### `_setup_logger`

Sets up the logger with the given configuration.

#### Parameters

| Parameter | Type                    | Default | Description                              |
|-----------|-------------------------|---------|------------------------------------------|
| `config`  | `Optional[Dict[str, Any]]` | None    | Configuration for the logger.             |

### `_create_index`

Creates and returns a FAISS index based on the specified type and metric.

#### Parameters

| Parameter | Type  | Default | Description                                  |
|-----------|-------|---------|----------------------------------------------|
| `index_type` | `str`  | 'Flat'   | Type of FAISS index to use.                  |
| `metric`    | `str`  | 'cosine' | Distance metric for FAISS index.             |

#### Returns

| Type | Description      |
|------|------------------|
| `faiss.Index` | FAISS index instance. |

### `_default_embedding_function`

Default embedding function using the SentenceTransformer model.

#### Parameters

| Parameter | Type | Default | Description          |
|-----------|------|---------|----------------------|
| `text`    | `str` | None    | The input text to embed. |

#### Returns

| Type | Description       |
|------|-------------------|
| `List[float]` | Embedding vector for the input text. |

### `_default_preprocess_function`

Default preprocessing function.

#### Parameters

| Parameter | Type | Default | Description        |
|-----------|------|---------|--------------------|
| `text`    | `str` | None    | The input text to preprocess. |

#### Returns

| Type | Description      |
|------|------------------|
| `str` | Preprocessed text. |

### `_default_postprocess_function`

Default postprocessing function.

#### Parameters

| Parameter | Type | Default | Description                    |
|-----------|------|---------|--------------------------------|
| `results` | `List[Dict[str, Any]]` | None    | The results to postprocess.   |

#### Returns

| Type | Description              |
|------|--------------------------|
| `List[Dict[str, Any]]` | Postprocessed results. |

## Usage Examples

### Example 1: Basic Usage

```python
# Initialize the FAISSDB instance
db = FAISSDB(dimension=768, index_type="Flat")

# Add documents to the FAISS index
db.add("This is a document about AI.", {"category": "AI"})
db.add("Python is great for data science.", {"category": "Programming"})

# Query the FAISS index
results = db.query("Tell me about AI")
for result in results:
    print(f"Score: {result['score']}, Text: {result['metadata']['text']}")
```

### Example 2: Custom Functions

```python
from transformers import AutoTokenizer, AutoModel
import torch

# Custom embedding function using a HuggingFace model
def custom_embedding_function(text: str) -> List[float]:
    tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
    model = AutoModel.from_pretrained("bert-base-uncased")
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    embeddings = outputs.last_hidden_state.mean(dim=1).squeeze().tolist()
    return embeddings

# Custom preprocessing function
def custom_preprocess(text: str) -> str:
    return text.lower().strip()

# Custom postprocessing function
def custom_postprocess(results: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    for result in results:
        result["custom_score"] = result["score"] * 2  # Example modification
    return results

# Initialize the FAISSDB instance with custom functions
db = FAISSDB(
    dimension=768,
    index_type="Flat",
    embedding_function=custom_embedding_function,
    preprocess_function=custom_preprocess,
    postprocess_function=custom_postprocess,
    metric="cosine",
    logger_config={
        "handlers": [
            {"sink": "custom_faiss_rag_wrapper.log", "rotation": "1 GB"},
            {"sink": lambda msg: print(f"Custom log: {msg}", end="")}
        ],
    },
)

# Add documents to the FAISS index
db.add("This is a document about machine learning.", {"category": "ML"})
db.add("Python is a versatile programming language.", {"category": "Programming"})

# Query the FAISS index
results = db.query("Explain machine learning")
for result in results:
    print(f"Score: {result['score']}, Custom Score: {result['custom_score']}, Text: {result['metadata']['text']}")
```

## Additional Information and Tips

- Ensure that the dimension of the document embeddings matches the dimension specified during the initialization of the FAISSDB instance.
- Use custom embedding functions to leverage domain-specific models for generating embeddings.
- Custom preprocessing and postprocessing functions can help tailor the text processing and

 result formatting to specific needs.
- FAISS supports various types of indices; choose the one that best fits the application requirements (e.g., `Flat` for brute-force search, `IVF` for faster search with some accuracy trade-off).
- Properly configure the logger to monitor and debug the operations of the FAISSDB instance.

## References and Resources

- [FAISS GitHub Repository](https://github.com/facebookresearch/faiss)
- [Sentence Transformers Documentation](https://www.sbert.net/)
- [Loguru Documentation](https://loguru.readthedocs.io/en/stable/)
- [HuggingFace Transformers](https://huggingface.co/transformers/)

By following this documentation, users can effectively utilize the `FAISSDB` class for various similarity search and document retrieval tasks, customizing it to their specific needs through the provided hooks and functions.
[5.4.8] 5 months ago			`# FAISSDB: Documentation`

			The `FAISSDB` class is a highly customizable wrapper for the FAISS (Facebook AI Similarity Search) library, designed for efficient similarity search and clustering of dense vectors. This class facilitates the creation of a Retrieval-Augmented Generation (RAG) system by providing methods to add documents to a FAISS index and query the index for similar documents. It supports custom embedding models, preprocessing functions, and other customizations to fit various use cases.


			`### Parameters`

			`\| Parameter \| Type \| Default \| Description \|`
			`\|------------------------\|--------------------------------------------------\|-------------------------------\|-----------------------------------------------------------------------------\|`
			\| `dimension` \| `int` \| `768` \| Dimension of the document embeddings. \|
			\| `index_type` \| `str` \| `'Flat'` \| Type of FAISS index to use (`'Flat'` or `'IVF'`). \|
			\| `embedding_model` \| `Optional[Any]` \| `None` \| Custom embedding model. \|
			\| `embedding_function` \| `Optional[Callable[[str], List[float]]]` \| `None` \| Custom function to generate embeddings from text. \|
			\| `preprocess_function` \| `Optional[Callable[[str], str]]` \| `None` \| Custom function to preprocess text before embedding. \|
			\| `postprocess_function` \| `Optional[Callable[[List[Dict[str, Any]]], List[Dict[str, Any]]]]` \| `None` \| Custom function to postprocess the results. \|
			\| `metric` \| `str` \| `'cosine'` \| Distance metric for FAISS index (`'cosine'` or `'l2'`). \|
			\| `logger_config` \| `Optional[Dict[str, Any]]` \| `None` \| Configuration for the logger. \|

			`## Methods`

			### `__init__`

			`Initializes the FAISSDB instance, setting up the logger, creating the FAISS index, and configuring custom functions if provided.`

			### `add`

			`Adds a document to the FAISS index.`

			`#### Parameters`

			`\| Parameter \| Type \| Default \| Description \|`
			`\|-----------\|-------------------------\|---------\|-------------------------------------------------\|`
			\| `doc` \| `str` \| None \| The document to be added. \|
			\| `metadata`\| `Optional[Dict[str, Any]]` \| None \| Additional metadata for the document. \|

			`#### Example Usage`

			```python
			`db = FAISSDB(dimension=768)`
			`db.add("This is a sample document.", {"category": "sample"})`
			```

			### `query`

			`Queries the FAISS index for similar documents.`

			`#### Parameters`

			`\| Parameter \| Type \| Default \| Description \|`
			`\|-----------\|------\|---------\|-------------\|`
			\| `query` \| `str` \| None \| The query string. \|
			\| `top_k` \| `int` \| `5` \| The number of top results to return. \|

			`#### Returns`

			`\| Type \| Description \|`
			`\|------\|-------------\|`
			\| `List[Dict[str, Any]]` \| A list of dictionaries containing the top_k most similar documents. \|

			`#### Example Usage`

			```python
			`results = db.query("What is artificial intelligence?")`
			`for result in results:`
			`print(f"Score: {result['score']}, Text: {result['metadata']['text']}")`
			```

			`## Internal Methods`

			### `_setup_logger`

			`Sets up the logger with the given configuration.`

			`#### Parameters`

			`\| Parameter \| Type \| Default \| Description \|`
			`\|-----------\|-------------------------\|---------\|------------------------------------------\|`
			\| `config` \| `Optional[Dict[str, Any]]` \| None \| Configuration for the logger. \|

			### `_create_index`

			`Creates and returns a FAISS index based on the specified type and metric.`

			`#### Parameters`

			`\| Parameter \| Type \| Default \| Description \|`
			`\|-----------\|-------\|---------\|----------------------------------------------\|`
			\| `index_type` \| `str` \| 'Flat' \| Type of FAISS index to use. \|
			\| `metric` \| `str` \| 'cosine' \| Distance metric for FAISS index. \|

			`#### Returns`

			`\| Type \| Description \|`
			`\|------\|------------------\|`
			\| `faiss.Index` \| FAISS index instance. \|

			### `_default_embedding_function`

			`Default embedding function using the SentenceTransformer model.`

			`#### Parameters`

			`\| Parameter \| Type \| Default \| Description \|`
			`\|-----------\|------\|---------\|----------------------\|`
			\| `text` \| `str` \| None \| The input text to embed. \|

			`#### Returns`

			`\| Type \| Description \|`
			`\|------\|-------------------\|`
			\| `List[float]` \| Embedding vector for the input text. \|

			### `_default_preprocess_function`

			`Default preprocessing function.`

			`#### Parameters`

			`\| Parameter \| Type \| Default \| Description \|`
			`\|-----------\|------\|---------\|--------------------\|`
			\| `text` \| `str` \| None \| The input text to preprocess. \|

			`#### Returns`

			`\| Type \| Description \|`
			`\|------\|------------------\|`
			\| `str` \| Preprocessed text. \|

			### `_default_postprocess_function`

			`Default postprocessing function.`

			`#### Parameters`

			`\| Parameter \| Type \| Default \| Description \|`
			`\|-----------\|------\|---------\|--------------------------------\|`
			\| `results` \| `List[Dict[str, Any]]` \| None \| The results to postprocess. \|

			`#### Returns`

			`\| Type \| Description \|`
			`\|------\|--------------------------\|`
			\| `List[Dict[str, Any]]` \| Postprocessed results. \|

			`## Usage Examples`

			`### Example 1: Basic Usage`

			```python
			`# Initialize the FAISSDB instance`
			`db = FAISSDB(dimension=768, index_type="Flat")`

			`# Add documents to the FAISS index`
			`db.add("This is a document about AI.", {"category": "AI"})`
			`db.add("Python is great for data science.", {"category": "Programming"})`

			`# Query the FAISS index`
			`results = db.query("Tell me about AI")`
			`for result in results:`
			`print(f"Score: {result['score']}, Text: {result['metadata']['text']}")`
			```

			`### Example 2: Custom Functions`

			```python
			`from transformers import AutoTokenizer, AutoModel`
			`import torch`

			`# Custom embedding function using a HuggingFace model`
			`def custom_embedding_function(text: str) -> List[float]:`
			`tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")`
			`model = AutoModel.from_pretrained("bert-base-uncased")`
			`inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)`
			`with torch.no_grad():`
			`outputs = model(**inputs)`
			`embeddings = outputs.last_hidden_state.mean(dim=1).squeeze().tolist()`
			`return embeddings`

			`# Custom preprocessing function`
			`def custom_preprocess(text: str) -> str:`
			`return text.lower().strip()`

			`# Custom postprocessing function`
			`def custom_postprocess(results: List[Dict[str, Any]]) -> List[Dict[str, Any]]:`
			`for result in results:`
			`result["custom_score"] = result["score"] * 2 # Example modification`
			`return results`

			`# Initialize the FAISSDB instance with custom functions`
			`db = FAISSDB(`
			`dimension=768,`
			`index_type="Flat",`
			`embedding_function=custom_embedding_function,`
			`preprocess_function=custom_preprocess,`
			`postprocess_function=custom_postprocess,`
			`metric="cosine",`
			`logger_config={`
			`"handlers": [`
			`{"sink": "custom_faiss_rag_wrapper.log", "rotation": "1 GB"},`
			`{"sink": lambda msg: print(f"Custom log: {msg}", end="")}`
			`],`
			`},`
			`)`

			`# Add documents to the FAISS index`
			`db.add("This is a document about machine learning.", {"category": "ML"})`
			`db.add("Python is a versatile programming language.", {"category": "Programming"})`

			`# Query the FAISS index`
			`results = db.query("Explain machine learning")`
			`for result in results:`
			`print(f"Score: {result['score']}, Custom Score: {result['custom_score']}, Text: {result['metadata']['text']}")`
			```

			`## Additional Information and Tips`

			`- Ensure that the dimension of the document embeddings matches the dimension specified during the initialization of the FAISSDB instance.`
			`- Use custom embedding functions to leverage domain-specific models for generating embeddings.`
			`- Custom preprocessing and postprocessing functions can help tailor the text processing and`

			`result formatting to specific needs.`
			- FAISS supports various types of indices; choose the one that best fits the application requirements (e.g., `Flat` for brute-force search, `IVF` for faster search with some accuracy trade-off).
			`- Properly configure the logger to monitor and debug the operations of the FAISSDB instance.`

			`## References and Resources`

			`- [FAISS GitHub Repository](https://github.com/facebookresearch/faiss)`
			`- [Sentence Transformers Documentation](https://www.sbert.net/)`
			`- [Loguru Documentation](https://loguru.readthedocs.io/en/stable/)`
			`- [HuggingFace Transformers](https://huggingface.co/transformers/)`

			By following this documentation, users can effectively utilize the `FAISSDB` class for various similarity search and document retrieval tasks, customizing it to their specific needs through the provided hooks and functions.