You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
swarms/docs/swarms_memory/pinecone.md

179 lines
7.9 KiB

# PineconeMemory Documentation
The `PineconeMemory` class provides a robust interface for integrating Pinecone-based Retrieval-Augmented Generation (RAG) systems. It allows for adding documents to a Pinecone index and querying the index for similar documents. The class supports custom embedding models, preprocessing functions, and other customizations to suit different use cases.
#### Parameters
| Parameter | Type | Default | Description |
|----------------------|-----------------------------------------------|-----------------------------------|------------------------------------------------------------------------------------------------------|
| `api_key` | `str` | - | Pinecone API key. |
| `environment` | `str` | - | Pinecone environment. |
| `index_name` | `str` | - | Name of the Pinecone index to use. |
| `dimension` | `int` | `768` | Dimension of the document embeddings. |
| `embedding_model` | `Optional[Any]` | `None` | Custom embedding model. Defaults to `SentenceTransformer('all-MiniLM-L6-v2')`. |
| `embedding_function` | `Optional[Callable[[str], List[float]]]` | `None` | Custom embedding function. Defaults to `_default_embedding_function`. |
| `preprocess_function`| `Optional[Callable[[str], str]]` | `None` | Custom preprocessing function. Defaults to `_default_preprocess_function`. |
| `postprocess_function`| `Optional[Callable[[List[Dict[str, Any]]], List[Dict[str, Any]]]]`| `None` | Custom postprocessing function. Defaults to `_default_postprocess_function`. |
| `metric` | `str` | `'cosine'` | Distance metric for Pinecone index. |
| `pod_type` | `str` | `'p1'` | Pinecone pod type. |
| `namespace` | `str` | `''` | Pinecone namespace. |
| `logger_config` | `Optional[Dict[str, Any]]` | `None` | Configuration for the logger. Defaults to logging to `rag_wrapper.log` and console output. |
### Methods
#### `_setup_logger`
```python
def _setup_logger(self, config: Optional[Dict[str, Any]] = None)
```
Sets up the logger with the given configuration.
#### `_default_embedding_function`
```python
def _default_embedding_function(self, text: str) -> List[float]
```
Generates embeddings using the default SentenceTransformer model.
#### `_default_preprocess_function`
```python
def _default_preprocess_function(self, text: str) -> str
```
Preprocesses the input text by stripping whitespace.
#### `_default_postprocess_function`
```python
def _default_postprocess_function(self, results: List[Dict[str, Any]]) -> List[Dict[str, Any]]
```
Postprocesses the query results.
#### `add`
Adds a document to the Pinecone index.
| Parameter | Type | Default | Description |
|-----------|-----------------------|---------|-----------------------------------------------|
| `doc` | `str` | - | The document to be added. |
| `metadata`| `Optional[Dict[str, Any]]` | `None` | Additional metadata for the document. |
#### `query`
Queries the Pinecone index for similar documents.
| Parameter | Type | Default | Description |
|-----------|-------------------------|---------|-----------------------------------------------|
| `query` | `str` | - | The query string. |
| `top_k` | `int` | `5` | The number of top results to return. |
| `filter` | `Optional[Dict[str, Any]]` | `None` | Metadata filter for the query. |
## Usage
The `PineconeMemory` class is initialized with the necessary parameters to configure Pinecone and the embedding model. It supports a variety of custom configurations to suit different needs.
#### Example
```python
from swarms_memory import PineconeMemory
# Initialize PineconeMemory
memory = PineconeMemory(
api_key="your-api-key",
environment="us-west1-gcp",
index_name="example-index",
dimension=768
)
```
### Adding Documents
Documents can be added to the Pinecone index using the `add` method. The method accepts a document string and optional metadata.
#### Example
```python
doc = "This is a sample document to be added to the Pinecone index."
metadata = {"author": "John Doe", "date": "2024-07-08"}
memory.add(doc, metadata)
```
### Querying Documents
The `query` method allows for querying the Pinecone index for similar documents based on a query string. It returns the top `k` most similar documents.
#### Example
```python
query = "Sample query to find similar documents."
results = memory.query(query, top_k=5)
for result in results:
print(result)
```
## Additional Information and Tips
### Custom Embedding and Preprocessing Functions
Custom embedding and preprocessing functions can be provided during initialization to tailor the document processing to specific requirements.
#### Example
```python
def custom_embedding_function(text: str) -> List[float]:
# Custom embedding logic
return [0.1, 0.2, 0.3]
def custom_preprocess_function(text: str) -> str:
# Custom preprocessing logic
return text.lower()
memory = PineconeMemory(
api_key="your-api-key",
environment="us-west1-gcp",
index_name="example-index",
embedding_function=custom_embedding_function,
preprocess_function=custom_preprocess_function
)
```
### Logger Configuration
The logger can be configured to suit different logging needs. The default configuration logs to a file and the console.
#### Example
```python
logger_config = {
"handlers": [
{"sink": "custom_log.log", "rotation": "1 MB"},
{"sink": lambda msg: print(msg, end="")},
]
}
memory = PineconeMemory(
api_key="your-api-key",
environment="us-west1-gcp",
index_name="example-index",
logger_config=logger_config
)
```
## References and Resources
- [Pinecone Documentation](https://docs.pinecone.io/)
- [SentenceTransformers Documentation](https://www.sbert.net/)
- [Loguru Documentation](https://loguru.readthedocs.io/en/stable/)
For further exploration and examples, refer to the official documentation and resources provided by Pinecone, SentenceTransformers, and Loguru.
This concludes the detailed documentation for the `PineconeMemory` class. The class offers a flexible and powerful interface for leveraging Pinecone's capabilities in retrieval-augmented generation systems. By supporting custom embeddings, preprocessing, and postprocessing functions, it can be tailored to a wide range of applications.