You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
179 lines
7.9 KiB
179 lines
7.9 KiB
5 months ago
|
# PineconeMemory Documentation
|
||
|
|
||
|
The `PineconeMemory` class provides a robust interface for integrating Pinecone-based Retrieval-Augmented Generation (RAG) systems. It allows for adding documents to a Pinecone index and querying the index for similar documents. The class supports custom embedding models, preprocessing functions, and other customizations to suit different use cases.
|
||
|
|
||
|
|
||
|
|
||
|
#### Parameters
|
||
|
|
||
|
| Parameter | Type | Default | Description |
|
||
|
|----------------------|-----------------------------------------------|-----------------------------------|------------------------------------------------------------------------------------------------------|
|
||
|
| `api_key` | `str` | - | Pinecone API key. |
|
||
|
| `environment` | `str` | - | Pinecone environment. |
|
||
|
| `index_name` | `str` | - | Name of the Pinecone index to use. |
|
||
|
| `dimension` | `int` | `768` | Dimension of the document embeddings. |
|
||
|
| `embedding_model` | `Optional[Any]` | `None` | Custom embedding model. Defaults to `SentenceTransformer('all-MiniLM-L6-v2')`. |
|
||
|
| `embedding_function` | `Optional[Callable[[str], List[float]]]` | `None` | Custom embedding function. Defaults to `_default_embedding_function`. |
|
||
|
| `preprocess_function`| `Optional[Callable[[str], str]]` | `None` | Custom preprocessing function. Defaults to `_default_preprocess_function`. |
|
||
|
| `postprocess_function`| `Optional[Callable[[List[Dict[str, Any]]], List[Dict[str, Any]]]]`| `None` | Custom postprocessing function. Defaults to `_default_postprocess_function`. |
|
||
|
| `metric` | `str` | `'cosine'` | Distance metric for Pinecone index. |
|
||
|
| `pod_type` | `str` | `'p1'` | Pinecone pod type. |
|
||
|
| `namespace` | `str` | `''` | Pinecone namespace. |
|
||
|
| `logger_config` | `Optional[Dict[str, Any]]` | `None` | Configuration for the logger. Defaults to logging to `rag_wrapper.log` and console output. |
|
||
|
|
||
|
### Methods
|
||
|
|
||
|
#### `_setup_logger`
|
||
|
|
||
|
```python
|
||
|
def _setup_logger(self, config: Optional[Dict[str, Any]] = None)
|
||
|
```
|
||
|
|
||
|
Sets up the logger with the given configuration.
|
||
|
|
||
|
#### `_default_embedding_function`
|
||
|
|
||
|
```python
|
||
|
def _default_embedding_function(self, text: str) -> List[float]
|
||
|
```
|
||
|
|
||
|
Generates embeddings using the default SentenceTransformer model.
|
||
|
|
||
|
#### `_default_preprocess_function`
|
||
|
|
||
|
```python
|
||
|
def _default_preprocess_function(self, text: str) -> str
|
||
|
```
|
||
|
|
||
|
Preprocesses the input text by stripping whitespace.
|
||
|
|
||
|
#### `_default_postprocess_function`
|
||
|
|
||
|
```python
|
||
|
def _default_postprocess_function(self, results: List[Dict[str, Any]]) -> List[Dict[str, Any]]
|
||
|
```
|
||
|
|
||
|
Postprocesses the query results.
|
||
|
|
||
|
#### `add`
|
||
|
|
||
|
Adds a document to the Pinecone index.
|
||
|
|
||
|
| Parameter | Type | Default | Description |
|
||
|
|-----------|-----------------------|---------|-----------------------------------------------|
|
||
|
| `doc` | `str` | - | The document to be added. |
|
||
|
| `metadata`| `Optional[Dict[str, Any]]` | `None` | Additional metadata for the document. |
|
||
|
|
||
|
#### `query`
|
||
|
|
||
|
Queries the Pinecone index for similar documents.
|
||
|
|
||
|
| Parameter | Type | Default | Description |
|
||
|
|-----------|-------------------------|---------|-----------------------------------------------|
|
||
|
| `query` | `str` | - | The query string. |
|
||
|
| `top_k` | `int` | `5` | The number of top results to return. |
|
||
|
| `filter` | `Optional[Dict[str, Any]]` | `None` | Metadata filter for the query. |
|
||
|
|
||
|
## Usage
|
||
|
|
||
|
|
||
|
The `PineconeMemory` class is initialized with the necessary parameters to configure Pinecone and the embedding model. It supports a variety of custom configurations to suit different needs.
|
||
|
|
||
|
#### Example
|
||
|
|
||
|
```python
|
||
|
from swarms_memory import PineconeMemory
|
||
|
|
||
|
# Initialize PineconeMemory
|
||
|
memory = PineconeMemory(
|
||
|
api_key="your-api-key",
|
||
|
environment="us-west1-gcp",
|
||
|
index_name="example-index",
|
||
|
dimension=768
|
||
|
)
|
||
|
```
|
||
|
|
||
|
### Adding Documents
|
||
|
|
||
|
Documents can be added to the Pinecone index using the `add` method. The method accepts a document string and optional metadata.
|
||
|
|
||
|
#### Example
|
||
|
|
||
|
```python
|
||
|
doc = "This is a sample document to be added to the Pinecone index."
|
||
|
metadata = {"author": "John Doe", "date": "2024-07-08"}
|
||
|
|
||
|
memory.add(doc, metadata)
|
||
|
```
|
||
|
|
||
|
### Querying Documents
|
||
|
|
||
|
The `query` method allows for querying the Pinecone index for similar documents based on a query string. It returns the top `k` most similar documents.
|
||
|
|
||
|
#### Example
|
||
|
|
||
|
```python
|
||
|
query = "Sample query to find similar documents."
|
||
|
results = memory.query(query, top_k=5)
|
||
|
|
||
|
for result in results:
|
||
|
print(result)
|
||
|
```
|
||
|
|
||
|
## Additional Information and Tips
|
||
|
|
||
|
### Custom Embedding and Preprocessing Functions
|
||
|
|
||
|
Custom embedding and preprocessing functions can be provided during initialization to tailor the document processing to specific requirements.
|
||
|
|
||
|
#### Example
|
||
|
|
||
|
```python
|
||
|
def custom_embedding_function(text: str) -> List[float]:
|
||
|
# Custom embedding logic
|
||
|
return [0.1, 0.2, 0.3]
|
||
|
|
||
|
def custom_preprocess_function(text: str) -> str:
|
||
|
# Custom preprocessing logic
|
||
|
return text.lower()
|
||
|
|
||
|
memory = PineconeMemory(
|
||
|
api_key="your-api-key",
|
||
|
environment="us-west1-gcp",
|
||
|
index_name="example-index",
|
||
|
embedding_function=custom_embedding_function,
|
||
|
preprocess_function=custom_preprocess_function
|
||
|
)
|
||
|
```
|
||
|
|
||
|
### Logger Configuration
|
||
|
|
||
|
The logger can be configured to suit different logging needs. The default configuration logs to a file and the console.
|
||
|
|
||
|
#### Example
|
||
|
|
||
|
```python
|
||
|
logger_config = {
|
||
|
"handlers": [
|
||
|
{"sink": "custom_log.log", "rotation": "1 MB"},
|
||
|
{"sink": lambda msg: print(msg, end="")},
|
||
|
]
|
||
|
}
|
||
|
|
||
|
memory = PineconeMemory(
|
||
|
api_key="your-api-key",
|
||
|
environment="us-west1-gcp",
|
||
|
index_name="example-index",
|
||
|
logger_config=logger_config
|
||
|
)
|
||
|
```
|
||
|
|
||
|
## References and Resources
|
||
|
|
||
|
- [Pinecone Documentation](https://docs.pinecone.io/)
|
||
|
- [SentenceTransformers Documentation](https://www.sbert.net/)
|
||
|
- [Loguru Documentation](https://loguru.readthedocs.io/en/stable/)
|
||
|
|
||
|
For further exploration and examples, refer to the official documentation and resources provided by Pinecone, SentenceTransformers, and Loguru.
|
||
|
|
||
|
This concludes the detailed documentation for the `PineconeMemory` class. The class offers a flexible and powerful interface for leveraging Pinecone's capabilities in retrieval-augmented generation systems. By supporting custom embeddings, preprocessing, and postprocessing functions, it can be tailored to a wide range of applications.
|