7.9 KiB
PineconeMemory Documentation
The PineconeMemory
class provides a robust interface for integrating Pinecone-based Retrieval-Augmented Generation (RAG) systems. It allows for adding documents to a Pinecone index and querying the index for similar documents. The class supports custom embedding models, preprocessing functions, and other customizations to suit different use cases.
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
api_key |
str |
- | Pinecone API key. |
environment |
str |
- | Pinecone environment. |
index_name |
str |
- | Name of the Pinecone index to use. |
dimension |
int |
768 |
Dimension of the document embeddings. |
embedding_model |
Optional[Any] |
None |
Custom embedding model. Defaults to SentenceTransformer('all-MiniLM-L6-v2') . |
embedding_function |
Optional[Callable[[str], List[float]]] |
None |
Custom embedding function. Defaults to _default_embedding_function . |
preprocess_function |
Optional[Callable[[str], str]] |
None |
Custom preprocessing function. Defaults to _default_preprocess_function . |
postprocess_function |
Optional[Callable[[List[Dict[str, Any]]], List[Dict[str, Any]]]] |
None |
Custom postprocessing function. Defaults to _default_postprocess_function . |
metric |
str |
'cosine' |
Distance metric for Pinecone index. |
pod_type |
str |
'p1' |
Pinecone pod type. |
namespace |
str |
'' |
Pinecone namespace. |
logger_config |
Optional[Dict[str, Any]] |
None |
Configuration for the logger. Defaults to logging to rag_wrapper.log and console output. |
Methods
_setup_logger
def _setup_logger(self, config: Optional[Dict[str, Any]] = None)
Sets up the logger with the given configuration.
_default_embedding_function
def _default_embedding_function(self, text: str) -> List[float]
Generates embeddings using the default SentenceTransformer model.
_default_preprocess_function
def _default_preprocess_function(self, text: str) -> str
Preprocesses the input text by stripping whitespace.
_default_postprocess_function
def _default_postprocess_function(self, results: List[Dict[str, Any]]) -> List[Dict[str, Any]]
Postprocesses the query results.
add
Adds a document to the Pinecone index.
Parameter | Type | Default | Description |
---|---|---|---|
doc |
str |
- | The document to be added. |
metadata |
Optional[Dict[str, Any]] |
None |
Additional metadata for the document. |
query
Queries the Pinecone index for similar documents.
Parameter | Type | Default | Description |
---|---|---|---|
query |
str |
- | The query string. |
top_k |
int |
5 |
The number of top results to return. |
filter |
Optional[Dict[str, Any]] |
None |
Metadata filter for the query. |
Usage
The PineconeMemory
class is initialized with the necessary parameters to configure Pinecone and the embedding model. It supports a variety of custom configurations to suit different needs.
Example
from swarms_memory import PineconeMemory
# Initialize PineconeMemory
memory = PineconeMemory(
api_key="your-api-key",
environment="us-west1-gcp",
index_name="example-index",
dimension=768
)
Adding Documents
Documents can be added to the Pinecone index using the add
method. The method accepts a document string and optional metadata.
Example
doc = "This is a sample document to be added to the Pinecone index."
metadata = {"author": "John Doe", "date": "2024-07-08"}
memory.add(doc, metadata)
Querying Documents
The query
method allows for querying the Pinecone index for similar documents based on a query string. It returns the top k
most similar documents.
Example
query = "Sample query to find similar documents."
results = memory.query(query, top_k=5)
for result in results:
print(result)
Additional Information and Tips
Custom Embedding and Preprocessing Functions
Custom embedding and preprocessing functions can be provided during initialization to tailor the document processing to specific requirements.
Example
def custom_embedding_function(text: str) -> List[float]:
# Custom embedding logic
return [0.1, 0.2, 0.3]
def custom_preprocess_function(text: str) -> str:
# Custom preprocessing logic
return text.lower()
memory = PineconeMemory(
api_key="your-api-key",
environment="us-west1-gcp",
index_name="example-index",
embedding_function=custom_embedding_function,
preprocess_function=custom_preprocess_function
)
Logger Configuration
The logger can be configured to suit different logging needs. The default configuration logs to a file and the console.
Example
logger_config = {
"handlers": [
{"sink": "custom_log.log", "rotation": "1 MB"},
{"sink": lambda msg: print(msg, end="")},
]
}
memory = PineconeMemory(
api_key="your-api-key",
environment="us-west1-gcp",
index_name="example-index",
logger_config=logger_config
)
References and Resources
For further exploration and examples, refer to the official documentation and resources provided by Pinecone, SentenceTransformers, and Loguru.
This concludes the detailed documentation for the PineconeMemory
class. The class offers a flexible and powerful interface for leveraging Pinecone's capabilities in retrieval-augmented generation systems. By supporting custom embeddings, preprocessing, and postprocessing functions, it can be tailored to a wide range of applications.