You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
swarms/docs/swarms_memory/pinecone.md

7.9 KiB

PineconeMemory Documentation

The PineconeMemory class provides a robust interface for integrating Pinecone-based Retrieval-Augmented Generation (RAG) systems. It allows for adding documents to a Pinecone index and querying the index for similar documents. The class supports custom embedding models, preprocessing functions, and other customizations to suit different use cases.

Parameters

Parameter Type Default Description
api_key str - Pinecone API key.
environment str - Pinecone environment.
index_name str - Name of the Pinecone index to use.
dimension int 768 Dimension of the document embeddings.
embedding_model Optional[Any] None Custom embedding model. Defaults to SentenceTransformer('all-MiniLM-L6-v2').
embedding_function Optional[Callable[[str], List[float]]] None Custom embedding function. Defaults to _default_embedding_function.
preprocess_function Optional[Callable[[str], str]] None Custom preprocessing function. Defaults to _default_preprocess_function.
postprocess_function Optional[Callable[[List[Dict[str, Any]]], List[Dict[str, Any]]]] None Custom postprocessing function. Defaults to _default_postprocess_function.
metric str 'cosine' Distance metric for Pinecone index.
pod_type str 'p1' Pinecone pod type.
namespace str '' Pinecone namespace.
logger_config Optional[Dict[str, Any]] None Configuration for the logger. Defaults to logging to rag_wrapper.log and console output.

Methods

_setup_logger

def _setup_logger(self, config: Optional[Dict[str, Any]] = None)

Sets up the logger with the given configuration.

_default_embedding_function

def _default_embedding_function(self, text: str) -> List[float]

Generates embeddings using the default SentenceTransformer model.

_default_preprocess_function

def _default_preprocess_function(self, text: str) -> str

Preprocesses the input text by stripping whitespace.

_default_postprocess_function

def _default_postprocess_function(self, results: List[Dict[str, Any]]) -> List[Dict[str, Any]]

Postprocesses the query results.

add

Adds a document to the Pinecone index.

Parameter Type Default Description
doc str - The document to be added.
metadata Optional[Dict[str, Any]] None Additional metadata for the document.

query

Queries the Pinecone index for similar documents.

Parameter Type Default Description
query str - The query string.
top_k int 5 The number of top results to return.
filter Optional[Dict[str, Any]] None Metadata filter for the query.

Usage

The PineconeMemory class is initialized with the necessary parameters to configure Pinecone and the embedding model. It supports a variety of custom configurations to suit different needs.

Example

from swarms_memory import PineconeMemory

# Initialize PineconeMemory
memory = PineconeMemory(
    api_key="your-api-key",
    environment="us-west1-gcp",
    index_name="example-index",
    dimension=768
)

Adding Documents

Documents can be added to the Pinecone index using the add method. The method accepts a document string and optional metadata.

Example

doc = "This is a sample document to be added to the Pinecone index."
metadata = {"author": "John Doe", "date": "2024-07-08"}

memory.add(doc, metadata)

Querying Documents

The query method allows for querying the Pinecone index for similar documents based on a query string. It returns the top k most similar documents.

Example

query = "Sample query to find similar documents."
results = memory.query(query, top_k=5)

for result in results:
    print(result)

Additional Information and Tips

Custom Embedding and Preprocessing Functions

Custom embedding and preprocessing functions can be provided during initialization to tailor the document processing to specific requirements.

Example

def custom_embedding_function(text: str) -> List[float]:
    # Custom embedding logic
    return [0.1, 0.2, 0.3]

def custom_preprocess_function(text: str) -> str:
    # Custom preprocessing logic
    return text.lower()

memory = PineconeMemory(
    api_key="your-api-key",
    environment="us-west1-gcp",
    index_name="example-index",
    embedding_function=custom_embedding_function,
    preprocess_function=custom_preprocess_function
)

Logger Configuration

The logger can be configured to suit different logging needs. The default configuration logs to a file and the console.

Example

logger_config = {
    "handlers": [
        {"sink": "custom_log.log", "rotation": "1 MB"},
        {"sink": lambda msg: print(msg, end="")},
    ]
}

memory = PineconeMemory(
    api_key="your-api-key",
    environment="us-west1-gcp",
    index_name="example-index",
    logger_config=logger_config
)

References and Resources

For further exploration and examples, refer to the official documentation and resources provided by Pinecone, SentenceTransformers, and Loguru.

This concludes the detailed documentation for the PineconeMemory class. The class offers a flexible and powerful interface for leveraging Pinecone's capabilities in retrieval-augmented generation systems. By supporting custom embeddings, preprocessing, and postprocessing functions, it can be tailored to a wide range of applications.