You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
swarms/docs/swarms_memory/chromadb.md

141 lines
5.4 KiB

6 months ago
# ChromaDB Documentation
ChromaDB is a specialized module designed to facilitate the storage and retrieval of documents using the ChromaDB system. It offers functionalities for adding documents to a local ChromaDB collection and querying this collection based on provided query texts. This module integrates with the ChromaDB client to create and manage collections, leveraging various configurations for optimizing the storage and retrieval processes.
#### Parameters
| Parameter | Type | Default | Description |
|----------------|-------------------|----------|-------------------------------------------------------------|
| `metric` | `str` | `"cosine"`| The similarity metric to use for the collection. |
| `output_dir` | `str` | `"swarms"`| The name of the collection to store the results in. |
| `limit_tokens` | `Optional[int]` | `1000` | The maximum number of tokens to use for the query. |
| `n_results` | `int` | `1` | The number of results to retrieve. |
| `docs_folder` | `Optional[str]` | `None` | The folder containing documents to be added to the collection.|
| `verbose` | `bool` | `False` | Flag to enable verbose logging for debugging. |
| `*args` | `tuple` | `()` | Additional positional arguments. |
| `**kwargs` | `dict` | `{}` | Additional keyword arguments. |
#### Methods
| Method | Description |
|-----------------------|----------------------------------------------------------|
| `__init__` | Initializes the ChromaDB instance with specified parameters. |
| `add` | Adds a document to the ChromaDB collection. |
| `query` | Queries documents from the ChromaDB collection based on the query text. |
| `traverse_directory` | Traverses the specified directory to add documents to the collection. |
## Usage
```python
from swarms_memory import ChromaDB
chromadb = ChromaDB(
metric="cosine",
output_dir="results",
limit_tokens=1000,
n_results=2,
docs_folder="path/to/docs",
verbose=True,
)
```
### Adding Documents
The `add` method allows you to add a document to the ChromaDB collection. It generates a unique ID for each document and adds it to the collection.
#### Parameters
| Parameter | Type | Default | Description |
|---------------|--------|---------|---------------------------------------------|
| `document` | `str` | - | The document to be added to the collection. |
| `*args` | `tuple`| `()` | Additional positional arguments. |
| `**kwargs` | `dict` | `{}` | Additional keyword arguments. |
#### Returns
| Type | Description |
|-------|--------------------------------------|
| `str` | The ID of the added document. |
#### Example
```python
task = "example_task"
result = "example_result"
result_id = chromadb.add(document="This is a sample document.")
print(f"Document ID: {result_id}")
```
### Querying Documents
The `query` method allows you to retrieve documents from the ChromaDB collection based on the provided query text.
#### Parameters
| Parameter | Type | Default | Description |
|-------------|--------|---------|----------------------------------------|
| `query_text`| `str` | - | The query string to search for. |
| `*args` | `tuple`| `()` | Additional positional arguments. |
| `**kwargs` | `dict` | `{}` | Additional keyword arguments. |
#### Returns
| Type | Description |
|-------|--------------------------------------|
| `str` | The retrieved documents as a string. |
#### Example
```python
query_text = "search term"
results = chromadb.query(query_text=query_text)
print(f"Retrieved Documents: {results}")
```
### Traversing Directory
The `traverse_directory` method traverses through every file in the specified directory and its subdirectories, adding the contents of each file to the ChromaDB collection.
#### Example
```python
chromadb.traverse_directory()
```
## Additional Information and Tips
### Verbose Logging
Enable the `verbose` flag during initialization to get detailed logs of the operations, which is useful for debugging.
```python
chromadb = ChromaDB(verbose=True)
```
### Handling Large Documents
When dealing with large documents, consider using the `limit_tokens` parameter to restrict the number of tokens processed in a single query.
```python
chromadb = ChromaDB(limit_tokens=500)
```
### Optimizing Query Performance
Use the appropriate similarity metric (`metric` parameter) that suits your use case for optimal query performance.
```python
chromadb = ChromaDB(metric="euclidean")
```
## References and Resources
- [ChromaDB Documentation](https://chromadb.io/docs)
- [Python UUID Module](https://docs.python.org/3/library/uuid.html)
- [Python os Module](https://docs.python.org/3/library/os.html)
- [Python logging Module](https://docs.python.org/3/library/logging.html)
- [dotenv Package](https://pypi.org/project/python-dotenv/)
By following this documentation, users can effectively utilize the ChromaDB module for managing document storage and retrieval in their applications.