You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
141 lines
5.4 KiB
141 lines
5.4 KiB
6 months ago
|
# ChromaDB Documentation
|
||
|
|
||
|
ChromaDB is a specialized module designed to facilitate the storage and retrieval of documents using the ChromaDB system. It offers functionalities for adding documents to a local ChromaDB collection and querying this collection based on provided query texts. This module integrates with the ChromaDB client to create and manage collections, leveraging various configurations for optimizing the storage and retrieval processes.
|
||
|
|
||
|
|
||
|
#### Parameters
|
||
|
|
||
|
| Parameter | Type | Default | Description |
|
||
|
|----------------|-------------------|----------|-------------------------------------------------------------|
|
||
|
| `metric` | `str` | `"cosine"`| The similarity metric to use for the collection. |
|
||
|
| `output_dir` | `str` | `"swarms"`| The name of the collection to store the results in. |
|
||
|
| `limit_tokens` | `Optional[int]` | `1000` | The maximum number of tokens to use for the query. |
|
||
|
| `n_results` | `int` | `1` | The number of results to retrieve. |
|
||
|
| `docs_folder` | `Optional[str]` | `None` | The folder containing documents to be added to the collection.|
|
||
|
| `verbose` | `bool` | `False` | Flag to enable verbose logging for debugging. |
|
||
|
| `*args` | `tuple` | `()` | Additional positional arguments. |
|
||
|
| `**kwargs` | `dict` | `{}` | Additional keyword arguments. |
|
||
|
|
||
|
#### Methods
|
||
|
|
||
|
| Method | Description |
|
||
|
|-----------------------|----------------------------------------------------------|
|
||
|
| `__init__` | Initializes the ChromaDB instance with specified parameters. |
|
||
|
| `add` | Adds a document to the ChromaDB collection. |
|
||
|
| `query` | Queries documents from the ChromaDB collection based on the query text. |
|
||
|
| `traverse_directory` | Traverses the specified directory to add documents to the collection. |
|
||
|
|
||
|
|
||
|
## Usage
|
||
|
|
||
|
```python
|
||
|
from swarms_memory import ChromaDB
|
||
|
|
||
|
chromadb = ChromaDB(
|
||
|
metric="cosine",
|
||
|
output_dir="results",
|
||
|
limit_tokens=1000,
|
||
|
n_results=2,
|
||
|
docs_folder="path/to/docs",
|
||
|
verbose=True,
|
||
|
)
|
||
|
```
|
||
|
|
||
|
### Adding Documents
|
||
|
|
||
|
The `add` method allows you to add a document to the ChromaDB collection. It generates a unique ID for each document and adds it to the collection.
|
||
|
|
||
|
#### Parameters
|
||
|
|
||
|
| Parameter | Type | Default | Description |
|
||
|
|---------------|--------|---------|---------------------------------------------|
|
||
|
| `document` | `str` | - | The document to be added to the collection. |
|
||
|
| `*args` | `tuple`| `()` | Additional positional arguments. |
|
||
|
| `**kwargs` | `dict` | `{}` | Additional keyword arguments. |
|
||
|
|
||
|
#### Returns
|
||
|
|
||
|
| Type | Description |
|
||
|
|-------|--------------------------------------|
|
||
|
| `str` | The ID of the added document. |
|
||
|
|
||
|
#### Example
|
||
|
|
||
|
```python
|
||
|
task = "example_task"
|
||
|
result = "example_result"
|
||
|
result_id = chromadb.add(document="This is a sample document.")
|
||
|
print(f"Document ID: {result_id}")
|
||
|
```
|
||
|
|
||
|
### Querying Documents
|
||
|
|
||
|
The `query` method allows you to retrieve documents from the ChromaDB collection based on the provided query text.
|
||
|
|
||
|
#### Parameters
|
||
|
|
||
|
| Parameter | Type | Default | Description |
|
||
|
|-------------|--------|---------|----------------------------------------|
|
||
|
| `query_text`| `str` | - | The query string to search for. |
|
||
|
| `*args` | `tuple`| `()` | Additional positional arguments. |
|
||
|
| `**kwargs` | `dict` | `{}` | Additional keyword arguments. |
|
||
|
|
||
|
#### Returns
|
||
|
|
||
|
| Type | Description |
|
||
|
|-------|--------------------------------------|
|
||
|
| `str` | The retrieved documents as a string. |
|
||
|
|
||
|
#### Example
|
||
|
|
||
|
```python
|
||
|
query_text = "search term"
|
||
|
results = chromadb.query(query_text=query_text)
|
||
|
print(f"Retrieved Documents: {results}")
|
||
|
```
|
||
|
|
||
|
### Traversing Directory
|
||
|
|
||
|
The `traverse_directory` method traverses through every file in the specified directory and its subdirectories, adding the contents of each file to the ChromaDB collection.
|
||
|
|
||
|
#### Example
|
||
|
|
||
|
```python
|
||
|
chromadb.traverse_directory()
|
||
|
```
|
||
|
|
||
|
## Additional Information and Tips
|
||
|
|
||
|
### Verbose Logging
|
||
|
|
||
|
Enable the `verbose` flag during initialization to get detailed logs of the operations, which is useful for debugging.
|
||
|
|
||
|
```python
|
||
|
chromadb = ChromaDB(verbose=True)
|
||
|
```
|
||
|
|
||
|
### Handling Large Documents
|
||
|
|
||
|
When dealing with large documents, consider using the `limit_tokens` parameter to restrict the number of tokens processed in a single query.
|
||
|
|
||
|
```python
|
||
|
chromadb = ChromaDB(limit_tokens=500)
|
||
|
```
|
||
|
|
||
|
### Optimizing Query Performance
|
||
|
|
||
|
Use the appropriate similarity metric (`metric` parameter) that suits your use case for optimal query performance.
|
||
|
|
||
|
```python
|
||
|
chromadb = ChromaDB(metric="euclidean")
|
||
|
```
|
||
|
|
||
|
## References and Resources
|
||
|
|
||
|
- [ChromaDB Documentation](https://chromadb.io/docs)
|
||
|
- [Python UUID Module](https://docs.python.org/3/library/uuid.html)
|
||
|
- [Python os Module](https://docs.python.org/3/library/os.html)
|
||
|
- [Python logging Module](https://docs.python.org/3/library/logging.html)
|
||
|
- [dotenv Package](https://pypi.org/project/python-dotenv/)
|
||
|
|
||
|
By following this documentation, users can effectively utilize the ChromaDB module for managing document storage and retrieval in their applications.
|