You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
swarms/docs/swarms/structs/image_batch_agent.md

272 lines
7.8 KiB

# ImageAgentBatchProcessor Documentation
## Overview
The `ImageAgentBatchProcessor` is a high-performance parallel image processing system designed for running AI agents on multiple images concurrently. It provides robust error handling, logging, and flexible configuration options.
## Installation
```bash
pip install swarms
```
## Class Arguments
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| agents | Union[Agent, List[Agent], Callable, List[Callable]] | Required | Single agent or list of agents to process images |
| max_workers | int | None | Maximum number of parallel workers (defaults to 95% of CPU cores) |
| supported_formats | List[str] | ['.jpg', '.jpeg', '.png'] | List of supported image file extensions |
## Methods
### run()
**Description**: Main method for processing multiple images in parallel with configured agents. Can handle single images, multiple images, or entire directories.
**Arguments**:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| image_paths | Union[str, List[str], Path] | Yes | Single image path, list of paths, or directory path |
| tasks | Union[str, List[str]] | Yes | Single task or list of tasks to perform on each image |
**Returns**: List[Dict[str, Any]] - List of processing results for each image
**Example**:
```python
from swarms import Agent
from swarms.structs import ImageAgentBatchProcessor
from pathlib import Path
# Initialize agent and processor
agent = Agent(api_key="your-api-key", model="gpt-4-vision")
processor = ImageAgentBatchProcessor(agents=agent)
# Example 1: Process single image
results = processor.run(
image_paths="path/to/image.jpg",
tasks="Describe this image"
)
# Example 2: Process multiple images
results = processor.run(
image_paths=["image1.jpg", "image2.jpg"],
tasks=["Describe objects", "Identify colors"]
)
# Example 3: Process directory
results = processor.run(
image_paths=Path("./images"),
tasks="Analyze image content"
)
```
### _validate_image_path()
**Description**: Internal method that validates if an image path exists and has a supported format.
**Arguments**:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| image_path | Union[str, Path] | Yes | Path to the image file to validate |
**Returns**: Path - Validated Path object
**Example**:
```python
from swarms.structs import ImageAgentBatchProcessor, ImageProcessingError
from pathlib import Path
processor = ImageAgentBatchProcessor(agents=agent)
try:
validated_path = processor._validate_image_path("image.jpg")
print(f"Valid image path: {validated_path}")
except ImageProcessingError as e:
print(f"Invalid image path: {e}")
```
### _process_single_image()
**Description**: Internal method that processes a single image with one agent and one or more tasks.
**Arguments**:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| image_path | Path | Yes | Path to the image to process |
| tasks | Union[str, List[str]] | Yes | Tasks to perform on the image |
| agent | Agent | Yes | Agent to use for processing |
**Returns**: Dict[str, Any] - Processing results for the image
**Example**:
```python
from swarms import Agent
from swarms.structs import ImageAgentBatchProcessor
from pathlib import Path
agent = Agent(api_key="your-api-key", model="gpt-4-vision")
processor = ImageAgentBatchProcessor(agents=agent)
try:
result = processor._process_single_image(
image_path=Path("image.jpg"),
tasks=["Describe image", "Identify objects"],
agent=agent
)
print(f"Processing results: {result}")
except Exception as e:
print(f"Processing failed: {e}")
```
### __call__()
**Description**: Makes the ImageAgentBatchProcessor callable like a function. Redirects to the run() method.
**Arguments**:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| *args | Any | No | Variable length argument list passed to run() |
| **kwargs | Any | No | Keyword arguments passed to run() |
**Returns**: List[Dict[str, Any]] - Same as run() method
**Example**:
```python
from swarms import Agent
from swarms.structs import ImageAgentBatchProcessor
# Initialize
agent = Agent(api_key="your-api-key", model="gpt-4-vision")
processor = ImageAgentBatchProcessor(agents=agent)
# Using __call__
results = processor(
image_paths=["image1.jpg", "image2.jpg"],
tasks="Describe the image"
)
# This is equivalent to:
results = processor.run(
image_paths=["image1.jpg", "image2.jpg"],
tasks="Describe the image"
)
```
## Return Format
The processor returns a list of dictionaries with the following structure:
```python
{
"image_path": str, # Path to the processed image
"results": { # Results for each task
"task_name": result, # Task-specific results
},
"processing_time": float # Processing time in seconds
}
```
## Complete Usage Examples
### 1. Basic Usage with Single Agent
```python
from swarms import Agent
from swarms.structs import ImageAgentBatchProcessor
# Initialize an agent
agent = Agent(
api_key="your-api-key",
model="gpt-4-vision"
)
# Create processor
processor = ImageAgentBatchProcessor(agents=agent)
# Process single image
results = processor.run(
image_paths="path/to/image.jpg",
tasks="Describe this image in detail"
)
```
### 2. Processing Multiple Images with Multiple Tasks
```python
# Initialize with multiple agents
agent1 = Agent(api_key="key1", model="gpt-4-vision")
agent2 = Agent(api_key="key2", model="claude-3")
processor = ImageAgentBatchProcessor(
agents=[agent1, agent2],
supported_formats=['.jpg', '.png', '.webp']
)
# Define multiple tasks
tasks = [
"Describe the main objects in the image",
"What is the dominant color?",
"Identify any text in the image"
]
# Process a directory of images
results = processor.run(
image_paths="path/to/image/directory",
tasks=tasks
)
# Process results
for result in results:
print(f"Image: {result['image_path']}")
for task, output in result['results'].items():
print(f"Task: {task}")
print(f"Result: {output}")
print(f"Processing time: {result['processing_time']:.2f} seconds")
```
### 3. Custom Error Handling
```python
from swarms.structs import ImageAgentBatchProcessor, ImageProcessingError
try:
processor = ImageAgentBatchProcessor(agents=agent)
results = processor.run(
image_paths=["image1.jpg", "image2.png", "invalid.txt"],
tasks="Analyze the image"
)
except ImageProcessingError as e:
print(f"Image processing failed: {e}")
except InvalidAgentError as e:
print(f"Agent configuration error: {e}")
```
## Best Practices
| Best Practice | Description |
|--------------|-------------|
| Resource Management | • The processor automatically uses 95% of available CPU cores<br>• For memory-intensive operations, consider reducing `max_workers` |
| Error Handling | • Always wrap processor calls in try-except blocks<br>• Check the results for any error keys |
| Task Design | • Keep tasks focused and specific<br>• Group related tasks together for efficiency |
| Performance Optimization | • Process images in batches for better throughput<br>• Use multiple agents for different types of analysis |
## Limitations
| Limitation | Description |
|------------|-------------|
| File Format Support | Only supports image file formats specified in `supported_formats` |
| Agent Requirements | Requires valid agent configurations |
| Resource Scaling | Memory usage scales with number of concurrent processes |
This documentation provides a comprehensive guide to using the `ImageAgentBatchProcessor`. The class is designed to be both powerful and flexible, allowing for various use cases from simple image analysis to complex multi-agent processing pipelines.