You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
95 lines
3.8 KiB
95 lines
3.8 KiB
# `OmniModalAgent` Documentation
|
|
|
|
## Overview & Architectural Analysis
|
|
The `OmniModalAgent` class is at the core of an architecture designed to facilitate dynamic interactions using various tools, through a seamless integration of planning, task execution, and response generation mechanisms. It encompasses multiple modalities including natural language processing, image processing, and more, aiming to provide comprehensive and intelligent responses.
|
|
|
|
### Architectural Components:
|
|
1. **LLM (Language Model)**: It acts as the foundation, underpinning the understanding and generation of language-based interactions.
|
|
2. **Chat Planner**: This component drafts a blueprint for the steps necessary based on the user's input.
|
|
3. **Task Executor**: As the name suggests, it's responsible for executing the formulated tasks.
|
|
4. **Tools**: A collection of tools and utilities used to process different types of tasks. They span across areas like image captioning, translation, and more.
|
|
|
|
|
|
## Structure & Organization
|
|
|
|
### Table of Contents:
|
|
1. Class Introduction and Architecture
|
|
2. Constructor (`__init__`)
|
|
3. Core Methods
|
|
- `run`
|
|
- `chat`
|
|
- `_stream_response`
|
|
4. Example Usage
|
|
5. Error Messages & Exception Handling
|
|
6. Summary & Further Reading
|
|
|
|
### Constructor (`__init__`):
|
|
The agent is initialized with a language model (`llm`). During initialization, the agent loads a myriad of tools to facilitate a broad spectrum of tasks, from document querying to image transformations.
|
|
|
|
### Core Methods:
|
|
#### 1. `run(self, input: str) -> str`:
|
|
Executes the OmniAgent. The agent plans its actions based on the user's input, executes those actions, and then uses a response generator to construct its reply.
|
|
|
|
#### 2. `chat(self, msg: str, streaming: bool) -> str`:
|
|
Facilitates an interactive chat with the agent. It processes user messages, handles exceptions, and returns a response, either in streaming format or as a whole string.
|
|
|
|
#### 3. `_stream_response(self, response: str)`:
|
|
For streaming mode, this function yields the response token by token, ensuring a smooth output agent.
|
|
|
|
## Examples & Use Cases
|
|
Initialize the `OmniModalAgent` and communicate with it:
|
|
```python
|
|
import os
|
|
|
|
from dotenv import load_dotenv
|
|
|
|
from swarms.agents.omni_modal_agent import OmniModalAgent, OpenAIChat
|
|
from swarms.models import OpenAIChat
|
|
|
|
# Load the environment variables
|
|
load_dotenv()
|
|
|
|
# Get the API key from the environment
|
|
api_key = os.environ.get("OPENAI_API_KEY")
|
|
|
|
# Initialize the language model
|
|
llm = OpenAIChat(
|
|
temperature=0.5,
|
|
model_name="gpt-4",
|
|
openai_api_key=api_key,
|
|
)
|
|
|
|
|
|
agent = OmniModalAgent(llm)
|
|
response = agent.run("Translate 'Hello' to French.")
|
|
print(response)
|
|
```
|
|
|
|
For a chat-based interaction:
|
|
```python
|
|
agent = OmniModalAgent(llm_instance)
|
|
print(agent.chat("How are you doing today?"))
|
|
```
|
|
|
|
## Error Messages & Exception Handling
|
|
The `chat` method in `OmniModalAgent` incorporates exception handling. When an error arises during message processing, it returns a formatted error message detailing the exception. This approach ensures that users receive informative feedback in case of unexpected situations.
|
|
|
|
For example, if there's an internal processing error, the chat function would return:
|
|
```
|
|
Error processing message: [Specific error details]
|
|
```
|
|
|
|
## Summary
|
|
`OmniModalAgent` epitomizes the fusion of various AI tools, planners, and executors into one cohesive unit, providing a comprehensive interface for diverse tasks and modalities. The versatility and robustness of this agent make it indispensable for applications desiring to bridge multiple AI functionalities in a unified manner.
|
|
|
|
For more extensive documentation, API references, and advanced use-cases, users are advised to refer to the primary documentation repository associated with the parent project. Regular updates, community feedback, and patches can also be found there.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|