parent
96e9cfd496
commit
7f577acca3
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,150 @@
|
||||
# Documentation for GPT4o Module
|
||||
|
||||
## Overview and Introduction
|
||||
|
||||
The `GPT4o` module is a multi-modal conversational model based on OpenAI's GPT-4 architecture. It extends the functionality of the `BaseMultiModalModel` class, enabling it to handle both text and image inputs for generating diverse and contextually rich responses. This module leverages the power of the GPT-4 model to enhance interactions by integrating visual information with textual prompts, making it highly relevant for applications requiring multi-modal understanding and response generation.
|
||||
|
||||
### Key Concepts
|
||||
- **Multi-Modal Model**: A model that can process and generate responses based on multiple types of inputs, such as text and images.
|
||||
- **System Prompt**: A predefined prompt to guide the conversation flow.
|
||||
- **Temperature**: A parameter that controls the randomness of the response generation.
|
||||
- **Max Tokens**: The maximum number of tokens (words or word pieces) in the generated response.
|
||||
|
||||
## Class Definition
|
||||
|
||||
### `GPT4o` Class
|
||||
|
||||
|
||||
### Parameters
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------------|--------|--------------------------------------------------------------------------------------|
|
||||
| `system_prompt` | `str` | The system prompt to be used in the conversation. |
|
||||
| `temperature` | `float`| The temperature parameter for generating diverse responses. Default is `0.1`. |
|
||||
| `max_tokens` | `int` | The maximum number of tokens in the generated response. Default is `300`. |
|
||||
| `openai_api_key`| `str` | The API key for accessing the OpenAI GPT-4 API. |
|
||||
| `*args` | | Additional positional arguments. |
|
||||
| `**kwargs` | | Additional keyword arguments. |
|
||||
|
||||
## Functionality and Usage
|
||||
|
||||
### `encode_image` Function
|
||||
|
||||
The `encode_image` function is used to encode an image file into a base64 string format, which can then be included in the request to the GPT-4 API.
|
||||
|
||||
#### Parameters
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|---------------|--------|----------------------------------------------|
|
||||
| `image_path` | `str` | The local path to the image file to be encoded. |
|
||||
|
||||
#### Returns
|
||||
|
||||
| Return Type | Description |
|
||||
|-------------|---------------------------------|
|
||||
| `str` | The base64 encoded string of the image. |
|
||||
|
||||
### `GPT4o.__init__` Method
|
||||
|
||||
The constructor for the `GPT4o` class initializes the model with the specified parameters and sets up the OpenAI client.
|
||||
|
||||
### `GPT4o.run` Method
|
||||
|
||||
The `run` method executes the GPT-4o model to generate a response based on the provided task and optional image.
|
||||
|
||||
#### Parameters
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|---------------|--------|----------------------------------------------------|
|
||||
| `task` | `str` | The task or user prompt for the conversation. |
|
||||
| `local_img` | `str` | The local path to the image file. |
|
||||
| `img` | `str` | The URL of the image. |
|
||||
| `*args` | | Additional positional arguments. |
|
||||
| `**kwargs` | | Additional keyword arguments. |
|
||||
|
||||
#### Returns
|
||||
|
||||
| Return Type | Description |
|
||||
|-------------|--------------------------------------------------|
|
||||
| `str` | The generated response from the GPT-4o model. |
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Example 1: Basic Text Prompt
|
||||
|
||||
```python
|
||||
from swarms import GPT4o
|
||||
|
||||
# Initialize the model
|
||||
model = GPT4o(
|
||||
system_prompt="You are a helpful assistant.",
|
||||
temperature=0.7,
|
||||
max_tokens=150,
|
||||
openai_api_key="your_openai_api_key"
|
||||
)
|
||||
|
||||
# Define the task
|
||||
task = "What is the capital of France?"
|
||||
|
||||
# Generate response
|
||||
response = model.run(task)
|
||||
print(response)
|
||||
```
|
||||
|
||||
### Example 2: Text Prompt with Local Image
|
||||
|
||||
```python
|
||||
from swarms import GPT4o
|
||||
|
||||
# Initialize the model
|
||||
model = GPT4o(
|
||||
system_prompt="Describe the image content.",
|
||||
temperature=0.5,
|
||||
max_tokens=200,
|
||||
openai_api_key="your_openai_api_key"
|
||||
)
|
||||
|
||||
# Define the task and image path
|
||||
task = "Describe the content of this image."
|
||||
local_img = "path/to/your/image.jpg"
|
||||
|
||||
# Generate response
|
||||
response = model.run(task, local_img=local_img)
|
||||
print(response)
|
||||
```
|
||||
|
||||
### Example 3: Text Prompt with Image URL
|
||||
|
||||
```python
|
||||
from swarms import GPT4o
|
||||
|
||||
# Initialize the model
|
||||
model = GPT4o(
|
||||
system_prompt="You are a visual assistant.",
|
||||
temperature=0.6,
|
||||
max_tokens=250,
|
||||
openai_api_key="your_openai_api_key"
|
||||
)
|
||||
|
||||
# Define the task and image URL
|
||||
task = "What can you tell about the scenery in this image?"
|
||||
img_url = "http://example.com/image.jpg"
|
||||
|
||||
# Generate response
|
||||
response = model.run(task, img=img_url)
|
||||
print(response)
|
||||
```
|
||||
|
||||
## Additional Information and Tips
|
||||
|
||||
- **API Key Management**: Ensure that your OpenAI API key is securely stored and managed. Do not hard-code it in your scripts. Use environment variables or secure storage solutions.
|
||||
- **Image Encoding**: The `encode_image` function is crucial for converting images to a base64 format suitable for API requests. Ensure that the images are accessible and properly formatted.
|
||||
- **Temperature Parameter**: Adjust the `temperature` parameter to control the creativity of the model's responses. Lower values make the output more deterministic, while higher values increase randomness.
|
||||
- **Token Limit**: Be mindful of the `max_tokens` parameter to avoid exceeding the API's token limits. This parameter controls the length of the generated responses.
|
||||
|
||||
## References and Resources
|
||||
|
||||
- [OpenAI API Documentation](https://beta.openai.com/docs/)
|
||||
- [Python Base64 Encoding](https://docs.python.org/3/library/base64.html)
|
||||
- [dotenv Documentation](https://saurabh-kumar.com/python-dotenv/)
|
||||
- [BaseMultiModalModel Documentation](https://swarms.apac.ai)
|
Loading…
Reference in new issue