[GPT4o]Docs]

pull/488/head
Kye Gomez 7 months ago
parent 96e9cfd496
commit 7f577acca3

@ -1,6 +1,6 @@
# Swarms Documentation
Cutting-edge framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, Swarms empowers agents to work together seamlessly, tackling complex tasks.
Orchestrate enterprise-grade agents for multi-agent collaboration and orchestration to automate real-world problems.
<div style="display:flex; margin:0 auto; justify-content: center;">
<div style="width:25%">
@ -92,37 +92,37 @@ Cutting-edge framework for orchestrating role-playing, autonomous AI agents. By
<h2>Examples</h2>
<ul>
<li>
<a target='_blank' href="https://github.com/joaomdmoura/Swarms-examples/tree/main/prep-for-a-meeting">
<a target='_blank' href="https://github.com/kyegomez/Swarms-examples/tree/main/prep-for-a-meeting">
Prepare for meetings
</a>
</li>
<li>
<a target='_blank' href="https://github.com/joaomdmoura/Swarms-examples/tree/main/trip_planner">
<a target='_blank' href="https://github.com/kyegomez/Swarms-examples/tree/main/trip_planner">
Trip Planner Crew
</a>
</li>
<li>
<a target='_blank' href="https://github.com/joaomdmoura/Swarms-examples/tree/main/instagram_post">
<a target='_blank' href="https://github.com/kyegomez/Swarms-examples/tree/main/instagram_post">
Create Instagram Post
</a>
</li>
<li>
<a target='_blank' href="https://github.com/joaomdmoura/Swarms-examples/tree/main/stock_analysis">
<a target='_blank' href="https://github.com/kyegomez/Swarms-examples/tree/main/stock_analysis">
Stock Analysis
</a>
</li>
<li>
<a target='_blank' href="https://github.com/joaomdmoura/Swarms-examples/tree/main/game-builder-crew">
<a target='_blank' href="https://github.com/kyegomez/Swarms-examples/tree/main/game-builder-crew">
Game Generator
</a>
</li>
<li>
<a target='_blank' href="https://github.com/joaomdmoura/Swarms-examples/tree/main/Swarms-LangGraph">
<a target='_blank' href="https://github.com/kyegomez/Swarms-examples/tree/main/Swarms-LangGraph">
Drafting emails with LangGraph
</a>
</li>
<li>
<a target='_blank' href="https://github.com/joaomdmoura/Swarms-examples/tree/main/landing_page_generator">
<a target='_blank' href="https://github.com/kyegomez/Swarms-examples/tree/main/landing_page_generator">
Landing Page Generator
</a>
</li>

@ -126,16 +126,17 @@ nav:
- Contributors:
- Contributing: "contributing.md"
- Swarms Framework Reference:
- Overview: "swarms/index.md"
- swarms.models:
- How to Create A Custom Language Model: "swarms/models/custom_model.md"
- Deploying Azure OpenAI in Production A Comprehensive Guide: "swarms/models/azure_openai.md"
- Language Models Available:
- Language Models:
- BaseLLM: "swarms/models/base_llm.md"
- Overview: "swarms/models/index.md"
- HuggingFaceLLM: "swarms/models/huggingface.md"
- Anthropic: "swarms/models/anthropic.md"
- OpenAIChat: "swarms/models/openai.md"
- MultiModal Models Available:
- MultiModal Models :
- BaseMultiModalModel: "swarms/models/base_multimodal_model.md"
- Fuyu: "swarms/models/fuyu.md"
- Vilt: "swarms/models/vilt.md"
@ -144,6 +145,7 @@ nav:
- Nougat: "swarms/models/nougat.md"
- Dalle3: "swarms/models/dalle3.md"
- GPT4VisionAPI: "swarms/models/gpt4v.md"
- GPT4o: "swarms/models/gpt4o.md"
- swarms.structs:
- Foundational Structures:
- Agent: "swarms/structs/agent.md"

File diff suppressed because it is too large Load Diff

@ -0,0 +1,150 @@
# Documentation for GPT4o Module
## Overview and Introduction
The `GPT4o` module is a multi-modal conversational model based on OpenAI's GPT-4 architecture. It extends the functionality of the `BaseMultiModalModel` class, enabling it to handle both text and image inputs for generating diverse and contextually rich responses. This module leverages the power of the GPT-4 model to enhance interactions by integrating visual information with textual prompts, making it highly relevant for applications requiring multi-modal understanding and response generation.
### Key Concepts
- **Multi-Modal Model**: A model that can process and generate responses based on multiple types of inputs, such as text and images.
- **System Prompt**: A predefined prompt to guide the conversation flow.
- **Temperature**: A parameter that controls the randomness of the response generation.
- **Max Tokens**: The maximum number of tokens (words or word pieces) in the generated response.
## Class Definition
### `GPT4o` Class
### Parameters
| Parameter | Type | Description |
|-----------------|--------|--------------------------------------------------------------------------------------|
| `system_prompt` | `str` | The system prompt to be used in the conversation. |
| `temperature` | `float`| The temperature parameter for generating diverse responses. Default is `0.1`. |
| `max_tokens` | `int` | The maximum number of tokens in the generated response. Default is `300`. |
| `openai_api_key`| `str` | The API key for accessing the OpenAI GPT-4 API. |
| `*args` | | Additional positional arguments. |
| `**kwargs` | | Additional keyword arguments. |
## Functionality and Usage
### `encode_image` Function
The `encode_image` function is used to encode an image file into a base64 string format, which can then be included in the request to the GPT-4 API.
#### Parameters
| Parameter | Type | Description |
|---------------|--------|----------------------------------------------|
| `image_path` | `str` | The local path to the image file to be encoded. |
#### Returns
| Return Type | Description |
|-------------|---------------------------------|
| `str` | The base64 encoded string of the image. |
### `GPT4o.__init__` Method
The constructor for the `GPT4o` class initializes the model with the specified parameters and sets up the OpenAI client.
### `GPT4o.run` Method
The `run` method executes the GPT-4o model to generate a response based on the provided task and optional image.
#### Parameters
| Parameter | Type | Description |
|---------------|--------|----------------------------------------------------|
| `task` | `str` | The task or user prompt for the conversation. |
| `local_img` | `str` | The local path to the image file. |
| `img` | `str` | The URL of the image. |
| `*args` | | Additional positional arguments. |
| `**kwargs` | | Additional keyword arguments. |
#### Returns
| Return Type | Description |
|-------------|--------------------------------------------------|
| `str` | The generated response from the GPT-4o model. |
## Usage Examples
### Example 1: Basic Text Prompt
```python
from swarms import GPT4o
# Initialize the model
model = GPT4o(
system_prompt="You are a helpful assistant.",
temperature=0.7,
max_tokens=150,
openai_api_key="your_openai_api_key"
)
# Define the task
task = "What is the capital of France?"
# Generate response
response = model.run(task)
print(response)
```
### Example 2: Text Prompt with Local Image
```python
from swarms import GPT4o
# Initialize the model
model = GPT4o(
system_prompt="Describe the image content.",
temperature=0.5,
max_tokens=200,
openai_api_key="your_openai_api_key"
)
# Define the task and image path
task = "Describe the content of this image."
local_img = "path/to/your/image.jpg"
# Generate response
response = model.run(task, local_img=local_img)
print(response)
```
### Example 3: Text Prompt with Image URL
```python
from swarms import GPT4o
# Initialize the model
model = GPT4o(
system_prompt="You are a visual assistant.",
temperature=0.6,
max_tokens=250,
openai_api_key="your_openai_api_key"
)
# Define the task and image URL
task = "What can you tell about the scenery in this image?"
img_url = "http://example.com/image.jpg"
# Generate response
response = model.run(task, img=img_url)
print(response)
```
## Additional Information and Tips
- **API Key Management**: Ensure that your OpenAI API key is securely stored and managed. Do not hard-code it in your scripts. Use environment variables or secure storage solutions.
- **Image Encoding**: The `encode_image` function is crucial for converting images to a base64 format suitable for API requests. Ensure that the images are accessible and properly formatted.
- **Temperature Parameter**: Adjust the `temperature` parameter to control the creativity of the model's responses. Lower values make the output more deterministic, while higher values increase randomness.
- **Token Limit**: Be mindful of the `max_tokens` parameter to avoid exceeding the API's token limits. This parameter controls the length of the generated responses.
## References and Resources
- [OpenAI API Documentation](https://beta.openai.com/docs/)
- [Python Base64 Encoding](https://docs.python.org/3/library/base64.html)
- [dotenv Documentation](https://saurabh-kumar.com/python-dotenv/)
- [BaseMultiModalModel Documentation](https://swarms.apac.ai)
Loading…
Cancel
Save