[GPT4o]Docs]

1 year ago · 7f577acca3
parent 96e9cfd496
commit 7f577acca3
4 changed files with 507 additions and 569 deletions
--- a/docs/index.md
+++ b/docs/index.md
@ -1,6 +1,6 @@
 # Swarms Documentation

-Cutting-edge framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, Swarms empowers agents to work together seamlessly, tackling complex tasks.
+Orchestrate enterprise-grade agents for multi-agent collaboration and orchestration to automate real-world problems.

 <div style="display:flex; margin:0 auto; justify-content: center;">
    <div style="width:25%">
@ -92,37 +92,37 @@ Cutting-edge framework for orchestrating role-playing, autonomous AI agents. By
        <h2>Examples</h2>
        <ul>
            <li>
-                <a target='_blank' href="https://github.com/joaomdmoura/Swarms-examples/tree/main/prep-for-a-meeting">
+                <a target='_blank' href="https://github.com/kyegomez/Swarms-examples/tree/main/prep-for-a-meeting">
                    Prepare for meetings
                </a>
            </li>
            <li>
-                <a target='_blank' href="https://github.com/joaomdmoura/Swarms-examples/tree/main/trip_planner">
+                <a target='_blank' href="https://github.com/kyegomez/Swarms-examples/tree/main/trip_planner">
                    Trip Planner Crew
                </a>
            </li>
            <li>
-                <a target='_blank' href="https://github.com/joaomdmoura/Swarms-examples/tree/main/instagram_post">
+                <a target='_blank' href="https://github.com/kyegomez/Swarms-examples/tree/main/instagram_post">
                    Create Instagram Post
                </a>
            </li>
            <li>
-                <a target='_blank' href="https://github.com/joaomdmoura/Swarms-examples/tree/main/stock_analysis">
+                <a target='_blank' href="https://github.com/kyegomez/Swarms-examples/tree/main/stock_analysis">
                    Stock Analysis
                </a>
            </li>
            <li>
-                <a target='_blank' href="https://github.com/joaomdmoura/Swarms-examples/tree/main/game-builder-crew">
+                <a target='_blank' href="https://github.com/kyegomez/Swarms-examples/tree/main/game-builder-crew">
                    Game Generator
                </a>
            </li>
            <li>
-                <a target='_blank' href="https://github.com/joaomdmoura/Swarms-examples/tree/main/Swarms-LangGraph">
+                <a target='_blank' href="https://github.com/kyegomez/Swarms-examples/tree/main/Swarms-LangGraph">
                    Drafting emails with LangGraph
                </a>
            </li>
            <li>
-                <a target='_blank' href="https://github.com/joaomdmoura/Swarms-examples/tree/main/landing_page_generator">
+                <a target='_blank' href="https://github.com/kyegomez/Swarms-examples/tree/main/landing_page_generator">
                    Landing Page Generator
                </a>
            </li>
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@ -126,16 +126,17 @@ nav:
  - Contributors:
    - Contributing: "contributing.md"
 - Swarms Framework Reference:
+  - Overview: "swarms/index.md"
  - swarms.models:
    - How to Create A Custom Language Model: "swarms/models/custom_model.md"
    - Deploying Azure OpenAI in Production A Comprehensive Guide: "swarms/models/azure_openai.md"
-    - Language Models Available:
+    - Language Models:
      - BaseLLM: "swarms/models/base_llm.md"
      - Overview: "swarms/models/index.md"
      - HuggingFaceLLM: "swarms/models/huggingface.md"
      - Anthropic: "swarms/models/anthropic.md"
      - OpenAIChat: "swarms/models/openai.md"
-    - MultiModal Models Available:
+    - MultiModal Models :
      - BaseMultiModalModel: "swarms/models/base_multimodal_model.md"
      - Fuyu: "swarms/models/fuyu.md"
      - Vilt: "swarms/models/vilt.md"
@ -144,6 +145,7 @@ nav:
      - Nougat: "swarms/models/nougat.md"
      - Dalle3: "swarms/models/dalle3.md"
      - GPT4VisionAPI: "swarms/models/gpt4v.md"
+      - GPT4o: "swarms/models/gpt4o.md"
  - swarms.structs:
      - Foundational Structures:
        - Agent: "swarms/structs/agent.md"
--- a/docs/swarms/index.md
+++ b/docs/swarms/index.md
--- a/docs/swarms/models/gpt4o.md
+++ b/docs/swarms/models/gpt4o.md
@ -0,0 +1,150 @@
+# Documentation for GPT4o Module
+
+## Overview and Introduction
+
+The `GPT4o` module is a multi-modal conversational model based on OpenAI's GPT-4 architecture. It extends the functionality of the `BaseMultiModalModel` class, enabling it to handle both text and image inputs for generating diverse and contextually rich responses. This module leverages the power of the GPT-4 model to enhance interactions by integrating visual information with textual prompts, making it highly relevant for applications requiring multi-modal understanding and response generation.
+
+### Key Concepts
+- **Multi-Modal Model**: A model that can process and generate responses based on multiple types of inputs, such as text and images.
+- **System Prompt**: A predefined prompt to guide the conversation flow.
+- **Temperature**: A parameter that controls the randomness of the response generation.
+- **Max Tokens**: The maximum number of tokens (words or word pieces) in the generated response.
+
+## Class Definition
+
+### `GPT4o` Class
+
+
+### Parameters
+
+| Parameter       | Type   | Description                                                                          |
+|-----------------|--------|--------------------------------------------------------------------------------------|
+| `system_prompt` | `str`  | The system prompt to be used in the conversation.                                     |
+| `temperature`   | `float`| The temperature parameter for generating diverse responses. Default is `0.1`.        |
+| `max_tokens`    | `int`  | The maximum number of tokens in the generated response. Default is `300`.            |
+| `openai_api_key`| `str`  | The API key for accessing the OpenAI GPT-4 API.                                       |
+| `*args`         |        | Additional positional arguments.                                                     |
+| `**kwargs`      |        | Additional keyword arguments.                                                        |
+
+## Functionality and Usage
+
+### `encode_image` Function
+
+The `encode_image` function is used to encode an image file into a base64 string format, which can then be included in the request to the GPT-4 API.
+
+#### Parameters
+
+| Parameter     | Type   | Description                                  |
+|---------------|--------|----------------------------------------------|
+| `image_path`  | `str`  | The local path to the image file to be encoded. |
+
+#### Returns
+
+| Return Type | Description                     |
+|-------------|---------------------------------|
+| `str`       | The base64 encoded string of the image. |
+
+### `GPT4o.__init__` Method
+
+The constructor for the `GPT4o` class initializes the model with the specified parameters and sets up the OpenAI client.
+
+### `GPT4o.run` Method
+
+The `run` method executes the GPT-4o model to generate a response based on the provided task and optional image.
+
+#### Parameters
+
+| Parameter     | Type   | Description                                        |
+|---------------|--------|----------------------------------------------------|
+| `task`        | `str`  | The task or user prompt for the conversation.      |
+| `local_img`   | `str`  | The local path to the image file.                  |
+| `img`         | `str`  | The URL of the image.                              |
+| `*args`       |        | Additional positional arguments.                   |
+| `**kwargs`    |        | Additional keyword arguments.                      |
+
+#### Returns
+
+| Return Type | Description                                      |
+|-------------|--------------------------------------------------|
+| `str`       | The generated response from the GPT-4o model.    |
+
+## Usage Examples
+
+### Example 1: Basic Text Prompt
+
+```python
+from swarms import GPT4o
+
+# Initialize the model
+model = GPT4o(
+    system_prompt="You are a helpful assistant.",
+    temperature=0.7,
+    max_tokens=150,
+    openai_api_key="your_openai_api_key"
+)
+
+# Define the task
+task = "What is the capital of France?"
+
+# Generate response
+response = model.run(task)
+print(response)
+```
+
+### Example 2: Text Prompt with Local Image
+
+```python
+from swarms import GPT4o
+
+# Initialize the model
+model = GPT4o(
+    system_prompt="Describe the image content.",
+    temperature=0.5,
+    max_tokens=200,
+    openai_api_key="your_openai_api_key"
+)
+
+# Define the task and image path
+task = "Describe the content of this image."
+local_img = "path/to/your/image.jpg"
+
+# Generate response
+response = model.run(task, local_img=local_img)
+print(response)
+```
+
+### Example 3: Text Prompt with Image URL
+
+```python
+from swarms import GPT4o
+
+# Initialize the model
+model = GPT4o(
+    system_prompt="You are a visual assistant.",
+    temperature=0.6,
+    max_tokens=250,
+    openai_api_key="your_openai_api_key"
+)
+
+# Define the task and image URL
+task = "What can you tell about the scenery in this image?"
+img_url = "http://example.com/image.jpg"
+
+# Generate response
+response = model.run(task, img=img_url)
+print(response)
+```
+
+## Additional Information and Tips
+
+- **API Key Management**: Ensure that your OpenAI API key is securely stored and managed. Do not hard-code it in your scripts. Use environment variables or secure storage solutions.
+- **Image Encoding**: The `encode_image` function is crucial for converting images to a base64 format suitable for API requests. Ensure that the images are accessible and properly formatted.
+- **Temperature Parameter**: Adjust the `temperature` parameter to control the creativity of the model's responses. Lower values make the output more deterministic, while higher values increase randomness.
+- **Token Limit**: Be mindful of the `max_tokens` parameter to avoid exceeding the API's token limits. This parameter controls the length of the generated responses.
+
+## References and Resources
+
+- [OpenAI API Documentation](https://beta.openai.com/docs/)
+- [Python Base64 Encoding](https://docs.python.org/3/library/base64.html)
+- [dotenv Documentation](https://saurabh-kumar.com/python-dotenv/)
+- [BaseMultiModalModel Documentation](https://swarms.apac.ai)