Merge branch 'kyegomez:master' into master

1 year ago · e922045cf0
parent 78b46efc94 51c82cf1f2
commit e922045cf0
27 changed files with 1051 additions and 1401 deletions
--- a/README.md
+++ b/README.md
@ -117,6 +117,32 @@ workflow.run()
 for task in workflow.tasks:
    print(f"Task: {task.description}, Result: {task.result}")

+```
+
+## `Multi Modal Autonomous Agents`
+- Run the flow with multiple modalities useful for various real-world tasks in manufacturing, logistics, and health.
+
+```python
+from swarms.structs import Flow
+from swarms.models.gpt4_vision_api import GPT4VisionAPI
+
+# Initialize the llm
+llm = GPT4VisionAPI()
+
+task = "Analyze this image of an assembly line and identify any issues such as misaligned parts, defects, or deviations from the standard assembly process. IF there is anything unsafe in the image, explain why it is unsafe and how it could be improved."
+img = "assembly_line.jpg"
+
+## Initialize the workflow
+flow = Flow(
+    llm=llm,
+    max_loops=1,
+    dashboard=True,
+)
+
+# Run the flow
+flow.run(task=task, img=img)
+
+
 ```

 ---
--- a/docs/swarms/models/gpt4v.md
+++ b/docs/swarms/models/gpt4v.md
@ -1,251 +1,201 @@
-# `GPT4Vision` Documentation
+# `GPT4VisionAPI` Documentation

-## Table of Contents
- [Overview](#overview)
+**Table of Contents**
+- [Introduction](#introduction)
 - [Installation](#installation)
- [Initialization](#initialization)
- [Methods](#methods)
-  - [process_img](#process_img)
-  - [__call__](#__call__)
-  - [run](#run)
-  - [arun](#arun)
- [Configuration Options](#configuration-options)
- [Usage Examples](#usage-examples)
- [Additional Tips](#additional-tips)
- [References and Resources](#references-and-resources)
-
---
-
-## Overview
-
-The GPT4Vision Model API is designed to provide an easy-to-use interface for interacting with the OpenAI GPT-4 Vision model. This model can generate textual descriptions for images and answer questions related to visual content. Whether you want to describe images or perform other vision-related tasks, GPT4Vision makes it simple and efficient.
-
-The library offers a straightforward way to send images and tasks to the GPT-4 Vision model and retrieve the generated responses. It handles API communication, authentication, and retries, making it a powerful tool for developers working with computer vision and natural language processing tasks.
+- [Module Overview](#module-overview)
+- [Class: GPT4VisionAPI](#class-gpt4visionapi)
+  - [Initialization](#initialization)
+  - [Methods](#methods)
+    - [encode_image](#encode_image)
+    - [run](#run)
+    - [__call__](#__call__)
+- [Examples](#examples)
+  - [Example 1: Basic Usage](#example-1-basic-usage)
+  - [Example 2: Custom API Key](#example-2-custom-api-key)
+  - [Example 3: Adjusting Maximum Tokens](#example-3-adjusting-maximum-tokens)
+- [Additional Information](#additional-information)
+- [References](#references)
+
+## Introduction<a name="introduction"></a>
+
+Welcome to the documentation for the `GPT4VisionAPI` module! This module is a powerful wrapper for the OpenAI GPT-4 Vision model. It allows you to interact with the model to generate descriptions or answers related to images. This documentation will provide you with comprehensive information on how to use this module effectively.
+
+## Installation<a name="installation"></a>
+
+Before you start using the `GPT4VisionAPI` module, make sure you have the required dependencies installed. You can install them using the following commands:
+
+```bash
+pip3 install --upgrade swarms
+```

-## Installation
+## Module Overview<a name="module-overview"></a>

-To use the GPT4Vision Model API, you need to install the required dependencies and configure your environment. Follow these steps to get started:
+The `GPT4VisionAPI` module serves as a bridge between your application and the OpenAI GPT-4 Vision model. It allows you to send requests to the model and retrieve responses related to images. Here are some key features and functionality provided by this module:

-1. Install the required Python package:
+- Encoding images to base64 format.
+- Running the GPT-4 Vision model with specified tasks and images.
+- Customization options such as setting the OpenAI API key and maximum token limit.

-   ```bash
-   pip3 install --upgrade swarms
-   ```
+## Class: GPT4VisionAPI<a name="class-gpt4visionapi"></a>

-2. Make sure you have an OpenAI API key. You can obtain one by signing up on the [OpenAI platform](https://beta.openai.com/signup/).
+The `GPT4VisionAPI` class is the core component of this module. It encapsulates the functionality required to interact with the GPT-4 Vision model. Below, we'll dive into the class in detail.

-3. Set your OpenAI API key as an environment variable. You can do this in your code or your environment configuration. Alternatively, you can provide the API key directly when initializing the `GPT4Vision` class.
+### Initialization<a name="initialization"></a>

-## Initialization
+When initializing the `GPT4VisionAPI` class, you have the option to provide the OpenAI API key and set the maximum token limit. Here are the parameters and their descriptions:

-To start using the GPT4Vision Model API, you need to create an instance of the `GPT4Vision` class. You can customize its behavior by providing various configuration options, but it also comes with sensible defaults.
+| Parameter           | Type     | Default Value                 | Description                                                                                              |
+|---------------------|----------|-------------------------------|----------------------------------------------------------------------------------------------------------|
+| openai_api_key      | str      | `OPENAI_API_KEY` environment variable (if available) | The OpenAI API key. If not provided, it defaults to the `OPENAI_API_KEY` environment variable.       |
+| max_tokens          | int      | 300                           | The maximum number of tokens to generate in the model's response.                                        |

-Here's how you can initialize the `GPT4Vision` class:
+Here's how you can initialize the `GPT4VisionAPI` class:

 ```python
-from swarms.models.gpt4v import GPT4Vision
-
-gpt4vision = GPT4Vision(
-    api_key="Your Key"
-)
-```
-
-The above code initializes the `GPT4Vision` class with default settings. You can adjust these settings as needed.
-
-## Methods
-
-### `process_img`
+from swarms.models import GPT4VisionAPI

-The `process_img` method is used to preprocess an image before sending it to the GPT-4 Vision model. It takes the image path as input and returns the processed image in a format suitable for API requests.
+# Initialize with default API key and max_tokens
+api = GPT4VisionAPI()

-```python
-processed_img = gpt4vision.process_img(img_path)
+# Initialize with custom API key and max_tokens
+custom_api_key = "your_custom_api_key"
+api = GPT4VisionAPI(openai_api_key=custom_api_key, max_tokens=500)
 ```

- `img_path` (str): The file path or URL of the image to be processed.
+### Methods<a name="methods"></a>

-### `__call__`
+#### encode_image<a name="encode_image"></a>

-The `__call__` method is the main method for interacting with the GPT-4 Vision model. It sends the image and tasks to the model and returns the generated response.
+This method allows you to encode an image from a URL to base64 format. It's a utility function used internally by the module.

 ```python
-response = gpt4vision(img, tasks)
-```
-
- `img` (Union[str, List[str]]): Either a single image URL or a list of image URLs to be used for the API request.
- `tasks` (List[str]): A list of tasks or questions related to the image(s).
-
-This method returns a `GPT4VisionResponse` object, which contains the generated answer.
-
-### `run`
+def encode_image(img: str) -> str:
+    """
+    Encode image to base64.

-The `run` method is an alternative way to interact with the GPT-4 Vision model. It takes a single task and image URL as input and returns the generated response.
+    Parameters:
+    - img (str): URL of the image to encode.

-```python
-response = gpt4vision.run(task, img)
+    Returns:
+    str: Base64 encoded image.
+    """
 ```

- `task` (str): The task or question related to the image.
- `img` (str): The image URL to be used for the API request.
-
-This method simplifies interactions when dealing with a single task and image.
-
-### `arun`
+#### run<a name="run"></a>

-The `arun` method is an asynchronous version of the `run` method. It allows for asynchronous processing of API requests, which can be useful in certain scenarios.
+The `run` method is the primary way to interact with the GPT-4 Vision model. It sends a request to the model with a task and an image URL, and it returns the model's response.

 ```python
-import asyncio
+def run(task: str, img: str) -> str:
+    """
+    Run the GPT-4 Vision model.

-async def main():
-    response = await gpt4vision.arun(task, img)
-    print(response)
+    Parameters:
+    - task (str): The task or question related to the image.
+    - img (str): URL of the image to analyze.

-loop = asyncio.get_event_loop()
-loop.run_until_complete(main())
+    Returns:
+    str: The model's response.
+    """
 ```

- `task` (str): The task or question related to the image.
- `img` (str): The image URL to be used for the API request.
-
-## Configuration Options
-
-The `GPT4Vision` class provides several configuration options that allow you to customize its behavior:
+#### __call__<a name="__call__"></a>

- `max_retries` (int): The maximum number of retries to make to the API. Default: 3
- `backoff_factor` (float): The backoff factor to use for exponential backoff. Default: 2.0
- `timeout_seconds` (int): The timeout in seconds for the API request. Default: 10
- `api_key` (str): The API key to use for the API request. Default: None (set via environment variable)
- `quality` (str): The quality of the image to generate. Options: 'low' or 'high'. Default: 'low'
- `max_tokens` (int): The maximum number of tokens to use for the API request. Default: 200
-
-## Usage Examples
-
-### Example 1: Generating Image Descriptions
-
-```python
-gpt4vision = GPT4Vision()
-img = "https://example.com/image.jpg"
-tasks = ["Describe this image."]
-response = gpt4vision(img, tasks)
-print(response.answer)
-```
-
-In this example, we create an instance of `GPT4Vision`, provide an image URL, and ask the model to describe the image. The response contains the generated description.
-
-### Example 2: Custom Configuration
+The `__call__` method is a convenient way to run the GPT-4 Vision model. It has the same functionality as the `run` method.

 ```python
-custom_config = {
-    "max_retries": 5,
-    "timeout_seconds": 20,
-    "quality": "high",
-    "max_tokens": 300,
-}
-gpt4vision = GPT4Vision(**custom_config)
-img = "https://example.com/another_image.jpg"
-tasks = ["What objects can you identify in this image?"]
-response = gpt4vision(img, tasks)
-print(response.answer)
-```
+def __call__(task: str, img: str) -> str:
+    """
+    Run the GPT-4 Vision model (callable).

-In this example, we create an instance of `GPT4Vision` with custom configuration options. We set a higher timeout, request high-quality images, and allow more tokens in the response.
+    Parameters:
+    - task (str): The task or question related to the image.
+    - img

-### Example 3: Using the `run` Method
+ (str): URL of the image to analyze.

-```python
-gpt4vision = GPT4Vision()
-img = "https://example.com/image.jpg"
-task = "Describe this image in detail."
-response = gpt4vision.run(task, img)
-print(response)
+    Returns:
+    str: The model's response.
+    """
 ```

-In this example, we use the `run` method to simplify the interaction by providing a single task and image URL.
+## Examples<a name="examples"></a>

-# Model Usage and Image Understanding
+Let's explore some usage examples of the `GPT4VisionAPI` module to better understand how to use it effectively.

-The GPT-4 Vision model processes images in a unique way, allowing it to answer questions about both or each of the images independently. Here's an overview:
+### Example 1: Basic Usage<a name="example-1-basic-usage"></a>

-| Purpose                                 | Description                                                                                                      |
-| --------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
-| Image Understanding                      | The model is shown two copies of the same image and can answer questions about both or each of the images independently. |
+In this example, we'll use the module with the default API key and maximum tokens to analyze an image.

-# Image Detail Control
+```python
+from swarms.models import GPT4VisionAPI

-You have control over how the model processes the image and generates textual understanding by using the `detail` parameter, which has two options: `low` and `high`.
+# Initialize with default API key and max_tokens
+api = GPT4VisionAPI()

-| Detail   | Description                                                                                                                                                                              |
-| -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| low      | Disables the "high-res" model. The model receives a low-res 512 x 512 version of the image and represents the image with a budget of 65 tokens. Ideal for use cases not requiring high detail. |
-| high     | Enables "high-res" mode. The model first sees the low-res image and then creates detailed crops of input images as 512px squares based on the input image size. Uses a total of 129 tokens.  |
+# Define the task and image URL
+task = "What is the color of the object?"
+img = "https://i.imgur.com/2M2ZGwC.jpeg"

-# Managing Images
+# Run the GPT-4 Vision model
+response = api.run(task, img)

-To use the Chat Completions API effectively, you must manage the images you pass to the model. Here are some key considerations:
+# Print the model's response
+print(response)
+```

-| Management Aspect        | Description                                                                                       |
-| ------------------------- | ------------------------------------------------------------------------------------------------- |
-| Image Reuse              | To pass the same image multiple times, include the image with each API request.                  |
-| Image Size Optimization   | Improve latency by downsizing images to meet the expected size requirements.                    |
-| Image Deletion           | After processing, images are deleted from OpenAI servers and not retained. No data is used for training. |
+### Example 2: Custom API Key<a name="example-2-custom-api-key"></a>

-# Limitations
+If you have a custom API key, you can initialize the module with it as shown in this example.

-While GPT-4 with Vision is powerful, it has some limitations:
+```python
+from swarms.models import GPT4VisionAPI

-| Limitation                                   | Description                                                                                         |
-| -------------------------------------------- | --------------------------------------------------------------------------------------------------- |
-| Medical Images                               | Not suitable for interpreting specialized medical images like CT scans.                                |
-| Non-English Text                            | May not perform optimally when handling non-Latin alphabets, such as Japanese or Korean.               |
-| Large Text in Images                        | Enlarge text within images for readability, but avoid cropping important details.                       |
-| Rotated or Upside-Down Text/Images          | May misinterpret rotated or upside-down text or images.                                                  |
-| Complex Visual Elements                     | May struggle to understand complex graphs or text with varying colors or styles.                        |
-| Spatial Reasoning                            | Struggles with tasks requiring precise spatial localization, such as identifying chess positions.       |
-| Accuracy                                     | May generate incorrect descriptions or captions in certain scenarios.                                    |
-| Panoramic and Fisheye Images                | Struggles with panoramic and fisheye images.                                                              |
+# Initialize with custom API key and max_tokens
+custom_api_key = "your_custom_api_key"
+api = GPT4VisionAPI(openai_api_key=custom_api_key, max_tokens=500)

-# Calculating Costs
+# Define the task and image URL
+task = "What is the object in the image?"
+img = "https://i.imgur.com/3T3ZHwD.jpeg"

-Image inputs are metered and charged in tokens. The token cost depends on the image size and detail option.
+# Run the GPT-4 Vision model
+response = api.run(task, img)

-| Example                                       | Token Cost  |
-| --------------------------------------------- | ----------- |
-| 1024 x 1024 square image in detail: high mode | 765 tokens  |
-| 2048 x 4096 image in detail: high mode        | 1105 tokens |
-| 4096 x 8192 image in detail: low mode         | 85 tokens   |
+# Print the model's response
+print(response)
+```

-# FAQ
+### Example 3: Adjusting Maximum Tokens<a name="example-3-adjusting-maximum-tokens"></a>

-Here are some frequently asked questions about GPT-4 with Vision:
+You can also customize the maximum token limit when initializing the module. In this example, we set it to 1000 tokens.

-| Question                                     | Answer                                                                                             |
-| -------------------------------------------- | -------------------------------------------------------------------------------------------------- |
-| Fine-Tuning Image Capabilities                | No, fine-tuning the image capabilities of GPT-4 is not supported at this time.                         |
-| Generating Images                            | GPT-4 is used for understanding images, not generating them.                                            |
-| Supported Image File Types                   | Supported image file types include PNG (.png), JPEG (.jpeg and .jpg), WEBP (.webp), and non-animated GIF (.gif). |
-| Image Size Limitations                       | Image uploads are restricted to 20MB per image.                                                           |
-| Image Deletion                               | Uploaded images are automatically deleted after processing by the model.                                   |
-| Learning More                               | For more details about GPT-4 with Vision, refer to the GPT-4 with Vision system card.                      |
-| CAPTCHA Submission                           | CAPTCHAs are blocked for safety reasons.                                                                  |
-| Rate Limits                                  | Image processing counts toward your tokens per minute (TPM) limit. Refer to the calculating costs section for details. |
-| Image Metadata                               | The model does not receive image metadata.                                                                |
-| Handling Unclear Images                      | If an image is unclear, the model will do its best to interpret it, but results may be less accurate.   |
+```python
+from swarms.models import GPT4VisionAPI

+# Initialize with default API key and custom max_tokens
+api = GPT4VisionAPI(max_tokens=1000)

+# Define the task and image URL
+task = "Describe the scene in the image."
+img = "https://i.imgur.com/4P4ZRxU.jpeg"

-## Additional Tips
+# Run the GPT-4 Vision model
+response = api.run(task, img)

- Make sure to handle potential exceptions and errors when making API requests. The library includes retries and error handling, but it's essential to handle exceptions gracefully in your code.
- Experiment with different configuration options to optimize the trade-off between response quality and response time based on your specific requirements.
+# Print the model's response
+print(response)
+```

-## References and Resources
+## Additional Information<a name="additional-information"></a>

- [OpenAI Platform](https://beta.openai.com/signup/): Sign up for an OpenAI API key.
- [OpenAI API Documentation](https://platform.openai.com/docs/api-reference/chat/create): Official API documentation for the GPT-4 Vision model.
+- If you encounter any errors or issues with the module, make sure to check your API key and internet connectivity.
+- It's recommended to handle exceptions when using the module to gracefully handle errors.
+- You can further customize the module to fit your specific use case by modifying the code as needed.

-Now you have a comprehensive understanding of the GPT4Vision Model API, its configuration options, and how to use it for various computer vision and natural language processing tasks. Start experimenting and integrating it into your projects to leverage the power of GPT-4 Vision for image-related tasks.
+## References<a name="references"></a>

-# Conclusion
+- [OpenAI API Documentation](https://beta.openai.com/docs/)

-With GPT-4 Vision, you have a powerful tool for understanding and generating textual descriptions for images. By considering its capabilities, limitations, and cost calculations, you can effectively leverage this model for various image-related tasks.
+This documentation provides a comprehensive guide on how to use the `GPT4VisionAPI` module effectively. It covers initialization, methods, usage examples, and additional information to ensure a smooth experience when working with the GPT-4 Vision model.
--- a/multi_modal_auto_agent.py
+++ b/multi_modal_auto_agent.py
@ -0,0 +1,16 @@
+from swarms.structs import Flow
+from swarms.models.gpt4_vision_api import GPT4VisionAPI
+
+
+llm = GPT4VisionAPI()
+
+task = "What is the color of the object?"
+img = "images/swarms.jpeg"
+
+## Initialize the workflow
+flow = Flow(
+    llm=llm,
+    max_loops="auto",
+)
+
+flow.run(task=task, img=img)
--- a/playground/agents/revgpt_agent.py
+++ b/playground/agents/revgpt_agent.py
@ -1,28 +0,0 @@
-import os
-from dotenv import load_dotenv
-from swarms.models.revgptV4 import RevChatGPTModel
-from swarms.workers.worker import Worker
-
-load_dotenv()
-
-config = {
-    "model": os.getenv("REVGPT_MODEL"),
-    "plugin_ids": [os.getenv("REVGPT_PLUGIN_IDS")],
-    "disable_history": os.getenv("REVGPT_DISABLE_HISTORY") == "True",
-    "PUID": os.getenv("REVGPT_PUID"),
-    "unverified_plugin_domains": [
-        os.getenv("REVGPT_UNVERIFIED_PLUGIN_DOMAINS")
-    ],
-}
-
-llm = RevChatGPTModel(access_token=os.getenv("ACCESS_TOKEN"), **config)
-
-worker = Worker(ai_name="Optimus Prime", llm=llm)
-
-task = (
-    "What were the winning boston marathon times for the past 5 years (ending"
-    " in 2022)? Generate a table of the year, name, country of origin, and"
-    " times."
-)
-response = worker.run(task)
-print(response)
--- a/playground/demos/assembly/assembly.py
+++ b/playground/demos/assembly/assembly.py
@ -0,0 +1,22 @@
+from swarms.structs import Flow
+from swarms.models.gpt4_vision_api import GPT4VisionAPI
+
+
+llm = GPT4VisionAPI()
+
+task = (
+    "Analyze this image of an assembly line and identify any issues such as"
+    " misaligned parts, defects, or deviations from the standard assembly"
+    " process. IF there is anything unsafe in the image, explain why it is"
+    " unsafe and how it could be improved."
+)
+img = "assembly_line.jpg"
+
+## Initialize the workflow
+flow = Flow(
+    llm=llm,
+    max_loops=1,
+    dashboard=True,
+)
+
+flow.run(task=task, img=img)
--- a/playground/demos/assembly/assembly_line.jpg
+++ b/playground/demos/assembly/assembly_line.jpg
--- a/playground/demos/jarvis_multi_modal_auto_agent/jarvis.py
+++ b/playground/demos/jarvis_multi_modal_auto_agent/jarvis.py
@ -0,0 +1,20 @@
+from swarms.structs import Flow
+from swarms.models.gpt4_vision_api import GPT4VisionAPI
+from swarms.prompts.multi_modal_autonomous_instruction_prompt import (
+    MULTI_MODAL_AUTO_AGENT_SYSTEM_PROMPT_1,
+)
+
+
+llm = GPT4VisionAPI()
+
+task = "What is the color of the object?"
+img = "images/swarms.jpeg"
+
+## Initialize the workflow
+flow = Flow(
+    llm=llm,
+    sop=MULTI_MODAL_AUTO_AGENT_SYSTEM_PROMPT_1,
+    max_loops="auto",
+)
+
+flow.run(task=task, img=img)
--- a/playground/demos/multi_modal_autonomous_agents/multi_modal_auto_agent.py
+++ b/playground/demos/multi_modal_autonomous_agents/multi_modal_auto_agent.py
@ -1,33 +1,17 @@
 from swarms.structs import Flow
-from swarms.models import Idefics
+from swarms.models.gpt4_vision_api import GPT4VisionAPI

-# Multi Modality Auto Agent
-llm = Idefics(max_length=2000)

-task = (
-    "User: What is in this image?"
-    " https://upload.wikimedia.org/wikipedia/commons/8/86/Id%C3%A9fix.JPG"
-)
+llm = GPT4VisionAPI()
+
+task = "What is the color of the object?"
+img = "images/swarms.jpeg"

 ## Initialize the workflow
 flow = Flow(
    llm=llm,
-    max_loops=2,
+    max_loops="auto",
    dashboard=True,
-    # stopping_condition=None,  # You can define a stopping condition as needed.
-    # loop_interval=1,
-    # retry_attempts=3,
-    # retry_interval=1,
-    # interactive=False,  # Set to 'True' for interactive mode.
-    # dynamic_temperature=False,  # Set to 'True' for dynamic temperature handling.
 )

-# out = flow.load_state("flow_state.json")
-# temp = flow.dynamic_temperature()
-# filter = flow.add_response_filter("Trump")
-out = flow.run(task)
-# out = flow.validate_response(out)
-# out = flow.analyze_feedback(out)
-# out = flow.print_history_and_memory()
-# # out = flow.save_state("flow_state.json")
-# print(out)
+flow.run(task=task, img=img)
--- a/playground/demos/positive_med/positive_med.py
+++ b/playground/demos/positive_med/positive_med.py
@ -53,8 +53,8 @@ topic_selection_task = (
    "Generate 10 topics on gaining mental clarity using ancient practices"
 )
 topics = llm(
-    f"Your System Instructions: {TOPIC_GENERATOR_SYSTEM_PROMPT}, Your current task:"
-    f" {topic_selection_task}"
+    f"Your System Instructions: {TOPIC_GENERATOR_SYSTEM_PROMPT}, Your current"
+    f" task: {topic_selection_task}"
 )

 dashboard = print(
--- a/pyproject.toml
+++ b/pyproject.toml
@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"

 [tool.poetry]
 name = "swarms"
-version = "2.4.0"
+version = "2.4.1"
 description = "Swarms - Pytorch"
 license = "MIT"
 authors = ["Kye Gomez <kye@apac.ai>"]
--- a/swarms/models/init.py
+++ b/swarms/models/init.py
@ -20,15 +20,16 @@ from swarms.models.mpt import MPT7B  # noqa: E402

 # MultiModal Models
 from swarms.models.idefics import Idefics  # noqa: E402
-
-# from swarms.models.kosmos_two import Kosmos  # noqa: E402
 from swarms.models.vilt import Vilt  # noqa: E402
 from swarms.models.nougat import Nougat  # noqa: E402
 from swarms.models.layoutlm_document_qa import LayoutLMDocumentQA  # noqa: E402
+from swarms.models.gpt4_vision_api import GPT4VisionAPI  # noqa: E402

 # from swarms.models.gpt4v import GPT4Vision
 # from swarms.models.dalle3 import Dalle3
 # from swarms.models.distilled_whisperx import DistilWhisperModel # noqa: E402
+# from swarms.models.whisperx_model import WhisperX  # noqa: E402
+# from swarms.models.kosmos_two import Kosmos  # noqa: E402

 __all__ = [
    "Anthropic",
@ -49,4 +50,6 @@ __all__ = [
    "WizardLLMStoryTeller",
    # "GPT4Vision",
    # "Dalle3",
+    # "DistilWhisperModel",
+    "GPT4VisionAPI",
 ]
--- a/swarms/models/anthropic.py
+++ b/swarms/models/anthropic.py
@ -185,11 +185,9 @@ def build_extra_kwargs(
        if field_name in extra_kwargs:
            raise ValueError(f"Found {field_name} supplied twice.")
        if field_name not in all_required_field_names:
-            warnings.warn(
-                f"""WARNING! {field_name} is not default parameter.
+            warnings.warn(f"""WARNING! {field_name} is not default parameter.
                {field_name} was transferred to model_kwargs.
-                Please confirm that {field_name} is what you intended."""
-            )
+                Please confirm that {field_name} is what you intended.""")
            extra_kwargs[field_name] = values.pop(field_name)

    invalid_model_kwargs = all_required_field_names.intersection(
--- a/swarms/models/base_multimodal_model.py
+++ b/swarms/models/base_multimodal_model.py
@ -0,0 +1,209 @@
+import asyncio
+import base64
+import concurrent.futures
+import time
+from concurrent import ThreadPoolExecutor
+from io import BytesIO
+from typing import List, Optional, Tuple
+
+import requests
+from ABC import abstractmethod
+from PIL import Image
+
+
+class BaseMultiModalModel:
+    def __init__(
+        self,
+        model_name: Optional[str],
+        temperature: Optional[int] = 0.5,
+        max_tokens: Optional[int] = 500,
+        max_workers: Optional[int] = 10,
+        top_p: Optional[int] = 1,
+        top_k: Optional[int] = 50,
+        device: Optional[str] = "cuda",
+        max_new_tokens: Optional[int] = 500,
+        retries: Optional[int] = 3,
+    ):
+        self.model_name = model_name
+        self.temperature = temperature
+        self.max_tokens = max_tokens
+        self.max_workers = max_workers
+        self.top_p = top_p
+        self.top_k = top_k
+        self.device = device
+        self.max_new_tokens = max_new_tokens
+        self.retries = retries
+        self.chat_history = []
+
+    
+    @abstractmethod
+    def __call__(self, text: str, img: str):
+        """Run the model"""
+        pass
+
+    def run(self, task: str, img: str):
+        """Run the model"""
+        pass
+
+    async def arun(self, task: str, img: str):
+        """Run the model asynchronously"""
+        pass
+
+    def get_img_from_web(self, img: str):
+        """Get the image from the web"""
+        try:
+            response = requests.get(img)
+            response.raise_for_status()
+            image_pil = Image.open(BytesIO(response.content))
+            return image_pil
+        except requests.RequestException as error:
+            print(f"Error fetching image from {img} and error: {error}")
+            return None
+        
+    def encode_img(self, img: str):
+        """Encode the image to base64"""
+        with open(img, "rb") as image_file:
+            return base64.b64encode(image_file.read()).decode("utf-8")
+    
+    def get_img(self, img: str):
+        """Get the image from the path"""
+        image_pil = Image.open(img)
+        return image_pil
+    
+    def clear_chat_history(self):
+        """Clear the chat history"""
+        self.chat_history = []
+
+    def run_many(
+        self,
+        tasks: List[str],
+        imgs: List[str],
+    ):
+        """
+        Run the model on multiple tasks and images all at once using concurrent
+
+        Args:
+            tasks (List[str]): List of tasks
+            imgs (List[str]): List of image paths
+        
+        Returns:
+            List[str]: List of responses
+            
+        
+        """
+        # Instantiate the thread pool executor
+        with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
+            results = executor.map(self.run, tasks, imgs)
+
+        # Print the results for debugging
+        for result in results:
+            print(result)
+
+
+    def run_batch(self, tasks_images: List[Tuple[str, str]]) -> List[str]:
+        """Process a batch of tasks and images"""
+        with concurrent.futures.ThreadPoolExecutor() as executor:
+            futures = [
+                executor.submit(self.run, task, img)
+                for task, img in tasks_images
+            ]
+            results = [future.result() for future in futures]
+        return results
+
+    async def run_batch_async(
+        self, tasks_images: List[Tuple[str, str]]
+    ) -> List[str]:
+        """Process a batch of tasks and images asynchronously"""
+        loop = asyncio.get_event_loop()
+        futures = [
+            loop.run_in_executor(None, self.run, task, img)
+            for task, img in tasks_images
+        ]
+        return await asyncio.gather(*futures)
+
+    async def run_batch_async_with_retries(
+        self, tasks_images: List[Tuple[str, str]]
+    ) -> List[str]:
+        """Process a batch of tasks and images asynchronously with retries"""
+        loop = asyncio.get_event_loop()
+        futures = [
+            loop.run_in_executor(None, self.run_with_retries, task, img)
+            for task, img in tasks_images
+        ]
+        return await asyncio.gather(*futures)
+    
+    def unique_chat_history(self):
+        """Get the unique chat history"""
+        return list(set(self.chat_history))
+    
+    def run_with_retries(self, task: str, img: str):
+        """Run the model with retries"""
+        for i in range(self.retries):
+            try:
+                return self.run(task, img)
+            except Exception as error:
+                print(f"Error with the request {error}")
+                continue
+    
+    def run_batch_with_retries(self, tasks_images: List[Tuple[str, str]]):
+        """Run the model with retries"""
+        for i in range(self.retries):
+            try:
+                return self.run_batch(tasks_images)
+            except Exception as error:
+                print(f"Error with the request {error}")
+                continue
+
+    def _tokens_per_second(self) -> float:
+        """Tokens per second"""
+        elapsed_time = self.end_time - self.start_time
+        if elapsed_time == 0:
+            return float("inf")
+        return self._num_tokens() / elapsed_time
+
+    def _time_for_generation(self, task: str) -> float:
+        """Time for Generation"""
+        self.start_time = time.time()
+        self.run(task)
+        self.end_time = time.time()
+        return self.end_time - self.start_time
+
+    @abstractmethod
+    def generate_summary(self, text: str) -> str:
+        """Generate Summary"""
+        pass
+
+    def set_temperature(self, value: float):
+        """Set Temperature"""
+        self.temperature = value
+
+    def set_max_tokens(self, value: int):
+        """Set new max tokens"""
+        self.max_tokens = value
+
+    def get_generation_time(self) -> float:
+        """Get generation time"""
+        if self.start_time and self.end_time:
+            return self.end_time - self.start_time
+        return 0
+    
+    def get_chat_history(self):
+        """Get the chat history"""
+        return self.chat_history
+    
+    def get_unique_chat_history(self):
+        """Get the unique chat history"""
+        return list(set(self.chat_history))
+    
+    def get_chat_history_length(self):
+        """Get the chat history length"""
+        return len(self.chat_history)
+    
+    def get_unique_chat_history_length(self):
+        """Get the unique chat history length"""
+        return len(list(set(self.chat_history)))
+    
+    def get_chat_history_tokens(self):
+        """Get the chat history tokens"""
+        return self._num_tokens()
+    
--- a/swarms/models/dalle3.py
+++ b/swarms/models/dalle3.py
@ -168,8 +168,10 @@ class Dalle3:
            # Handling exceptions and printing the errors details
            print(
                colored(
-                    f"Error running Dalle3: {error} try optimizing your api"
-                    " key and or try again",
+                    (
+                        f"Error running Dalle3: {error} try optimizing your api"
+                        " key and or try again"
+                    ),
                    "red",
                )
            )
@ -231,8 +233,10 @@ class Dalle3:
        except (Exception, openai.OpenAIError) as error:
            print(
                colored(
-                    f"Error running Dalle3: {error} try optimizing your api"
-                    " key and or try again",
+                    (
+                        f"Error running Dalle3: {error} try optimizing your api"
+                        " key and or try again"
+                    ),
                    "red",
                )
            )
@ -306,8 +310,10 @@ class Dalle3:
                except Exception as error:
                    print(
                        colored(
-                            f"Error running Dalle3: {error} try optimizing"
-                            " your api key and or try again",
+                            (
+                                f"Error running Dalle3: {error} try optimizing"
+                                " your api key and or try again"
+                            ),
                            "red",
                        )
                    )
--- a/swarms/models/fast_vit_classes.json
+++ b/swarms/models/fast_vit_classes.json
--- a/swarms/models/fuyu.py
+++ b/swarms/models/fuyu.py
@ -63,9 +63,9 @@ class Fuyu:

    def __call__(self, text: str, img: str):
        """Call the model with text and img paths"""
-        image_pil = Image.open(img)
+        img = self.get_img(img)
        model_inputs = self.processor(
-            text=text, images=[image_pil], device=self.device_map
+            text=text, images=[img], device=self.device_map
        )

        for k, v in model_inputs.items():
@ -79,13 +79,13 @@ class Fuyu:
        )
        return print(str(text))

-    def get_img_from_web(self, img_url: str):
+    def get_img_from_web(self, img: str):
        """Get the image from the web"""
        try:
-            response = requests.get(img_url)
+            response = requests.get(img)
            response.raise_for_status()
            image_pil = Image.open(BytesIO(response.content))
            return image_pil
        except requests.RequestException as error:
-            print(f"Error fetching image from {img_url} and error: {error}")
+            print(f"Error fetching image from {img} and error: {error}")
            return None
--- a/swarms/models/gpt4_vision_api.py
+++ b/swarms/models/gpt4_vision_api.py
@ -0,0 +1,291 @@
+import asyncio
+import base64
+import concurrent.futures
+from termcolor import colored
+import json
+import os
+from concurrent.futures import ThreadPoolExecutor
+from typing import List, Tuple
+
+import aiohttp
+import requests
+from dotenv import load_dotenv
+
+# Load environment variables
+load_dotenv()
+openai_api_key = os.getenv("OPENAI_API_KEY")
+
+
+class GPT4VisionAPI:
+    """
+    GPT-4 Vision API
+
+    This class is a wrapper for the OpenAI API. It is used to run the GPT-4 Vision model.
+
+    Parameters
+    ----------
+    openai_api_key : str
+        The OpenAI API key. Defaults to the OPENAI_API_KEY environment variable.
+    max_tokens : int
+        The maximum number of tokens to generate. Defaults to 300.
+
+
+    Methods
+    -------
+    encode_image(img: str)
+        Encode image to base64.
+    run(task: str, img: str)
+        Run the model.
+    __call__(task: str, img: str)
+        Run the model.
+
+    Examples:
+    ---------
+    >>> from swarms.models import GPT4VisionAPI
+    >>> llm = GPT4VisionAPI()
+    >>> task = "What is the color of the object?"
+    >>> img = "https://i.imgur.com/2M2ZGwC.jpeg"
+    >>> llm.run(task, img)
+
+
+    """
+
+    def __init__(
+        self,
+        openai_api_key: str = openai_api_key,
+        model_name: str = "gpt-4-vision-preview",
+        max_workers: int = 10,
+        max_tokens: str = 300,
+        openai_proxy: str = "https://api.openai.com/v1/chat/completions",
+    ):
+        super().__init__()
+        self.openai_api_key = openai_api_key
+        self.model_name = model_name
+        self.max_workers = max_workers
+        self.max_tokens = max_tokens
+        self.openai_proxy = openai_proxy
+
+    def encode_image(self, img: str):
+        """Encode image to base64."""
+        with open(img, "rb") as image_file:
+            return base64.b64encode(image_file.read()).decode("utf-8")
+
+    def download_img_then_encode(self, img: str):
+        """Download image from URL then encode image to base64 using requests"""
+
+    # Function to handle vision tasks
+    def run(self, task: str, img: str):
+        """Run the model."""
+        try:
+            base64_image = self.encode_image(img)
+            headers = {
+                "Content-Type": "application/json",
+                "Authorization": f"Bearer {openai_api_key}",
+            }
+            payload = {
+                "model": self.model_name,
+                "messages": [
+                    {
+                        "role": "user",
+                        "content": [
+                            {"type": "text", "text": task},
+                            {
+                                "type": "image_url",
+                                "image_url": {
+                                    "url": (
+                                        f"data:image/jpeg;base64,{base64_image}"
+                                    )
+                                },
+                            },
+                        ],
+                    }
+                ],
+                "max_tokens": self.max_tokens,
+            }
+            response = requests.post(
+                "https://api.openai.com/v1/chat/completions",
+                headers=headers,
+                json=payload,
+            )
+
+            out = response.json()
+            content = out["choices"][0]["message"]["content"]
+            print(content)
+        except Exception as error:
+            print(f"Error with the request: {error}")
+            raise error
+
+    def __call__(self, task: str, img: str):
+        """Run the model."""
+        try:
+            base64_image = self.encode_image(img)
+            headers = {
+                "Content-Type": "application/json",
+                "Authorization": f"Bearer {openai_api_key}",
+            }
+            payload = {
+                "model": "gpt-4-vision-preview",
+                "messages": [
+                    {
+                        "role": "user",
+                        "content": [
+                            {"type": "text", "text": task},
+                            {
+                                "type": "image_url",
+                                "image_url": {
+                                    "url": (
+                                        f"data:image/jpeg;base64,{base64_image}"
+                                    )
+                                },
+                            },
+                        ],
+                    }
+                ],
+                "max_tokens": self.max_tokens,
+            }
+            response = requests.post(
+                self.openai_proxy,
+                headers=headers,
+                json=payload,
+            )
+
+            out = response.json()
+            content = out["choices"][0]["message"]["content"]
+            print(content)
+        except Exception as error:
+            print(f"Error with the request: {error}")
+            raise error
+        # Function to handle vision tasks
+
+    def run_many(
+        self,
+        tasks: List[str],
+        imgs: List[str],
+    ):
+        """
+        Run the model on multiple tasks and images all at once using concurrent
+
+        """
+        # Instantiate the thread pool executor
+        with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
+            results = executor.map(self.run, tasks, imgs)
+
+        # Print the results for debugging
+        for result in results:
+            print(result)
+
+        return list(results)
+
+    async def arun(
+        self,
+        task: str,
+        img: str,
+    ):
+        """
+        Asynchronously run the model
+
+        Overview:
+        ---------
+        This method is used to asynchronously run the model. It is used to run the model
+        on a single task and image.
+
+        Parameters:
+        ----------
+        task : str
+            The task to run the model on.
+        img : str
+            The image to run the task on
+
+        """
+        try:
+            base64_image = self.encode_image(img)
+            headers = {
+                "Content-Type": "application/json",
+                "Authorization": f"Bearer {openai_api_key}",
+            }
+            payload = {
+                "model": "gpt-4-vision-preview",
+                "messages": [
+                    {
+                        "role": "user",
+                        "content": [
+                            {"type": "text", "text": task},
+                            {
+                                "type": "image_url",
+                                "image_url": {
+                                    "url": (
+                                        f"data:image/jpeg;base64,{base64_image}"
+                                    )
+                                },
+                            },
+                        ],
+                    }
+                ],
+                "max_tokens": self.max_tokens,
+            }
+            async with aiohttp.ClientSession() as session:
+                async with session.post(
+                    self.openai_proxy, headers=headers, data=json.dumps(payload)
+                ) as response:
+                    out = await response.json()
+                    content = out["choices"][0]["message"]["content"]
+                    print(content)
+        except Exception as error:
+            print(f"Error with the request {error}")
+            raise error
+
+    def run_batch(self, tasks_images: List[Tuple[str, str]]) -> List[str]:
+        """Process a batch of tasks and images"""
+        with concurrent.futures.ThreadPoolExecutor() as executor:
+            futures = [
+                executor.submit(self.run, task, img)
+                for task, img in tasks_images
+            ]
+            results = [future.result() for future in futures]
+        return results
+
+    async def run_batch_async(
+        self, tasks_images: List[Tuple[str, str]]
+    ) -> List[str]:
+        """Process a batch of tasks and images asynchronously"""
+        loop = asyncio.get_event_loop()
+        futures = [
+            loop.run_in_executor(None, self.run, task, img)
+            for task, img in tasks_images
+        ]
+        return await asyncio.gather(*futures)
+
+    async def run_batch_async_with_retries(
+        self, tasks_images: List[Tuple[str, str]]
+    ) -> List[str]:
+        """Process a batch of tasks and images asynchronously with retries"""
+        loop = asyncio.get_event_loop()
+        futures = [
+            loop.run_in_executor(None, self.run_with_retries, task, img)
+            for task, img in tasks_images
+        ]
+        return await asyncio.gather(*futures)
+
+    def health_check(self):
+        """Health check for the GPT4Vision model"""
+        try:
+            response = requests.get("https://api.openai.com/v1/engines")
+            return response.status_code == 200
+        except requests.RequestException as error:
+            print(f"Health check failed: {error}")
+            return False
+
+    def print_dashboard(self):
+        dashboard = print(
+            colored(
+                f"""
+            GPT4Vision Dashboard
+            -------------------
+            Model: {self.model_name}
+            Max Workers: {self.max_workers}
+            OpenAIProxy: {self.openai_proxy}
+            """,
+                "green",
+            )
+        )
+        return dashboard
--- a/swarms/models/huggingface.py
+++ b/swarms/models/huggingface.py
@ -291,8 +291,10 @@ class HuggingfaceLLM:
        except Exception as e:
            print(
                colored(
-                    "HuggingfaceLLM could not generate text because of"
-                    f" error: {e}, try optimizing your arguments",
+                    (
+                        "HuggingfaceLLM could not generate text because of"
+                        f" error: {e}, try optimizing your arguments"
+                    ),
                    "red",
                )
            )
--- a/swarms/models/kosmos_two.py
+++ b/swarms/models/kosmos_two.py
@ -18,38 +18,31 @@ def is_overlapping(rect1, rect2):

 class Kosmos:
    """
+    Kosmos model by Yen-Chun Shieh
+
+    Parameters
+    ----------
+    model_name : str
+        Path to the pretrained model
+    
+    Examples
+    --------
+    >>> kosmos = Kosmos()
+    >>> kosmos("Hello, my name is", "path/to/image.png")

-    Args:
-
-
-    # Initialize Kosmos
-    kosmos = Kosmos()
-
-    # Perform multimodal grounding
-    kosmos.multimodal_grounding("Find the red apple in the image.", "https://example.com/apple.jpg")
-
-    # Perform referring expression comprehension
-    kosmos.referring_expression_comprehension("Show me the green bottle.", "https://example.com/bottle.jpg")
-
-    # Generate referring expressions
-    kosmos.referring_expression_generation("It is on the table.", "https://example.com/table.jpg")
-
-    # Perform grounded visual question answering
-    kosmos.grounded_vqa("What is the color of the car?", "https://example.com/car.jpg")
-
-    # Generate grounded image caption
-    kosmos.grounded_image_captioning("https://example.com/beach.jpg")
    """

    def __init__(
        self,
        model_name="ydshieh/kosmos-2-patch14-224",
+        *args,
+        **kwargs,
    ):
        self.model = AutoModelForVision2Seq.from_pretrained(
-            model_name, trust_remote_code=True
+            model_name, trust_remote_code=True, *args, **kwargs
        )
        self.processor = AutoProcessor.from_pretrained(
-            model_name, trust_remote_code=True
+            model_name, trust_remote_code=True, *args, **kwargs
        )

    def get_image(self, url):
--- a/swarms/models/ssd_1b.py
+++ b/swarms/models/ssd_1b.py
@ -140,8 +140,10 @@ class SSD1B:
            # Handling exceptions and printing the errors details
            print(
                colored(
-                    f"Error running SSD1B: {error} try optimizing your api"
-                    " key and or try again",
+                    (
+                        f"Error running SSD1B: {error} try optimizing your api"
+                        " key and or try again"
+                    ),
                    "red",
                )
            )
@ -226,8 +228,10 @@ class SSD1B:
                except Exception as error:
                    print(
                        colored(
-                            f"Error running SSD1B: {error} try optimizing"
-                            " your api key and or try again",
+                            (
+                                f"Error running SSD1B: {error} try optimizing"
+                                " your api key and or try again"
+                            ),
                            "red",
                        )
                    )
--- a/swarms/models/whisperx_model.py
+++ b/swarms/models/whisperx_model.py
@ -2,7 +2,7 @@ import os
 import subprocess

 try:
-    import whisperx
+    import swarms.models.whisperx_model as whisperx_model
    from pydub import AudioSegment
    from pytube import YouTube
 except Exception as error:
@ -66,17 +66,17 @@ class WhisperX:
        compute_type = "float16"

        # 1. Transcribe with original Whisper (batched) 🗣️
-        model = whisperx.load_model(
+        model = whisperx_model.load_model(
            "large-v2", device, compute_type=compute_type
        )
-        audio = whisperx.load_audio(audio_file)
+        audio = whisperx_model.load_audio(audio_file)
        result = model.transcribe(audio, batch_size=batch_size)

        # 2. Align Whisper output 🔍
-        model_a, metadata = whisperx.load_align_model(
+        model_a, metadata = whisperx_model.load_align_model(
            language_code=result["language"], device=device
        )
-        result = whisperx.align(
+        result = whisperx_model.align(
            result["segments"],
            model_a,
            metadata,
@ -86,7 +86,7 @@ class WhisperX:
        )

        # 3. Assign speaker labels 🏷️
-        diarize_model = whisperx.DiarizationPipeline(
+        diarize_model = whisperx_model.DiarizationPipeline(
            use_auth_token=self.hf_api_key, device=device
        )
        diarize_model(audio_file)
@ -99,16 +99,16 @@ class WhisperX:
            print("The key 'segments' is not found in the result.")

    def transcribe(self, audio_file):
-        model = whisperx.load_model("large-v2", self.device, self.compute_type)
-        audio = whisperx.load_audio(audio_file)
+        model = whisperx_model.load_model("large-v2", self.device, self.compute_type)
+        audio = whisperx_model.load_audio(audio_file)
        result = model.transcribe(audio, batch_size=self.batch_size)

        # 2. Align Whisper output 🔍
-        model_a, metadata = whisperx.load_align_model(
+        model_a, metadata = whisperx_model.load_align_model(
            language_code=result["language"], device=self.device
        )

-        result = whisperx.align(
+        result = whisperx_model.align(
            result["segments"],
            model_a,
            metadata,
@ -118,7 +118,7 @@ class WhisperX:
        )

        # 3. Assign speaker labels 🏷️
-        diarize_model = whisperx.DiarizationPipeline(
+        diarize_model = whisperx_model.DiarizationPipeline(
            use_auth_token=self.hf_api_key, device=self.device
        )

--- a/swarms/prompts/autobloggen.py
+++ b/swarms/prompts/autobloggen.py
@ -274,5 +274,3 @@ Check Accuracy:
 - Flag any bold claims that lack credible evidence for fact-checker review.   

 """
-
-
--- a/swarms/structs/flow.py
+++ b/swarms/structs/flow.py
@ -489,14 +489,16 @@ class Flow:
        except Exception as error:
            print(
                colored(
-                    "Error activating autonomous agent. Try optimizing your"
-                    " parameters...",
+                    (
+                        "Error activating autonomous agent. Try optimizing your"
+                        " parameters..."
+                    ),
                    "red",
                )
            )
            print(error)

-    def run(self, task: str, **kwargs):
+    def run(self, task: str, img: Optional[str], **kwargs):
        """
        Run the autonomous agent loop

@ -550,10 +552,17 @@ class Flow:
                attempt = 0
                while attempt < self.retry_attempts:
                    try:
-                        response = self.llm(
-                            task,
-                            **kwargs,
-                        )
+                        if img:
+                            response = self.llm(
+                                task,
+                                img,
+                                **kwargs,
+                            )
+                        else:
+                            response = self.llm(
+                                task,
+                                **kwargs,
+                            )

                        # If code interpreter is enabled then run the code
                        if self.code_interpreter:
@ -650,7 +659,7 @@ class Flow:
            while attempt < self.retry_attempts:
                try:
                    response = self.llm(
-                        task ** kwargs,
+                        task**kwargs,
                    )
                    if self.interactive:
                        print(f"AI: {response}")
--- a/swarms/structs/sequential_workflow.py
+++ b/swarms/structs/sequential_workflow.py
@ -385,9 +385,11 @@ class SequentialWorkflow:
        except Exception as e:
            print(
                colored(
-                    f"Error initializing the Sequential workflow: {e} try"
-                    " optimizing your inputs like the flow class and task"
-                    " description",
+                    (
+                        f"Error initializing the Sequential workflow: {e} try"
+                        " optimizing your inputs like the flow class and task"
+                        " description"
+                    ),
                    "red",
                    attrs=["bold", "underline"],
                )
--- a/tests/models/test_gpt4_vision_api.py
+++ b/tests/models/test_gpt4_vision_api.py
@ -0,0 +1,238 @@
+import asyncio
+import os
+from unittest.mock import AsyncMock, Mock, mock_open, patch
+from aiohttp import ClientResponseError
+import pytest
+from dotenv import load_dotenv
+from requests.exceptions import RequestException
+
+from swarms.models.gpt4_vision_api import GPT4VisionAPI
+
+load_dotenv()
+
+
+custom_api_key = os.environ.get("OPENAI_API_KEY")
+img = "images/swarms.jpeg"
+
+
+@pytest.fixture
+def vision_api():
+    return GPT4VisionAPI(openai_api_key="test_api_key")
+
+
+def test_init(vision_api):
+    assert vision_api.openai_api_key == "test_api_key"
+
+
+def test_encode_image(vision_api):
+    with patch(
+        "builtins.open", mock_open(read_data=b"test_image_data"), create=True
+    ):
+        encoded_image = vision_api.encode_image(img)
+        assert encoded_image == "dGVzdF9pbWFnZV9kYXRh"
+
+
+def test_run_success(vision_api):
+    expected_response = {"choices": [{"text": "This is the model's response."}]}
+    with patch(
+        "requests.post", return_value=Mock(json=lambda: expected_response)
+    ) as mock_post:
+        result = vision_api.run("What is this?", img)
+        mock_post.assert_called_once()
+        assert result == "This is the model's response."
+
+
+def test_run_request_error(vision_api):
+    with patch(
+        "requests.post", side_effect=RequestException("Request Error")
+    ) as mock_post:
+        with pytest.raises(RequestException):
+            vision_api.run("What is this?", img)
+
+
+def test_run_response_error(vision_api):
+    expected_response = {"error": "Model Error"}
+    with patch(
+        "requests.post", return_value=Mock(json=lambda: expected_response)
+    ) as mock_post:
+        with pytest.raises(RuntimeError):
+            vision_api.run("What is this?", img)
+
+
+def test_call(vision_api):
+    expected_response = {"choices": [{"text": "This is the model's response."}]}
+    with patch(
+        "requests.post", return_value=Mock(json=lambda: expected_response)
+    ) as mock_post:
+        result = vision_api("What is this?", img)
+        mock_post.assert_called_once()
+        assert result == "This is the model's response."
+
+
+@pytest.fixture
+def gpt_api():
+    return GPT4VisionAPI()
+
+
+def test_initialization_with_default_key():
+    api = GPT4VisionAPI()
+    assert api.openai_api_key == custom_api_key
+
+
+def test_initialization_with_custom_key():
+    custom_key = custom_api_key
+    api = GPT4VisionAPI(openai_api_key=custom_key)
+    assert api.openai_api_key == custom_key
+
+
+def test_run_successful_response(gpt_api):
+    task = "What is in the image?"
+    img_url = img
+    response_json = {"choices": [{"text": "Answer from GPT-4 Vision"}]}
+    mock_response = Mock()
+    mock_response.json.return_value = response_json
+    with patch("requests.post", return_value=mock_response) as mock_post:
+        result = gpt_api.run(task, img_url)
+        mock_post.assert_called_once()
+    assert result == response_json["choices"][0]["text"]
+
+
+def test_run_with_exception(gpt_api):
+    task = "What is in the image?"
+    img_url = img
+    with patch("requests.post", side_effect=Exception("Test Exception")):
+        with pytest.raises(Exception):
+            gpt_api.run(task, img_url)
+
+
+def test_call_method_successful_response(gpt_api):
+    task = "What is in the image?"
+    img_url = img
+    response_json = {"choices": [{"text": "Answer from GPT-4 Vision"}]}
+    mock_response = Mock()
+    mock_response.json.return_value = response_json
+    with patch("requests.post", return_value=mock_response) as mock_post:
+        result = gpt_api(task, img_url)
+        mock_post.assert_called_once()
+    assert result == response_json
+
+
+def test_call_method_with_exception(gpt_api):
+    task = "What is in the image?"
+    img_url = img
+    with patch("requests.post", side_effect=Exception("Test Exception")):
+        with pytest.raises(Exception):
+            gpt_api(task, img_url)
+
+
+@pytest.mark.asyncio
+async def test_arun_success(vision_api):
+    expected_response = {
+        "choices": [{"message": {"content": "This is the model's response."}}]
+    }
+    with patch(
+        "aiohttp.ClientSession.post",
+        new_callable=AsyncMock,
+        return_value=AsyncMock(json=AsyncMock(return_value=expected_response)),
+    ) as mock_post:
+        result = await vision_api.arun("What is this?", img)
+        mock_post.assert_called_once()
+        assert result == "This is the model's response."
+
+
+@pytest.mark.asyncio
+async def test_arun_request_error(vision_api):
+    with patch(
+        "aiohttp.ClientSession.post",
+        new_callable=AsyncMock,
+        side_effect=Exception("Request Error"),
+    ) as mock_post:
+        with pytest.raises(Exception):
+            await vision_api.arun("What is this?", img)
+
+
+def test_run_many_success(vision_api):
+    expected_response = {
+        "choices": [{"message": {"content": "This is the model's response."}}]
+    }
+    with patch(
+        "requests.post", return_value=Mock(json=lambda: expected_response)
+    ) as mock_post:
+        tasks = ["What is this?", "What is that?"]
+        imgs = [img, img]
+        results = vision_api.run_many(tasks, imgs)
+        assert mock_post.call_count == 2
+        assert results == [
+            "This is the model's response.",
+            "This is the model's response.",
+        ]
+
+
+def test_run_many_request_error(vision_api):
+    with patch(
+        "requests.post", side_effect=RequestException("Request Error")
+    ) as mock_post:
+        tasks = ["What is this?", "What is that?"]
+        imgs = [img, img]
+        with pytest.raises(RequestException):
+            vision_api.run_many(tasks, imgs)
+
+
+@pytest.mark.asyncio
+async def test_arun_json_decode_error(vision_api):
+    with patch(
+        "aiohttp.ClientSession.post",
+        new_callable=AsyncMock,
+        return_value=AsyncMock(json=AsyncMock(side_effect=ValueError)),
+    ) as mock_post:
+        with pytest.raises(ValueError):
+            await vision_api.arun("What is this?", img)
+
+
+@pytest.mark.asyncio
+async def test_arun_api_error(vision_api):
+    error_response = {"error": {"message": "API Error"}}
+    with patch(
+        "aiohttp.ClientSession.post",
+        new_callable=AsyncMock,
+        return_value=AsyncMock(json=AsyncMock(return_value=error_response)),
+    ) as mock_post:
+        with pytest.raises(Exception, match="API Error"):
+            await vision_api.arun("What is this?", img)
+
+
+@pytest.mark.asyncio
+async def test_arun_unexpected_response(vision_api):
+    unexpected_response = {"unexpected": "response"}
+    with patch(
+        "aiohttp.ClientSession.post",
+        new_callable=AsyncMock,
+        return_value=AsyncMock(
+            json=AsyncMock(return_value=unexpected_response)
+        ),
+    ) as mock_post:
+        with pytest.raises(Exception, match="Unexpected response"):
+            await vision_api.arun("What is this?", img)
+
+
+@pytest.mark.asyncio
+async def test_arun_retries(vision_api):
+    with patch(
+        "aiohttp.ClientSession.post",
+        new_callable=AsyncMock,
+        side_effect=ClientResponseError(None, None),
+    ) as mock_post:
+        with pytest.raises(ClientResponseError):
+            await vision_api.arun("What is this?", img)
+        assert mock_post.call_count == vision_api.retries + 1
+
+
+@pytest.mark.asyncio
+async def test_arun_timeout(vision_api):
+    with patch(
+        "aiohttp.ClientSession.post",
+        new_callable=AsyncMock,
+        side_effect=asyncio.TimeoutError,
+    ) as mock_post:
+        with pytest.raises(asyncio.TimeoutError):
+            await vision_api.arun("What is this?", img)
--- a/tests/models/test_revgptv4.py
+++ b/tests/models/test_revgptv4.py
@ -1,93 +0,0 @@
-import unittest
-from unittest.mock import patch
-from RevChatGPTModelv4 import RevChatGPTModelv4
-
-
-class TestRevChatGPT(unittest.TestCase):
-    def setUp(self):
-        self.access_token = "123"
-        self.model = RevChatGPTModelv4(access_token=self.access_token)
-
-    def test_run(self):
-        prompt = "What is the capital of France?"
-        self.model.start_time = 10
-        self.model.end_time = 20
-        response = self.model.run(prompt)
-        self.assertEqual(response, "The capital of France is Paris.")
-        self.assertEqual(self.model.start_time, 10)
-        self.assertEqual(self.model.end_time, 20)
-
-    def test_generate_summary(self):
-        text = "Hello world. This is some text. It has multiple sentences."
-        summary = self.model.generate_summary(text)
-        self.assertEqual(summary, "")
-
-    @patch("RevChatGPTModelv4.Chatbot.install_plugin")
-    def test_enable_plugin(self, mock_install_plugin):
-        plugin_id = "plugin123"
-        self.model.enable_plugin(plugin_id)
-        mock_install_plugin.assert_called_with(plugin_id=plugin_id)
-
-    @patch("RevChatGPTModelv4.Chatbot.get_plugins")
-    def test_list_plugins(self, mock_get_plugins):
-        mock_get_plugins.return_value = [{"id": "123", "name": "Test Plugin"}]
-        plugins = self.model.list_plugins()
-        self.assertEqual(len(plugins), 1)
-        self.assertEqual(plugins[0]["id"], "123")
-        self.assertEqual(plugins[0]["name"], "Test Plugin")
-
-    @patch("RevChatGPTModelv4.Chatbot.get_conversations")
-    def test_get_conversations(self, mock_get_conversations):
-        self.model.chatbot.get_conversations()
-        mock_get_conversations.assert_called()
-
-    @patch("RevChatGPTModelv4.Chatbot.get_msg_history")
-    def test_get_msg_history(self, mock_get_msg_history):
-        convo_id = "123"
-        self.model.chatbot.get_msg_history(convo_id)
-        mock_get_msg_history.assert_called_with(convo_id)
-
-    @patch("RevChatGPTModelv4.Chatbot.share_conversation")
-    def test_share_conversation(self, mock_share_conversation):
-        self.model.chatbot.share_conversation()
-        mock_share_conversation.assert_called()
-
-    @patch("RevChatGPTModelv4.Chatbot.gen_title")
-    def test_gen_title(self, mock_gen_title):
-        convo_id = "123"
-        message_id = "456"
-        self.model.chatbot.gen_title(convo_id, message_id)
-        mock_gen_title.assert_called_with(convo_id, message_id)
-
-    @patch("RevChatGPTModelv4.Chatbot.change_title")
-    def test_change_title(self, mock_change_title):
-        convo_id = "123"
-        title = "New Title"
-        self.model.chatbot.change_title(convo_id, title)
-        mock_change_title.assert_called_with(convo_id, title)
-
-    @patch("RevChatGPTModelv4.Chatbot.delete_conversation")
-    def test_delete_conversation(self, mock_delete_conversation):
-        convo_id = "123"
-        self.model.chatbot.delete_conversation(convo_id)
-        mock_delete_conversation.assert_called_with(convo_id)
-
-    @patch("RevChatGPTModelv4.Chatbot.clear_conversations")
-    def test_clear_conversations(self, mock_clear_conversations):
-        self.model.chatbot.clear_conversations()
-        mock_clear_conversations.assert_called()
-
-    @patch("RevChatGPTModelv4.Chatbot.rollback_conversation")
-    def test_rollback_conversation(self, mock_rollback_conversation):
-        num = 2
-        self.model.chatbot.rollback_conversation(num)
-        mock_rollback_conversation.assert_called_with(num)
-
-    @patch("RevChatGPTModelv4.Chatbot.reset_chat")
-    def test_reset_chat(self, mock_reset_chat):
-        self.model.chatbot.reset_chat()
-        mock_reset_chat.assert_called()
-
-
-if __name__ == "__main__":
-    unittest.main()
--- a/tests/models/test_whisperx.py
+++ b/tests/models/test_whisperx.py
@ -7,7 +7,7 @@ import pytest
 import whisperx
 from pydub import AudioSegment
 from pytube import YouTube
-from swarms.models.whisperx import WhisperX
+from swarms.models.whisperx_model import WhisperX


 # Fixture to create a temporary directory for testing