From 7f577acca35e5db75e95cb4cb07a64489677f3f7 Mon Sep 17 00:00:00 2001 From: Kye Gomez Date: Tue, 4 Jun 2024 15:44:12 -0700 Subject: [PATCH] [GPT4o]Docs] --- docs/index.md | 16 +- docs/mkdocs.yml | 6 +- docs/swarms/index.md | 904 ++++++++++++++---------------------- docs/swarms/models/gpt4o.md | 150 ++++++ 4 files changed, 507 insertions(+), 569 deletions(-) create mode 100644 docs/swarms/models/gpt4o.md diff --git a/docs/index.md b/docs/index.md index 47be0eb9..24d36eac 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,6 +1,6 @@ # Swarms Documentation -Cutting-edge framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, Swarms empowers agents to work together seamlessly, tackling complex tasks. +Orchestrate enterprise-grade agents for multi-agent collaboration and orchestration to automate real-world problems.
@@ -92,37 +92,37 @@ Cutting-edge framework for orchestrating role-playing, autonomous AI agents. By

Examples

  • - + Prepare for meetings
  • - + Trip Planner Crew
  • - + Create Instagram Post
  • - + Stock Analysis
  • - + Game Generator
  • - + Drafting emails with LangGraph
  • - + Landing Page Generator
  • diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index 3f6ddc15..79176ce6 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -126,16 +126,17 @@ nav: - Contributors: - Contributing: "contributing.md" - Swarms Framework Reference: + - Overview: "swarms/index.md" - swarms.models: - How to Create A Custom Language Model: "swarms/models/custom_model.md" - Deploying Azure OpenAI in Production A Comprehensive Guide: "swarms/models/azure_openai.md" - - Language Models Available: + - Language Models: - BaseLLM: "swarms/models/base_llm.md" - Overview: "swarms/models/index.md" - HuggingFaceLLM: "swarms/models/huggingface.md" - Anthropic: "swarms/models/anthropic.md" - OpenAIChat: "swarms/models/openai.md" - - MultiModal Models Available: + - MultiModal Models : - BaseMultiModalModel: "swarms/models/base_multimodal_model.md" - Fuyu: "swarms/models/fuyu.md" - Vilt: "swarms/models/vilt.md" @@ -144,6 +145,7 @@ nav: - Nougat: "swarms/models/nougat.md" - Dalle3: "swarms/models/dalle3.md" - GPT4VisionAPI: "swarms/models/gpt4v.md" + - GPT4o: "swarms/models/gpt4o.md" - swarms.structs: - Foundational Structures: - Agent: "swarms/structs/agent.md" diff --git a/docs/swarms/index.md b/docs/swarms/index.md index c3d45d86..ac91ecdf 100644 --- a/docs/swarms/index.md +++ b/docs/swarms/index.md @@ -1,14 +1,22 @@ -# Swarms +
    Orchestrate swarms of agents for production-grade applications. -Individual agents face five significant challenges that hinder their deployment in production: short memory, single-task threading, hallucinations, high cost, and lack of collaboration. Multi-agent collaboration offers a solution to all these issues. Swarms provides simple, reliable, and agile tools to create your own Swarm tailored to your specific needs. Currently, Swarms is being used in production by RBC, John Deere, and many AI startups. For more information on the unparalleled benefits of multi-agent collaboration, check out this GitHub repository for research papers or schedule a call with me! +[![GitHub issues](https://img.shields.io/github/issues/kyegomez/swarms)](https://github.com/kyegomez/swarms/issues) [![GitHub forks](https://img.shields.io/github/forks/kyegomez/swarms)](https://github.com/kyegomez/swarms/network) [![GitHub stars](https://img.shields.io/github/stars/kyegomez/swarms)](https://github.com/kyegomez/swarms/stargazers) [![GitHub license](https://img.shields.io/github/license/kyegomez/swarms)](https://github.com/kyegomez/swarms/blob/main/LICENSE)[![GitHub star chart](https://img.shields.io/github/stars/kyegomez/swarms?style=social)](https://star-history.com/#kyegomez/swarms)[![Dependency Status](https://img.shields.io/librariesio/github/kyegomez/swarms)](https://libraries.io/github/kyegomez/swarms) [![Downloads](https://static.pepy.tech/badge/swarms/month)](https://pepy.tech/project/swarms) +[![Join the Agora discord](https://img.shields.io/discord/1110910277110743103?label=Discord&logo=discord&logoColor=white&style=plastic&color=d7b023)![Share on Twitter](https://img.shields.io/twitter/url/https/twitter.com/cloudposse.svg?style=social&label=Share%20%40kyegomez/swarms)](https://twitter.com/intent/tweet?text=Check%20out%20this%20amazing%20AI%20project:%20&url=https%3A%2F%2Fgithub.com%2Fkyegomez%2Fswarms) [![Share on Facebook](https://img.shields.io/badge/Share-%20facebook-blue)](https://www.facebook.com/sharer/sharer.php?u=https%3A%2F%2Fgithub.com%2Fkyegomez%2Fswarms) [![Share on LinkedIn](https://img.shields.io/badge/Share-%20linkedin-blue)](https://www.linkedin.com/shareArticle?mini=true&url=https%3A%2F%2Fgithub.com%2Fkyegomez%2Fswarms&title=&summary=&source=) + +[![Share on Reddit](https://img.shields.io/badge/-Share%20on%20Reddit-orange)](https://www.reddit.com/submit?url=https%3A%2F%2Fgithub.com%2Fkyegomez%2Fswarms&title=Swarms%20-%20the%20future%20of%20AI) [![Share on Hacker News](https://img.shields.io/badge/-Share%20on%20Hacker%20News-orange)](https://news.ycombinator.com/submitlink?u=https%3A%2F%2Fgithub.com%2Fkyegomez%2Fswarms&t=Swarms%20-%20the%20future%20of%20AI) [![Share on Pinterest](https://img.shields.io/badge/-Share%20on%20Pinterest-red)](https://pinterest.com/pin/create/button/?url=https%3A%2F%2Fgithub.com%2Fkyegomez%2Fswarms&media=https%3A%2F%2Fexample.com%2Fimage.jpg&description=Swarms%20-%20the%20future%20of%20AI) [![Share on WhatsApp](https://img.shields.io/badge/-Share%20on%20WhatsApp-green)](https://api.whatsapp.com/send?text=Check%20out%20Swarms%20-%20the%20future%20of%20AI%20%23swarms%20%23AI%0A%0Ahttps%3A%2F%2Fgithub.com%2Fkyegomez%2Fswarms) + +
    + + +Individual agents face 5 significant challenges that hinder their deployment in production: short memory, single-task threading, hallucinations, high cost, and lack of collaboration. Multi-agent collaboration offers a solution to all these issues. Swarms provides simple, reliable, and agile tools to create your own Swarm tailored to your specific needs. Currently, Swarms is being used in production by RBC, John Deere, and many AI startups. ---- ## Install -`pip3 install -U swarms` +`$ pip3 install -U swarms` --- @@ -58,119 +66,12 @@ agent.run("Generate a 10,000 word blog on health and wellness.") ``` -### `ToolAgent` -ToolAgent is an agent that can use tools through JSON function calling. It intakes any open source model from huggingface and is extremely modular and plug in and play. We need help adding general support to all models soon. - - -```python -from pydantic import BaseModel, Field -from transformers import AutoModelForCausalLM, AutoTokenizer - -from swarms import ToolAgent -from swarms.utils.json_utils import base_model_to_json - -# Load the pre-trained model and tokenizer -model = AutoModelForCausalLM.from_pretrained( - "databricks/dolly-v2-12b", - load_in_4bit=True, - device_map="auto", -) -tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-12b") - - -# Initialize the schema for the person's information -class Schema(BaseModel): - name: str = Field(..., title="Name of the person") - agent: int = Field(..., title="Age of the person") - is_student: bool = Field( - ..., title="Whether the person is a student" - ) - courses: list[str] = Field( - ..., title="List of courses the person is taking" - ) - - -# Convert the schema to a JSON string -tool_schema = base_model_to_json(Schema) - -# Define the task to generate a person's information -task = ( - "Generate a person's information based on the following schema:" -) - -# Create an instance of the ToolAgent class -agent = ToolAgent( - name="dolly-function-agent", - description="Ana gent to create a child data", - model=model, - tokenizer=tokenizer, - json_schema=tool_schema, -) - -# Run the agent to generate the person's information -generated_data = agent.run(task) - -# Print the generated data -print(f"Generated data: {generated_data}") - -``` - - -### `Worker` -The `Worker` is a simple all-in-one agent equipped with an LLM, tools, and RAG for low level tasks. - -✅ Plug in and Play LLM. Utilize any LLM from anywhere and any framework - -✅ Reliable RAG: Utilizes FAISS for efficient RAG but it's modular so you can use any DB. - -✅ Multi-Step Parallel Function Calling: Use any tool - -```python -# Importing necessary modules -import os - -from dotenv import load_dotenv - -from swarms import OpenAIChat, Worker, tool - -# Loading environment variables from .env file -load_dotenv() - -# Retrieving the OpenAI API key from environment variables -api_key = os.getenv("OPENAI_API_KEY") - - -# Create a tool -@tool -def search_api(query: str): - pass - - -# Creating a Worker instance -worker = Worker( - name="My Worker", - role="Worker", - human_in_the_loop=False, - tools=[search_api], - temperature=0.5, - llm=OpenAIChat(openai_api_key=api_key), -) - -# Running the worker with a prompt -out = worker.run("Hello, how are you? Create an image of how your are doing!") - -# Printing the output -print(out) -``` - ------- - - # `Agent` with Long Term Memory `Agent` equipped with quasi-infinite long term memory. Great for long document understanding, analysis, and retrieval. ```python -from swarms import Agent, ChromaDB, OpenAIChat +from swarms import Agent, OpenAIChat +from playground.memory.chromadb_example import ChromaDB # Copy and paste the code and put it in your own local directory. # Making an instance of the ChromaDB class memory = ChromaDB( @@ -208,7 +109,7 @@ print(out) An LLM equipped with long term memory and tools, a full stack agent capable of automating all and any digital tasks given a good prompt. ```python -from swarms import Agent, ChromaDB, OpenAIChat, tool +from swarms import Agent, ChromaDB, OpenAIChat # Making an instance of the ChromaDB class memory = ChromaDB( @@ -219,7 +120,6 @@ memory = ChromaDB( ) # Initialize a tool -@tool def search_api(query: str): # Add your logic here return query @@ -248,189 +148,6 @@ print(out) ``` - - - - - - - ----- - -### `SequentialWorkflow` -Sequential Workflow enables you to sequentially execute tasks with `Agent` and then pass the output into the next agent and onwards until you have specified your max loops. `SequentialWorkflow` is wonderful for real-world business tasks like sending emails, summarizing documents, and analyzing data. - - -✅ Save and Restore Workflow states! - -✅ Multi-Modal Support for Visual Chaining - -✅ Utilizes Agent class - -```python -import os - -from dotenv import load_dotenv - -from swarms import Agent, OpenAIChat, SequentialWorkflow - -load_dotenv() - -# Load the environment variables -api_key = os.getenv("OPENAI_API_KEY") - - -# Initialize the language agent -llm = OpenAIChat( - temperature=0.5, model_name="gpt-4", openai_api_key=api_key, max_tokens=4000 -) - - -# Initialize the agent with the language agent -agent1 = Agent(llm=llm, max_loops=1) - -# Create another agent for a different task -agent2 = Agent(llm=llm, max_loops=1) - -# Create another agent for a different task -agent3 = Agent(llm=llm, max_loops=1) - -# Create the workflow -workflow = SequentialWorkflow(max_loops=1) - -# Add tasks to the workflow -workflow.add( - agent1, - "Generate a 10,000 word blog on health and wellness.", -) - -# Suppose the next task takes the output of the first task as input -workflow.add( - agent2, - "Summarize the generated blog", -) - -# Run the workflow -workflow.run() - -# Output the results -for task in workflow.tasks: - print(f"Task: {task.description}, Result: {task.result}") -``` - - - -### `ConcurrentWorkflow` -`ConcurrentWorkflow` runs all the tasks all at the same time with the inputs you give it! - - -```python -import os - -from dotenv import load_dotenv - -from swarms import Agent, ConcurrentWorkflow, OpenAIChat, Task - -# Load environment variables from .env file -load_dotenv() - -# Load environment variables -llm = OpenAIChat(openai_api_key=os.getenv("OPENAI_API_KEY")) -agent = Agent(llm=llm, max_loops=1) - -# Create a workflow -workflow = ConcurrentWorkflow(max_workers=5) - -# Create tasks -task1 = Task(agent, "What's the weather in miami") -task2 = Task(agent, "What's the weather in new york") -task3 = Task(agent, "What's the weather in london") - -# Add tasks to the workflow -workflow.add(tasks=[task1, task2, task3]) - -# Run the workflow -workflow.run() -``` - -### `RecursiveWorkflow` -`RecursiveWorkflow` will keep executing the tasks until a specific token like is located inside the text! - -```python -import os - -from dotenv import load_dotenv - -from swarms import Agent, OpenAIChat, RecursiveWorkflow, Task - -# Load environment variables from .env file -load_dotenv() - -# Load environment variables -llm = OpenAIChat(openai_api_key=os.getenv("OPENAI_API_KEY")) -agent = Agent(llm=llm, max_loops=1) - -# Create a workflow -workflow = RecursiveWorkflow(stop_token="") - -# Create tasks -task1 = Task(agent, "What's the weather in miami") -task2 = Task(agent, "What's the weather in new york") -task3 = Task(agent, "What's the weather in london") - -# Add tasks to the workflow -workflow.add(task1) -workflow.add(task2) -workflow.add(task3) - -# Run the workflow -workflow.run() -``` - - - -### `ModelParallelizer` -The ModelParallelizer allows you to run multiple models concurrently, comparing their outputs. This feature enables you to easily compare the performance and results of different models, helping you make informed decisions about which model to use for your specific task. - -Plug-and-Play Integration: The structure provides a seamless integration with various models, including OpenAIChat, Anthropic, Mixtral, and Gemini. You can easily plug in any of these models and start using them without the need for extensive modifications or setup. - - -```python -import os - -from dotenv import load_dotenv - -from swarms import Anthropic, Gemini, Mixtral, ModelParallelizer, OpenAIChat - -load_dotenv() - -# API Keys -anthropic_api_key = os.getenv("ANTHROPIC_API_KEY") -openai_api_key = os.getenv("OPENAI_API_KEY") -gemini_api_key = os.getenv("GEMINI_API_KEY") - -# Initialize the models -llm = OpenAIChat(openai_api_key=openai_api_key) -anthropic = Anthropic(anthropic_api_key=anthropic_api_key) -mixtral = Mixtral() -gemini = Gemini(gemini_api_key=gemini_api_key) - -# Initialize the parallelizer -llms = [llm, anthropic, mixtral, gemini] -parallelizer = ModelParallelizer(llms) - -# Set the task -task = "Generate a 10,000 word blog on health and wellness." - -# Run the task -out = parallelizer.run(task) - -# Print the responses 1 by 1 -for i in range(len(out)): - print(f"Response from LLM {i}: {out[i]}") -``` - - ### Simple Conversational Agent A Plug in and play conversational agent with `GPT4`, `Mixytral`, or any of our models @@ -463,11 +180,11 @@ agent("Generate a transcript for a youtube video on what swarms are!") ``` ## Devin -Implementation of Devil in less than 90 lines of code with several tools: +Implementation of Devin in less than 90 lines of code with several tools: terminal, browser, and edit files! ```python -from swarms import Agent, Anthropic, tool +from swarms import Agent, Anthropic import subprocess # Model @@ -476,7 +193,6 @@ llm = Anthropic( ) # Tools -@tool def terminal( code: str, ): @@ -494,8 +210,6 @@ def terminal( ).stdout return str(out) - -@tool def browser(query: str): """ Search the query in the browser with the `browser` tool. @@ -512,7 +226,6 @@ def browser(query: str): webbrowser.open(url) return f"Searching for {query} in the browser." -@tool def create_file(file_path: str, content: str): """ Create a file using the file editor tool. @@ -528,7 +241,6 @@ def create_file(file_path: str, content: str): file.write(content) return f"File {file_path} created successfully." -@tool def file_editor(file_path: str, mode: str, content: str): """ Edit a file using the file editor tool. @@ -558,85 +270,267 @@ agent = Agent( max_loops="auto", autosave=True, dashboard=False, - streaming_on=True, - verbose=True, - stopping_token="", - interactive=True, - tools=[terminal, browser, file_editor, create_file], - code_interpreter=True, - # streaming=True, + streaming_on=True, + verbose=True, + stopping_token="", + interactive=True, + tools=[terminal, browser, file_editor, create_file], + code_interpreter=True, + # streaming=True, +) + +# Run the agent +out = agent("Create a new file for a plan to take over the world.") +print(out) +``` + + +## `Agent`with Pydantic BaseModel as Output Type +The following is an example of an agent that intakes a pydantic basemodel and outputs it at the same time: + +```python +from pydantic import BaseModel, Field +from swarms import Anthropic, Agent + + +# Initialize the schema for the person's information +class Schema(BaseModel): + name: str = Field(..., title="Name of the person") + agent: int = Field(..., title="Age of the person") + is_student: bool = Field(..., title="Whether the person is a student") + courses: list[str] = Field( + ..., title="List of courses the person is taking" + ) + + +# Convert the schema to a JSON string +tool_schema = Schema( + name="Tool Name", + agent=1, + is_student=True, + courses=["Course1", "Course2"], +) + +# Define the task to generate a person's information +task = "Generate a person's information based on the following schema:" + +# Initialize the agent +agent = Agent( + agent_name="Person Information Generator", + system_prompt=( + "Generate a person's information based on the following schema:" + ), + # Set the tool schema to the JSON string -- this is the key difference + tool_schema=tool_schema, + llm=Anthropic(), + max_loops=3, + autosave=True, + dashboard=False, + streaming_on=True, + verbose=True, + interactive=True, + # Set the output type to the tool schema which is a BaseModel + output_type=tool_schema, # or dict, or str + metadata_output_type="json", + # List of schemas that the agent can handle + list_tool_schemas=[tool_schema], + function_calling_format_type="OpenAI", + function_calling_type="json", # or soon yaml +) + +# Run the agent to generate the person's information +generated_data = agent.run(task) + +# Print the generated data +print(f"Generated data: {generated_data}") + + +``` + + +### `ToolAgent` +ToolAgent is an agent that can use tools through JSON function calling. It intakes any open source model from huggingface and is extremely modular and plug in and play. We need help adding general support to all models soon. + + +```python +from pydantic import BaseModel, Field +from transformers import AutoModelForCausalLM, AutoTokenizer + +from swarms import ToolAgent +from swarms.utils.json_utils import base_model_to_json + +# Load the pre-trained model and tokenizer +model = AutoModelForCausalLM.from_pretrained( + "databricks/dolly-v2-12b", + load_in_4bit=True, + device_map="auto", +) +tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-12b") + + +# Initialize the schema for the person's information +class Schema(BaseModel): + name: str = Field(..., title="Name of the person") + agent: int = Field(..., title="Age of the person") + is_student: bool = Field( + ..., title="Whether the person is a student" + ) + courses: list[str] = Field( + ..., title="List of courses the person is taking" + ) + + +# Convert the schema to a JSON string +tool_schema = base_model_to_json(Schema) + +# Define the task to generate a person's information +task = ( + "Generate a person's information based on the following schema:" +) + +# Create an instance of the ToolAgent class +agent = ToolAgent( + name="dolly-function-agent", + description="Ana gent to create a child data", + model=model, + tokenizer=tokenizer, + json_schema=tool_schema, +) + +# Run the agent to generate the person's information +generated_data = agent.run(task) + +# Print the generated data +print(f"Generated data: {generated_data}") + +``` + + + + + + + +---- + +### `SequentialWorkflow` +Sequential Workflow enables you to sequentially execute tasks with `Agent` and then pass the output into the next agent and onwards until you have specified your max loops. `SequentialWorkflow` is wonderful for real-world business tasks like sending emails, summarizing documents, and analyzing data. + + +✅ Save and Restore Workflow states! + +✅ Multi-Modal Support for Visual Chaining + +✅ Utilizes Agent class + +```python +from swarms import Agent, SequentialWorkflow, Anthropic + + +# Initialize the language model agent (e.g., GPT-3) +llm = Anthropic() + +# Initialize agents for individual tasks +agent1 = Agent( + agent_name="Blog generator", + system_prompt="Generate a blog post like stephen king", + llm=llm, + max_loops=1, + dashboard=False, + tools=[], +) +agent2 = Agent( + agent_name="summarizer", + system_prompt="Sumamrize the blog post", + llm=llm, + max_loops=1, + dashboard=False, + tools=[], +) + +# Create the Sequential workflow +workflow = SequentialWorkflow( + agents=[agent1, agent2], max_loops=1, verbose=False +) + +# Run the workflow +workflow.run( + "Generate a blog post on how swarms of agents can help businesses grow." ) -# Run the agent -out = agent("Create a new file for a plan to take over the world.") -print(out) ``` -## `Agent`with Pydantic BaseModel as Output Type -The following is an example of an agent that intakes a pydantic basemodel and outputs it at the same time: + +### `ConcurrentWorkflow` +`ConcurrentWorkflow` runs all the tasks all at the same time with the inputs you give it! + ```python -from pydantic import BaseModel, Field -from swarms import Anthropic -from swarms import Agent +import os +from dotenv import load_dotenv -# Initialize the schema for the person's information -class Schema(BaseModel): - name: str = Field(..., title="Name of the person") - agent: int = Field(..., title="Age of the person") - is_student: bool = Field(..., title="Whether the person is a student") - courses: list[str] = Field( - ..., title="List of courses the person is taking" - ) +from swarms import Agent, ConcurrentWorkflow, OpenAIChat, Task +# Load environment variables from .env file +load_dotenv() -# Convert the schema to a JSON string -tool_schema = Schema( - name="Tool Name", - agent=1, - is_student=True, - courses=["Course1", "Course2"], -) +# Load environment variables +llm = OpenAIChat(openai_api_key=os.getenv("OPENAI_API_KEY")) +agent = Agent(llm=llm, max_loops=1) -# Define the task to generate a person's information -task = "Generate a person's information based on the following schema:" +# Create a workflow +workflow = ConcurrentWorkflow(max_workers=5) -# Initialize the agent -agent = Agent( - agent_name="Person Information Generator", - system_prompt=( - "Generate a person's information based on the following schema:" - ), - # Set the tool schema to the JSON string -- this is the key difference - tool_schema=tool_schema, - llm=Anthropic(), - max_loops=3, - autosave=True, - dashboard=False, - streaming_on=True, - verbose=True, - interactive=True, - # Set the output type to the tool schema which is a BaseModel - output_type=tool_schema, # or dict, or str - metadata_output_type="json", - # List of schemas that the agent can handle - list_tool_schemas=[tool_schema], - function_calling_format_type="OpenAI", - function_calling_type="json", # or soon yaml -) +# Create tasks +task1 = Task(agent, "What's the weather in miami") +task2 = Task(agent, "What's the weather in new york") +task3 = Task(agent, "What's the weather in london") -# Run the agent to generate the person's information -generated_data = agent.run(task) +# Add tasks to the workflow +workflow.add(tasks=[task1, task2, task3]) -# Print the generated data -print(f"Generated data: {generated_data}") +# Run the workflow +workflow.run() +``` + +### `RecursiveWorkflow` +`RecursiveWorkflow` will keep executing the tasks until a specific token like is located inside the text! + +```python +import os + +from dotenv import load_dotenv + +from swarms import Agent, OpenAIChat, RecursiveWorkflow, Task + +# Load environment variables from .env file +load_dotenv() + +# Load environment variables +llm = OpenAIChat(openai_api_key=os.getenv("OPENAI_API_KEY")) +agent = Agent(llm=llm, max_loops=1) + +# Create a workflow +workflow = RecursiveWorkflow(stop_token="") + +# Create tasks +task1 = Task(agent, "What's the weather in miami") +task2 = Task(agent, "What's the weather in new york") +task3 = Task(agent, "What's the weather in london") +# Add tasks to the workflow +workflow.add(task1) +workflow.add(task2) +workflow.add(task3) +# Run the workflow +workflow.run() ``` + ### `SwarmNetwork` `SwarmNetwork` provides the infrasturcture for building extremely dense and complex multi-agent applications that span across various types of agents. @@ -763,106 +657,6 @@ print(f"Task result: {task.result}") -### `BlockList` -- Modularity and Flexibility: BlocksList allows users to create custom swarms by adding or removing different classes or functions as blocks. This means users can easily tailor the functionality of their swarm to suit their specific needs. - -- Ease of Management: With methods to add, remove, update, and retrieve blocks, BlocksList provides a straightforward way to manage the components of a swarm. This makes it easier to maintain and update the swarm over time. - -- Enhanced Searchability: BlocksList offers methods to get blocks by various attributes such as name, type, ID, and parent-related properties. This makes it easier for users to find and work with specific blocks in a large and complex swarm. - -```python -import os - -from dotenv import load_dotenv -from transformers import AutoModelForCausalLM, AutoTokenizer -from pydantic import BaseModel -from swarms import BlocksList, Gemini, GPT4VisionAPI, Mixtral, OpenAI, ToolAgent - -# Load the environment variables -load_dotenv() - -# Get the environment variables -openai_api_key = os.getenv("OPENAI_API_KEY") -gemini_api_key = os.getenv("GEMINI_API_KEY") - -# Tool Agent -model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-12b") -tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-12b") - -# Initialize the schema for the person's information -class Schema(BaseModel): - name: str = Field(..., title="Name of the person") - agent: int = Field(..., title="Age of the person") - is_student: bool = Field( - ..., title="Whether the person is a student" - ) - courses: list[str] = Field( - ..., title="List of courses the person is taking" - ) - -# Convert the schema to a JSON string -json_schema = base_model_to_json(Schema) - - -toolagent = ToolAgent(model=model, tokenizer=tokenizer, json_schema=json_schema) - -# Blocks List which enables you to build custom swarms by adding classes or functions -swarm = BlocksList( - "SocialMediaSwarm", - "A swarm of social media agents", - [ - OpenAI(openai_api_key=openai_api_key), - Mixtral(), - GPT4VisionAPI(openai_api_key=openai_api_key), - Gemini(gemini_api_key=gemini_api_key), - ], -) - - -# Add the new block to the swarm -swarm.add(toolagent) - -# Remove a block from the swarm -swarm.remove(toolagent) - -# Update a block in the swarm -swarm.update(toolagent) - -# Get a block at a specific index -block_at_index = swarm.get(0) - -# Get all blocks in the swarm -all_blocks = swarm.get_all() - -# Get blocks by name -openai_blocks = swarm.get_by_name("OpenAI") - -# Get blocks by type -gpt4_blocks = swarm.get_by_type("GPT4VisionAPI") - -# Get blocks by ID -block_by_id = swarm.get_by_id(toolagent.id) - -# Get blocks by parent -blocks_by_parent = swarm.get_by_parent(swarm) - -# Get blocks by parent ID -blocks_by_parent_id = swarm.get_by_parent_id(swarm.id) - -# Get blocks by parent name -blocks_by_parent_name = swarm.get_by_parent_name(swarm.name) - -# Get blocks by parent type -blocks_by_parent_type = swarm.get_by_parent_type(type(swarm).__name__) - -# Get blocks by parent description -blocks_by_parent_description = swarm.get_by_parent_description(swarm.description) - -# Run the block in the swarm -inference = swarm.run_block(toolagent, "Hello World") -print(inference) -``` - ## Majority Voting Multiple-agents will evaluate an idea based off of an parsing or evaluation function. From papers like "[More agents is all you need](https://arxiv.org/pdf/2402.05120.pdf) @@ -1177,88 +971,95 @@ autoswarm.run("Analyze these financial data and give me a summary") ``` ## `AgentRearrange` -Inspired by Einops and einsum, this orchestration techniques enables you to map out the relationships between various agents. For example you specify linear and sequential relationships like `a -> a1 -> a2 -> a3` or concurrent relationships where the first agent will send a message to 3 agents all at once: `a -> a1, a2, a3`. You can customize your workflow to mix sequential and concurrent relationships +Inspired by Einops and einsum, this orchestration techniques enables you to map out the relationships between various agents. For example you specify linear and sequential relationships like `a -> a1 -> a2 -> a3` or concurrent relationships where the first agent will send a message to 3 agents all at once: `a -> a1, a2, a3`. You can customize your workflow to mix sequential and concurrent relationships. [Docs Available:](https://swarms.apac.ai/en/latest/swarms/structs/agent_rearrange/) ```python -from swarms import Agent, Anthropic, AgentRearrange, +from swarms import Agent, AgentRearrange, rearrange, Anthropic -## Initialize the workflow -agent = Agent( - agent_name="t", - agent_description=( - "Generate a transcript for a youtube video on what swarms" - " are!" - ), - system_prompt=( - "Generate a transcript for a youtube video on what swarms" - " are!" - ), + +# Initialize the director agent + +director = Agent( + agent_name="Director", + system_prompt="Directs the tasks for the workers", llm=Anthropic(), max_loops=1, - autosave=True, dashboard=False, streaming_on=True, verbose=True, stopping_token="", + state_save_file_type="json", + saved_state_path="director.json", ) -agent2 = Agent( - agent_name="t1", - agent_description=( - "Generate a transcript for a youtube video on what swarms" - " are!" - ), + +# Initialize worker 1 + +worker1 = Agent( + agent_name="Worker1", + system_prompt="Generates a transcript for a youtube video on what swarms are", llm=Anthropic(), max_loops=1, - system_prompt="Summarize the transcript", - autosave=True, dashboard=False, streaming_on=True, verbose=True, stopping_token="", + state_save_file_type="json", + saved_state_path="worker1.json", ) -agent3 = Agent( - agent_name="t2", - agent_description=( - "Generate a transcript for a youtube video on what swarms" - " are!" - ), + +# Initialize worker 2 +worker2 = Agent( + agent_name="Worker2", + system_prompt="Summarizes the transcript generated by Worker1", llm=Anthropic(), max_loops=1, - system_prompt="Finalize the transcript", - autosave=True, dashboard=False, streaming_on=True, verbose=True, stopping_token="", + state_save_file_type="json", + saved_state_path="worker2.json", ) -# Rearrange the agents -rearrange = AgentRearrange( - agents=[agent, agent2, agent3], - verbose=True, - # custom_prompt="Summarize the transcript", +# Create a list of agents +agents = [director, worker1, worker2] + +# Define the flow pattern +flow = "Director -> Worker1 -> Worker2" + +# Using AgentRearrange class +agent_system = AgentRearrange(agents=agents, flow=flow) +output = agent_system.run( + "Create a format to express and communicate swarms of llms in a structured manner for youtube" ) +print(output) -# Run the workflow on a task -results = rearrange( - # pattern="t -> t1, t2 -> t2", - pattern="t -> t1 -> t2", - default_task=( - "Generate a transcript for a YouTube video on what swarms" - " are!" - ), - t="Generate a transcript for a YouTube video on what swarms are!", - # t2="Summarize the transcript", - # t3="Finalize the transcript", + +# Using rearrange function +output = rearrange( + agents, + flow, + "Create a format to express and communicate swarms of llms in a structured manner for youtube", ) -# print(results) +print(output) ``` +## `HierarhicalSwarm` +Coming soon... + + +## `AgentLoadBalancer` +Coming soon... + + +## `GraphSwarm` +Coming soon... + --- @@ -1267,6 +1068,26 @@ Documentation is located here at: [swarms.apac.ai](https://swarms.apac.ai) ---- +## Folder Structure +The swarms package has been meticlously crafted for extreme use-ability and understanding, the swarms package is split up into various modules such as `swarms.agents` that holds pre-built agents, `swarms.structs` that holds a vast array of structures like `Agent` and multi agent structures. The 3 most important are `structs`, `models`, and `agents`. + +```sh +├── __init__.py +├── agents +├── artifacts +├── memory +├── schemas +├── models +├── prompts +├── structs +├── telemetry +├── tools +├── utils +└── workers +``` + +---- + ## 🫶 Contributions: The easiest way to contribute is to pick any issue with the `good first issue` tag 💪. Read the Contributing guidelines [here](/CONTRIBUTING.md). Bug Report? [File here](https://github.com/swarms/gateway/issues) | Feature Request? [File here](https://github.com/swarms/gateway/issues) @@ -1297,68 +1118,33 @@ Join our growing community around the world, for real-time support, ideas, and d Book a discovery call to learn how Swarms can lower your operating costs by 40% with swarms of autonomous agents in lightspeed. [Click here to book a time that works for you!](https://calendly.com/swarm-corp/30min?month=2023-11) - ## Accelerate Backlog -Help us accelerate our backlog by supporting us financially! Note, we're an open source corporation and so all the revenue we generate is through donations at the moment ;) +Accelerate Bugs, Features, and Demos to implement by supporting us here: -## File Structure -The swarms package has been meticlously crafted for extreme use-ability and understanding, the swarms package is split up into various modules such as `swarms.agents` that holds pre-built agents, `swarms.structs` that holds a vast array of structures like `Agent` and multi agent structures. The 3 most important are `structs`, `models`, and `agents`. - -```sh -├── __init__.py -├── agents -├── artifacts -├── chunkers -├── cli -├── loaders -├── memory -├── models -├── prompts -├── structs -├── telemetry -├── tokenizers -├── tools -├── utils -└── workers -``` - ## Docker Instructions - -This application uses Docker with CUDA support. To build and run the Docker container, follow these steps: - -### Prerequisites - -- Make sure you have [Docker installed](https://docs.docker.com/get-docker/) on your machine. -- Ensure your machine has an NVIDIA GPU and [NVIDIA Docker support](https://github.com/NVIDIA/nvidia-docker) installed. - -### Building the Docker Image - -To build the Docker image, navigate to the root directory containing the `Dockerfile` and run the following command: - -```bash -docker build --gpus all -t swarms -``` -### Running the Docker Container -To run the Docker container, use the following command: - -`docker run --gpus all -p 4000:80 swarms` - -Replace swarms with the name of your Docker image, and replace 4000:80 with your actual port mapping. The format is hostPort:containerPort. - -Now, your application should be running with CUDA support! +- [Learn More Here About Deployments In Docker](https://swarms.apac.ai/en/latest/docker_setup/) ## Swarm Newsletter 🤖 🤖 🤖 📧 Sign up to the Swarm newsletter to receive updates on the latest Autonomous agent research papers, step by step guides on creating multi-agent app, and much more Swarmie goodiness 😊 - [CLICK HERE TO SIGNUP](https://docs.google.com/forms/d/e/1FAIpQLSfqxI2ktPR9jkcIwzvHL0VY6tEIuVPd-P2fOWKnd6skT9j1EQ/viewform?usp=sf_link) # License Apache License - - +# Citation +Please cite Swarms in your paper or your project if you found it beneficial in any way! Appreciate you. + +```bibtex +@misc{swarms, + author = {Gomez, Kye}, + title = {{Swarms: The Multi-Agent Collaboration Framework}}, + howpublished = {\url{https://github.com/kyegomez/swarms}}, + year = {2023}, + note = {Accessed: Date} +} +``` diff --git a/docs/swarms/models/gpt4o.md b/docs/swarms/models/gpt4o.md new file mode 100644 index 00000000..7b53a742 --- /dev/null +++ b/docs/swarms/models/gpt4o.md @@ -0,0 +1,150 @@ +# Documentation for GPT4o Module + +## Overview and Introduction + +The `GPT4o` module is a multi-modal conversational model based on OpenAI's GPT-4 architecture. It extends the functionality of the `BaseMultiModalModel` class, enabling it to handle both text and image inputs for generating diverse and contextually rich responses. This module leverages the power of the GPT-4 model to enhance interactions by integrating visual information with textual prompts, making it highly relevant for applications requiring multi-modal understanding and response generation. + +### Key Concepts +- **Multi-Modal Model**: A model that can process and generate responses based on multiple types of inputs, such as text and images. +- **System Prompt**: A predefined prompt to guide the conversation flow. +- **Temperature**: A parameter that controls the randomness of the response generation. +- **Max Tokens**: The maximum number of tokens (words or word pieces) in the generated response. + +## Class Definition + +### `GPT4o` Class + + +### Parameters + +| Parameter | Type | Description | +|-----------------|--------|--------------------------------------------------------------------------------------| +| `system_prompt` | `str` | The system prompt to be used in the conversation. | +| `temperature` | `float`| The temperature parameter for generating diverse responses. Default is `0.1`. | +| `max_tokens` | `int` | The maximum number of tokens in the generated response. Default is `300`. | +| `openai_api_key`| `str` | The API key for accessing the OpenAI GPT-4 API. | +| `*args` | | Additional positional arguments. | +| `**kwargs` | | Additional keyword arguments. | + +## Functionality and Usage + +### `encode_image` Function + +The `encode_image` function is used to encode an image file into a base64 string format, which can then be included in the request to the GPT-4 API. + +#### Parameters + +| Parameter | Type | Description | +|---------------|--------|----------------------------------------------| +| `image_path` | `str` | The local path to the image file to be encoded. | + +#### Returns + +| Return Type | Description | +|-------------|---------------------------------| +| `str` | The base64 encoded string of the image. | + +### `GPT4o.__init__` Method + +The constructor for the `GPT4o` class initializes the model with the specified parameters and sets up the OpenAI client. + +### `GPT4o.run` Method + +The `run` method executes the GPT-4o model to generate a response based on the provided task and optional image. + +#### Parameters + +| Parameter | Type | Description | +|---------------|--------|----------------------------------------------------| +| `task` | `str` | The task or user prompt for the conversation. | +| `local_img` | `str` | The local path to the image file. | +| `img` | `str` | The URL of the image. | +| `*args` | | Additional positional arguments. | +| `**kwargs` | | Additional keyword arguments. | + +#### Returns + +| Return Type | Description | +|-------------|--------------------------------------------------| +| `str` | The generated response from the GPT-4o model. | + +## Usage Examples + +### Example 1: Basic Text Prompt + +```python +from swarms import GPT4o + +# Initialize the model +model = GPT4o( + system_prompt="You are a helpful assistant.", + temperature=0.7, + max_tokens=150, + openai_api_key="your_openai_api_key" +) + +# Define the task +task = "What is the capital of France?" + +# Generate response +response = model.run(task) +print(response) +``` + +### Example 2: Text Prompt with Local Image + +```python +from swarms import GPT4o + +# Initialize the model +model = GPT4o( + system_prompt="Describe the image content.", + temperature=0.5, + max_tokens=200, + openai_api_key="your_openai_api_key" +) + +# Define the task and image path +task = "Describe the content of this image." +local_img = "path/to/your/image.jpg" + +# Generate response +response = model.run(task, local_img=local_img) +print(response) +``` + +### Example 3: Text Prompt with Image URL + +```python +from swarms import GPT4o + +# Initialize the model +model = GPT4o( + system_prompt="You are a visual assistant.", + temperature=0.6, + max_tokens=250, + openai_api_key="your_openai_api_key" +) + +# Define the task and image URL +task = "What can you tell about the scenery in this image?" +img_url = "http://example.com/image.jpg" + +# Generate response +response = model.run(task, img=img_url) +print(response) +``` + +## Additional Information and Tips + +- **API Key Management**: Ensure that your OpenAI API key is securely stored and managed. Do not hard-code it in your scripts. Use environment variables or secure storage solutions. +- **Image Encoding**: The `encode_image` function is crucial for converting images to a base64 format suitable for API requests. Ensure that the images are accessible and properly formatted. +- **Temperature Parameter**: Adjust the `temperature` parameter to control the creativity of the model's responses. Lower values make the output more deterministic, while higher values increase randomness. +- **Token Limit**: Be mindful of the `max_tokens` parameter to avoid exceeding the API's token limits. This parameter controls the length of the generated responses. + +## References and Resources + +- [OpenAI API Documentation](https://beta.openai.com/docs/) +- [Python Base64 Encoding](https://docs.python.org/3/library/base64.html) +- [dotenv Documentation](https://saurabh-kumar.com/python-dotenv/) +- [BaseMultiModalModel Documentation](https://swarms.apac.ai) \ No newline at end of file