From 403240fecc186e5f4306a2895ac3200622de2771 Mon Sep 17 00:00:00 2001 From: Kye Date: Wed, 11 Oct 2023 14:26:34 -0400 Subject: [PATCH] worker guide Former-commit-id: 917ad28b63f9c5efab2f4642c3d595802f8dcf76 --- docs/examples/omni_agent.md | 174 ++++++++++++++++----------- docs/examples/worker.md | 98 +++++++++++++++ mkdocs.yml | 1 + swarms/structs/nonlinear_workflow.py | 10 ++ swarms/structs/workflow.py | 1 - 5 files changed, 213 insertions(+), 71 deletions(-) create mode 100644 docs/examples/worker.md diff --git a/docs/examples/omni_agent.md b/docs/examples/omni_agent.md index de5985e2..f3a9b3cf 100644 --- a/docs/examples/omni_agent.md +++ b/docs/examples/omni_agent.md @@ -1,71 +1,105 @@ -# A Comprehensive Guide to Setting Up OmniWorker: Your Passport to Multimodal Tasks** - -**Introduction** -- Introduction to OmniWorker -- Explanation of its use-cases and importance in multimodal tasks -- Mention of prerequisites: Git, Python 3.x, Terminal or Command Prompt access - -**Chapter 1: Cloning the Necessary Repository** -- Explanation of Git and its use in version control -- Step-by-step guide on how to clone the OmniWorker repository - ```bash - !git clone https://github.com/kyegomez/swarms - ``` - -**Chapter 2: Navigating to the Cloned Directory** -- Explanation of directory navigation in the terminal - ```bash - %cd /swarms - ``` - -**Chapter 3: Installing the Required Dependencies** -- Explanation of Python dependencies and the purpose of `requirements.txt` file -- Step-by-step installation of dependencies - ```bash - !pip install -r requirements.txt - ``` - -**Chapter 4: Installing Additional Dependencies** -- Discussion on the additional dependencies and their roles in OmniWorker - ```bash - !pip install git+https://github.com/IDEA-Research/GroundingDINO.git - !pip install git+https://github.com/facebookresearch/segment-anything.git - !pip install faiss-gpu - !pip install langchain-experimental - ``` - -**Chapter 5: Setting Up Your OpenAI API Key** -- Explanation of OpenAI API and its key -- Guide on how to obtain and set up the OpenAI API key - ```bash - !export OPENAI_API_KEY="your-api-key" - ``` - -**Chapter 6: Running the OmniModal Agent Script** -- Discussion on the OmniModal Agent script and its functionality -- Guide on how to run the script - ```bash - !python3 omnimodal_agent.py - ``` - -**Chapter 7: Importing the Necessary Modules** -- Discussion on Python modules and their importance -- Step-by-step guide on importing necessary modules for OmniWorker - ```python - from langchain.llms import OpenAIChat - from swarms.agents import OmniModalAgent - ``` - -**Chapter 8: Creating and Running OmniModalAgent Instance** -- Explanation of OmniModalAgent instance and its role -- Guide on how to create and run OmniModalAgent instance - ```python - llm = OpenAIChat() - agent = OmniModalAgent(llm) - agent.run("Create a video of a swarm of fish") - ``` - -**Conclusion** -- Recap of the steps taken to set up OmniWorker -- Encouragement to explore more functionalities and apply OmniWorker to various multimodal tasks +# **OmniModalAgent from Swarms: A Comprehensive Starting Guide** +--- + +**Table of Contents** + +1. Introduction: The OmniModal Magic +2. The Mechanics: Unraveling the Underpinnings +3. The Installation Adventure: Setting the Stage +4. Practical Examples: Let’s Get Our Hands Dirty! +5. Error Handling: Because Bumps on the Road are Inevitable +6. Dive Deeper: Advanced Features and Usage +7. Wrapping Up: The Road Ahead + +--- + +**1. Introduction: The OmniModal Magic** + +Imagine a world where you could communicate seamlessly across any modality, be it text, image, speech, or even video. Now, stop imagining because OmniModalAgent is here to turn that dream into reality. By leveraging advanced architecture and state-of-the-art tools, it can understand and generate any modality you can think of! + +--- + +**2. The Mechanics: Unraveling the Underpinnings** + +Dive into the world of OmniModalAgent and let’s decipher how it works: + +- **LLM (Language Model)**: It’s the brain behind understanding and generating language-based interactions. +- **Chat Planner**: Think of it as the strategist. It lays out the plan for the user's input. +- **Task Executor**: The doer. Once the plan is ready, this component takes charge to execute tasks. +- **Tools**: A treasure chest full of tools, from image captioning to translation. + +--- + +**3. The Installation Adventure: Setting the Stage** + +Getting OmniModalAgent up and running is as easy as pie. Ready to bake? + +```bash +pip install swarms +``` + +And voilà, your oven (system) is now equipped to bake any modality cake you desire! + +--- + +**4. Practical Examples: Let’s Get Our Hands Dirty!** + +Let’s embark on an exciting journey with OmniModalAgent: + +**i. Basic Interaction**: + +```python +from swarms import OmniModalAgent, OpenAIChat + +llm = OpenAIChat() +agent = OmniModalAgent(llm) +response = agent.run("Hello, how are you? Create an image of how you are doing!") +print(response) +``` + +**ii. Dive into a Conversation**: + +```python +agent = OmniModalAgent(llm) +print(agent.chat("What's the weather like?")) +``` + +--- + +**5. Error Handling: Because Bumps on the Road are Inevitable** + +Errors are like rain, unpredictable but inevitable. Luckily, OmniModalAgent comes with an umbrella. If there's a hiccup during message processing, it’s gracious enough to let you know. + +For instance, if there's a bump, you’ll receive: + +```python +Error processing message: [Details of the error] +``` + +--- + +**6. Dive Deeper: Advanced Features and Usage** + +The power of OmniModalAgent isn’t just limited to basic interactions. Here’s a sneak peek into its advanced capabilities: + +**Streaming Responses**: + +Imagine receiving responses as a gentle stream rather than a sudden splash. With the `_stream_response` method, you can achieve just that. + +```python +for token in agent._stream_response(response): + print(token) +``` + +**The Treasure Chest: Tools**: + +OmniModalAgent boasts a plethora of tools, from image captioning to speech-to-text. When you initialize the agent, it equips itself with these tools, ready to tackle any challenge you throw its way. + +--- + +**7. Wrapping Up: The Road Ahead** + +You've just scratched the surface of what OmniModalAgent can do. As you explore deeper, you'll discover more of its magic. The world of multi-modality is vast, and with OmniModalAgent as your companion, there's no limit to where you can go. + +**Happy Exploring and Coding!** 🚀🎉 diff --git a/docs/examples/worker.md b/docs/examples/worker.md new file mode 100644 index 00000000..bcfaacdc --- /dev/null +++ b/docs/examples/worker.md @@ -0,0 +1,98 @@ +# **The Ultimate Guide to Mastering the `Worker` Class from Swarms** + +--- + +**Table of Contents** + +1. Introduction: Welcome to the World of the Worker +2. The Basics: What Does the Worker Do? +3. Installation: Setting the Stage +4. Dive Deep: Understanding the Architecture +5. Practical Usage: Let's Get Rolling! +6. Advanced Tips and Tricks +7. Handling Errors: Because We All Slip Up Sometimes +8. Beyond the Basics: Advanced Features and Customization +9. Conclusion: Taking Your Knowledge Forward + +--- + +**1. Introduction: Welcome to the World of the Worker** + +Greetings, future master of the `Worker`! Step into a universe where you can command an AI worker to perform intricate tasks, be it searching the vast expanse of the internet or crafting multi-modality masterpieces. Ready to embark on this thrilling journey? Let’s go! + +--- + +**2. The Basics: What Does the Worker Do?** + +The `Worker` is your personal AI assistant. Think of it as a diligent bee in a swarm, ready to handle complex tasks across various modalities, from text and images to audio and beyond. + +--- + +**3. Installation: Setting the Stage** + +Before we can call upon our Worker, we need to set the stage: + +```bash +pip install swarms +``` + +Voila! You’re now ready to summon your Worker. + +--- + +**4. Dive Deep: Understanding the Architecture** + +- **Language Model (LLM)**: The brain of our Worker. It understands and crafts intricate language-based responses. +- **Tools**: Think of these as the Worker's toolkit. They range from file tools, website querying, to even complex tasks like image captioning. +- **Memory**: No, our Worker doesn’t forget. It employs a sophisticated memory mechanism to remember past interactions and learn from them. + +--- + +**5. Practical Usage: Let's Get Rolling!** + +Here’s a simple way to invoke the Worker and give it a task: + +```python +from swarms import Worker + +node = Worker(ai_name="Optimus Prime") +task = "What were the winning boston marathon times for the past 5 years (ending in 2022)? Generate a table of the year, name, country of origin, and times." +response = node.run(task) +print(response) +``` + +The result? An agent with elegantly integrated tools and long term memories + +--- + +**6. Advanced Tips and Tricks** + +- **Streaming Responses**: Want your Worker to respond in a more dynamic fashion? Use the `_stream_response` method to get results token by token. +- **Human-in-the-Loop**: By setting `human_in_the_loop` to `True`, you can involve a human in the decision-making process, ensuring the best results. + +--- + +**7. Handling Errors: Because We All Slip Up Sometimes** + +Your Worker is designed to be robust. But if it ever encounters a hiccup, it's equipped to let you know. Error messages are crafted to be informative, guiding you on the next steps. + +--- + +**8. Beyond the Basics: Advanced Features and Customization** + +- **Custom Tools**: Want to expand the Worker's toolkit? Use the `external_tools` parameter to integrate your custom tools. +- **Memory Customization**: You can tweak the Worker's memory settings, ensuring it remembers what's crucial for your tasks. + +--- + +**9. Conclusion: Taking Your Knowledge Forward** + +Congratulations! You’re now well-equipped to harness the power of the `Worker` from Swarms. As you venture further, remember: the possibilities are endless, and with the Worker by your side, there’s no task too big! + +**Happy Coding and Exploring!** 🚀🎉 + +--- + +*Note*: This guide provides a stepping stone to the vast capabilities of the `Worker`. Dive into the official documentation for a deeper understanding and stay updated with the latest features. + +--- \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 8ab159a3..c27f86a1 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -97,6 +97,7 @@ nav: - Overview: "examples/index.md" - Agents: - OmniAgent: "examples/omni_agent.md" + - Worker: "examples/worker.md" - Applications: - CustomerSupport: - Overview: "applications/customer_support.md" diff --git a/swarms/structs/nonlinear_workflow.py b/swarms/structs/nonlinear_workflow.py index 33c9d5d3..c0cea1fc 100644 --- a/swarms/structs/nonlinear_workflow.py +++ b/swarms/structs/nonlinear_workflow.py @@ -4,6 +4,9 @@ from typing import Dict, List class Task: + """ + Task is a unit of work that can be executed by an agent + """ def __init__( self, id: str, parents: List["Task"] = None, children: List["Task"] = None ): @@ -12,9 +15,16 @@ class Task: self.children = children def can_execute(self): + """ + can_execute returns True if the task can be executed + """ raise NotImplementedError def execute(self): + """ + Execute the task + + """ raise NotImplementedError diff --git a/swarms/structs/workflow.py b/swarms/structs/workflow.py index f6753748..8aa3751e 100644 --- a/swarms/structs/workflow.py +++ b/swarms/structs/workflow.py @@ -22,7 +22,6 @@ class Workflow: workflow.run() - """ class Task: