Merge pull request #2 from kyegomez/master

Catching up 20240218
1 year ago · 41928d8f6b
parent abe3f65b40 c17b1cf54b
commit 41928d8f6b
11 changed files with 123 additions and 71 deletions
--- a/docs/swarms/structs/majorityvoting.md
+++ b/docs/swarms/structs/majorityvoting.md
@ -1,6 +1,4 @@
-Due to the limitations of this platform and the scope of your request, I am unable to create a full 10,000-word documentation here. However, I can provide a structured outline for a comprehensive documentation guide that you could expand upon offline.
-
-# swarms.structs Documentation
+# `MajorityVoting` Documentation

 ## Overview

@ -43,15 +41,6 @@ The `MajorityVoting` class is a high-level abstraction used to coordinate a grou

 ### Class Definition

-```python
-class MajorityVoting:
-    def __init__(self, agents, concurrent=False, multithreaded=False, multiprocess=False, asynchronous=False, output_parser=None, autosave=False, verbose=False, *args, **kwargs):
-        pass
-
-    def run(self, task, *args, **kwargs):
-        pass
-```
-
 ### Parameters

 | Parameter       | Type       | Default  | Description                                                          |
@ -118,21 +107,3 @@ result = majority_voting.run("What is the square root of 16?")
 print(result)  # Output: 4
 ```

-## Advanced Features
-
-Detailed instructions on how to use multithreading, multiprocessing, asynchronous execution, and how to parse the output with custom functions would be included in this section.
-
-## Troubleshooting and FAQ
-
-This section would cover common problems and questions related to the `swarms.structs` library.
-
-## Conclusion
-
-A summary of the `swarms.structs` library's capabilities and potential applications in various domains.
-
-## References
-
-Links to external documentation, source code repository, and any further reading regarding swarms or collective decision-making algorithms.
-
---
-**Note:** Expand on each section by incorporating explanations, additional code examples, and in-depth descriptions of how the underlying mechanisms work for each method and functionality provided by the `MajorityVoting` class. Consider adding visual aids such as flowcharts or diagrams where appropriate.
--- a/docs/swarms/structs/taskqueuebase.md
+++ b/docs/swarms/structs/taskqueuebase.md
@ -1,8 +1,5 @@
-Due to the limitations of the platform, it's not possible to create documentation as long and detailed as 10,000 words within a single response. However, I can provide you with an outline and the starting point for a comprehensive and professional documentation in markdown format for the `TaskQueueBase` class according to the steps provided. 

-Here is the template you can follow to expand upon:
-
-# swarms.structs Documentation
+# `TaskQueueBase`

 ## Introduction
 The `swarms.structs` library is a key component of a multi-agent system's task management infrastructure. It provides the necessary classes and methods to create and manage queues of tasks that can be distributed among a swarm of agents. The purpose of this documentation is to guide users through the proper use of the `TaskQueueBase` class, which serves as an abstract base class for implementing task queues.
@ -128,8 +125,3 @@ This section would provide insights on thread safety, error handling, and best p

 Links to further resources and any academic papers or external documentation related to task queues and multi-agent systems would be included here.

---
-
-Please note that this is just an outline of the structure and beginning of the documentation. For a full documentation, expand each section to include detail_sy examples, considerations for thread safety, performance implications, and subtleties of the implementation. You can also add a FAQ section, troubleshooting guide, and any benchmarks if available. 
-
-Remember, each method should be thoroughly explained with explicit examples that include handling successes and failures, as well as edge cases that might be encountered. The documentation should also consider various environments where the `TaskQueueBase` class may be used, such as different operating systems, and Python environments (i.e. CPython vs. PyPy).
--- a/docs/swarms/utils/pdf_to_text.md
+++ b/docs/swarms/utils/pdf_to_text.md
@ -1,7 +1,7 @@
 # pdf_to_text

 ## Introduction
-The function `pdf_to_text` is a Python utility for converting a PDF file into a string of text content. It leverages the `PyPDF2` library, an excellent Python library for processing PDF files. The function takes in a PDF file's path and reads its content, subsequently returning the extracted textual data. 
+The function `pdf_to_text` is a Python utility for converting a PDF file into a string of text content. It leverages the `pypdf` library, an excellent Python library for processing PDF files. The function takes in a PDF file's path and reads its content, subsequently returning the extracted textual data.

 This function can be very useful when you want to extract textual information from PDF files automatically. For instance, when processing a large number of documents, performing textual analysis, or when you're dealing with text data that is only available in PDF format.

@ -34,14 +34,14 @@ def pdf_to_text(pdf_path: str) -> str:

 ## Function Description 

-`pdf_to_text` utilises the `PdfReader` function from the `PyPDF2` library to read the PDF file. If the PDF file does not exist at the specified path or there was an error while reading the file, appropriate exceptions will be raised. It then iterates through each page in the PDF and uses the `extract_text` function to extract the text content from each page. These contents are then concatenated into a single variable and returned as the result.
+`pdf_to_text` utilises the `PdfReader` function from the `pypdf` library to read the PDF file. If the PDF file does not exist at the specified path or there was an error while reading the file, appropriate exceptions will be raised. It then iterates through each page in the PDF and uses the `extract_text` function to extract the text content from each page. These contents are then concatenated into a single variable and returned as the result.

 ## Usage Examples

-To use this function, you first need to install the `PyPDF2` library. It can be installed via pip:
+To use this function, you first need to install the `pypdf` library. It can be installed via pip:

 ```python
-!pip install pypdf2
+!pip install pypdf
 ```

 Then, you should import the `pdf_to_text` function:
@ -68,4 +68,4 @@ print(text)
 - This function reads the text from the PDF. It does not handle images, graphical elements, or any non-text content.
 - If the PDF contains scanned images rather than textual data, the `extract_text` function may not be able to extract any text. In such cases, you would require OCR (Optical Character Recognition) tools to extract the text. 
 - Be aware of the possibility that the output string might contain special characters or escape sequences because they were part of the PDF's content. You might need to clean the resulting text according to your requirements.
- The function uses the PyPDF2 library to facilitate the PDF reading and text extraction. For any issues related to PDF manipulation, consult the [PyPDF2 library documentation](https://pythonhosted.org/PyPDF2/).
+- The function uses the pypdf library to facilitate the PDF reading and text extraction. For any issues related to PDF manipulation, consult the [pypdf library documentation](https://pypdf.readthedocs.io/en/stable/).
--- a/playground/agents/multion_agent.py
+++ b/playground/agents/multion_agent.py
@ -64,9 +64,7 @@ class MultiOnAgent(AbstractLLM):


 # model
-model = MultiOnAgent(
-    multion_api_key=""
-)
+model = MultiOnAgent(multion_api_key="")

 # out = model.run("search for a recipe")
 agent = Agent(
--- a/playground/agents/perimeter_defense_agent.py
+++ b/playground/agents/perimeter_defense_agent.py
@ -0,0 +1,70 @@
+import os
+from dotenv import load_dotenv
+from swarms.models import GPT4VisionAPI
+from swarms.structs import Agent
+import swarms.prompts.security_team as stsp
+
+# Load environment variables and initialize the Vision API
+load_dotenv()
+api_key = os.getenv("OPENAI_API_KEY")
+
+llm = GPT4VisionAPI(openai_api_key=api_key)
+
+# Image for analysis
+img = "bank_robbery.jpg"
+
+# Initialize agents with respective prompts for security tasks
+crowd_analysis_agent = Agent(
+    llm=llm,
+    sop=stsp.CROWD_ANALYSIS_AGENT_PROMPT,
+    max_loops=1,
+    multi_modal=True,
+)
+
+weapon_detection_agent = Agent(
+    llm=llm,
+    sop=stsp.WEAPON_DETECTION_AGENT_PROMPT,
+    max_loops=1,
+    multi_modal=True,
+)
+
+surveillance_monitoring_agent = Agent(
+    llm=llm,
+    sop=stsp.SURVEILLANCE_MONITORING_AGENT_PROMPT,
+    max_loops=1,
+    multi_modal=True,
+)
+
+emergency_response_coordinator = Agent(
+    llm=llm,
+    sop=stsp.EMERGENCY_RESPONSE_COORDINATOR_PROMPT,
+    max_loops=1,
+    multi_modal=True,
+)
+
+# Run agents with respective tasks on the same image
+crowd_analysis = crowd_analysis_agent.run(
+    "Analyze the crowd dynamics in the scene", img
+)
+
+weapon_detection_analysis = weapon_detection_agent.run(
+    "Inspect the scene for any potential threats", img
+)
+
+surveillance_monitoring_analysis = surveillance_monitoring_agent.run(
+    "Monitor the overall scene for unusual activities", img
+)
+
+emergency_response_analysis = emergency_response_coordinator.run(
+    "Develop a response plan based on the scene analysis", img
+)
+
+# Process and output results for each task
+# Example output (uncomment to use):
+print(f"Crowd Analysis: {crowd_analysis}")
+print(f"Weapon Detection Analysis: {weapon_detection_analysis}")
+print(
+    "Surveillance Monitoring Analysis:"
+    f" {surveillance_monitoring_analysis}"
+)
+print(f"Emergency Response Analysis: {emergency_response_analysis}")
--- a/pyproject.toml
+++ b/pyproject.toml
@ -45,7 +45,7 @@ datasets = "*"
 optimum = "1.15.0"
 diffusers = "*"
 toml = "*"
-PyPDF2 = "3.0.1"
+pypdf = "4.0.1"
 accelerate = "*"
 anthropic = "*"
 sentencepiece = "0.1.98"
--- a/requirements.txt
+++ b/requirements.txt
@ -16,7 +16,7 @@ huggingface-hub
 google-generativeai==0.3.1
 sentencepiece==0.1.98
 requests_mock
-PyPDF2==3.0.1
+pypdf==4.0.1
 accelerate==0.22.0
 chromadb
 tensorflow
--- a/swarms/loaders/pdf_loader.py
+++ b/swarms/loaders/pdf_loader.py
@ -4,7 +4,7 @@ from dataclasses import dataclass
 from pathlib import Path
 from typing import IO, Dict, List, Optional

-from PyPDF2 import PdfReader
+from pypdf import PdfReader

 from swarms.utils.hash import str_to_hash

--- a/swarms/structs/agent.py
+++ b/swarms/structs/agent.py
@ -29,6 +29,7 @@ from swarms.utils.pdf_to_text import pdf_to_text
 from swarms.utils.token_count_tiktoken import limit_tokens_from_string
 from swarms.tools.exec_tool import execute_tool_by_name
 from swarms.prompts.worker_prompt import worker_tools_sop_promp
+from swarms.structs.schemas import Step


 # Utils
@ -50,6 +51,14 @@ def agent_id():
    return str(uuid.uuid4())


+def task_id():
+    return str(uuid.uuid4())
+
+
+def step_id():
+    return str(uuid.uuid1())
+
+
 class Agent:
    """
    Agent is the backbone to connect LLMs with tools and long term memory. Agent also provides the ability to
@ -296,6 +305,9 @@ class Agent:
        # Initialize the llm with the conditional variables
        # self.llm = llm(*args, **kwargs)

+        # Step cache
+        self.step_cache = []
+
    def set_system_prompt(self, system_prompt: str):
        """Set the system prompt"""
        self.system_prompt = system_prompt
@ -522,7 +534,7 @@ class Agent:
            # Activate Autonomous agent message
            self.activate_autonomous_agent()

-            response = task  # or combined_prompt
+            # response = task  # or combined_prompt
            history = self._history(self.user_name, task)

            # If dashboard = True then print the dashboard
@ -541,20 +553,13 @@ class Agent:
                self.loop_count_print(loop_count, self.max_loops)
                print("\n")

-                # Check to see if stopping token is in the output to stop the loop
-                if self.stopping_token:
-                    if self._check_stopping_condition(
-                        response
-                    ) or parse_done_token(response):
-                        break
-
                # Adjust temperature, comment if no work
                if self.dynamic_temperature_enabled:
                    print(colored("Adjusting temperature...", "blue"))
                    self.dynamic_temperature()

                # Preparing the prompt
-                task = self.agent_history_prompt(history=response)
+                task = self.agent_history_prompt(history=task)

                attempt = 0
                while attempt < self.retry_attempts:
@ -573,6 +578,24 @@ class Agent:
                            )
                            print(response)

+                        # Log each step
+                        step = Step(
+                            input=task,
+                            task_id=task_id,
+                            step_id=step_id,
+                            output=response,
+                        )
+
+                        # Check to see if stopping token is in the output to stop the loop
+                        if self.stopping_token:
+                            if self._check_stopping_condition(
+                                response
+                            ) or parse_done_token(response):
+                                break
+
+                        self.step_cache.append(step)
+                        logging.info(f"Step: {step}")
+
                        # If parser exists then parse the response
                        if self.parser:
                            response = self.parser(response)
@ -692,10 +715,8 @@ class Agent:
        else:
            system_prompt = self.system_prompt
            agent_history_prompt = f"""
-                SYSTEM_PROMPT: {system_prompt}
-
+                System : {system_prompt}
                
-                ################ CHAT HISTORY ####################
                {history}
            """
            return agent_history_prompt
--- a/swarms/utils/pdf_to_text.py
+++ b/swarms/utils/pdf_to_text.py
@ -1,11 +1,11 @@
 import sys

 try:
-    import PyPDF2
+    import pypdf
 except ImportError:
    print(
-        "PyPDF2 not installed. Please install it using: pip install"
-        " PyPDF2"
+        "pypdf not installed. Please install it using: pip install"
+        " pypdf"
    )
    sys.exit(1)

@ -27,7 +27,7 @@ def pdf_to_text(pdf_path):
    try:
        # Open the PDF file
        with open(pdf_path, "rb") as file:
-            pdf_reader = PyPDF2.PdfReader(file)
+            pdf_reader = pypdf.PdfReader(file)
            text = ""

            # Iterate through each page and extract text
--- a/tests/utils/test_pdf_to_text.py
+++ b/tests/utils/test_pdf_to_text.py
@ -1,12 +1,12 @@
 import pytest
-import PyPDF2
+import pypdf
 from swarms.utils import pdf_to_text


@pytest.fixture
 def pdf_file(tmpdir):
-    pdf_writer = PyPDF2.PdfWriter()
-    pdf_page = PyPDF2.pdf.PageObject.createBlankPage(None, 200, 200)
+    pdf_writer = pypdf.PdfWriter()
+    pdf_page = pypdf.PageObject.create_blank_page(None, 200, 200)
    pdf_writer.add_page(pdf_page)
    pdf_file = tmpdir.join("temp.pdf")
    with open(pdf_file, "wb") as output: