Merge branch 'kyegomez:master' into feat/testing_suite

3 days ago · bf22003941
parent b3616314a3 c9d6660879
commit bf22003941
7 changed files with 432 additions and 94 deletions
--- a/.github/workflows/lint.yml
+++ b/.github/workflows/lint.yml
@ -14,7 +14,7 @@ jobs:
          python-version: '3.10'
          
      - name: Cache pip dependencies
-        uses: actions/cache@v3
+        uses: actions/cache@v4
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pip-${{ hashFiles('**/pyproject.toml') }}
--- a/.github/workflows/trivy.yml
+++ b/.github/workflows/trivy.yml
@ -34,7 +34,7 @@ jobs:
          docker build -t docker.io/my-organization/my-app:${{ github.sha }} .

      - name: Run Trivy vulnerability scanner
-        uses: aquasecurity/trivy-action@76071ef0d7ec797419534a183b498b4d6366cf37
+        uses: aquasecurity/trivy-action@dc5a429b52fcf669ce959baa2c2dd26090d2a6c4
        with:
          image-ref: 'docker.io/my-organization/my-app:${{ github.sha }}'
          format: 'template'
--- a/bug_report.md
+++ b/bug_report.md
@ -0,0 +1,170 @@
+# Bug Report: Swarms Codebase Issues
+
+## Bug 1: Error Handling in Daemon Thread (Critical)
+
+**Location:** `swarms/structs/agent.py` lines 1446-1453
+
+**Description:** The `_handle_run_error` method creates a daemon thread to handle errors, but the exception raised in `__handle_run_error` will not propagate back to the main thread. This causes silent failures where errors are logged but not properly propagated to the caller.
+
+**Type:** Concurrency/Error Handling Bug
+
+**Severity:** Critical - Can lead to silent failures
+
+**Current Code:**
+```python
+def _handle_run_error(self, error: any):
+    process_thread = threading.Thread(
+        target=self.__handle_run_error,
+        args=(error,),
+        daemon=True,
+    )
+    process_thread.start()
+```
+
+**Problem:** 
+- The daemon thread will exit when the main thread exits
+- The `raise error` at the end of `__handle_run_error` occurs in the daemon thread, not the main thread
+- This means exceptions are lost and not properly handled by the calling code
+
+**Fix:** Remove the threading wrapper and call the error handler directly, or use proper exception propagation.
+
+---
+
+## Bug 2: Method Name Typo (Logic Error)
+
+**Location:** `swarms/structs/agent.py` lines 2128 and 2122
+
+**Description:** There are two related typos in the response filtering functionality:
+1. The method `apply_reponse_filters` has a typo in the name - it should be `apply_response_filters`
+2. The `add_response_filter` method accesses `self.reponse_filters` instead of `self.response_filters`
+
+**Type:** Naming/Logic Error
+
+**Severity:** Medium - Can cause AttributeError when called
+
+**Current Code:**
+```python
+def add_response_filter(self, filter_word: str) -> None:
+    logger.info(f"Adding response filter: {filter_word}")
+    self.reponse_filters.append(filter_word)  # TYPO: reponse_filters
+
+def apply_reponse_filters(self, response: str) -> str:  # TYPO: apply_reponse_filters
+    """
+    Apply the response filters to the response
+    """
+    logger.info(
+        f"Applying response filters to response: {response}"
+    )
+    for word in self.response_filters:
+        response = response.replace(word, "[FILTERED]")
+    return response
+```
+
+**Problem:** 
+- Method name is misspelled: `apply_reponse_filters` instead of `apply_response_filters`
+- Attribute access is misspelled: `self.reponse_filters` instead of `self.response_filters`
+- The method is called correctly in `filtered_run` method, suggesting these are typos
+
+**Fix:** Fix both typos to use correct spelling.
+
+---
+
+## Bug 3: Document Ingestion Logic Error (Data Loss)
+
+**Location:** `swarms/structs/agent.py` lines 2193-2212
+
+**Description:** The `ingest_docs` method has a logic error where it processes all documents in a loop but only retains the data from the last document. All previous documents are processed but their data is overwritten and lost.
+
+**Type:** Logic Error
+
+**Severity:** High - Causes data loss
+
+**Current Code:**
+```python
+def ingest_docs(self, docs: List[str], *args, **kwargs):
+    """Ingest the docs into the memory
+
+    Args:
+        docs (List[str]): Documents of pdfs, text, csvs
+
+    Returns:
+        None
+    """
+    try:
+        for doc in docs:
+            data = data_to_text(doc)
+
+        return self.short_memory.add(
+            role=self.user_name, content=data
+        )
+    except Exception as error:
+        logger.info(f"Error ingesting docs: {error}", "red")
+```
+
+**Problem:**
+- The `data` variable is overwritten on each iteration
+- Only the last document's data is actually added to memory
+- All previous documents are processed but their data is lost
+- The method should either process documents individually or combine all data
+
+**Fix:** Accumulate all document data or process each document individually.
+
+---
+
+## Impact Assessment
+
+1. **Bug 1 (Critical):** Can cause silent failures in production, making debugging difficult
+2. **Bug 2 (Medium):** Will cause AttributeError when the method is called correctly
+3. **Bug 3 (High):** Causes data loss when ingesting multiple documents
+
+## Fixes Applied
+
+### Bug 1 Fix - Error Handling
+**Status:** ✅ FIXED
+
+Changed the `_handle_run_error` method to call `__handle_run_error` directly instead of using a daemon thread:
+```python
+def _handle_run_error(self, error: any):
+    # Handle error directly instead of using daemon thread
+    # to ensure proper exception propagation
+    self.__handle_run_error(error)
+```
+
+### Bug 2 Fix - Method Name Typos
+**Status:** ✅ FIXED
+
+Fixed both typos in the response filtering functionality:
+1. Renamed `apply_reponse_filters` to `apply_response_filters`
+2. Fixed `self.reponse_filters` to `self.response_filters`
+
+### Bug 3 Fix - Document Ingestion Logic
+**Status:** ✅ FIXED
+
+Modified the `ingest_docs` method to process all documents and combine their content:
+```python
+def ingest_docs(self, docs: List[str], *args, **kwargs):
+    try:
+        # Process all documents and combine their content
+        all_data = []
+        for doc in docs:
+            data = data_to_text(doc)
+            all_data.append(f"Document: {doc}\n{data}")
+        
+        # Combine all document content
+        combined_data = "\n\n".join(all_data)
+
+        return self.short_memory.add(
+            role=self.user_name, content=combined_data
+        )
+    except Exception as error:
+        logger.info(f"Error ingesting docs: {error}", "red")
+```
+
+## Recommendations
+
+1. ✅ Fixed the error handling to properly propagate exceptions
+2. ✅ Corrected the method name typos
+3. ✅ Fixed the document ingestion logic to process all documents
+4. Add unit tests to prevent similar issues in the future
+5. Consider adding linting rules to catch method name typos
+6. Consider code review processes to catch similar issues
--- a/pyproject.toml
+++ b/pyproject.toml
@ -86,7 +86,7 @@ swarms = "swarms.cli.main:main"

 [tool.poetry.group.lint.dependencies]
 black = ">=23.1,<26.0"
-ruff = ">=0.5.1,<0.11.14"
+ruff = ">=0.5.1,<0.12.3"
 types-toml = "^0.10.8.1"
 types-pytz = ">=2023.3,<2026.0"
 types-chardet = "^5.0.4.6"
--- a/swarms/structs/agent.py
+++ b/swarms/structs/agent.py
@ -1444,12 +1444,9 @@ class Agent:
        raise error

    def _handle_run_error(self, error: any):
-        process_thread = threading.Thread(
-            target=self.__handle_run_error,
-            args=(error,),
-            daemon=True,
-        )
-        process_thread.start()
+        # Handle error directly instead of using daemon thread
+        # to ensure proper exception propagation
+        self.__handle_run_error(error)

    async def arun(
        self,
@ -2123,9 +2120,9 @@ class Agent:

        """
        logger.info(f"Adding response filter: {filter_word}")
-        self.reponse_filters.append(filter_word)
+        self.response_filters.append(filter_word)

-    def apply_reponse_filters(self, response: str) -> str:
+    def apply_response_filters(self, response: str) -> str:
        """
        Apply the response filters to the response

@ -2200,11 +2197,17 @@ class Agent:
            None
        """
        try:
+            # Process all documents and combine their content
+            all_data = []
            for doc in docs:
                data = data_to_text(doc)
+                all_data.append(f"Document: {doc}\n{data}")
+            
+            # Combine all document content
+            combined_data = "\n\n".join(all_data)

            return self.short_memory.add(
-                role=self.user_name, content=data
+                role=self.user_name, content=combined_data
            )
        except Exception as error:
            logger.info(f"Error ingesting docs: {error}", "red")
--- a/swarms/utils/litellm_wrapper.py
+++ b/swarms/utils/litellm_wrapper.py
@ -212,44 +212,62 @@ class LiteLLM:
        Process vision input specifically for Anthropic models.
        Handles Anthropic's specific image format requirements.
        """
-        # Get base64 encoded image
-        image_url = get_image_base64(image)
-
-        # Extract mime type from the data URI or use default
-        mime_type = "image/jpeg"  # default
-        if "data:" in image_url and ";base64," in image_url:
-            mime_type = image_url.split(";base64,")[0].split("data:")[
-                1
-            ]
-
-        # Ensure mime type is one of the supported formats
-        supported_formats = [
-            "image/jpeg",
-            "image/png",
-            "image/gif",
-            "image/webp",
-        ]
-        if mime_type not in supported_formats:
-            mime_type = (
-                "image/jpeg"  # fallback to jpeg if unsupported
+        # Check if we can use direct URL
+        if self._should_use_direct_url(image):
+            # Use direct URL without base64 conversion
+            messages.append(
+                {
+                    "role": "user",
+                    "content": [
+                        {"type": "text", "text": task},
+                        {
+                            "type": "image_url",
+                            "image_url": {
+                                "url": image,
+                            },
+                        },
+                    ],
+                }
            )
+        else:
+            # Fall back to base64 conversion for local files
+            image_url = get_image_base64(image)
+
+            # Extract mime type from the data URI or use default
+            mime_type = "image/jpeg"  # default
+            if "data:" in image_url and ";base64," in image_url:
+                mime_type = image_url.split(";base64,")[0].split("data:")[
+                    1
+                ]
+
+            # Ensure mime type is one of the supported formats
+            supported_formats = [
+                "image/jpeg",
+                "image/png",
+                "image/gif",
+                "image/webp",
+            ]
+            if mime_type not in supported_formats:
+                mime_type = (
+                    "image/jpeg"  # fallback to jpeg if unsupported
+                )

-        # Construct Anthropic vision message
-        messages.append(
-            {
-                "role": "user",
-                "content": [
-                    {"type": "text", "text": task},
-                    {
-                        "type": "image_url",
-                        "image_url": {
-                            "url": image_url,
-                            "format": mime_type,
+            # Construct Anthropic vision message with base64
+            messages.append(
+                {
+                    "role": "user",
+                    "content": [
+                        {"type": "text", "text": task},
+                        {
+                            "type": "image_url",
+                            "image_url": {
+                                "url": image_url,
+                                "format": mime_type,
+                            },
                        },
-                    },
-                ],
-            }
-        )
+                    ],
+                }
+            )

        return messages

@ -260,21 +278,29 @@ class LiteLLM:
        Process vision input specifically for OpenAI models.
        Handles OpenAI's specific image format requirements.
        """
-        # Get base64 encoded image with proper format
-        image_url = get_image_base64(image)
-
-        # Prepare vision message
-        vision_message = {
-            "type": "image_url",
-            "image_url": {"url": image_url},
-        }
-
-        # Add format for specific models
-        extension = Path(image).suffix.lower()
-        mime_type = (
-            f"image/{extension[1:]}" if extension else "image/jpeg"
-        )
-        vision_message["image_url"]["format"] = mime_type
+        # Check if we can use direct URL
+        if self._should_use_direct_url(image):
+            # Use direct URL without base64 conversion
+            vision_message = {
+                "type": "image_url",
+                "image_url": {"url": image},
+            }
+        else:
+            # Fall back to base64 conversion for local files
+            image_url = get_image_base64(image)
+
+            # Prepare vision message with base64
+            vision_message = {
+                "type": "image_url",
+                "image_url": {"url": image_url},
+            }
+
+            # Add format for specific models
+            extension = Path(image).suffix.lower()
+            mime_type = (
+                f"image/{extension[1:]}" if extension else "image/jpeg"
+            )
+            vision_message["image_url"]["format"] = mime_type

        # Append vision message
        messages.append(
@ -289,44 +315,61 @@ class LiteLLM:

        return messages

+    def _should_use_direct_url(self, image: str) -> bool:
+        """
+        Determine if we should use direct URL passing instead of base64 conversion.
+        
+        Args:
+            image (str): The image source (URL or file path)
+            
+        Returns:
+            bool: True if we should use direct URL, False if we need base64 conversion
+        """
+        # Only use direct URL for HTTP/HTTPS URLs
+        if not image.startswith(("http://", "https://")):
+            return False
+        
+        # Check for local/custom models that might not support direct URLs
+        model_lower = self.model_name.lower()
+        local_indicators = ["localhost", "127.0.0.1", "local", "custom", "ollama", "llama-cpp"]
+        
+        is_local = any(indicator in model_lower for indicator in local_indicators) or \
+                   (self.base_url is not None and any(indicator in self.base_url.lower() for indicator in local_indicators))
+        
+        if is_local:
+            return False
+        
+        # Use LiteLLM's supports_vision to check if model supports vision and direct URLs
+        try:
+            return supports_vision(model=self.model_name)
+        except Exception:
+            return False
+
    def vision_processing(
        self, task: str, image: str, messages: Optional[list] = None
    ):
        """
        Process the image for the given task.
        Handles different image formats and model requirements.
+        
+        This method now intelligently chooses between:
+        1. Direct URL passing (when model supports it and image is a URL)
+        2. Base64 conversion (for local files or unsupported models)
+        
+        This approach reduces server load and improves performance by avoiding
+        unnecessary image downloads and base64 conversions when possible.
        """
-        # # # Handle Anthropic models separately
-        # # if "anthropic" in self.model_name.lower() or "claude" in self.model_name.lower():
-        # #     messages = self.anthropic_vision_processing(task, image, messages)
-        # #     return messages
-
-        # # Get base64 encoded image with proper format
-        # image_url = get_image_base64(image)
-
-        # # Prepare vision message
-        # vision_message = {
-        #     "type": "image_url",
-        #     "image_url": {"url": image_url},
-        # }
-
-        # # Add format for specific models
-        # extension = Path(image).suffix.lower()
-        # mime_type = f"image/{extension[1:]}" if extension else "image/jpeg"
-        # vision_message["image_url"]["format"] = mime_type
-
-        # # Append vision message
-        # messages.append(
-        #     {
-        #         "role": "user",
-        #         "content": [
-        #             {"type": "text", "text": task},
-        #             vision_message,
-        #         ],
-        #     }
-        # )
-
-        # return messages
+        logger.info(f"Processing image for model: {self.model_name}")
+        
+        # Log whether we're using direct URL or base64 conversion
+        if self._should_use_direct_url(image):
+            logger.info(f"Using direct URL passing for image: {image[:100]}...")
+        else:
+            if image.startswith(("http://", "https://")):
+                logger.info("Converting URL image to base64 (model doesn't support direct URLs)")
+            else:
+                logger.info("Converting local file to base64")
+
        if (
            "anthropic" in self.model_name.lower()
            or "claude" in self.model_name.lower()
@ -370,7 +413,16 @@ class LiteLLM:

    def check_if_model_supports_vision(self, img: str = None):
        """
-        Check if the model supports vision.
+        Check if the model supports vision capabilities.
+        
+        This method uses LiteLLM's built-in supports_vision function to verify
+        that the model can handle image inputs before processing.
+        
+        Args:
+            img (str, optional): Image path/URL to validate against model capabilities
+            
+        Raises:
+            ValueError: If the model doesn't support vision and an image is provided
        """
        if img is not None:
            out = supports_vision(model=self.model_name)
--- a/tests/utils/test_litellm_wrapper.py
+++ b/tests/utils/test_litellm_wrapper.py
@ -201,6 +201,119 @@ def run_test_suite():
    except Exception as e:
        log_test_result("Batched Run", False, str(e))

+    # Test 8: Vision Support Check
+    try:
+        logger.info("Testing vision support check")
+        llm = LiteLLM(model_name="gpt-4o")
+        # This should not raise an error for vision-capable models
+        llm.check_if_model_supports_vision(img="test.jpg")
+        log_test_result("Vision Support Check", True)
+    except Exception as e:
+        log_test_result("Vision Support Check", False, str(e))
+
+    # Test 9: Direct URL Processing
+    try:
+        logger.info("Testing direct URL processing")
+        llm = LiteLLM(model_name="gpt-4o")
+        test_url = "https://github.com/kyegomez/swarms/blob/master/swarms_logo_new.png?raw=true"
+        should_use_direct = llm._should_use_direct_url(test_url)
+        assert isinstance(should_use_direct, bool)
+        log_test_result("Direct URL Processing", True)
+    except Exception as e:
+        log_test_result("Direct URL Processing", False, str(e))
+
+    # Test 10: Message Preparation with Image
+    try:
+        logger.info("Testing message preparation with image")
+        llm = LiteLLM(model_name="gpt-4o")
+        # Mock image URL to test message structure
+        test_img = "https://github.com/kyegomez/swarms/blob/master/swarms_logo_new.png?raw=true"
+        messages = llm._prepare_messages("Describe this image", img=test_img)
+        assert isinstance(messages, list)
+        assert len(messages) >= 1
+        # Check if image content is properly structured
+        user_message = next((msg for msg in messages if msg["role"] == "user"), None)
+        assert user_message is not None
+        log_test_result("Message Preparation with Image", True)
+    except Exception as e:
+        log_test_result("Message Preparation with Image", False, str(e))
+
+    # Test 11: Vision Processing Methods
+    try:
+        logger.info("Testing vision processing methods")
+        llm = LiteLLM(model_name="gpt-4o")
+        messages = []
+        
+        # Test OpenAI vision processing
+        processed_messages = llm.openai_vision_processing(
+            "Describe this image", 
+            "https://github.com/kyegomez/swarms/blob/master/swarms_logo_new.png?raw=true", 
+            messages.copy()
+        )
+        assert isinstance(processed_messages, list)
+        assert len(processed_messages) > 0
+        
+        # Test Anthropic vision processing
+        llm_anthropic = LiteLLM(model_name="claude-3-5-sonnet-20241022")
+        processed_messages_anthropic = llm_anthropic.anthropic_vision_processing(
+            "Describe this image", 
+            "https://github.com/kyegomez/swarms/blob/master/swarms_logo_new.png?raw=true", 
+            messages.copy()
+        )
+        assert isinstance(processed_messages_anthropic, list)
+        assert len(processed_messages_anthropic) > 0
+        
+        log_test_result("Vision Processing Methods", True)
+    except Exception as e:
+        log_test_result("Vision Processing Methods", False, str(e))
+
+    # Test 12: Local vs URL Detection
+    try:
+        logger.info("Testing local vs URL detection")
+        llm = LiteLLM(model_name="gpt-4o")
+        
+        # Test URL detection
+        url_test = "https://github.com/kyegomez/swarms/blob/master/swarms_logo_new.png?raw=true"
+        is_url_direct = llm._should_use_direct_url(url_test)
+        
+        # Test local file detection
+        local_test = "/path/to/local/image.jpg"
+        is_local_direct = llm._should_use_direct_url(local_test)
+        
+        # URLs should potentially use direct, local files should not
+        assert isinstance(is_url_direct, bool)
+        assert isinstance(is_local_direct, bool)
+        assert is_local_direct == False  # Local files should never use direct URL
+        
+        log_test_result("Local vs URL Detection", True)
+    except Exception as e:
+        log_test_result("Local vs URL Detection", False, str(e))
+
+    # Test 13: Vision Message Structure
+    try:
+        logger.info("Testing vision message structure")
+        llm = LiteLLM(model_name="gpt-4o")
+        messages = []
+        
+        # Test message structure for image input
+        result = llm.vision_processing(
+            task="What do you see?",
+            image="https://github.com/kyegomez/swarms/blob/master/swarms_logo_new.png?raw=true",
+            messages=messages
+        )
+        
+        assert isinstance(result, list)
+        assert len(result) > 0
+        
+        # Verify the message contains both text and image components
+        user_msg = result[-1]  # Last message should be user message
+        assert user_msg["role"] == "user"
+        assert "content" in user_msg
+        
+        log_test_result("Vision Message Structure", True)
+    except Exception as e:
+        log_test_result("Vision Message Structure", False, str(e))
+
    # Generate test report
    success_rate = (passed_tests / total_tests) * 100
    logger.info("\n=== Test Suite Report ===")