commit
5bd51107fb
@ -0,0 +1,37 @@
|
||||
name: "Init Environment"
|
||||
description: "Initialize environment for tests"
|
||||
runs:
|
||||
using: "composite"
|
||||
steps:
|
||||
- name: Checkout actions
|
||||
uses: actions/checkout@v3
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }}
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
|
||||
- name: Install and configure Poetry
|
||||
uses: snok/install-poetry@v1
|
||||
with:
|
||||
virtualenvs-create: true
|
||||
virtualenvs-in-project: true
|
||||
installer-parallel: true
|
||||
|
||||
- name: Load cached venv
|
||||
id: cached-poetry-dependencies
|
||||
uses: actions/cache@v3
|
||||
with:
|
||||
path: .venv
|
||||
key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}
|
||||
|
||||
- name: Install dependencies
|
||||
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
|
||||
run: poetry install --no-interaction --no-root --with test --with dev --all-extras
|
||||
shell: bash
|
||||
|
||||
- name: Activate venv
|
||||
run: |
|
||||
source .venv/bin/activate
|
||||
echo PATH=$PATH >> $GITHUB_ENV
|
||||
shell: bash
|
@ -0,0 +1,76 @@
|
||||
# Module Name: Mixtral
|
||||
|
||||
## Introduction
|
||||
The Mixtral module is a powerful language model designed for text generation tasks. It leverages the MistralAI Mixtral-8x7B pre-trained model to generate high-quality text based on user-defined tasks or prompts. In this documentation, we will provide a comprehensive overview of the Mixtral module, including its architecture, purpose, arguments, and detailed usage examples.
|
||||
|
||||
## Purpose
|
||||
The Mixtral module is designed to facilitate text generation tasks using state-of-the-art language models. Whether you need to generate creative content, draft text for various applications, or simply explore the capabilities of Mixtral, this module serves as a versatile and efficient solution. With its easy-to-use interface, you can quickly generate text for a wide range of applications.
|
||||
|
||||
## Architecture
|
||||
The Mixtral module is built on top of the MistralAI Mixtral-8x7B pre-trained model. It utilizes a deep neural network architecture with 8 layers and 7 attention heads to generate coherent and contextually relevant text. The model is capable of handling a variety of text generation tasks, from simple prompts to more complex content generation.
|
||||
|
||||
## Class Definition
|
||||
### `Mixtral(model_name: str = "mistralai/Mixtral-8x7B-v0.1", max_new_tokens: int = 500)`
|
||||
|
||||
#### Parameters
|
||||
- `model_name` (str, optional): The name or path of the pre-trained Mixtral model. Default is "mistralai/Mixtral-8x7B-v0.1".
|
||||
- `max_new_tokens` (int, optional): The maximum number of new tokens to generate. Default is 500.
|
||||
|
||||
## Functionality and Usage
|
||||
The Mixtral module offers a straightforward interface for text generation. It accepts a task or prompt as input and returns generated text based on the provided input.
|
||||
|
||||
### `run(task: Optional[str] = None, **kwargs) -> str`
|
||||
|
||||
#### Parameters
|
||||
- `task` (str, optional): The task or prompt for text generation.
|
||||
|
||||
#### Returns
|
||||
- `str`: The generated text.
|
||||
|
||||
## Usage Examples
|
||||
### Example 1: Basic Usage
|
||||
|
||||
```python
|
||||
from swarms.models import Mixtral
|
||||
|
||||
# Initialize the Mixtral model
|
||||
mixtral = Mixtral()
|
||||
|
||||
# Generate text for a simple task
|
||||
generated_text = mixtral.run("Generate a creative story.")
|
||||
print(generated_text)
|
||||
```
|
||||
|
||||
### Example 2: Custom Model
|
||||
|
||||
You can specify a custom pre-trained model by providing the `model_name` parameter.
|
||||
|
||||
```python
|
||||
custom_model_name = "model_name"
|
||||
mixtral_custom = Mixtral(model_name=custom_model_name)
|
||||
|
||||
generated_text = mixtral_custom.run("Generate text with a custom model.")
|
||||
print(generated_text)
|
||||
```
|
||||
|
||||
### Example 3: Controlling Output Length
|
||||
|
||||
You can control the length of the generated text by adjusting the `max_new_tokens` parameter.
|
||||
|
||||
```python
|
||||
mixtral_length = Mixtral(max_new_tokens=100)
|
||||
|
||||
generated_text = mixtral_length.run("Generate a short text.")
|
||||
print(generated_text)
|
||||
```
|
||||
|
||||
## Additional Information and Tips
|
||||
- It's recommended to use a descriptive task or prompt to guide the text generation process.
|
||||
- Experiment with different prompt styles and lengths to achieve the desired output.
|
||||
- You can fine-tune Mixtral on specific tasks if needed, although pre-trained models often work well out of the box.
|
||||
- Monitor the `max_new_tokens` parameter to control the length of the generated text.
|
||||
|
||||
## Conclusion
|
||||
The Mixtral module is a versatile tool for text generation tasks, powered by the MistralAI Mixtral-8x7B pre-trained model. Whether you need creative writing, content generation, or assistance with text-based tasks, Mixtral can help you achieve your goals. With a simple interface and flexible parameters, it's a valuable addition to your text generation toolkit.
|
||||
|
||||
If you encounter any issues or have questions about using Mixtral, please refer to the MistralAI documentation or reach out to their support team for further assistance. Happy text generation with Mixtral!
|
@ -0,0 +1,265 @@
|
||||
# Module/Class Name: Conversation
|
||||
|
||||
## Introduction
|
||||
|
||||
The `Conversation` class is a powerful tool for managing and structuring conversation data in a Python program. It enables you to create, manipulate, and analyze conversations easily. This documentation will provide you with a comprehensive understanding of the `Conversation` class, its attributes, methods, and how to effectively use it.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. **Class Definition**
|
||||
- Overview
|
||||
- Attributes
|
||||
|
||||
2. **Methods**
|
||||
- `__init__(self, time_enabled: bool = False, *args, **kwargs)`
|
||||
- `add(self, role: str, content: str, *args, **kwargs)`
|
||||
- `delete(self, index: str)`
|
||||
- `update(self, index: str, role, content)`
|
||||
- `query(self, index: str)`
|
||||
- `search(self, keyword: str)`
|
||||
- `display_conversation(self, detailed: bool = False)`
|
||||
- `export_conversation(self, filename: str)`
|
||||
- `import_conversation(self, filename: str)`
|
||||
- `count_messages_by_role(self)`
|
||||
- `return_history_as_string(self)`
|
||||
- `save_as_json(self, filename: str)`
|
||||
- `load_from_json(self, filename: str)`
|
||||
- `search_keyword_in_conversation(self, keyword: str)`
|
||||
- `pretty_print_conversation(self, messages)`
|
||||
|
||||
---
|
||||
|
||||
### 1. Class Definition
|
||||
|
||||
#### Overview
|
||||
|
||||
The `Conversation` class is designed to manage conversations by keeping track of messages and their attributes. It offers methods for adding, deleting, updating, querying, and displaying messages within the conversation. Additionally, it supports exporting and importing conversations, searching for specific keywords, and more.
|
||||
|
||||
#### Attributes
|
||||
|
||||
- `time_enabled (bool)`: A flag indicating whether to enable timestamp recording for messages.
|
||||
- `conversation_history (list)`: A list that stores messages in the conversation.
|
||||
|
||||
### 2. Methods
|
||||
|
||||
#### `__init__(self, time_enabled: bool = False, *args, **kwargs)`
|
||||
|
||||
- **Description**: Initializes a new Conversation object.
|
||||
- **Parameters**:
|
||||
- `time_enabled (bool)`: If `True`, timestamps will be recorded for each message. Default is `False`.
|
||||
|
||||
#### `add(self, role: str, content: str, *args, **kwargs)`
|
||||
|
||||
- **Description**: Adds a message to the conversation history.
|
||||
- **Parameters**:
|
||||
- `role (str)`: The role of the speaker (e.g., "user," "assistant").
|
||||
- `content (str)`: The content of the message.
|
||||
|
||||
#### `delete(self, index: str)`
|
||||
|
||||
- **Description**: Deletes a message from the conversation history.
|
||||
- **Parameters**:
|
||||
- `index (str)`: The index of the message to delete.
|
||||
|
||||
#### `update(self, index: str, role, content)`
|
||||
|
||||
- **Description**: Updates a message in the conversation history.
|
||||
- **Parameters**:
|
||||
- `index (str)`: The index of the message to update.
|
||||
- `role (_type_)`: The new role of the speaker.
|
||||
- `content (_type_)`: The new content of the message.
|
||||
|
||||
#### `query(self, index: str)`
|
||||
|
||||
- **Description**: Retrieves a message from the conversation history.
|
||||
- **Parameters**:
|
||||
- `index (str)`: The index of the message to query.
|
||||
- **Returns**: The message as a string.
|
||||
|
||||
#### `search(self, keyword: str)`
|
||||
|
||||
- **Description**: Searches for messages containing a specific keyword in the conversation history.
|
||||
- **Parameters**:
|
||||
- `keyword (str)`: The keyword to search for.
|
||||
- **Returns**: A list of messages that contain the keyword.
|
||||
|
||||
#### `display_conversation(self, detailed: bool = False)`
|
||||
|
||||
- **Description**: Displays the conversation history.
|
||||
- **Parameters**:
|
||||
- `detailed (bool, optional)`: If `True`, provides detailed information about each message. Default is `False`.
|
||||
|
||||
#### `export_conversation(self, filename: str)`
|
||||
|
||||
- **Description**: Exports the conversation history to a text file.
|
||||
- **Parameters**:
|
||||
- `filename (str)`: The name of the file to export to.
|
||||
|
||||
#### `import_conversation(self, filename: str)`
|
||||
|
||||
- **Description**: Imports a conversation history from a text file.
|
||||
- **Parameters**:
|
||||
- `filename (str)`: The name of the file to import from.
|
||||
|
||||
#### `count_messages_by_role(self)`
|
||||
|
||||
- **Description**: Counts the number of messages by role in the conversation.
|
||||
- **Returns**: A dictionary containing the count of messages for each role.
|
||||
|
||||
#### `return_history_as_string(self)`
|
||||
|
||||
- **Description**: Returns the entire conversation history as a single string.
|
||||
- **Returns**: The conversation history as a string.
|
||||
|
||||
#### `save_as_json(self, filename: str)`
|
||||
|
||||
- **Description**: Saves the conversation history as a JSON file.
|
||||
- **Parameters**:
|
||||
- `filename (str)`: The name of the JSON file to save.
|
||||
|
||||
#### `load_from_json(self, filename: str)`
|
||||
|
||||
- **Description**: Loads a conversation history from a JSON file.
|
||||
- **Parameters**:
|
||||
- `filename (str)`: The name of the JSON file to load.
|
||||
|
||||
#### `search_keyword_in_conversation(self, keyword: str)`
|
||||
|
||||
- **Description**: Searches for a keyword in the conversation history and returns matching messages.
|
||||
- **Parameters**:
|
||||
- `keyword (str)`: The keyword to search for.
|
||||
- **Returns**: A list of messages containing the keyword.
|
||||
|
||||
#### `pretty_print_conversation(self, messages)`
|
||||
|
||||
- **Description**: Pretty prints a list of messages with colored role indicators.
|
||||
- **Parameters**:
|
||||
- `messages (list)`: A list of messages to print.
|
||||
|
||||
## Examples
|
||||
|
||||
Here are some usage examples of the `Conversation` class:
|
||||
|
||||
### Creating a Conversation
|
||||
|
||||
```python
|
||||
from swarms.structs import Conversation
|
||||
|
||||
conv = Conversation()
|
||||
```
|
||||
|
||||
### Adding Messages
|
||||
|
||||
```python
|
||||
conv.add("user", "Hello, world!")
|
||||
conv.add("assistant", "Hello, user!")
|
||||
```
|
||||
|
||||
### Displaying the Conversation
|
||||
|
||||
```python
|
||||
conv.display_conversation()
|
||||
```
|
||||
|
||||
### Searching for Messages
|
||||
|
||||
```python
|
||||
result = conv.search("Hello")
|
||||
```
|
||||
|
||||
### Exporting and Importing Conversations
|
||||
|
||||
```python
|
||||
conv.export_conversation("conversation.txt")
|
||||
conv.import_conversation("conversation.txt")
|
||||
```
|
||||
|
||||
### Counting Messages by Role
|
||||
|
||||
```python
|
||||
counts = conv.count_messages_by_role()
|
||||
```
|
||||
|
||||
### Loading and Saving as JSON
|
||||
|
||||
```python
|
||||
conv.save_as_json("conversation.json")
|
||||
conv.load_from_json("conversation.json")
|
||||
```
|
||||
|
||||
Certainly! Let's continue with more examples and additional information about the `Conversation` class.
|
||||
|
||||
### Querying a Specific Message
|
||||
|
||||
You can retrieve a specific message from the conversation by its index:
|
||||
|
||||
```python
|
||||
message = conv.query(0) # Retrieves the first message
|
||||
```
|
||||
|
||||
### Updating a Message
|
||||
|
||||
You can update a message's content or role within the conversation:
|
||||
|
||||
```python
|
||||
conv.update(0, "user", "Hi there!") # Updates the first message
|
||||
```
|
||||
|
||||
### Deleting a Message
|
||||
|
||||
If you want to remove a message from the conversation, you can use the `delete` method:
|
||||
|
||||
```python
|
||||
conv.delete(0) # Deletes the first message
|
||||
```
|
||||
|
||||
### Counting Messages by Role
|
||||
|
||||
You can count the number of messages by role in the conversation:
|
||||
|
||||
```python
|
||||
counts = conv.count_messages_by_role()
|
||||
# Example result: {'user': 2, 'assistant': 2}
|
||||
```
|
||||
|
||||
### Exporting and Importing as Text
|
||||
|
||||
You can export the conversation to a text file and later import it:
|
||||
|
||||
```python
|
||||
conv.export_conversation("conversation.txt") # Export
|
||||
conv.import_conversation("conversation.txt") # Import
|
||||
```
|
||||
|
||||
### Exporting and Importing as JSON
|
||||
|
||||
Conversations can also be saved and loaded as JSON files:
|
||||
|
||||
```python
|
||||
conv.save_as_json("conversation.json") # Save as JSON
|
||||
conv.load_from_json("conversation.json") # Load from JSON
|
||||
```
|
||||
|
||||
### Searching for a Keyword
|
||||
|
||||
You can search for messages containing a specific keyword within the conversation:
|
||||
|
||||
```python
|
||||
results = conv.search_keyword_in_conversation("Hello")
|
||||
```
|
||||
|
||||
### Pretty Printing
|
||||
|
||||
The `pretty_print_conversation` method provides a visually appealing way to display messages with colored role indicators:
|
||||
|
||||
```python
|
||||
conv.pretty_print_conversation(conv.conversation_history)
|
||||
```
|
||||
|
||||
These examples demonstrate the versatility of the `Conversation` class in managing and interacting with conversation data. Whether you're building a chatbot, conducting analysis, or simply organizing dialogues, this class offers a robust set of tools to help you accomplish your goals.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `Conversation` class is a valuable utility for handling conversation data in Python. With its ability to add, update, delete, search, export, and import messages, you have the flexibility to work with conversations in various ways. Feel free to explore its features and adapt them to your specific projects and applications.
|
||||
|
||||
If you have any further questions or need additional assistance, please don't hesitate to ask!
|
@ -0,0 +1,21 @@
|
||||
import os
|
||||
|
||||
from dotenv import load_dotenv
|
||||
|
||||
# Import the OpenAIChat model and the Agent struct
|
||||
from swarms.models import OpenAIChat
|
||||
from swarms.structs import Agent
|
||||
|
||||
# Load the environment variables
|
||||
load_dotenv()
|
||||
|
||||
# Get the API key from the environment
|
||||
api_key = os.environ.get("OPENAI_API_KEY")
|
||||
|
||||
# Initialize the language model
|
||||
llm = OpenAIChat(
|
||||
temperature=0.5,
|
||||
model_name="gpt-4",
|
||||
openai_api_key=api_key,
|
||||
max_tokens=1000,
|
||||
)
|
@ -0,0 +1,96 @@
|
||||
import time
|
||||
import os
|
||||
|
||||
import pygame
|
||||
import speech_recognition as sr
|
||||
from dotenv import load_dotenv
|
||||
from playsound import playsound
|
||||
|
||||
from swarms import OpenAIChat, OpenAITTS
|
||||
|
||||
# Load the environment variables
|
||||
load_dotenv()
|
||||
|
||||
# Get the API key from the environment
|
||||
openai_api_key = os.environ.get("OPENAI_API_KEY")
|
||||
|
||||
# Initialize the language model
|
||||
llm = OpenAIChat(
|
||||
openai_api_key=openai_api_key,
|
||||
)
|
||||
|
||||
# Initialize the text-to-speech model
|
||||
tts = OpenAITTS(
|
||||
model_name="tts-1-1106",
|
||||
voice="onyx",
|
||||
openai_api_key=openai_api_key,
|
||||
saved_filepath="runs/tts_speech.wav",
|
||||
)
|
||||
|
||||
# Initialize the speech recognition model
|
||||
r = sr.Recognizer()
|
||||
|
||||
|
||||
def play_audio(file_path):
|
||||
# Check if the file exists
|
||||
if not os.path.isfile(file_path):
|
||||
print(f"Audio file {file_path} not found.")
|
||||
return
|
||||
|
||||
# Initialize the mixer module
|
||||
pygame.mixer.init()
|
||||
|
||||
try:
|
||||
# Load the mp3 file
|
||||
pygame.mixer.music.load(file_path)
|
||||
|
||||
# Play the mp3 file
|
||||
pygame.mixer.music.play()
|
||||
|
||||
# Wait for the audio to finish playing
|
||||
while pygame.mixer.music.get_busy():
|
||||
pygame.time.Clock().tick(10)
|
||||
except pygame.error as e:
|
||||
print(f"Couldn't play {file_path}: {e}")
|
||||
finally:
|
||||
# Stop the mixer module and free resources
|
||||
pygame.mixer.quit()
|
||||
|
||||
|
||||
while True:
|
||||
# Listen for user speech
|
||||
with sr.Microphone() as source:
|
||||
print("Listening...")
|
||||
audio = r.listen(source)
|
||||
|
||||
# Convert speech to text
|
||||
try:
|
||||
print("Recognizing...")
|
||||
task = r.recognize_google(audio)
|
||||
print(f"User said: {task}")
|
||||
except sr.UnknownValueError:
|
||||
print("Could not understand audio")
|
||||
continue
|
||||
except Exception as e:
|
||||
print(f"Error: {e}")
|
||||
continue
|
||||
|
||||
# Run the Gemini model on the task
|
||||
print("Running GPT4 model...")
|
||||
out = llm(task)
|
||||
print(f"Gemini output: {out}")
|
||||
|
||||
# Convert the Gemini output to speech
|
||||
print("Running text-to-speech model...")
|
||||
out = tts.run_and_save(out)
|
||||
print(f"Text-to-speech output: {out}")
|
||||
|
||||
# Ask the user if they want to play the audio
|
||||
# play_audio = input("Do you want to play the audio? (yes/no): ")
|
||||
# if play_audio.lower() == "yes":
|
||||
# Initialize the mixer module
|
||||
# Play the audio file
|
||||
|
||||
time.sleep(5)
|
||||
|
||||
playsound("runs/tts_speech.wav")
|
@ -0,0 +1,4 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Find all __pycache__ directories and delete them
|
||||
find . -type d -name "__pycache__" -exec rm -rf {} +
|
@ -1,4 +1,11 @@
|
||||
from swarms.memory.base_vectordb import VectorDatabase
|
||||
from swarms.memory.short_term_memory import ShortTermMemory
|
||||
from swarms.memory.sqlite import SQLiteDB
|
||||
from swarms.memory.weaviate_db import WeaviateDB
|
||||
|
||||
__all__ = ["VectorDatabase", "ShortTermMemory"]
|
||||
__all__ = [
|
||||
"VectorDatabase",
|
||||
"ShortTermMemory",
|
||||
"SQLiteDB",
|
||||
"WeaviateDB",
|
||||
]
|
||||
|
@ -0,0 +1,159 @@
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
|
||||
class AbstractDatabase(ABC):
|
||||
"""
|
||||
Abstract base class for a database.
|
||||
|
||||
This class defines the interface for interacting with a database.
|
||||
Subclasses must implement the abstract methods to provide the
|
||||
specific implementation details for connecting to a database,
|
||||
executing queries, and performing CRUD operations.
|
||||
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def connect(self):
|
||||
"""
|
||||
Connect to the database.
|
||||
|
||||
This method establishes a connection to the database.
|
||||
|
||||
"""
|
||||
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def close(self):
|
||||
"""
|
||||
Close the database connection.
|
||||
|
||||
This method closes the connection to the database.
|
||||
|
||||
"""
|
||||
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def execute_query(self, query):
|
||||
"""
|
||||
Execute a database query.
|
||||
|
||||
This method executes the given query on the database.
|
||||
|
||||
Parameters:
|
||||
query (str): The query to be executed.
|
||||
|
||||
"""
|
||||
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def fetch_all(self):
|
||||
"""
|
||||
Fetch all rows from the result set.
|
||||
|
||||
This method retrieves all rows from the result set of a query.
|
||||
|
||||
Returns:
|
||||
list: A list of dictionaries representing the rows.
|
||||
|
||||
"""
|
||||
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def fetch_one(self):
|
||||
"""
|
||||
Fetch one row from the result set.
|
||||
|
||||
This method retrieves one row from the result set of a query.
|
||||
|
||||
Returns:
|
||||
dict: A dictionary representing the row.
|
||||
|
||||
"""
|
||||
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def add(self, table, data):
|
||||
"""
|
||||
Add a new record to the database.
|
||||
|
||||
This method adds a new record to the specified table in the database.
|
||||
|
||||
Parameters:
|
||||
table (str): The name of the table.
|
||||
data (dict): A dictionary representing the data to be added.
|
||||
|
||||
"""
|
||||
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def query(self, table, condition):
|
||||
"""
|
||||
Query the database.
|
||||
|
||||
This method queries the specified table in the database based on the given condition.
|
||||
|
||||
Parameters:
|
||||
table (str): The name of the table.
|
||||
condition (str): The condition to be applied in the query.
|
||||
|
||||
Returns:
|
||||
list: A list of dictionaries representing the query results.
|
||||
|
||||
"""
|
||||
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get(self, table, id):
|
||||
"""
|
||||
Get a record from the database.
|
||||
|
||||
This method retrieves a record from the specified table in the database based on the given ID.
|
||||
|
||||
Parameters:
|
||||
table (str): The name of the table.
|
||||
id (int): The ID of the record to be retrieved.
|
||||
|
||||
Returns:
|
||||
dict: A dictionary representing the retrieved record.
|
||||
|
||||
"""
|
||||
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def update(self, table, id, data):
|
||||
"""
|
||||
Update a record in the database.
|
||||
|
||||
This method updates a record in the specified table in the database based on the given ID.
|
||||
|
||||
Parameters:
|
||||
table (str): The name of the table.
|
||||
id (int): The ID of the record to be updated.
|
||||
data (dict): A dictionary representing the updated data.
|
||||
|
||||
"""
|
||||
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def delete(self, table, id):
|
||||
"""
|
||||
Delete a record from the database.
|
||||
|
||||
This method deletes a record from the specified table in the database based on the given ID.
|
||||
|
||||
Parameters:
|
||||
table (str): The name of the table.
|
||||
id (int): The ID of the record to be deleted.
|
||||
|
||||
"""
|
||||
|
||||
pass
|
@ -1,302 +1,140 @@
|
||||
import subprocess
|
||||
import uuid
|
||||
from typing import Optional
|
||||
from attr import define, field, Factory
|
||||
from dataclasses import dataclass
|
||||
from swarms.memory.base import BaseVectorStore
|
||||
|
||||
try:
|
||||
from sqlalchemy.engine import Engine
|
||||
from sqlalchemy import create_engine, Column, String, JSON
|
||||
from sqlalchemy.ext.declarative import declarative_base
|
||||
from typing import Any, List, Optional
|
||||
from sqlalchemy import JSON, Column, String, create_engine
|
||||
from sqlalchemy.dialects.postgresql import UUID
|
||||
from sqlalchemy.ext.declarative import declarative_base
|
||||
from sqlalchemy.orm import Session
|
||||
except ImportError:
|
||||
print(
|
||||
"The PgVectorVectorStore requires sqlalchemy to be installed"
|
||||
)
|
||||
print("pip install sqlalchemy")
|
||||
subprocess.run(["pip", "install", "sqlalchemy"])
|
||||
|
||||
try:
|
||||
from pgvector.sqlalchemy import Vector
|
||||
except ImportError:
|
||||
print("The PgVectorVectorStore requires pgvector to be installed")
|
||||
print("pip install pgvector")
|
||||
subprocess.run(["pip", "install", "pgvector"])
|
||||
|
||||
class PostgresDB:
|
||||
"""
|
||||
A class representing a Postgres database.
|
||||
|
||||
@define
|
||||
class PgVectorVectorStore(BaseVectorStore):
|
||||
"""A vector store driver to Postgres using the PGVector extension.
|
||||
Args:
|
||||
connection_string (str): The connection string for the Postgres database.
|
||||
table_name (str): The name of the table in the database.
|
||||
|
||||
Attributes:
|
||||
connection_string: An optional string describing the target Postgres database instance.
|
||||
create_engine_params: Additional configuration params passed when creating the database connection.
|
||||
engine: An optional sqlalchemy Postgres engine to use.
|
||||
table_name: Optionally specify the name of the table to used to store vectors.
|
||||
|
||||
Methods:
|
||||
upsert_vector(vector: list[float], vector_id: Optional[str] = None, namespace: Optional[str] = None, meta: Optional[dict] = None, **kwargs) -> str:
|
||||
Upserts a vector into the index.
|
||||
load_entry(vector_id: str, namespace: Optional[str] = None) -> Optional[BaseVector.Entry]:
|
||||
Loads a single vector from the index.
|
||||
load_entries(namespace: Optional[str] = None) -> list[BaseVector.Entry]:
|
||||
Loads all vectors from the index.
|
||||
query(query: str, count: Optional[int] = None, namespace: Optional[str] = None, include_vectors: bool = False, include_metadata=True, **kwargs) -> list[BaseVector.QueryResult]:
|
||||
Queries the index for vectors similar to the given query string.
|
||||
setup(create_schema: bool = True, install_uuid_extension: bool = True, install_vector_extension: bool = True) -> None:
|
||||
Provides a mechanism to initialize the database schema and extensions.
|
||||
|
||||
Usage:
|
||||
>>> from swarms.memory.vector_stores.pgvector import PgVectorVectorStore
|
||||
>>> from swarms.utils.embeddings import USEEmbedding
|
||||
>>> from swarms.utils.hash import str_to_hash
|
||||
>>> from swarms.utils.dataframe import dataframe_to_hash
|
||||
>>> import pandas as pd
|
||||
>>>
|
||||
>>> # Create a new PgVectorVectorStore instance:
|
||||
>>> pv = PgVectorVectorStore(
|
||||
>>> connection_string="postgresql://postgres:password@localhost:5432/postgres",
|
||||
>>> table_name="your-table-name"
|
||||
>>> )
|
||||
>>> # Create a new index:
|
||||
>>> pv.setup()
|
||||
>>> # Create a new USEEmbedding instance:
|
||||
>>> use = USEEmbedding()
|
||||
>>> # Create a new dataframe:
|
||||
>>> df = pd.DataFrame({
|
||||
>>> "text": [
|
||||
>>> "This is a test",
|
||||
>>> "This is another test",
|
||||
>>> "This is a third test"
|
||||
>>> ]
|
||||
>>> })
|
||||
>>> # Embed the dataframe:
|
||||
>>> df["embedding"] = df["text"].apply(use.embed_string)
|
||||
>>> # Upsert the dataframe into the index:
|
||||
>>> pv.upsert_vector(
|
||||
>>> vector=df["embedding"].tolist(),
|
||||
>>> vector_id=dataframe_to_hash(df),
|
||||
>>> namespace="your-namespace"
|
||||
>>> )
|
||||
>>> # Query the index:
|
||||
>>> pv.query(
|
||||
>>> query="This is a test",
|
||||
>>> count=10,
|
||||
>>> namespace="your-namespace"
|
||||
>>> )
|
||||
>>> # Load a single entry from the index:
|
||||
>>> pv.load_entry(
|
||||
>>> vector_id=dataframe_to_hash(df),
|
||||
>>> namespace="your-namespace"
|
||||
>>> )
|
||||
>>> # Load all entries from the index:
|
||||
>>> pv.load_entries(
|
||||
>>> namespace="your-namespace"
|
||||
>>> )
|
||||
|
||||
engine: The SQLAlchemy engine for connecting to the database.
|
||||
table_name (str): The name of the table in the database.
|
||||
VectorModel: The SQLAlchemy model representing the vector table.
|
||||
|
||||
"""
|
||||
|
||||
connection_string: Optional[str] = field(
|
||||
default=None, kw_only=True
|
||||
)
|
||||
create_engine_params: dict = field(factory=dict, kw_only=True)
|
||||
engine: Optional[Engine] = field(default=None, kw_only=True)
|
||||
table_name: str = field(kw_only=True)
|
||||
_model: any = field(
|
||||
default=Factory(
|
||||
lambda self: self.default_vector_model(), takes_self=True
|
||||
)
|
||||
)
|
||||
|
||||
@connection_string.validator
|
||||
def validate_connection_string(
|
||||
self, _, connection_string: Optional[str]
|
||||
) -> None:
|
||||
# If an engine is provided, the connection string is not used.
|
||||
if self.engine is not None:
|
||||
return
|
||||
def __init__(
|
||||
self, connection_string: str, table_name: str, *args, **kwargs
|
||||
):
|
||||
"""
|
||||
Initializes a new instance of the PostgresDB class.
|
||||
|
||||
# If an engine is not provided, a connection string is required.
|
||||
if connection_string is None:
|
||||
raise ValueError(
|
||||
"An engine or connection string is required"
|
||||
)
|
||||
Args:
|
||||
connection_string (str): The connection string for the Postgres database.
|
||||
table_name (str): The name of the table in the database.
|
||||
|
||||
if not connection_string.startswith("postgresql://"):
|
||||
raise ValueError(
|
||||
"The connection string must describe a Postgres"
|
||||
" database connection"
|
||||
"""
|
||||
self.engine = create_engine(
|
||||
connection_string, *args, **kwargs
|
||||
)
|
||||
self.table_name = table_name
|
||||
self.VectorModel = self._create_vector_model()
|
||||
|
||||
@engine.validator
|
||||
def validate_engine(self, _, engine: Optional[Engine]) -> None:
|
||||
# If a connection string is provided, an engine does not need to be provided.
|
||||
if self.connection_string is not None:
|
||||
return
|
||||
def _create_vector_model(self):
|
||||
"""
|
||||
Creates the SQLAlchemy model for the vector table.
|
||||
|
||||
# If a connection string is not provided, an engine is required.
|
||||
if engine is None:
|
||||
raise ValueError(
|
||||
"An engine or connection string is required"
|
||||
)
|
||||
Returns:
|
||||
The SQLAlchemy model representing the vector table.
|
||||
|
||||
def __attrs_post_init__(self) -> None:
|
||||
"""If a an engine is provided, it will be used to connect to the database.
|
||||
If not, a connection string is used to create a new database connection here.
|
||||
"""
|
||||
if self.engine is None:
|
||||
self.engine = create_engine(
|
||||
self.connection_string, **self.create_engine_params
|
||||
)
|
||||
Base = declarative_base()
|
||||
|
||||
def setup(
|
||||
self,
|
||||
create_schema: bool = True,
|
||||
install_uuid_extension: bool = True,
|
||||
install_vector_extension: bool = True,
|
||||
) -> None:
|
||||
"""Provides a mechanism to initialize the database schema and extensions."""
|
||||
if install_uuid_extension:
|
||||
self.engine.execute(
|
||||
'CREATE EXTENSION IF NOT EXISTS "uuid-ossp";'
|
||||
)
|
||||
class VectorModel(Base):
|
||||
__tablename__ = self.table_name
|
||||
|
||||
if install_vector_extension:
|
||||
self.engine.execute(
|
||||
'CREATE EXTENSION IF NOT EXISTS "vector";'
|
||||
id = Column(
|
||||
UUID(as_uuid=True),
|
||||
primary_key=True,
|
||||
default=uuid.uuid4,
|
||||
unique=True,
|
||||
nullable=False,
|
||||
)
|
||||
vector = Column(
|
||||
String
|
||||
) # Assuming vector is stored as a string
|
||||
namespace = Column(String)
|
||||
meta = Column(JSON)
|
||||
|
||||
if create_schema:
|
||||
self._model.metadata.create_all(self.engine)
|
||||
return VectorModel
|
||||
|
||||
def upsert_vector(
|
||||
def add_or_update_vector(
|
||||
self,
|
||||
vector: list[float],
|
||||
vector: str,
|
||||
vector_id: Optional[str] = None,
|
||||
namespace: Optional[str] = None,
|
||||
meta: Optional[dict] = None,
|
||||
**kwargs,
|
||||
) -> str:
|
||||
"""Inserts or updates a vector in the collection."""
|
||||
) -> None:
|
||||
"""
|
||||
Adds or updates a vector in the database.
|
||||
|
||||
Args:
|
||||
vector (str): The vector to be added or updated.
|
||||
vector_id (str, optional): The ID of the vector. If not provided, a new ID will be generated.
|
||||
namespace (str, optional): The namespace of the vector.
|
||||
meta (dict, optional): Additional metadata associated with the vector.
|
||||
|
||||
"""
|
||||
try:
|
||||
with Session(self.engine) as session:
|
||||
obj = self._model(
|
||||
obj = self.VectorModel(
|
||||
id=vector_id,
|
||||
vector=vector,
|
||||
namespace=namespace,
|
||||
meta=meta,
|
||||
)
|
||||
|
||||
obj = session.merge(obj)
|
||||
session.merge(obj)
|
||||
session.commit()
|
||||
except Exception as e:
|
||||
print(f"Error adding or updating vector: {e}")
|
||||
|
||||
return str(obj.id)
|
||||
def query_vectors(
|
||||
self, query: Any, namespace: Optional[str] = None
|
||||
) -> List[Any]:
|
||||
"""
|
||||
Queries vectors from the database based on the given query and namespace.
|
||||
|
||||
def load_entry(
|
||||
self, vector_id: str, namespace: Optional[str] = None
|
||||
) -> BaseVectorStore.Entry:
|
||||
"""Retrieves a specific vector entry from the collection based on its identifier and optional namespace."""
|
||||
with Session(self.engine) as session:
|
||||
result = session.get(self._model, vector_id)
|
||||
Args:
|
||||
query (Any): The query or condition to filter the vectors.
|
||||
namespace (str, optional): The namespace of the vectors to be queried.
|
||||
|
||||
return BaseVectorStore.Entry(
|
||||
id=result.id,
|
||||
vector=result.vector,
|
||||
namespace=result.namespace,
|
||||
meta=result.meta,
|
||||
)
|
||||
Returns:
|
||||
List[Any]: A list of vectors that match the query and namespace.
|
||||
|
||||
def load_entries(
|
||||
self, namespace: Optional[str] = None
|
||||
) -> list[BaseVectorStore.Entry]:
|
||||
"""Retrieves all vector entries from the collection, optionally filtering to only
|
||||
those that match the provided namespace.
|
||||
"""
|
||||
try:
|
||||
with Session(self.engine) as session:
|
||||
query = session.query(self._model)
|
||||
q = session.query(self.VectorModel)
|
||||
if namespace:
|
||||
query = query.filter_by(namespace=namespace)
|
||||
|
||||
results = query.all()
|
||||
|
||||
return [
|
||||
BaseVectorStore.Entry(
|
||||
id=str(result.id),
|
||||
vector=result.vector,
|
||||
namespace=result.namespace,
|
||||
meta=result.meta,
|
||||
)
|
||||
for result in results
|
||||
]
|
||||
|
||||
def query(
|
||||
self,
|
||||
query: str,
|
||||
count: Optional[int] = BaseVectorStore.DEFAULT_QUERY_COUNT,
|
||||
namespace: Optional[str] = None,
|
||||
include_vectors: bool = False,
|
||||
distance_metric: str = "cosine_distance",
|
||||
**kwargs,
|
||||
) -> list[BaseVectorStore.QueryResult]:
|
||||
"""Performs a search on the collection to find vectors similar to the provided input vector,
|
||||
optionally filtering to only those that match the provided namespace.
|
||||
q = q.filter_by(namespace=namespace)
|
||||
# Assuming 'query' is a condition or filter
|
||||
q = q.filter(query)
|
||||
return q.all()
|
||||
except Exception as e:
|
||||
print(f"Error querying vectors: {e}")
|
||||
return []
|
||||
|
||||
def delete_vector(self, vector_id):
|
||||
"""
|
||||
distance_metrics = {
|
||||
"cosine_distance": self._model.vector.cosine_distance,
|
||||
"l2_distance": self._model.vector.l2_distance,
|
||||
"inner_product": self._model.vector.max_inner_product,
|
||||
}
|
||||
Deletes a vector from the database based on the given vector ID.
|
||||
|
||||
if distance_metric not in distance_metrics:
|
||||
raise ValueError("Invalid distance metric provided")
|
||||
|
||||
op = distance_metrics[distance_metric]
|
||||
Args:
|
||||
vector_id: The ID of the vector to be deleted.
|
||||
|
||||
"""
|
||||
try:
|
||||
with Session(self.engine) as session:
|
||||
vector = self.embedding_driver.embed_string(query)
|
||||
|
||||
# The query should return both the vector and the distance metric score.
|
||||
query = session.query(
|
||||
self._model,
|
||||
op(vector).label("score"),
|
||||
).order_by(op(vector))
|
||||
|
||||
if namespace:
|
||||
query = query.filter_by(namespace=namespace)
|
||||
|
||||
results = query.limit(count).all()
|
||||
|
||||
return [
|
||||
BaseVectorStore.QueryResult(
|
||||
id=str(result[0].id),
|
||||
vector=(
|
||||
result[0].vector if include_vectors else None
|
||||
),
|
||||
score=result[1],
|
||||
meta=result[0].meta,
|
||||
namespace=result[0].namespace,
|
||||
)
|
||||
for result in results
|
||||
]
|
||||
|
||||
def default_vector_model(self) -> any:
|
||||
Base = declarative_base()
|
||||
|
||||
@dataclass
|
||||
class VectorModel(Base):
|
||||
__tablename__ = self.table_name
|
||||
|
||||
id = Column(
|
||||
UUID(as_uuid=True),
|
||||
primary_key=True,
|
||||
default=uuid.uuid4,
|
||||
unique=True,
|
||||
nullable=False,
|
||||
)
|
||||
vector = Column(Vector())
|
||||
namespace = Column(String)
|
||||
meta = Column(JSON)
|
||||
|
||||
return VectorModel
|
||||
obj = session.get(self.VectorModel, vector_id)
|
||||
if obj:
|
||||
session.delete(obj)
|
||||
session.commit()
|
||||
except Exception as e:
|
||||
print(f"Error deleting vector: {e}")
|
||||
|
@ -0,0 +1,120 @@
|
||||
from typing import List, Tuple, Any, Optional
|
||||
from swarms.memory.base_vectordb import VectorDatabase
|
||||
|
||||
try:
|
||||
import sqlite3
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"Please install sqlite3 to use the SQLiteDB class."
|
||||
)
|
||||
|
||||
|
||||
class SQLiteDB(VectorDatabase):
|
||||
"""
|
||||
A reusable class for SQLite database operations with methods for adding,
|
||||
deleting, updating, and querying data.
|
||||
|
||||
Attributes:
|
||||
db_path (str): The file path to the SQLite database.
|
||||
"""
|
||||
|
||||
def __init__(self, db_path: str):
|
||||
"""
|
||||
Initializes the SQLiteDB class with the given database path.
|
||||
|
||||
Args:
|
||||
db_path (str): The file path to the SQLite database.
|
||||
"""
|
||||
self.db_path = db_path
|
||||
|
||||
def execute_query(
|
||||
self, query: str, params: Optional[Tuple[Any, ...]] = None
|
||||
) -> List[Tuple]:
|
||||
"""
|
||||
Executes a SQL query and returns fetched results.
|
||||
|
||||
Args:
|
||||
query (str): The SQL query to execute.
|
||||
params (Tuple[Any, ...], optional): The parameters to substitute into the query.
|
||||
|
||||
Returns:
|
||||
List[Tuple]: The results fetched from the database.
|
||||
"""
|
||||
try:
|
||||
with sqlite3.connect(self.db_path) as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(query, params or ())
|
||||
return cursor.fetchall()
|
||||
except Exception as error:
|
||||
print(f"Error executing query: {error}")
|
||||
raise error
|
||||
|
||||
def add(self, query: str, params: Tuple[Any, ...]) -> None:
|
||||
"""
|
||||
Adds a new entry to the database.
|
||||
|
||||
Args:
|
||||
query (str): The SQL query for insertion.
|
||||
params (Tuple[Any, ...]): The parameters to substitute into the query.
|
||||
"""
|
||||
try:
|
||||
with sqlite3.connect(self.db_path) as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(query, params)
|
||||
conn.commit()
|
||||
except Exception as error:
|
||||
print(f"Error adding new entry: {error}")
|
||||
raise error
|
||||
|
||||
def delete(self, query: str, params: Tuple[Any, ...]) -> None:
|
||||
"""
|
||||
Deletes an entry from the database.
|
||||
|
||||
Args:
|
||||
query (str): The SQL query for deletion.
|
||||
params (Tuple[Any, ...]): The parameters to substitute into the query.
|
||||
"""
|
||||
try:
|
||||
with sqlite3.connect(self.db_path) as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(query, params)
|
||||
conn.commit()
|
||||
except Exception as error:
|
||||
print(f"Error deleting entry: {error}")
|
||||
raise error
|
||||
|
||||
def update(self, query: str, params: Tuple[Any, ...]) -> None:
|
||||
"""
|
||||
Updates an entry in the database.
|
||||
|
||||
Args:
|
||||
query (str): The SQL query for updating.
|
||||
params (Tuple[Any, ...]): The parameters to substitute into the query.
|
||||
"""
|
||||
try:
|
||||
with sqlite3.connect(self.db_path) as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(query, params)
|
||||
conn.commit()
|
||||
except Exception as error:
|
||||
print(f"Error updating entry: {error}")
|
||||
raise error
|
||||
|
||||
def query(
|
||||
self, query: str, params: Optional[Tuple[Any, ...]] = None
|
||||
) -> List[Tuple]:
|
||||
"""
|
||||
Fetches data from the database based on a query.
|
||||
|
||||
Args:
|
||||
query (str): The SQL query to execute.
|
||||
params (Tuple[Any, ...], optional): The parameters to substitute into the query.
|
||||
|
||||
Returns:
|
||||
List[Tuple]: The results fetched from the database.
|
||||
"""
|
||||
try:
|
||||
return self.execute_query(query, params)
|
||||
except Exception as error:
|
||||
print(f"Error querying database: {error}")
|
||||
raise error
|
@ -0,0 +1,182 @@
|
||||
"""
|
||||
Weaviate API Client
|
||||
"""
|
||||
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from swarms.memory.base_vectordb import VectorDatabase
|
||||
|
||||
try:
|
||||
import weaviate
|
||||
except ImportError:
|
||||
print("pip install weaviate-client")
|
||||
|
||||
|
||||
class WeaviateDB(VectorDatabase):
|
||||
"""
|
||||
|
||||
Weaviate API Client
|
||||
Interface to Weaviate, a vector database with a GraphQL API.
|
||||
|
||||
Args:
|
||||
http_host (str): The HTTP host of the Weaviate server.
|
||||
http_port (str): The HTTP port of the Weaviate server.
|
||||
http_secure (bool): Whether to use HTTPS.
|
||||
grpc_host (Optional[str]): The gRPC host of the Weaviate server.
|
||||
grpc_port (Optional[str]): The gRPC port of the Weaviate server.
|
||||
grpc_secure (Optional[bool]): Whether to use gRPC over TLS.
|
||||
auth_client_secret (Optional[Any]): The authentication client secret.
|
||||
additional_headers (Optional[Dict[str, str]]): Additional headers to send with requests.
|
||||
additional_config (Optional[weaviate.AdditionalConfig]): Additional configuration for the client.
|
||||
|
||||
Methods:
|
||||
create_collection: Create a new collection in Weaviate.
|
||||
add: Add an object to a specified collection.
|
||||
query: Query objects from a specified collection.
|
||||
update: Update an object in a specified collection.
|
||||
delete: Delete an object from a specified collection.
|
||||
|
||||
Examples:
|
||||
>>> from swarms.memory import WeaviateDB
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
http_host: str,
|
||||
http_port: str,
|
||||
http_secure: bool,
|
||||
grpc_host: Optional[str] = None,
|
||||
grpc_port: Optional[str] = None,
|
||||
grpc_secure: Optional[bool] = None,
|
||||
auth_client_secret: Optional[Any] = None,
|
||||
additional_headers: Optional[Dict[str, str]] = None,
|
||||
additional_config: Optional[Any] = None,
|
||||
connection_params: Dict[str, Any] = None,
|
||||
*args,
|
||||
**kwargs,
|
||||
):
|
||||
super().__init__(*args, **kwargs)
|
||||
self.http_host = http_host
|
||||
self.http_port = http_port
|
||||
self.http_secure = http_secure
|
||||
self.grpc_host = grpc_host
|
||||
self.grpc_port = grpc_port
|
||||
self.grpc_secure = grpc_secure
|
||||
self.auth_client_secret = auth_client_secret
|
||||
self.additional_headers = additional_headers
|
||||
self.additional_config = additional_config
|
||||
self.connection_params = connection_params
|
||||
|
||||
# If connection_params are provided, use them to initialize the client.
|
||||
connection_params = weaviate.ConnectionParams.from_params(
|
||||
http_host=http_host,
|
||||
http_port=http_port,
|
||||
http_secure=http_secure,
|
||||
grpc_host=grpc_host,
|
||||
grpc_port=grpc_port,
|
||||
grpc_secure=grpc_secure,
|
||||
)
|
||||
|
||||
# If additional headers are provided, add them to the connection params.
|
||||
self.client = weaviate.WeaviateDB(
|
||||
connection_params=connection_params,
|
||||
auth_client_secret=auth_client_secret,
|
||||
additional_headers=additional_headers,
|
||||
additional_config=additional_config,
|
||||
)
|
||||
|
||||
def create_collection(
|
||||
self,
|
||||
name: str,
|
||||
properties: List[Dict[str, Any]],
|
||||
vectorizer_config: Any = None,
|
||||
):
|
||||
"""Create a new collection in Weaviate.
|
||||
|
||||
Args:
|
||||
name (str): _description_
|
||||
properties (List[Dict[str, Any]]): _description_
|
||||
vectorizer_config (Any, optional): _description_. Defaults to None.
|
||||
"""
|
||||
try:
|
||||
out = self.client.collections.create(
|
||||
name=name,
|
||||
vectorizer_config=vectorizer_config,
|
||||
properties=properties,
|
||||
)
|
||||
print(out)
|
||||
except Exception as error:
|
||||
print(f"Error creating collection: {error}")
|
||||
raise
|
||||
|
||||
def add(self, collection_name: str, properties: Dict[str, Any]):
|
||||
"""Add an object to a specified collection.
|
||||
|
||||
Args:
|
||||
collection_name (str): _description_
|
||||
properties (Dict[str, Any]): _description_
|
||||
|
||||
Returns:
|
||||
_type_: _description_
|
||||
"""
|
||||
try:
|
||||
collection = self.client.collections.get(collection_name)
|
||||
return collection.data.insert(properties)
|
||||
except Exception as error:
|
||||
print(f"Error adding object: {error}")
|
||||
raise
|
||||
|
||||
def query(
|
||||
self, collection_name: str, query: str, limit: int = 10
|
||||
):
|
||||
"""Query objects from a specified collection.
|
||||
|
||||
Args:
|
||||
collection_name (str): _description_
|
||||
query (str): _description_
|
||||
limit (int, optional): _description_. Defaults to 10.
|
||||
|
||||
Returns:
|
||||
_type_: _description_
|
||||
"""
|
||||
try:
|
||||
collection = self.client.collections.get(collection_name)
|
||||
response = collection.query.bm25(query=query, limit=limit)
|
||||
return [o.properties for o in response.objects]
|
||||
except Exception as error:
|
||||
print(f"Error querying objects: {error}")
|
||||
raise
|
||||
|
||||
def update(
|
||||
self,
|
||||
collection_name: str,
|
||||
object_id: str,
|
||||
properties: Dict[str, Any],
|
||||
):
|
||||
"""UPdate an object in a specified collection.
|
||||
|
||||
Args:
|
||||
collection_name (str): _description_
|
||||
object_id (str): _description_
|
||||
properties (Dict[str, Any]): _description_
|
||||
"""
|
||||
try:
|
||||
collection = self.client.collections.get(collection_name)
|
||||
collection.data.update(object_id, properties)
|
||||
except Exception as error:
|
||||
print(f"Error updating object: {error}")
|
||||
raise
|
||||
|
||||
def delete(self, collection_name: str, object_id: str):
|
||||
"""Delete an object from a specified collection.
|
||||
|
||||
Args:
|
||||
collection_name (str): _description_
|
||||
object_id (str): _description_
|
||||
"""
|
||||
try:
|
||||
collection = self.client.collections.get(collection_name)
|
||||
collection.data.delete_by_id(object_id)
|
||||
except Exception as error:
|
||||
print(f"Error deleting object: {error}")
|
||||
raise
|
@ -1,183 +0,0 @@
|
||||
"""
|
||||
|
||||
|
||||
BiomedCLIP-PubMedBERT_256-vit_base_patch16_224
|
||||
https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224
|
||||
BiomedCLIP is a biomedical vision-language foundation model that is pretrained on PMC-15M,
|
||||
a dataset of 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central, using contrastive learning. It uses PubMedBERT as the text encoder and Vision Transformer as the image encoder, with domain-specific adaptations. It can perform various vision-language processing (VLP) tasks such as cross-modal retrieval, image classification, and visual question answering. BiomedCLIP establishes new state of the art in a wide range of standard datasets, and substantially outperforms prior VLP approaches:
|
||||
|
||||
|
||||
|
||||
Citation
|
||||
@misc{https://doi.org/10.48550/arXiv.2303.00915,
|
||||
doi = {10.48550/ARXIV.2303.00915},
|
||||
url = {https://arxiv.org/abs/2303.00915},
|
||||
author = {Zhang, Sheng and Xu, Yanbo and Usuyama, Naoto and Bagga, Jaspreet and Tinn, Robert and Preston, Sam and Rao, Rajesh and Wei, Mu and Valluri, Naveen and Wong, Cliff and Lungren, Matthew and Naumann, Tristan and Poon, Hoifung},
|
||||
title = {Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing},
|
||||
publisher = {arXiv},
|
||||
year = {2023},
|
||||
}
|
||||
|
||||
Model Use
|
||||
How to use
|
||||
Please refer to this example notebook.
|
||||
|
||||
Intended Use
|
||||
This model is intended to be used solely for (I) future research on visual-language processing and (II) reproducibility of the experimental results reported in the reference paper.
|
||||
|
||||
Primary Intended Use
|
||||
The primary intended use is to support AI researchers building on top of this work. BiomedCLIP and its associated models should be helpful for exploring various biomedical VLP research questions, especially in the radiology domain.
|
||||
|
||||
Out-of-Scope Use
|
||||
Any deployed use case of the model --- commercial or otherwise --- is currently out of scope. Although we evaluated the models using a broad set of publicly-available research benchmarks, the models and evaluations are not intended for deployed use cases. Please refer to the associated paper for more details.
|
||||
|
||||
Data
|
||||
This model builds upon PMC-15M dataset, which is a large-scale parallel image-text dataset for biomedical vision-language processing. It contains 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central. It covers a diverse range of biomedical image types, such as microscopy, radiography, histology, and more.
|
||||
|
||||
Limitations
|
||||
This model was developed using English corpora, and thus can be considered English-only.
|
||||
|
||||
Further information
|
||||
Please refer to the corresponding paper, "Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing" for additional details on the model training and evaluation.
|
||||
"""
|
||||
|
||||
import open_clip
|
||||
import torch
|
||||
from PIL import Image
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
|
||||
class BioClip:
|
||||
"""
|
||||
BioClip
|
||||
|
||||
Args:
|
||||
model_path (str): path to the model
|
||||
|
||||
Attributes:
|
||||
model_path (str): path to the model
|
||||
model (torch.nn.Module): the model
|
||||
preprocess_train (torchvision.transforms.Compose): the preprocessing pipeline for training
|
||||
preprocess_val (torchvision.transforms.Compose): the preprocessing pipeline for validation
|
||||
tokenizer (open_clip.Tokenizer): the tokenizer
|
||||
device (torch.device): the device to run the model on
|
||||
|
||||
Methods:
|
||||
__call__(self, img_path: str, labels: list, template: str = 'this is a photo of ', context_length: int = 256):
|
||||
returns a dictionary of labels and their probabilities
|
||||
plot_image_with_metadata(img_path: str, metadata: dict): plots the image with the metadata
|
||||
|
||||
Usage:
|
||||
clip = BioClip('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')
|
||||
|
||||
labels = [
|
||||
'adenocarcinoma histopathology',
|
||||
'brain MRI',
|
||||
'covid line chart',
|
||||
'squamous cell carcinoma histopathology',
|
||||
'immunohistochemistry histopathology',
|
||||
'bone X-ray',
|
||||
'chest X-ray',
|
||||
'pie chart',
|
||||
'hematoxylin and eosin histopathology'
|
||||
]
|
||||
|
||||
result = clip("your_image_path.jpg", labels)
|
||||
metadata = {'filename': "your_image_path.jpg".split('/')[-1], 'top_probs': result}
|
||||
clip.plot_image_with_metadata("your_image_path.jpg", metadata)
|
||||
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self, model_path: str):
|
||||
self.model_path = model_path
|
||||
(
|
||||
self.model,
|
||||
self.preprocess_train,
|
||||
self.preprocess_val,
|
||||
) = open_clip.create_model_and_transforms(model_path)
|
||||
self.tokenizer = open_clip.get_tokenizer(model_path)
|
||||
self.device = (
|
||||
torch.device("cuda")
|
||||
if torch.cuda.is_available()
|
||||
else torch.device("cpu")
|
||||
)
|
||||
self.model.to(self.device)
|
||||
self.model.eval()
|
||||
|
||||
def __call__(
|
||||
self,
|
||||
img_path: str,
|
||||
labels: list,
|
||||
template: str = "this is a photo of ",
|
||||
context_length: int = 256,
|
||||
):
|
||||
image = torch.stack(
|
||||
[self.preprocess_val(Image.open(img_path))]
|
||||
).to(self.device)
|
||||
texts = self.tokenizer(
|
||||
[template + l for l in labels],
|
||||
context_length=context_length,
|
||||
).to(self.device)
|
||||
|
||||
with torch.no_grad():
|
||||
image_features, text_features, logit_scale = self.model(
|
||||
image, texts
|
||||
)
|
||||
logits = (
|
||||
(logit_scale * image_features @ text_features.t())
|
||||
.detach()
|
||||
.softmax(dim=-1)
|
||||
)
|
||||
sorted_indices = torch.argsort(
|
||||
logits, dim=-1, descending=True
|
||||
)
|
||||
logits = logits.cpu().numpy()
|
||||
sorted_indices = sorted_indices.cpu().numpy()
|
||||
|
||||
results = {}
|
||||
for idx in sorted_indices[0]:
|
||||
label = labels[idx]
|
||||
prob = logits[0][idx]
|
||||
results[label] = prob
|
||||
return results
|
||||
|
||||
@staticmethod
|
||||
def plot_image_with_metadata(img_path: str, metadata: dict):
|
||||
img = Image.open(img_path)
|
||||
fig, ax = plt.subplots(figsize=(5, 5))
|
||||
ax.imshow(img)
|
||||
ax.axis("off")
|
||||
title = (
|
||||
metadata["filename"]
|
||||
+ "\n"
|
||||
+ "\n".join(
|
||||
[
|
||||
f"{k}: {v*100:.1f}"
|
||||
for k, v in metadata["top_probs"].items()
|
||||
]
|
||||
)
|
||||
)
|
||||
ax.set_title(title, fontsize=14)
|
||||
plt.tight_layout()
|
||||
plt.show()
|
||||
|
||||
|
||||
# Usage
|
||||
# clip = BioClip('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')
|
||||
|
||||
# labels = [
|
||||
# 'adenocarcinoma histopathology',
|
||||
# 'brain MRI',
|
||||
# 'covid line chart',
|
||||
# 'squamous cell carcinoma histopathology',
|
||||
# 'immunohistochemistry histopathology',
|
||||
# 'bone X-ray',
|
||||
# 'chest X-ray',
|
||||
# 'pie chart',
|
||||
# 'hematoxylin and eosin histopathology'
|
||||
# ]
|
||||
|
||||
# result = clip("your_image_path.jpg", labels)
|
||||
# metadata = {'filename': "your_image_path.jpg".split('/')[-1], 'top_probs': result}
|
||||
# clip.plot_image_with_metadata("your_image_path.jpg", metadata)
|
@ -1,131 +0,0 @@
|
||||
from typing import List, Tuple
|
||||
|
||||
from PIL import Image
|
||||
from pydantic import BaseModel, model_validator, validator
|
||||
from transformers import AutoModelForVision2Seq, AutoProcessor
|
||||
|
||||
|
||||
# Assuming the Detections class represents the output of the model prediction
|
||||
class Detections(BaseModel):
|
||||
xyxy: List[Tuple[float, float, float, float]]
|
||||
class_id: List[int]
|
||||
confidence: List[float]
|
||||
|
||||
@model_validator
|
||||
def check_length(cls, values):
|
||||
assert (
|
||||
len(values.get("xyxy"))
|
||||
== len(values.get("class_id"))
|
||||
== len(values.get("confidence"))
|
||||
), "All fields must have the same length."
|
||||
return values
|
||||
|
||||
@validator(
|
||||
"xyxy", "class_id", "confidence", pre=True, each_item=True
|
||||
)
|
||||
def check_not_empty(cls, v):
|
||||
if isinstance(v, list) and len(v) == 0:
|
||||
raise ValueError("List must not be empty")
|
||||
return v
|
||||
|
||||
@classmethod
|
||||
def empty(cls):
|
||||
return cls(xyxy=[], class_id=[], confidence=[])
|
||||
|
||||
|
||||
class Kosmos2(BaseModel):
|
||||
"""
|
||||
Kosmos2
|
||||
|
||||
Args:
|
||||
------
|
||||
model: AutoModelForVision2Seq
|
||||
processor: AutoProcessor
|
||||
|
||||
Usage:
|
||||
------
|
||||
>>> from swarms import Kosmos2
|
||||
>>> from swarms.models.kosmos2 import Detections
|
||||
>>> from PIL import Image
|
||||
>>> model = Kosmos2.initialize()
|
||||
>>> image = Image.open("path_to_image.jpg")
|
||||
>>> detections = model(image)
|
||||
>>> print(detections)
|
||||
|
||||
"""
|
||||
|
||||
model: AutoModelForVision2Seq
|
||||
processor: AutoProcessor
|
||||
|
||||
@classmethod
|
||||
def initialize(cls):
|
||||
model = AutoModelForVision2Seq.from_pretrained(
|
||||
"ydshieh/kosmos-2-patch14-224", trust_remote_code=True
|
||||
)
|
||||
processor = AutoProcessor.from_pretrained(
|
||||
"ydshieh/kosmos-2-patch14-224", trust_remote_code=True
|
||||
)
|
||||
return cls(model=model, processor=processor)
|
||||
|
||||
def __call__(self, img: str) -> Detections:
|
||||
image = Image.open(img)
|
||||
prompt = "<grounding>An image of"
|
||||
|
||||
inputs = self.processor(
|
||||
text=prompt, images=image, return_tensors="pt"
|
||||
)
|
||||
outputs = self.model.generate(
|
||||
**inputs, use_cache=True, max_new_tokens=64
|
||||
)
|
||||
|
||||
generated_text = self.processor.batch_decode(
|
||||
outputs, skip_special_tokens=True
|
||||
)[0]
|
||||
|
||||
# The actual processing of generated_text to entities would go here
|
||||
# For the purpose of this example, assume a mock function 'extract_entities' exists:
|
||||
entities = self.extract_entities(generated_text)
|
||||
|
||||
# Convert entities to detections format
|
||||
detections = self.process_entities_to_detections(
|
||||
entities, image
|
||||
)
|
||||
return detections
|
||||
|
||||
def extract_entities(
|
||||
self, text: str
|
||||
) -> List[Tuple[str, Tuple[float, float, float, float]]]:
|
||||
# Placeholder function for entity extraction
|
||||
# This should be replaced with the actual method of extracting entities
|
||||
return []
|
||||
|
||||
def process_entities_to_detections(
|
||||
self,
|
||||
entities: List[Tuple[str, Tuple[float, float, float, float]]],
|
||||
image: Image.Image,
|
||||
) -> Detections:
|
||||
if not entities:
|
||||
return Detections.empty()
|
||||
|
||||
class_ids = [0] * len(
|
||||
entities
|
||||
) # Replace with actual class ID extraction logic
|
||||
xyxys = [
|
||||
(
|
||||
e[1][0] * image.width,
|
||||
e[1][1] * image.height,
|
||||
e[1][2] * image.width,
|
||||
e[1][3] * image.height,
|
||||
)
|
||||
for e in entities
|
||||
]
|
||||
confidences = [1.0] * len(entities) # Placeholder confidence
|
||||
|
||||
return Detections(
|
||||
xyxy=xyxys, class_id=class_ids, confidence=confidences
|
||||
)
|
||||
|
||||
|
||||
# Usage:
|
||||
# kosmos2 = Kosmos2.initialize()
|
||||
# detections = kosmos2(img="path_to_image.jpg")
|
@ -0,0 +1,73 @@
|
||||
from typing import Optional
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
from swarms.models.base_llm import AbstractLLM
|
||||
|
||||
|
||||
class Mixtral(AbstractLLM):
|
||||
"""Mixtral model.
|
||||
|
||||
Args:
|
||||
model_name (str): The name or path of the pre-trained Mixtral model.
|
||||
max_new_tokens (int): The maximum number of new tokens to generate.
|
||||
*args: Variable length argument list.
|
||||
|
||||
|
||||
Examples:
|
||||
>>> from swarms.models import Mixtral
|
||||
>>> mixtral = Mixtral()
|
||||
>>> mixtral.run("Test task")
|
||||
'Generated text'
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model_name: str = "mistralai/Mixtral-8x7B-v0.1",
|
||||
max_new_tokens: int = 500,
|
||||
*args,
|
||||
**kwargs,
|
||||
):
|
||||
"""
|
||||
Initializes a Mixtral model.
|
||||
|
||||
Args:
|
||||
model_name (str): The name or path of the pre-trained Mixtral model.
|
||||
max_new_tokens (int): The maximum number of new tokens to generate.
|
||||
*args: Variable length argument list.
|
||||
**kwargs: Arbitrary keyword arguments.
|
||||
"""
|
||||
super().__init__(*args, **kwargs)
|
||||
self.model_name = model_name
|
||||
self.max_new_tokens = max_new_tokens
|
||||
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
self.model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name, *args, **kwargs
|
||||
)
|
||||
|
||||
def run(self, task: Optional[str] = None, **kwargs):
|
||||
"""
|
||||
Generates text based on the given task.
|
||||
|
||||
Args:
|
||||
task (str, optional): The task or prompt for text generation.
|
||||
|
||||
Returns:
|
||||
str: The generated text.
|
||||
"""
|
||||
try:
|
||||
inputs = self.tokenizer(task, return_tensors="pt")
|
||||
|
||||
outputs = self.model.generate(
|
||||
**inputs,
|
||||
max_new_tokens=self.max_new_tokens,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
out = self.tokenizer.decode(
|
||||
outputs[0],
|
||||
skip_special_tokens=True,
|
||||
)
|
||||
|
||||
return out
|
||||
except Exception as error:
|
||||
print(f"There is an error: {error} in Mixtral model.")
|
||||
raise error
|
@ -1,74 +0,0 @@
|
||||
from typing import Dict, List, Optional
|
||||
from dataclass import dataclass
|
||||
|
||||
from swarms.models import OpenAI
|
||||
|
||||
|
||||
@dataclass
|
||||
class OpenAIAssistant:
|
||||
name: str = "OpenAI Assistant"
|
||||
instructions: str = None
|
||||
tools: List[Dict] = None
|
||||
model: str = None
|
||||
openai_api_key: str = None
|
||||
temperature: float = 0.5
|
||||
max_tokens: int = 100
|
||||
stop: List[str] = None
|
||||
echo: bool = False
|
||||
stream: bool = False
|
||||
log: bool = False
|
||||
presence: bool = False
|
||||
dashboard: bool = False
|
||||
debug: bool = False
|
||||
max_loops: int = 5
|
||||
stopping_condition: Optional[str] = None
|
||||
loop_interval: int = 1
|
||||
retry_attempts: int = 3
|
||||
retry_interval: int = 1
|
||||
interactive: bool = False
|
||||
dynamic_temperature: bool = False
|
||||
state: Dict = None
|
||||
response_filters: List = None
|
||||
response_filter: Dict = None
|
||||
response_filter_name: str = None
|
||||
response_filter_value: str = None
|
||||
response_filter_type: str = None
|
||||
response_filter_action: str = None
|
||||
response_filter_action_value: str = None
|
||||
response_filter_action_type: str = None
|
||||
response_filter_action_name: str = None
|
||||
client = OpenAI()
|
||||
role: str = "user"
|
||||
instructions: str = None
|
||||
|
||||
def create_assistant(self, task: str):
|
||||
assistant = self.client.create_assistant(
|
||||
name=self.name,
|
||||
instructions=self.instructions,
|
||||
tools=self.tools,
|
||||
model=self.model,
|
||||
)
|
||||
return assistant
|
||||
|
||||
def create_thread(self):
|
||||
thread = self.client.beta.threads.create()
|
||||
return thread
|
||||
|
||||
def add_message_to_thread(self, thread_id: str, message: str):
|
||||
message = self.client.beta.threads.add_message(
|
||||
thread_id=thread_id, role=self.user, content=message
|
||||
)
|
||||
return message
|
||||
|
||||
def run(self, task: str):
|
||||
run = self.client.beta.threads.runs.create(
|
||||
thread_id=self.create_thread().id,
|
||||
assistant_id=self.create_assistant().id,
|
||||
instructions=self.instructions,
|
||||
)
|
||||
|
||||
out = self.client.beta.threads.runs.retrieve(
|
||||
thread_id=run.thread_id, run_id=run.id
|
||||
)
|
||||
|
||||
return out
|
@ -1,23 +0,0 @@
|
||||
import os
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI()
|
||||
|
||||
|
||||
def get_ada_embeddings(
|
||||
text: str, model: str = "text-embedding-ada-002"
|
||||
):
|
||||
"""
|
||||
Simple function to get embeddings from ada
|
||||
|
||||
Usage:
|
||||
>>> get_ada_embeddings("Hello World")
|
||||
>>> get_ada_embeddings("Hello World", model="text-embedding-ada-001")
|
||||
|
||||
"""
|
||||
|
||||
text = text.replace("\n", " ")
|
||||
|
||||
return client.embeddings.create(input=[text], model=model)[
|
||||
"data"
|
||||
][0]["embedding"]
|
@ -1,138 +0,0 @@
|
||||
import os
|
||||
import subprocess
|
||||
|
||||
try:
|
||||
import whisperx
|
||||
from pydub import AudioSegment
|
||||
from pytube import YouTube
|
||||
except Exception as error:
|
||||
print("Error importing pytube. Please install pytube manually.")
|
||||
print("pip install pytube")
|
||||
print("pip install pydub")
|
||||
print("pip install whisperx")
|
||||
print(f"Pytube error: {error}")
|
||||
|
||||
|
||||
class WhisperX:
|
||||
def __init__(
|
||||
self,
|
||||
video_url,
|
||||
audio_format="mp3",
|
||||
device="cuda",
|
||||
batch_size=16,
|
||||
compute_type="float16",
|
||||
hf_api_key=None,
|
||||
):
|
||||
"""
|
||||
# Example usage
|
||||
video_url = "url"
|
||||
speech_to_text = WhisperX(video_url)
|
||||
transcription = speech_to_text.transcribe_youtube_video()
|
||||
print(transcription)
|
||||
|
||||
"""
|
||||
self.video_url = video_url
|
||||
self.audio_format = audio_format
|
||||
self.device = device
|
||||
self.batch_size = batch_size
|
||||
self.compute_type = compute_type
|
||||
self.hf_api_key = hf_api_key
|
||||
|
||||
def install(self):
|
||||
subprocess.run(["pip", "install", "whisperx"])
|
||||
subprocess.run(["pip", "install", "pytube"])
|
||||
subprocess.run(["pip", "install", "pydub"])
|
||||
|
||||
def download_youtube_video(self):
|
||||
audio_file = f"video.{self.audio_format}"
|
||||
|
||||
# Download video 📥
|
||||
yt = YouTube(self.video_url)
|
||||
yt_stream = yt.streams.filter(only_audio=True).first()
|
||||
yt_stream.download(filename="video.mp4")
|
||||
|
||||
# Convert video to audio 🎧
|
||||
video = AudioSegment.from_file("video.mp4", format="mp4")
|
||||
video.export(audio_file, format=self.audio_format)
|
||||
os.remove("video.mp4")
|
||||
|
||||
return audio_file
|
||||
|
||||
def transcribe_youtube_video(self):
|
||||
audio_file = self.download_youtube_video()
|
||||
|
||||
device = "cuda"
|
||||
batch_size = 16
|
||||
compute_type = "float16"
|
||||
|
||||
# 1. Transcribe with original Whisper (batched) 🗣️
|
||||
model = whisperx.load_model(
|
||||
"large-v2", device, compute_type=compute_type
|
||||
)
|
||||
audio = whisperx.load_audio(audio_file)
|
||||
result = model.transcribe(audio, batch_size=batch_size)
|
||||
|
||||
# 2. Align Whisper output 🔍
|
||||
model_a, metadata = whisperx.load_align_model(
|
||||
language_code=result["language"], device=device
|
||||
)
|
||||
result = whisperx.align(
|
||||
result["segments"],
|
||||
model_a,
|
||||
metadata,
|
||||
audio,
|
||||
device,
|
||||
return_char_alignments=False,
|
||||
)
|
||||
|
||||
# 3. Assign speaker labels 🏷️
|
||||
diarize_model = whisperx.DiarizationPipeline(
|
||||
use_auth_token=self.hf_api_key, device=device
|
||||
)
|
||||
diarize_model(audio_file)
|
||||
|
||||
try:
|
||||
segments = result["segments"]
|
||||
transcription = " ".join(
|
||||
segment["text"] for segment in segments
|
||||
)
|
||||
return transcription
|
||||
except KeyError:
|
||||
print("The key 'segments' is not found in the result.")
|
||||
|
||||
def transcribe(self, audio_file):
|
||||
model = whisperx.load_model(
|
||||
"large-v2", self.device, self.compute_type
|
||||
)
|
||||
audio = whisperx.load_audio(audio_file)
|
||||
result = model.transcribe(audio, batch_size=self.batch_size)
|
||||
|
||||
# 2. Align Whisper output 🔍
|
||||
model_a, metadata = whisperx.load_align_model(
|
||||
language_code=result["language"], device=self.device
|
||||
)
|
||||
|
||||
result = whisperx.align(
|
||||
result["segments"],
|
||||
model_a,
|
||||
metadata,
|
||||
audio,
|
||||
self.device,
|
||||
return_char_alignments=False,
|
||||
)
|
||||
|
||||
# 3. Assign speaker labels 🏷️
|
||||
diarize_model = whisperx.DiarizationPipeline(
|
||||
use_auth_token=self.hf_api_key, device=self.device
|
||||
)
|
||||
|
||||
diarize_model(audio_file)
|
||||
|
||||
try:
|
||||
segments = result["segments"]
|
||||
transcription = " ".join(
|
||||
segment["text"] for segment in segments
|
||||
)
|
||||
return transcription
|
||||
except KeyError:
|
||||
print("The key 'segments' is not found in the result.")
|
@ -1,5 +1,21 @@
|
||||
from swarms.structs.agent import Agent
|
||||
from swarms.structs.sequential_workflow import SequentialWorkflow
|
||||
from swarms.structs.autoscaler import AutoScaler
|
||||
from swarms.structs.conversation import Conversation
|
||||
from swarms.structs.schemas import (
|
||||
TaskInput,
|
||||
Artifact,
|
||||
ArtifactUpload,
|
||||
StepInput,
|
||||
)
|
||||
|
||||
__all__ = ["Agent", "SequentialWorkflow", "AutoScaler"]
|
||||
__all__ = [
|
||||
"Agent",
|
||||
"SequentialWorkflow",
|
||||
"AutoScaler",
|
||||
"Conversation",
|
||||
"TaskInput",
|
||||
"Artifact",
|
||||
"ArtifactUpload",
|
||||
"StepInput",
|
||||
]
|
||||
|
@ -0,0 +1,309 @@
|
||||
import datetime
|
||||
import json
|
||||
|
||||
from termcolor import colored
|
||||
|
||||
from swarms.memory.base_db import AbstractDatabase
|
||||
from swarms.structs.base import BaseStructure
|
||||
|
||||
|
||||
class Conversation(BaseStructure):
|
||||
"""
|
||||
Conversation class
|
||||
|
||||
|
||||
Attributes:
|
||||
time_enabled (bool): whether to enable time
|
||||
conversation_history (list): list of messages in the conversation
|
||||
|
||||
|
||||
Examples:
|
||||
>>> conv = Conversation()
|
||||
>>> conv.add("user", "Hello, world!")
|
||||
>>> conv.add("assistant", "Hello, user!")
|
||||
>>> conv.display_conversation()
|
||||
user: Hello, world!
|
||||
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
time_enabled: bool = False,
|
||||
database: AbstractDatabase = None,
|
||||
autosave: bool = True,
|
||||
save_filepath: str = "/runs/conversation.json",
|
||||
*args,
|
||||
**kwargs,
|
||||
):
|
||||
super().__init__()
|
||||
self.time_enabled = time_enabled
|
||||
self.database = database
|
||||
self.autosave = autosave
|
||||
self.save_filepath = save_filepath
|
||||
self.conversation_history = []
|
||||
|
||||
def add(self, role: str, content: str, *args, **kwargs):
|
||||
"""Add a message to the conversation history
|
||||
|
||||
Args:
|
||||
role (str): The role of the speaker
|
||||
content (str): The content of the message
|
||||
|
||||
"""
|
||||
if self.time_enabled:
|
||||
now = datetime.datetime.now()
|
||||
timestamp = now.strftime("%Y-%m-%d %H:%M:%S")
|
||||
message = {
|
||||
"role": role,
|
||||
"content": content,
|
||||
"timestamp": timestamp,
|
||||
}
|
||||
else:
|
||||
message = {
|
||||
"role": role,
|
||||
"content": content,
|
||||
}
|
||||
|
||||
self.conversation_history.append(message)
|
||||
|
||||
if self.autosave:
|
||||
self.save_as_json(self.save_filepath)
|
||||
|
||||
def delete(self, index: str):
|
||||
"""Delete a message from the conversation history
|
||||
|
||||
Args:
|
||||
index (str): index of the message to delete
|
||||
"""
|
||||
self.conversation_history.pop(index)
|
||||
|
||||
def update(self, index: str, role, content):
|
||||
"""Update a message in the conversation history
|
||||
|
||||
Args:
|
||||
index (str): index of the message to update
|
||||
role (_type_): role of the speaker
|
||||
content (_type_): content of the message
|
||||
"""
|
||||
self.conversation_history[index] = {
|
||||
"role": role,
|
||||
"content": content,
|
||||
}
|
||||
|
||||
def query(self, index: str):
|
||||
"""Query a message in the conversation history
|
||||
|
||||
Args:
|
||||
index (str): index of the message to query
|
||||
|
||||
Returns:
|
||||
str: the message
|
||||
"""
|
||||
return self.conversation_history[index]
|
||||
|
||||
def search(self, keyword: str):
|
||||
"""Search for a message in the conversation history
|
||||
|
||||
Args:
|
||||
keyword (str): Keyword to search for
|
||||
|
||||
Returns:
|
||||
str: description
|
||||
"""
|
||||
return [
|
||||
msg
|
||||
for msg in self.conversation_history
|
||||
if keyword in msg["content"]
|
||||
]
|
||||
|
||||
def display_conversation(self, detailed: bool = False):
|
||||
"""Display the conversation history
|
||||
|
||||
Args:
|
||||
detailed (bool, optional): detailed. Defaults to False.
|
||||
"""
|
||||
role_to_color = {
|
||||
"system": "red",
|
||||
"user": "green",
|
||||
"assistant": "blue",
|
||||
"function": "magenta",
|
||||
}
|
||||
for message in self.conversation_history:
|
||||
print(
|
||||
colored(
|
||||
f"{message['role']}: {message['content']}\n\n",
|
||||
role_to_color[message["role"]],
|
||||
)
|
||||
)
|
||||
|
||||
def export_conversation(self, filename: str, *args, **kwargs):
|
||||
"""Export the conversation history to a file
|
||||
|
||||
Args:
|
||||
filename (str): filename to export to
|
||||
"""
|
||||
with open(filename, "w") as f:
|
||||
for message in self.conversation_history:
|
||||
f.write(f"{message['role']}: {message['content']}\n")
|
||||
|
||||
def import_conversation(self, filename: str):
|
||||
"""Import a conversation history from a file
|
||||
|
||||
Args:
|
||||
filename (str): filename to import from
|
||||
"""
|
||||
with open(filename, "r") as f:
|
||||
for line in f:
|
||||
role, content = line.split(": ", 1)
|
||||
self.add(role, content.strip())
|
||||
|
||||
def count_messages_by_role(self):
|
||||
"""Count the number of messages by role"""
|
||||
counts = {
|
||||
"system": 0,
|
||||
"user": 0,
|
||||
"assistant": 0,
|
||||
"function": 0,
|
||||
}
|
||||
for message in self.conversation_history:
|
||||
counts[message["role"]] += 1
|
||||
return counts
|
||||
|
||||
def return_history_as_string(self):
|
||||
"""Return the conversation history as a string
|
||||
|
||||
Returns:
|
||||
str: the conversation history
|
||||
"""
|
||||
return "\n".join(
|
||||
[
|
||||
f"{message['role']}: {message['content']}\n\n"
|
||||
for message in self.conversation_history
|
||||
]
|
||||
)
|
||||
|
||||
def save_as_json(self, filename: str):
|
||||
"""Save the conversation history as a JSON file
|
||||
|
||||
Args:
|
||||
filename (str): Save the conversation history as a JSON file
|
||||
"""
|
||||
# Save the conversation history as a JSON file
|
||||
with open(filename, "w") as f:
|
||||
json.dump(self.conversation_history, f)
|
||||
|
||||
def load_from_json(self, filename: str):
|
||||
"""Load the conversation history from a JSON file
|
||||
|
||||
Args:
|
||||
filename (str): filename to load from
|
||||
"""
|
||||
# Load the conversation history from a JSON file
|
||||
with open(filename, "r") as f:
|
||||
self.conversation_history = json.load(f)
|
||||
|
||||
def search_keyword_in_conversation(self, keyword: str):
|
||||
"""Search for a keyword in the conversation history
|
||||
|
||||
Args:
|
||||
keyword (str): keyword to search for
|
||||
|
||||
Returns:
|
||||
str: description
|
||||
"""
|
||||
return [
|
||||
msg
|
||||
for msg in self.conversation_history
|
||||
if keyword in msg["content"]
|
||||
]
|
||||
|
||||
def pretty_print_conversation(self, messages):
|
||||
"""Pretty print the conversation history
|
||||
|
||||
Args:
|
||||
messages (str): messages to print
|
||||
"""
|
||||
role_to_color = {
|
||||
"system": "red",
|
||||
"user": "green",
|
||||
"assistant": "blue",
|
||||
"tool": "magenta",
|
||||
}
|
||||
|
||||
for message in messages:
|
||||
if message["role"] == "system":
|
||||
print(
|
||||
colored(
|
||||
f"system: {message['content']}\n",
|
||||
role_to_color[message["role"]],
|
||||
)
|
||||
)
|
||||
elif message["role"] == "user":
|
||||
print(
|
||||
colored(
|
||||
f"user: {message['content']}\n",
|
||||
role_to_color[message["role"]],
|
||||
)
|
||||
)
|
||||
elif message["role"] == "assistant" and message.get(
|
||||
"function_call"
|
||||
):
|
||||
print(
|
||||
colored(
|
||||
f"assistant: {message['function_call']}\n",
|
||||
role_to_color[message["role"]],
|
||||
)
|
||||
)
|
||||
elif message["role"] == "assistant" and not message.get(
|
||||
"function_call"
|
||||
):
|
||||
print(
|
||||
colored(
|
||||
f"assistant: {message['content']}\n",
|
||||
role_to_color[message["role"]],
|
||||
)
|
||||
)
|
||||
elif message["role"] == "tool":
|
||||
print(
|
||||
colored(
|
||||
(
|
||||
f"function ({message['name']}):"
|
||||
f" {message['content']}\n"
|
||||
),
|
||||
role_to_color[message["role"]],
|
||||
)
|
||||
)
|
||||
|
||||
def add_to_database(self, *args, **kwargs):
|
||||
"""Add the conversation history to the database"""
|
||||
self.database.add("conversation", self.conversation_history)
|
||||
|
||||
def query_from_database(self, query, *args, **kwargs):
|
||||
"""Query the conversation history from the database"""
|
||||
return self.database.query("conversation", query)
|
||||
|
||||
def delete_from_database(self, *args, **kwargs):
|
||||
"""Delete the conversation history from the database"""
|
||||
self.database.delete("conversation")
|
||||
|
||||
def update_from_database(self, *args, **kwargs):
|
||||
"""Update the conversation history from the database"""
|
||||
self.database.update(
|
||||
"conversation", self.conversation_history
|
||||
)
|
||||
|
||||
def get_from_database(self, *args, **kwargs):
|
||||
"""Get the conversation history from the database"""
|
||||
return self.database.get("conversation")
|
||||
|
||||
def execute_query_from_database(self, query, *args, **kwargs):
|
||||
"""Execute a query on the database"""
|
||||
return self.database.execute_query(query)
|
||||
|
||||
def fetch_all_from_database(self, *args, **kwargs):
|
||||
"""Fetch all from the database"""
|
||||
return self.database.fetch_all()
|
||||
|
||||
def fetch_one_from_database(self, *args, **kwargs):
|
||||
"""Fetch one from the database"""
|
||||
return self.database.fetch_one()
|
@ -0,0 +1,36 @@
|
||||
import inspect
|
||||
|
||||
|
||||
def print_class_parameters(cls, api_format: bool = False):
|
||||
"""
|
||||
Print the parameters of a class constructor.
|
||||
|
||||
Parameters:
|
||||
cls (type): The class to inspect.
|
||||
|
||||
Example:
|
||||
>>> print_class_parameters(Agent)
|
||||
Parameter: x, Type: <class 'int'>
|
||||
Parameter: y, Type: <class 'int'>
|
||||
"""
|
||||
try:
|
||||
# Get the parameters of the class constructor
|
||||
sig = inspect.signature(cls.__init__)
|
||||
params = sig.parameters
|
||||
|
||||
if api_format:
|
||||
param_dict = {}
|
||||
for name, param in params.items():
|
||||
if name == "self":
|
||||
continue
|
||||
param_dict[name] = str(param.annotation)
|
||||
return param_dict
|
||||
|
||||
# Print the parameters
|
||||
for name, param in params.items():
|
||||
if name == "self":
|
||||
continue
|
||||
print(f"Parameter: {name}, Type: {param.annotation}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"An error occurred while inspecting the class: {e}")
|
@ -0,0 +1,70 @@
|
||||
import torch
|
||||
import logging
|
||||
from typing import Union, List, Any
|
||||
from torch.cuda import memory_allocated, memory_reserved
|
||||
|
||||
|
||||
def check_device(
|
||||
log_level: Any = logging.INFO,
|
||||
memory_threshold: float = 0.8,
|
||||
capability_threshold: float = 3.5,
|
||||
return_type: str = "list",
|
||||
) -> Union[torch.device, List[torch.device]]:
|
||||
"""
|
||||
Checks for the availability of CUDA and returns the appropriate device(s).
|
||||
If CUDA is not available, returns a CPU device.
|
||||
If CUDA is available, returns a list of all available GPU devices.
|
||||
"""
|
||||
logging.basicConfig(level=log_level)
|
||||
|
||||
# Check for CUDA availability
|
||||
try:
|
||||
if not torch.cuda.is_available():
|
||||
logging.info("CUDA is not available. Using CPU...")
|
||||
return torch.device("cpu")
|
||||
except Exception as e:
|
||||
logging.error("Error checking for CUDA availability: ", e)
|
||||
return torch.device("cpu")
|
||||
|
||||
logging.info("CUDA is available.")
|
||||
|
||||
# Check for multiple GPUs
|
||||
num_gpus = torch.cuda.device_count()
|
||||
devices = []
|
||||
if num_gpus > 1:
|
||||
logging.info(f"Multiple GPUs available: {num_gpus}")
|
||||
devices = [torch.device(f"cuda:{i}") for i in range(num_gpus)]
|
||||
else:
|
||||
logging.info("Only one GPU is available.")
|
||||
devices = [torch.device("cuda")]
|
||||
|
||||
# Check additional properties for each device
|
||||
for device in devices:
|
||||
try:
|
||||
torch.cuda.set_device(device)
|
||||
capability = torch.cuda.get_device_capability(device)
|
||||
total_memory = torch.cuda.get_device_properties(
|
||||
device
|
||||
).total_memory
|
||||
allocated_memory = memory_allocated(device)
|
||||
reserved_memory = memory_reserved(device)
|
||||
device_name = torch.cuda.get_device_name(device)
|
||||
|
||||
logging.info(
|
||||
f"Device: {device}, Name: {device_name}, Compute"
|
||||
f" Capability: {capability}, Total Memory:"
|
||||
f" {total_memory}, Allocated Memory:"
|
||||
f" {allocated_memory}, Reserved Memory:"
|
||||
f" {reserved_memory}"
|
||||
)
|
||||
except Exception as e:
|
||||
logging.error(
|
||||
f"Error retrieving properties for device {device}: ",
|
||||
e,
|
||||
)
|
||||
|
||||
return devices
|
||||
|
||||
|
||||
# devices = check_device()
|
||||
# logging.info(f"Using device(s): {devices}")
|
@ -0,0 +1,39 @@
|
||||
import time
|
||||
from functools import wraps
|
||||
from typing import Callable
|
||||
|
||||
|
||||
def metrics_decorator(func: Callable):
|
||||
"""Metrics decorator for LLM
|
||||
|
||||
Args:
|
||||
func (Callable): The function to decorate
|
||||
|
||||
Example:
|
||||
>>> @metrics_decorator
|
||||
>>> def my_function():
|
||||
>>> return "Hello, world!"
|
||||
>>> my_function()
|
||||
|
||||
"""
|
||||
|
||||
@wraps(func)
|
||||
def wrapper(self, *args, **kwargs):
|
||||
# Time to First Token
|
||||
start_time = time.time()
|
||||
result = func(self, *args, **kwargs)
|
||||
first_token_time = time.time()
|
||||
|
||||
# Generation Latency
|
||||
end_time = time.time()
|
||||
|
||||
# Throughput (assuming the function returns a list of tokens)
|
||||
throughput = len(result) / (end_time - start_time)
|
||||
|
||||
return f"""
|
||||
Time to First Token: {first_token_time - start_time}
|
||||
Generation Latency: {end_time - start_time}
|
||||
Throughput: {throughput}
|
||||
"""
|
||||
|
||||
return wrapper
|
@ -0,0 +1,57 @@
|
||||
import torch
|
||||
from torch import nn
|
||||
|
||||
|
||||
def load_model_torch(
|
||||
model_path: str = None,
|
||||
device: torch.device = None,
|
||||
model: nn.Module = None,
|
||||
strict: bool = True,
|
||||
map_location=None,
|
||||
*args,
|
||||
**kwargs,
|
||||
) -> nn.Module:
|
||||
"""
|
||||
Load a PyTorch model from a given path and move it to the specified device.
|
||||
|
||||
Args:
|
||||
model_path (str): Path to the saved model file.
|
||||
device (torch.device): Device to move the model to.
|
||||
model (nn.Module): The model architecture, if the model file only contains the state dictionary.
|
||||
strict (bool): Whether to strictly enforce that the keys in the state dictionary match the keys returned by the model's `state_dict()` function.
|
||||
map_location (callable): A function to remap the storage locations of the loaded model.
|
||||
*args: Additional arguments to pass to `torch.load`.
|
||||
**kwargs: Additional keyword arguments to pass to `torch.load`.
|
||||
|
||||
Returns:
|
||||
nn.Module: The loaded model.
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If the model file is not found.
|
||||
RuntimeError: If there is an error while loading the model.
|
||||
"""
|
||||
if device is None:
|
||||
device = torch.device(
|
||||
"cuda" if torch.cuda.is_available() else "cpu"
|
||||
)
|
||||
|
||||
try:
|
||||
if model is None:
|
||||
model = torch.load(
|
||||
model_path, map_location=map_location, *args, **kwargs
|
||||
)
|
||||
else:
|
||||
model.load_state_dict(
|
||||
torch.load(
|
||||
model_path,
|
||||
map_location=map_location,
|
||||
*args,
|
||||
**kwargs,
|
||||
),
|
||||
strict=strict,
|
||||
)
|
||||
return model.to(device)
|
||||
except FileNotFoundError:
|
||||
raise FileNotFoundError(f"Model file not found: {model_path}")
|
||||
except RuntimeError as e:
|
||||
raise RuntimeError(f"Error loading model: {str(e)}")
|
@ -0,0 +1,30 @@
|
||||
import torch
|
||||
from swarms.utils.load_model_torch import load_model_torch
|
||||
|
||||
|
||||
def prep_torch_inference(
|
||||
model_path: str = None,
|
||||
device: torch.device = None,
|
||||
*args,
|
||||
**kwargs,
|
||||
):
|
||||
"""
|
||||
Prepare a Torch model for inference.
|
||||
|
||||
Args:
|
||||
model_path (str): Path to the model file.
|
||||
device (torch.device): Device to run the model on.
|
||||
*args: Additional positional arguments.
|
||||
**kwargs: Additional keyword arguments.
|
||||
|
||||
Returns:
|
||||
torch.nn.Module: The prepared model.
|
||||
"""
|
||||
try:
|
||||
model = load_model_torch(model_path, device)
|
||||
model.eval()
|
||||
return model
|
||||
except Exception as e:
|
||||
# Add error handling code here
|
||||
print(f"Error occurred while preparing Torch model: {e}")
|
||||
return None
|
@ -0,0 +1,104 @@
|
||||
import pytest
|
||||
import sqlite3
|
||||
from swarms.memory.sqlite import SQLiteDB
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def db():
|
||||
conn = sqlite3.connect(":memory:")
|
||||
conn.execute(
|
||||
"CREATE TABLE test (id INTEGER PRIMARY KEY, name TEXT)"
|
||||
)
|
||||
conn.commit()
|
||||
return SQLiteDB(":memory:")
|
||||
|
||||
|
||||
def test_add(db):
|
||||
db.add("INSERT INTO test (name) VALUES (?)", ("test",))
|
||||
result = db.query("SELECT * FROM test")
|
||||
assert result == [(1, "test")]
|
||||
|
||||
|
||||
def test_delete(db):
|
||||
db.add("INSERT INTO test (name) VALUES (?)", ("test",))
|
||||
db.delete("DELETE FROM test WHERE name = ?", ("test",))
|
||||
result = db.query("SELECT * FROM test")
|
||||
assert result == []
|
||||
|
||||
|
||||
def test_update(db):
|
||||
db.add("INSERT INTO test (name) VALUES (?)", ("test",))
|
||||
db.update(
|
||||
"UPDATE test SET name = ? WHERE name = ?", ("new", "test")
|
||||
)
|
||||
result = db.query("SELECT * FROM test")
|
||||
assert result == [(1, "new")]
|
||||
|
||||
|
||||
def test_query(db):
|
||||
db.add("INSERT INTO test (name) VALUES (?)", ("test",))
|
||||
result = db.query("SELECT * FROM test WHERE name = ?", ("test",))
|
||||
assert result == [(1, "test")]
|
||||
|
||||
|
||||
def test_execute_query(db):
|
||||
db.add("INSERT INTO test (name) VALUES (?)", ("test",))
|
||||
result = db.execute_query(
|
||||
"SELECT * FROM test WHERE name = ?", ("test",)
|
||||
)
|
||||
assert result == [(1, "test")]
|
||||
|
||||
|
||||
def test_add_without_params(db):
|
||||
with pytest.raises(sqlite3.ProgrammingError):
|
||||
db.add("INSERT INTO test (name) VALUES (?)")
|
||||
|
||||
|
||||
def test_delete_without_params(db):
|
||||
with pytest.raises(sqlite3.ProgrammingError):
|
||||
db.delete("DELETE FROM test WHERE name = ?")
|
||||
|
||||
|
||||
def test_update_without_params(db):
|
||||
with pytest.raises(sqlite3.ProgrammingError):
|
||||
db.update("UPDATE test SET name = ? WHERE name = ?")
|
||||
|
||||
|
||||
def test_query_without_params(db):
|
||||
with pytest.raises(sqlite3.ProgrammingError):
|
||||
db.query("SELECT * FROM test WHERE name = ?")
|
||||
|
||||
|
||||
def test_execute_query_without_params(db):
|
||||
with pytest.raises(sqlite3.ProgrammingError):
|
||||
db.execute_query("SELECT * FROM test WHERE name = ?")
|
||||
|
||||
|
||||
def test_add_with_wrong_query(db):
|
||||
with pytest.raises(sqlite3.OperationalError):
|
||||
db.add("INSERT INTO wrong (name) VALUES (?)", ("test",))
|
||||
|
||||
|
||||
def test_delete_with_wrong_query(db):
|
||||
with pytest.raises(sqlite3.OperationalError):
|
||||
db.delete("DELETE FROM wrong WHERE name = ?", ("test",))
|
||||
|
||||
|
||||
def test_update_with_wrong_query(db):
|
||||
with pytest.raises(sqlite3.OperationalError):
|
||||
db.update(
|
||||
"UPDATE wrong SET name = ? WHERE name = ?",
|
||||
("new", "test"),
|
||||
)
|
||||
|
||||
|
||||
def test_query_with_wrong_query(db):
|
||||
with pytest.raises(sqlite3.OperationalError):
|
||||
db.query("SELECT * FROM wrong WHERE name = ?", ("test",))
|
||||
|
||||
|
||||
def test_execute_query_with_wrong_query(db):
|
||||
with pytest.raises(sqlite3.OperationalError):
|
||||
db.execute_query(
|
||||
"SELECT * FROM wrong WHERE name = ?", ("test",)
|
||||
)
|
@ -1,56 +0,0 @@
|
||||
import unittest
|
||||
import os
|
||||
from unittest.mock import patch
|
||||
from langchain import HuggingFaceHub
|
||||
from langchain.chat_models import ChatOpenAI
|
||||
|
||||
from swarms.models.llm import LLM
|
||||
|
||||
|
||||
class TestLLM(unittest.TestCase):
|
||||
@patch.object(HuggingFaceHub, "__init__", return_value=None)
|
||||
@patch.object(ChatOpenAI, "__init__", return_value=None)
|
||||
def setUp(self, mock_hf_init, mock_openai_init):
|
||||
self.llm_openai = LLM(openai_api_key="mock_openai_key")
|
||||
self.llm_hf = LLM(
|
||||
hf_repo_id="mock_repo_id", hf_api_token="mock_hf_token"
|
||||
)
|
||||
self.prompt = "Who won the FIFA World Cup in 1998?"
|
||||
|
||||
def test_init(self):
|
||||
self.assertEqual(
|
||||
self.llm_openai.openai_api_key, "mock_openai_key"
|
||||
)
|
||||
self.assertEqual(self.llm_hf.hf_repo_id, "mock_repo_id")
|
||||
self.assertEqual(self.llm_hf.hf_api_token, "mock_hf_token")
|
||||
|
||||
@patch.object(HuggingFaceHub, "run", return_value="France")
|
||||
@patch.object(ChatOpenAI, "run", return_value="France")
|
||||
def test_run(self, mock_hf_run, mock_openai_run):
|
||||
result_openai = self.llm_openai.run(self.prompt)
|
||||
mock_openai_run.assert_called_once()
|
||||
self.assertEqual(result_openai, "France")
|
||||
|
||||
result_hf = self.llm_hf.run(self.prompt)
|
||||
mock_hf_run.assert_called_once()
|
||||
self.assertEqual(result_hf, "France")
|
||||
|
||||
def test_error_on_no_keys(self):
|
||||
with self.assertRaises(ValueError):
|
||||
LLM()
|
||||
|
||||
@patch.object(os, "environ", {})
|
||||
def test_error_on_missing_hf_token(self):
|
||||
with self.assertRaises(ValueError):
|
||||
LLM(hf_repo_id="mock_repo_id")
|
||||
|
||||
@patch.dict(
|
||||
os.environ, {"HUGGINGFACEHUB_API_TOKEN": "mock_hf_token"}
|
||||
)
|
||||
def test_hf_token_from_env(self):
|
||||
llm = LLM(hf_repo_id="mock_repo_id")
|
||||
self.assertEqual(llm.hf_api_token, "mock_hf_token")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
@ -1,91 +0,0 @@
|
||||
# test_embeddings.py
|
||||
|
||||
import pytest
|
||||
import openai
|
||||
from unittest.mock import patch
|
||||
from swarms.models.simple_ada import (
|
||||
get_ada_embeddings,
|
||||
) # Adjust this import path to your project structure
|
||||
from os import getenv
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv()
|
||||
|
||||
|
||||
# Fixture for test texts
|
||||
@pytest.fixture
|
||||
def test_texts():
|
||||
return [
|
||||
"Hello World",
|
||||
"This is a test string with newline\ncharacters",
|
||||
"A quick brown fox jumps over the lazy dog",
|
||||
]
|
||||
|
||||
|
||||
# Basic Test
|
||||
def test_get_ada_embeddings_basic(test_texts):
|
||||
with patch("openai.resources.Embeddings.create") as mock_create:
|
||||
# Mocking the OpenAI API call
|
||||
mock_create.return_value = {
|
||||
"data": [{"embedding": [0.1, 0.2, 0.3]}]
|
||||
}
|
||||
|
||||
for text in test_texts:
|
||||
embedding = get_ada_embeddings(text)
|
||||
assert embedding == [
|
||||
0.1,
|
||||
0.2,
|
||||
0.3,
|
||||
], "Embedding does not match expected output"
|
||||
mock_create.assert_called_with(
|
||||
input=[text.replace("\n", " ")],
|
||||
model="text-embedding-ada-002",
|
||||
)
|
||||
|
||||
|
||||
# Parameterized Test
|
||||
@pytest.mark.parametrize(
|
||||
"text, model, expected_call_model",
|
||||
[
|
||||
(
|
||||
"Hello World",
|
||||
"text-embedding-ada-002",
|
||||
"text-embedding-ada-002",
|
||||
),
|
||||
(
|
||||
"Hello World",
|
||||
"text-embedding-ada-001",
|
||||
"text-embedding-ada-001",
|
||||
),
|
||||
],
|
||||
)
|
||||
def test_get_ada_embeddings_models(text, model, expected_call_model):
|
||||
with patch("openai.resources.Embeddings.create") as mock_create:
|
||||
mock_create.return_value = {
|
||||
"data": [{"embedding": [0.1, 0.2, 0.3]}]
|
||||
}
|
||||
|
||||
_ = get_ada_embeddings(text, model=model)
|
||||
mock_create.assert_called_with(
|
||||
input=[text], model=expected_call_model
|
||||
)
|
||||
|
||||
|
||||
# Exception Test
|
||||
def test_get_ada_embeddings_exception():
|
||||
with patch("openai.resources.Embeddings.create") as mock_create:
|
||||
mock_create.side_effect = openai.OpenAIError("Test error")
|
||||
with pytest.raises(openai.OpenAIError):
|
||||
get_ada_embeddings("Some text")
|
||||
|
||||
|
||||
# Tests for environment variable loading
|
||||
def test_env_var_loading(monkeypatch):
|
||||
monkeypatch.setenv("OPENAI_API_KEY", "testkey123")
|
||||
with patch("openai.resources.Embeddings.create"):
|
||||
assert (
|
||||
getenv("OPENAI_API_KEY") == "testkey123"
|
||||
), "Environment variable for API key is not set correctly"
|
||||
|
||||
|
||||
# ... more tests to cover other aspects such as different input types, large inputs, invalid inputs, etc.
|
@ -1,83 +0,0 @@
|
||||
import os
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from swarms.models.autotemp import AutoTempAgent
|
||||
|
||||
api_key = os.getenv("OPENAI_API_KEY")
|
||||
|
||||
load_dotenv()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def auto_temp_agent():
|
||||
return AutoTempAgent(api_key=api_key)
|
||||
|
||||
|
||||
def test_initialization(auto_temp_agent):
|
||||
assert isinstance(auto_temp_agent, AutoTempAgent)
|
||||
assert auto_temp_agent.auto_select is True
|
||||
assert auto_temp_agent.max_workers == 6
|
||||
assert auto_temp_agent.temperature == 0.5
|
||||
assert auto_temp_agent.alt_temps == [0.4, 0.6, 0.8, 1.0, 1.2, 1.4]
|
||||
|
||||
|
||||
def test_evaluate_output(auto_temp_agent):
|
||||
output = "This is a test output."
|
||||
with patch("swarms.models.OpenAIChat") as MockOpenAIChat:
|
||||
mock_instance = MockOpenAIChat.return_value
|
||||
mock_instance.return_value = "Score: 95.5"
|
||||
score = auto_temp_agent.evaluate_output(output)
|
||||
assert score == 95.5
|
||||
mock_instance.assert_called_once()
|
||||
|
||||
|
||||
def test_run_auto_select(auto_temp_agent):
|
||||
task = "Generate a blog post."
|
||||
temperature_string = "0.4,0.6,0.8,1.0,1.2,1.4"
|
||||
result = auto_temp_agent.run(task, temperature_string)
|
||||
assert "Best AutoTemp Output" in result
|
||||
assert "Temp" in result
|
||||
assert "Score" in result
|
||||
|
||||
|
||||
def test_run_no_scores(auto_temp_agent):
|
||||
task = "Invalid task."
|
||||
temperature_string = "0.4,0.6,0.8,1.0,1.2,1.4"
|
||||
with ThreadPoolExecutor(
|
||||
max_workers=auto_temp_agent.max_workers
|
||||
) as executor:
|
||||
with patch.object(
|
||||
executor,
|
||||
"submit",
|
||||
side_effect=[None, None, None, None, None, None],
|
||||
):
|
||||
result = auto_temp_agent.run(task, temperature_string)
|
||||
assert result == "No valid outputs generated."
|
||||
|
||||
|
||||
def test_run_manual_select(auto_temp_agent):
|
||||
auto_temp_agent.auto_select = False
|
||||
task = "Generate a blog post."
|
||||
temperature_string = "0.4,0.6,0.8,1.0,1.2,1.4"
|
||||
result = auto_temp_agent.run(task, temperature_string)
|
||||
assert "Best AutoTemp Output" not in result
|
||||
assert "Temp" in result
|
||||
assert "Score" in result
|
||||
|
||||
|
||||
def test_failed_initialization():
|
||||
with pytest.raises(Exception):
|
||||
AutoTempAgent()
|
||||
|
||||
|
||||
def test_failed_evaluate_output(auto_temp_agent):
|
||||
output = "This is a test output."
|
||||
with patch("swarms.models.OpenAIChat") as MockOpenAIChat:
|
||||
mock_instance = MockOpenAIChat.return_value
|
||||
mock_instance.return_value = "Invalid score text"
|
||||
score = auto_temp_agent.evaluate_output(output)
|
||||
assert score == 0.0
|
@ -1,171 +0,0 @@
|
||||
# Import necessary modules and define fixtures if needed
|
||||
import os
|
||||
import pytest
|
||||
import torch
|
||||
from PIL import Image
|
||||
from swarms.models.bioclip import BioClip
|
||||
|
||||
|
||||
# Define fixtures if needed
|
||||
@pytest.fixture
|
||||
def sample_image_path():
|
||||
return "path_to_sample_image.jpg"
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def clip_instance():
|
||||
return BioClip(
|
||||
"microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224"
|
||||
)
|
||||
|
||||
|
||||
# Basic tests for the BioClip class
|
||||
def test_clip_initialization(clip_instance):
|
||||
assert isinstance(clip_instance.model, torch.nn.Module)
|
||||
assert hasattr(clip_instance, "model_path")
|
||||
assert hasattr(clip_instance, "preprocess_train")
|
||||
assert hasattr(clip_instance, "preprocess_val")
|
||||
assert hasattr(clip_instance, "tokenizer")
|
||||
assert hasattr(clip_instance, "device")
|
||||
|
||||
|
||||
def test_clip_call_method(clip_instance, sample_image_path):
|
||||
labels = [
|
||||
"adenocarcinoma histopathology",
|
||||
"brain MRI",
|
||||
"covid line chart",
|
||||
"squamous cell carcinoma histopathology",
|
||||
"immunohistochemistry histopathology",
|
||||
"bone X-ray",
|
||||
"chest X-ray",
|
||||
"pie chart",
|
||||
"hematoxylin and eosin histopathology",
|
||||
]
|
||||
result = clip_instance(sample_image_path, labels)
|
||||
assert isinstance(result, dict)
|
||||
assert len(result) == len(labels)
|
||||
|
||||
|
||||
def test_clip_plot_image_with_metadata(
|
||||
clip_instance, sample_image_path
|
||||
):
|
||||
metadata = {
|
||||
"filename": "sample_image.jpg",
|
||||
"top_probs": {"label1": 0.75, "label2": 0.65},
|
||||
}
|
||||
clip_instance.plot_image_with_metadata(
|
||||
sample_image_path, metadata
|
||||
)
|
||||
|
||||
|
||||
# More test cases can be added to cover additional functionality and edge cases
|
||||
|
||||
|
||||
# Parameterized tests for different image and label combinations
|
||||
@pytest.mark.parametrize(
|
||||
"image_path, labels",
|
||||
[
|
||||
("image1.jpg", ["label1", "label2"]),
|
||||
("image2.jpg", ["label3", "label4"]),
|
||||
# Add more image and label combinations
|
||||
],
|
||||
)
|
||||
def test_clip_parameterized_calls(clip_instance, image_path, labels):
|
||||
result = clip_instance(image_path, labels)
|
||||
assert isinstance(result, dict)
|
||||
assert len(result) == len(labels)
|
||||
|
||||
|
||||
# Test image preprocessing
|
||||
def test_clip_image_preprocessing(clip_instance, sample_image_path):
|
||||
image = Image.open(sample_image_path)
|
||||
processed_image = clip_instance.preprocess_val(image)
|
||||
assert isinstance(processed_image, torch.Tensor)
|
||||
|
||||
|
||||
# Test label tokenization
|
||||
def test_clip_label_tokenization(clip_instance):
|
||||
labels = ["label1", "label2"]
|
||||
tokenized_labels = clip_instance.tokenizer(labels)
|
||||
assert isinstance(tokenized_labels, torch.Tensor)
|
||||
assert tokenized_labels.shape[0] == len(labels)
|
||||
|
||||
|
||||
# More tests can be added to cover other methods and edge cases
|
||||
|
||||
|
||||
# End-to-end tests with actual images and labels
|
||||
def test_clip_end_to_end(clip_instance, sample_image_path):
|
||||
labels = [
|
||||
"adenocarcinoma histopathology",
|
||||
"brain MRI",
|
||||
"covid line chart",
|
||||
"squamous cell carcinoma histopathology",
|
||||
"immunohistochemistry histopathology",
|
||||
"bone X-ray",
|
||||
"chest X-ray",
|
||||
"pie chart",
|
||||
"hematoxylin and eosin histopathology",
|
||||
]
|
||||
result = clip_instance(sample_image_path, labels)
|
||||
assert isinstance(result, dict)
|
||||
assert len(result) == len(labels)
|
||||
|
||||
|
||||
# Test label tokenization with long labels
|
||||
def test_clip_long_labels(clip_instance):
|
||||
labels = ["label" + str(i) for i in range(100)]
|
||||
tokenized_labels = clip_instance.tokenizer(labels)
|
||||
assert isinstance(tokenized_labels, torch.Tensor)
|
||||
assert tokenized_labels.shape[0] == len(labels)
|
||||
|
||||
|
||||
# Test handling of multiple image files
|
||||
def test_clip_multiple_images(clip_instance, sample_image_path):
|
||||
labels = ["label1", "label2"]
|
||||
image_paths = [sample_image_path, "image2.jpg"]
|
||||
results = clip_instance(image_paths, labels)
|
||||
assert isinstance(results, list)
|
||||
assert len(results) == len(image_paths)
|
||||
for result in results:
|
||||
assert isinstance(result, dict)
|
||||
assert len(result) == len(labels)
|
||||
|
||||
|
||||
# Test model inference performance
|
||||
def test_clip_inference_performance(
|
||||
clip_instance, sample_image_path, benchmark
|
||||
):
|
||||
labels = [
|
||||
"adenocarcinoma histopathology",
|
||||
"brain MRI",
|
||||
"covid line chart",
|
||||
"squamous cell carcinoma histopathology",
|
||||
"immunohistochemistry histopathology",
|
||||
"bone X-ray",
|
||||
"chest X-ray",
|
||||
"pie chart",
|
||||
"hematoxylin and eosin histopathology",
|
||||
]
|
||||
result = benchmark(clip_instance, sample_image_path, labels)
|
||||
assert isinstance(result, dict)
|
||||
assert len(result) == len(labels)
|
||||
|
||||
|
||||
# Test different preprocessing pipelines
|
||||
def test_clip_preprocessing_pipelines(
|
||||
clip_instance, sample_image_path
|
||||
):
|
||||
labels = ["label1", "label2"]
|
||||
image = Image.open(sample_image_path)
|
||||
|
||||
# Test preprocessing for training
|
||||
processed_image_train = clip_instance.preprocess_train(image)
|
||||
assert isinstance(processed_image_train, torch.Tensor)
|
||||
|
||||
# Test preprocessing for validation
|
||||
processed_image_val = clip_instance.preprocess_val(image)
|
||||
assert isinstance(processed_image_val, torch.Tensor)
|
||||
|
||||
|
||||
# ...
|
@ -1,454 +0,0 @@
|
||||
import os
|
||||
from unittest.mock import Mock
|
||||
|
||||
import pytest
|
||||
from openai import OpenAIError
|
||||
from PIL import Image
|
||||
from termcolor import colored
|
||||
|
||||
from swarms.models.dalle3 import Dalle3
|
||||
|
||||
|
||||
# Mocking the OpenAI client to avoid making actual API calls during testing
|
||||
@pytest.fixture
|
||||
def mock_openai_client():
|
||||
return Mock()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def dalle3(mock_openai_client):
|
||||
return Dalle3(client=mock_openai_client)
|
||||
|
||||
|
||||
def test_dalle3_call_success(dalle3, mock_openai_client):
|
||||
# Arrange
|
||||
task = "A painting of a dog"
|
||||
expected_img_url = "https://cdn.openai.com/dall-e/encoded/feats/feats_01J9J5ZKJZJY9.png"
|
||||
mock_openai_client.images.generate.return_value = Mock(
|
||||
data=[Mock(url=expected_img_url)]
|
||||
)
|
||||
|
||||
# Act
|
||||
img_url = dalle3(task)
|
||||
|
||||
# Assert
|
||||
assert img_url == expected_img_url
|
||||
mock_openai_client.images.generate.assert_called_once_with(
|
||||
prompt=task, n=4
|
||||
)
|
||||
|
||||
|
||||
def test_dalle3_call_failure(dalle3, mock_openai_client, capsys):
|
||||
# Arrange
|
||||
task = "Invalid task"
|
||||
expected_error_message = "Error running Dalle3: API Error"
|
||||
|
||||
# Mocking OpenAIError
|
||||
mock_openai_client.images.generate.side_effect = OpenAIError(
|
||||
expected_error_message,
|
||||
http_status=500,
|
||||
error="Internal Server Error",
|
||||
)
|
||||
|
||||
# Act and assert
|
||||
with pytest.raises(OpenAIError) as excinfo:
|
||||
dalle3(task)
|
||||
|
||||
assert str(excinfo.value) == expected_error_message
|
||||
mock_openai_client.images.generate.assert_called_once_with(
|
||||
prompt=task, n=4
|
||||
)
|
||||
|
||||
# Ensure the error message is printed in red
|
||||
captured = capsys.readouterr()
|
||||
assert colored(expected_error_message, "red") in captured.out
|
||||
|
||||
|
||||
def test_dalle3_create_variations_success(dalle3, mock_openai_client):
|
||||
# Arrange
|
||||
img_url = "https://cdn.openai.com/dall-e/encoded/feats/feats_01J9J5ZKJZJY9.png"
|
||||
expected_variation_url = "https://cdn.openai.com/dall-e/encoded/feats/feats_02ABCDE.png"
|
||||
mock_openai_client.images.create_variation.return_value = Mock(
|
||||
data=[Mock(url=expected_variation_url)]
|
||||
)
|
||||
|
||||
# Act
|
||||
variation_img_url = dalle3.create_variations(img_url)
|
||||
|
||||
# Assert
|
||||
assert variation_img_url == expected_variation_url
|
||||
mock_openai_client.images.create_variation.assert_called_once()
|
||||
_, kwargs = mock_openai_client.images.create_variation.call_args
|
||||
assert kwargs["img"] is not None
|
||||
assert kwargs["n"] == 4
|
||||
assert kwargs["size"] == "1024x1024"
|
||||
|
||||
|
||||
def test_dalle3_create_variations_failure(
|
||||
dalle3, mock_openai_client, capsys
|
||||
):
|
||||
# Arrange
|
||||
img_url = "https://cdn.openai.com/dall-e/encoded/feats/feats_01J9J5ZKJZJY9.png"
|
||||
expected_error_message = "Error running Dalle3: API Error"
|
||||
|
||||
# Mocking OpenAIError
|
||||
mock_openai_client.images.create_variation.side_effect = (
|
||||
OpenAIError(
|
||||
expected_error_message,
|
||||
http_status=500,
|
||||
error="Internal Server Error",
|
||||
)
|
||||
)
|
||||
|
||||
# Act and assert
|
||||
with pytest.raises(OpenAIError) as excinfo:
|
||||
dalle3.create_variations(img_url)
|
||||
|
||||
assert str(excinfo.value) == expected_error_message
|
||||
mock_openai_client.images.create_variation.assert_called_once()
|
||||
|
||||
# Ensure the error message is printed in red
|
||||
captured = capsys.readouterr()
|
||||
assert colored(expected_error_message, "red") in captured.out
|
||||
|
||||
|
||||
def test_dalle3_read_img():
|
||||
# Arrange
|
||||
img_path = "test_image.png"
|
||||
img = Image.new("RGB", (512, 512))
|
||||
|
||||
# Save the image temporarily
|
||||
img.save(img_path)
|
||||
|
||||
# Act
|
||||
dalle3 = Dalle3()
|
||||
img_loaded = dalle3.read_img(img_path)
|
||||
|
||||
# Assert
|
||||
assert isinstance(img_loaded, Image.Image)
|
||||
|
||||
# Clean up
|
||||
os.remove(img_path)
|
||||
|
||||
|
||||
def test_dalle3_set_width_height():
|
||||
# Arrange
|
||||
img = Image.new("RGB", (512, 512))
|
||||
width = 256
|
||||
height = 256
|
||||
|
||||
# Act
|
||||
dalle3 = Dalle3()
|
||||
img_resized = dalle3.set_width_height(img, width, height)
|
||||
|
||||
# Assert
|
||||
assert img_resized.size == (width, height)
|
||||
|
||||
|
||||
def test_dalle3_convert_to_bytesio():
|
||||
# Arrange
|
||||
img = Image.new("RGB", (512, 512))
|
||||
expected_format = "PNG"
|
||||
|
||||
# Act
|
||||
dalle3 = Dalle3()
|
||||
img_bytes = dalle3.convert_to_bytesio(img, format=expected_format)
|
||||
|
||||
# Assert
|
||||
assert isinstance(img_bytes, bytes)
|
||||
assert img_bytes.startswith(b"\x89PNG")
|
||||
|
||||
|
||||
def test_dalle3_call_multiple_times(dalle3, mock_openai_client):
|
||||
# Arrange
|
||||
task = "A painting of a dog"
|
||||
expected_img_url = "https://cdn.openai.com/dall-e/encoded/feats/feats_01J9J5ZKJZJY9.png"
|
||||
mock_openai_client.images.generate.return_value = Mock(
|
||||
data=[Mock(url=expected_img_url)]
|
||||
)
|
||||
|
||||
# Act
|
||||
img_url1 = dalle3(task)
|
||||
img_url2 = dalle3(task)
|
||||
|
||||
# Assert
|
||||
assert img_url1 == expected_img_url
|
||||
assert img_url2 == expected_img_url
|
||||
assert mock_openai_client.images.generate.call_count == 2
|
||||
|
||||
|
||||
def test_dalle3_call_with_large_input(dalle3, mock_openai_client):
|
||||
# Arrange
|
||||
task = "A" * 2048 # Input longer than API's limit
|
||||
expected_error_message = "Error running Dalle3: API Error"
|
||||
mock_openai_client.images.generate.side_effect = OpenAIError(
|
||||
expected_error_message,
|
||||
http_status=500,
|
||||
error="Internal Server Error",
|
||||
)
|
||||
|
||||
# Act and assert
|
||||
with pytest.raises(OpenAIError) as excinfo:
|
||||
dalle3(task)
|
||||
|
||||
assert str(excinfo.value) == expected_error_message
|
||||
|
||||
|
||||
def test_dalle3_create_variations_with_invalid_image_url(
|
||||
dalle3, mock_openai_client
|
||||
):
|
||||
# Arrange
|
||||
img_url = "https://invalid-image-url.com"
|
||||
expected_error_message = "Error running Dalle3: Invalid image URL"
|
||||
|
||||
# Act and assert
|
||||
with pytest.raises(ValueError) as excinfo:
|
||||
dalle3.create_variations(img_url)
|
||||
|
||||
assert str(excinfo.value) == expected_error_message
|
||||
|
||||
|
||||
def test_dalle3_set_width_height_invalid_dimensions(dalle3):
|
||||
# Arrange
|
||||
img = dalle3.read_img("test_image.png")
|
||||
width = 0
|
||||
height = -1
|
||||
|
||||
# Act and assert
|
||||
with pytest.raises(ValueError):
|
||||
dalle3.set_width_height(img, width, height)
|
||||
|
||||
|
||||
def test_dalle3_convert_to_bytesio_invalid_format(dalle3):
|
||||
# Arrange
|
||||
img = dalle3.read_img("test_image.png")
|
||||
invalid_format = "invalid_format"
|
||||
|
||||
# Act and assert
|
||||
with pytest.raises(ValueError):
|
||||
dalle3.convert_to_bytesio(img, format=invalid_format)
|
||||
|
||||
|
||||
def test_dalle3_call_with_retry(dalle3, mock_openai_client):
|
||||
# Arrange
|
||||
task = "A painting of a dog"
|
||||
expected_img_url = "https://cdn.openai.com/dall-e/encoded/feats/feats_01J9J5ZKJZJY9.png"
|
||||
|
||||
# Simulate a retry scenario
|
||||
mock_openai_client.images.generate.side_effect = [
|
||||
OpenAIError(
|
||||
"Temporary error",
|
||||
http_status=500,
|
||||
error="Internal Server Error",
|
||||
),
|
||||
Mock(data=[Mock(url=expected_img_url)]),
|
||||
]
|
||||
|
||||
# Act
|
||||
img_url = dalle3(task)
|
||||
|
||||
# Assert
|
||||
assert img_url == expected_img_url
|
||||
assert mock_openai_client.images.generate.call_count == 2
|
||||
|
||||
|
||||
def test_dalle3_create_variations_with_retry(
|
||||
dalle3, mock_openai_client
|
||||
):
|
||||
# Arrange
|
||||
img_url = "https://cdn.openai.com/dall-e/encoded/feats/feats_01J9J5ZKJZJY9.png"
|
||||
expected_variation_url = "https://cdn.openai.com/dall-e/encoded/feats/feats_02ABCDE.png"
|
||||
|
||||
# Simulate a retry scenario
|
||||
mock_openai_client.images.create_variation.side_effect = [
|
||||
OpenAIError(
|
||||
"Temporary error",
|
||||
http_status=500,
|
||||
error="Internal Server Error",
|
||||
),
|
||||
Mock(data=[Mock(url=expected_variation_url)]),
|
||||
]
|
||||
|
||||
# Act
|
||||
variation_img_url = dalle3.create_variations(img_url)
|
||||
|
||||
# Assert
|
||||
assert variation_img_url == expected_variation_url
|
||||
assert mock_openai_client.images.create_variation.call_count == 2
|
||||
|
||||
|
||||
def test_dalle3_call_exception_logging(
|
||||
dalle3, mock_openai_client, capsys
|
||||
):
|
||||
# Arrange
|
||||
task = "A painting of a dog"
|
||||
expected_error_message = "Error running Dalle3: API Error"
|
||||
|
||||
# Mocking OpenAIError
|
||||
mock_openai_client.images.generate.side_effect = OpenAIError(
|
||||
expected_error_message,
|
||||
http_status=500,
|
||||
error="Internal Server Error",
|
||||
)
|
||||
|
||||
# Act
|
||||
with pytest.raises(OpenAIError):
|
||||
dalle3(task)
|
||||
|
||||
# Assert that the error message is logged
|
||||
captured = capsys.readouterr()
|
||||
assert expected_error_message in captured.err
|
||||
|
||||
|
||||
def test_dalle3_create_variations_exception_logging(
|
||||
dalle3, mock_openai_client, capsys
|
||||
):
|
||||
# Arrange
|
||||
img_url = "https://cdn.openai.com/dall-e/encoded/feats/feats_01J9J5ZKJZJY9.png"
|
||||
expected_error_message = "Error running Dalle3: API Error"
|
||||
|
||||
# Mocking OpenAIError
|
||||
mock_openai_client.images.create_variation.side_effect = (
|
||||
OpenAIError(
|
||||
expected_error_message,
|
||||
http_status=500,
|
||||
error="Internal Server Error",
|
||||
)
|
||||
)
|
||||
|
||||
# Act
|
||||
with pytest.raises(OpenAIError):
|
||||
dalle3.create_variations(img_url)
|
||||
|
||||
# Assert that the error message is logged
|
||||
captured = capsys.readouterr()
|
||||
assert expected_error_message in captured.err
|
||||
|
||||
|
||||
def test_dalle3_read_img_invalid_path(dalle3):
|
||||
# Arrange
|
||||
invalid_img_path = "invalid_image_path.png"
|
||||
|
||||
# Act and assert
|
||||
with pytest.raises(FileNotFoundError):
|
||||
dalle3.read_img(invalid_img_path)
|
||||
|
||||
|
||||
def test_dalle3_call_no_api_key():
|
||||
# Arrange
|
||||
task = "A painting of a dog"
|
||||
dalle3 = Dalle3(api_key=None)
|
||||
expected_error_message = (
|
||||
"Error running Dalle3: API Key is missing"
|
||||
)
|
||||
|
||||
# Act and assert
|
||||
with pytest.raises(ValueError) as excinfo:
|
||||
dalle3(task)
|
||||
|
||||
assert str(excinfo.value) == expected_error_message
|
||||
|
||||
|
||||
def test_dalle3_create_variations_no_api_key():
|
||||
# Arrange
|
||||
img_url = "https://cdn.openai.com/dall-e/encoded/feats/feats_01J9J5ZKJZJY9.png"
|
||||
dalle3 = Dalle3(api_key=None)
|
||||
expected_error_message = (
|
||||
"Error running Dalle3: API Key is missing"
|
||||
)
|
||||
|
||||
# Act and assert
|
||||
with pytest.raises(ValueError) as excinfo:
|
||||
dalle3.create_variations(img_url)
|
||||
|
||||
assert str(excinfo.value) == expected_error_message
|
||||
|
||||
|
||||
def test_dalle3_call_with_retry_max_retries_exceeded(
|
||||
dalle3, mock_openai_client
|
||||
):
|
||||
# Arrange
|
||||
task = "A painting of a dog"
|
||||
|
||||
# Simulate max retries exceeded
|
||||
mock_openai_client.images.generate.side_effect = OpenAIError(
|
||||
"Temporary error",
|
||||
http_status=500,
|
||||
error="Internal Server Error",
|
||||
)
|
||||
|
||||
# Act and assert
|
||||
with pytest.raises(OpenAIError) as excinfo:
|
||||
dalle3(task)
|
||||
|
||||
assert "Retry limit exceeded" in str(excinfo.value)
|
||||
|
||||
|
||||
def test_dalle3_create_variations_with_retry_max_retries_exceeded(
|
||||
dalle3, mock_openai_client
|
||||
):
|
||||
# Arrange
|
||||
img_url = "https://cdn.openai.com/dall-e/encoded/feats/feats_01J9J5ZKJZJY9.png"
|
||||
|
||||
# Simulate max retries exceeded
|
||||
mock_openai_client.images.create_variation.side_effect = (
|
||||
OpenAIError(
|
||||
"Temporary error",
|
||||
http_status=500,
|
||||
error="Internal Server Error",
|
||||
)
|
||||
)
|
||||
|
||||
# Act and assert
|
||||
with pytest.raises(OpenAIError) as excinfo:
|
||||
dalle3.create_variations(img_url)
|
||||
|
||||
assert "Retry limit exceeded" in str(excinfo.value)
|
||||
|
||||
|
||||
def test_dalle3_call_retry_with_success(dalle3, mock_openai_client):
|
||||
# Arrange
|
||||
task = "A painting of a dog"
|
||||
expected_img_url = "https://cdn.openai.com/dall-e/encoded/feats/feats_01J9J5ZKJZJY9.png"
|
||||
|
||||
# Simulate success after a retry
|
||||
mock_openai_client.images.generate.side_effect = [
|
||||
OpenAIError(
|
||||
"Temporary error",
|
||||
http_status=500,
|
||||
error="Internal Server Error",
|
||||
),
|
||||
Mock(data=[Mock(url=expected_img_url)]),
|
||||
]
|
||||
|
||||
# Act
|
||||
img_url = dalle3(task)
|
||||
|
||||
# Assert
|
||||
assert img_url == expected_img_url
|
||||
assert mock_openai_client.images.generate.call_count == 2
|
||||
|
||||
|
||||
def test_dalle3_create_variations_retry_with_success(
|
||||
dalle3, mock_openai_client
|
||||
):
|
||||
# Arrange
|
||||
img_url = "https://cdn.openai.com/dall-e/encoded/feats/feats_01J9J5ZKJZJY9.png"
|
||||
expected_variation_url = "https://cdn.openai.com/dall-e/encoded/feats/feats_02ABCDE.png"
|
||||
|
||||
# Simulate success after a retry
|
||||
mock_openai_client.images.create_variation.side_effect = [
|
||||
OpenAIError(
|
||||
"Temporary error",
|
||||
http_status=500,
|
||||
error="Internal Server Error",
|
||||
),
|
||||
Mock(data=[Mock(url=expected_variation_url)]),
|
||||
]
|
||||
|
||||
# Act
|
||||
variation_img_url = dalle3.create_variations(img_url)
|
||||
|
||||
# Assert
|
||||
assert variation_img_url == expected_variation_url
|
||||
assert mock_openai_client.images.create_variation.call_count == 2
|
@ -1,336 +0,0 @@
|
||||
import os
|
||||
import tempfile
|
||||
from functools import wraps
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import numpy as np
|
||||
import pytest
|
||||
import torch
|
||||
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
|
||||
|
||||
from swarms.models.distilled_whisperx import (
|
||||
DistilWhisperModel,
|
||||
async_retry,
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def distil_whisper_model():
|
||||
return DistilWhisperModel()
|
||||
|
||||
|
||||
def create_audio_file(
|
||||
data: np.ndarray, sample_rate: int, file_path: str
|
||||
):
|
||||
data.tofile(file_path)
|
||||
return file_path
|
||||
|
||||
|
||||
def test_initialization(distil_whisper_model):
|
||||
assert isinstance(distil_whisper_model, DistilWhisperModel)
|
||||
assert isinstance(distil_whisper_model.model, torch.nn.Module)
|
||||
assert isinstance(distil_whisper_model.processor, torch.nn.Module)
|
||||
assert distil_whisper_model.device in ["cpu", "cuda:0"]
|
||||
|
||||
|
||||
def test_transcribe_audio_file(distil_whisper_model):
|
||||
test_data = np.random.rand(
|
||||
16000
|
||||
) # Simulated audio data (1 second)
|
||||
with tempfile.NamedTemporaryFile(
|
||||
suffix=".wav", delete=False
|
||||
) as audio_file:
|
||||
audio_file_path = create_audio_file(
|
||||
test_data, 16000, audio_file.name
|
||||
)
|
||||
transcription = distil_whisper_model.transcribe(
|
||||
audio_file_path
|
||||
)
|
||||
os.remove(audio_file_path)
|
||||
|
||||
assert isinstance(transcription, str)
|
||||
assert transcription.strip() != ""
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_async_transcribe_audio_file(distil_whisper_model):
|
||||
test_data = np.random.rand(
|
||||
16000
|
||||
) # Simulated audio data (1 second)
|
||||
with tempfile.NamedTemporaryFile(
|
||||
suffix=".wav", delete=False
|
||||
) as audio_file:
|
||||
audio_file_path = create_audio_file(
|
||||
test_data, 16000, audio_file.name
|
||||
)
|
||||
transcription = await distil_whisper_model.async_transcribe(
|
||||
audio_file_path
|
||||
)
|
||||
os.remove(audio_file_path)
|
||||
|
||||
assert isinstance(transcription, str)
|
||||
assert transcription.strip() != ""
|
||||
|
||||
|
||||
def test_transcribe_audio_data(distil_whisper_model):
|
||||
test_data = np.random.rand(
|
||||
16000
|
||||
) # Simulated audio data (1 second)
|
||||
transcription = distil_whisper_model.transcribe(
|
||||
test_data.tobytes()
|
||||
)
|
||||
|
||||
assert isinstance(transcription, str)
|
||||
assert transcription.strip() != ""
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_async_transcribe_audio_data(distil_whisper_model):
|
||||
test_data = np.random.rand(
|
||||
16000
|
||||
) # Simulated audio data (1 second)
|
||||
transcription = await distil_whisper_model.async_transcribe(
|
||||
test_data.tobytes()
|
||||
)
|
||||
|
||||
assert isinstance(transcription, str)
|
||||
assert transcription.strip() != ""
|
||||
|
||||
|
||||
def test_real_time_transcribe(distil_whisper_model, capsys):
|
||||
test_data = np.random.rand(
|
||||
16000 * 5
|
||||
) # Simulated audio data (5 seconds)
|
||||
with tempfile.NamedTemporaryFile(
|
||||
suffix=".wav", delete=False
|
||||
) as audio_file:
|
||||
audio_file_path = create_audio_file(
|
||||
test_data, 16000, audio_file.name
|
||||
)
|
||||
|
||||
distil_whisper_model.real_time_transcribe(
|
||||
audio_file_path, chunk_duration=1
|
||||
)
|
||||
|
||||
os.remove(audio_file_path)
|
||||
|
||||
captured = capsys.readouterr()
|
||||
assert "Starting real-time transcription..." in captured.out
|
||||
assert "Chunk" in captured.out
|
||||
|
||||
|
||||
def test_real_time_transcribe_audio_file_not_found(
|
||||
distil_whisper_model, capsys
|
||||
):
|
||||
audio_file_path = "non_existent_audio.wav"
|
||||
distil_whisper_model.real_time_transcribe(
|
||||
audio_file_path, chunk_duration=1
|
||||
)
|
||||
|
||||
captured = capsys.readouterr()
|
||||
assert "The audio file was not found." in captured.out
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_async_retry():
|
||||
def _mock_async_retry(
|
||||
retries=3, exceptions=(Exception,), delay=1
|
||||
):
|
||||
def decorator(func):
|
||||
@wraps(func)
|
||||
async def wrapper(*args, **kwargs):
|
||||
return await func(*args, **kwargs)
|
||||
|
||||
return wrapper
|
||||
|
||||
return decorator
|
||||
|
||||
with patch(
|
||||
"distil_whisper_model.async_retry", new=_mock_async_retry()
|
||||
):
|
||||
yield
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_async_retry_decorator_success():
|
||||
async def mock_async_function():
|
||||
return "Success"
|
||||
|
||||
decorated_function = async_retry()(mock_async_function)
|
||||
result = await decorated_function()
|
||||
assert result == "Success"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_async_retry_decorator_failure():
|
||||
async def mock_async_function():
|
||||
raise Exception("Error")
|
||||
|
||||
decorated_function = async_retry()(mock_async_function)
|
||||
with pytest.raises(Exception, match="Error"):
|
||||
await decorated_function()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_async_retry_decorator_multiple_attempts():
|
||||
async def mock_async_function():
|
||||
if mock_async_function.attempts == 0:
|
||||
mock_async_function.attempts += 1
|
||||
raise Exception("Error")
|
||||
else:
|
||||
return "Success"
|
||||
|
||||
mock_async_function.attempts = 0
|
||||
decorated_function = async_retry(max_retries=2)(
|
||||
mock_async_function
|
||||
)
|
||||
result = await decorated_function()
|
||||
assert result == "Success"
|
||||
|
||||
|
||||
def test_create_audio_file():
|
||||
test_data = np.random.rand(
|
||||
16000
|
||||
) # Simulated audio data (1 second)
|
||||
sample_rate = 16000
|
||||
with tempfile.NamedTemporaryFile(
|
||||
suffix=".wav", delete=False
|
||||
) as audio_file:
|
||||
audio_file_path = create_audio_file(
|
||||
test_data, sample_rate, audio_file.name
|
||||
)
|
||||
|
||||
assert os.path.exists(audio_file_path)
|
||||
os.remove(audio_file_path)
|
||||
|
||||
|
||||
# test_distilled_whisperx.py
|
||||
|
||||
|
||||
# Fixtures for setting up model, processor, and audio files
|
||||
@pytest.fixture(scope="module")
|
||||
def model_id():
|
||||
return "distil-whisper/distil-large-v2"
|
||||
|
||||
|
||||
@pytest.fixture(scope="module")
|
||||
def whisper_model(model_id):
|
||||
return DistilWhisperModel(model_id)
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def audio_file_path(tmp_path_factory):
|
||||
# You would create a small temporary MP3 file here for testing
|
||||
# or use a public domain MP3 file's path
|
||||
return "path/to/valid_audio.mp3"
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def invalid_audio_file_path():
|
||||
return "path/to/invalid_audio.mp3"
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def audio_dict():
|
||||
# This should represent a valid audio dictionary as expected by the model
|
||||
return {"array": torch.randn(1, 16000), "sampling_rate": 16000}
|
||||
|
||||
|
||||
# Test initialization
|
||||
def test_initialization(whisper_model):
|
||||
assert whisper_model.model is not None
|
||||
assert whisper_model.processor is not None
|
||||
|
||||
|
||||
# Test successful transcription with file path
|
||||
def test_transcribe_with_file_path(whisper_model, audio_file_path):
|
||||
transcription = whisper_model.transcribe(audio_file_path)
|
||||
assert isinstance(transcription, str)
|
||||
|
||||
|
||||
# Test successful transcription with audio dict
|
||||
def test_transcribe_with_audio_dict(whisper_model, audio_dict):
|
||||
transcription = whisper_model.transcribe(audio_dict)
|
||||
assert isinstance(transcription, str)
|
||||
|
||||
|
||||
# Test for file not found error
|
||||
def test_file_not_found(whisper_model, invalid_audio_file_path):
|
||||
with pytest.raises(Exception):
|
||||
whisper_model.transcribe(invalid_audio_file_path)
|
||||
|
||||
|
||||
# Asynchronous tests
|
||||
@pytest.mark.asyncio
|
||||
async def test_async_transcription_success(
|
||||
whisper_model, audio_file_path
|
||||
):
|
||||
transcription = await whisper_model.async_transcribe(
|
||||
audio_file_path
|
||||
)
|
||||
assert isinstance(transcription, str)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_async_transcription_failure(
|
||||
whisper_model, invalid_audio_file_path
|
||||
):
|
||||
with pytest.raises(Exception):
|
||||
await whisper_model.async_transcribe(invalid_audio_file_path)
|
||||
|
||||
|
||||
# Testing real-time transcription simulation
|
||||
def test_real_time_transcription(
|
||||
whisper_model, audio_file_path, capsys
|
||||
):
|
||||
whisper_model.real_time_transcribe(
|
||||
audio_file_path, chunk_duration=1
|
||||
)
|
||||
captured = capsys.readouterr()
|
||||
assert "Starting real-time transcription..." in captured.out
|
||||
|
||||
|
||||
# Testing retry decorator for asynchronous function
|
||||
@pytest.mark.asyncio
|
||||
async def test_async_retry():
|
||||
@async_retry(max_retries=2, exceptions=(ValueError,), delay=0)
|
||||
async def failing_func():
|
||||
raise ValueError("Test")
|
||||
|
||||
with pytest.raises(ValueError):
|
||||
await failing_func()
|
||||
|
||||
|
||||
# Mocking the actual model to avoid GPU/CPU intensive operations during test
|
||||
@pytest.fixture
|
||||
def mocked_model(monkeypatch):
|
||||
model_mock = AsyncMock(AutoModelForSpeechSeq2Seq)
|
||||
processor_mock = MagicMock(AutoProcessor)
|
||||
monkeypatch.setattr(
|
||||
"swarms.models.distilled_whisperx.AutoModelForSpeechSeq2Seq.from_pretrained",
|
||||
model_mock,
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"swarms.models.distilled_whisperx.AutoProcessor.from_pretrained",
|
||||
processor_mock,
|
||||
)
|
||||
return model_mock, processor_mock
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_async_transcribe_with_mocked_model(
|
||||
mocked_model, audio_file_path
|
||||
):
|
||||
model_mock, processor_mock = mocked_model
|
||||
# Set up what the mock should return when it's called
|
||||
model_mock.return_value.generate.return_value = torch.tensor(
|
||||
[[0]]
|
||||
)
|
||||
processor_mock.return_value.batch_decode.return_value = [
|
||||
"mocked transcription"
|
||||
]
|
||||
model_wrapper = DistilWhisperModel()
|
||||
transcription = await model_wrapper.async_transcribe(
|
||||
audio_file_path
|
||||
)
|
||||
assert transcription == "mocked transcription"
|
@ -1,90 +1,237 @@
|
||||
import pytest
|
||||
import torch
|
||||
from unittest.mock import Mock
|
||||
from swarms.models.huggingface import HuggingFaceLLM
|
||||
import logging
|
||||
from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
|
||||
@pytest.fixture
|
||||
def mock_torch():
|
||||
return Mock()
|
||||
from swarms.models.huggingface import HuggingfaceLLM
|
||||
|
||||
|
||||
# Mock some functions and objects for testing
|
||||
@pytest.fixture
|
||||
def mock_autotokenizer():
|
||||
return Mock()
|
||||
def mock_huggingface_llm(monkeypatch):
|
||||
# Mock the model and tokenizer creation
|
||||
def mock_init(
|
||||
self,
|
||||
model_id,
|
||||
device="cpu",
|
||||
max_length=500,
|
||||
quantize=False,
|
||||
quantization_config=None,
|
||||
verbose=False,
|
||||
distributed=False,
|
||||
decoding=False,
|
||||
max_workers=5,
|
||||
repitition_penalty=1.3,
|
||||
no_repeat_ngram_size=5,
|
||||
temperature=0.7,
|
||||
top_k=40,
|
||||
top_p=0.8,
|
||||
):
|
||||
pass
|
||||
|
||||
# Mock the model loading
|
||||
def mock_load_model(self):
|
||||
pass
|
||||
|
||||
# Mock the model generation
|
||||
def mock_run(self, task):
|
||||
pass
|
||||
|
||||
monkeypatch.setattr(HuggingfaceLLM, "__init__", mock_init)
|
||||
monkeypatch.setattr(HuggingfaceLLM, "load_model", mock_load_model)
|
||||
monkeypatch.setattr(HuggingfaceLLM, "run", mock_run)
|
||||
|
||||
|
||||
# Basic tests for initialization and attribute settings
|
||||
def test_init_huggingface_llm():
|
||||
llm = HuggingfaceLLM(
|
||||
model_id="test_model",
|
||||
device="cuda",
|
||||
max_length=1000,
|
||||
quantize=True,
|
||||
quantization_config={"config_key": "config_value"},
|
||||
verbose=True,
|
||||
distributed=True,
|
||||
decoding=True,
|
||||
max_workers=3,
|
||||
repitition_penalty=1.5,
|
||||
no_repeat_ngram_size=4,
|
||||
temperature=0.8,
|
||||
top_k=50,
|
||||
top_p=0.7,
|
||||
)
|
||||
|
||||
assert llm.model_id == "test_model"
|
||||
assert llm.device == "cuda"
|
||||
assert llm.max_length == 1000
|
||||
assert llm.quantize is True
|
||||
assert llm.quantization_config == {"config_key": "config_value"}
|
||||
assert llm.verbose is True
|
||||
assert llm.distributed is True
|
||||
assert llm.decoding is True
|
||||
assert llm.max_workers == 3
|
||||
assert llm.repitition_penalty == 1.5
|
||||
assert llm.no_repeat_ngram_size == 4
|
||||
assert llm.temperature == 0.8
|
||||
assert llm.top_k == 50
|
||||
assert llm.top_p == 0.7
|
||||
|
||||
@pytest.fixture
|
||||
def mock_automodelforcausallm():
|
||||
return Mock()
|
||||
|
||||
# Test loading the model
|
||||
def test_load_model(mock_huggingface_llm):
|
||||
llm = HuggingfaceLLM(model_id="test_model")
|
||||
llm.load_model()
|
||||
|
||||
@pytest.fixture
|
||||
def mock_bitsandbytesconfig():
|
||||
return Mock()
|
||||
# Ensure that the load_model function is called
|
||||
assert True
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def hugging_face_llm(
|
||||
mock_torch,
|
||||
mock_autotokenizer,
|
||||
mock_automodelforcausallm,
|
||||
mock_bitsandbytesconfig,
|
||||
):
|
||||
HuggingFaceLLM.torch = mock_torch
|
||||
HuggingFaceLLM.AutoTokenizer = mock_autotokenizer
|
||||
HuggingFaceLLM.AutoModelForCausalLM = mock_automodelforcausallm
|
||||
HuggingFaceLLM.BitsAndBytesConfig = mock_bitsandbytesconfig
|
||||
# Test running the model
|
||||
def test_run(mock_huggingface_llm):
|
||||
llm = HuggingfaceLLM(model_id="test_model")
|
||||
llm.run("Test prompt")
|
||||
|
||||
return HuggingFaceLLM(model_id="test")
|
||||
# Ensure that the run function is called
|
||||
assert True
|
||||
|
||||
|
||||
def test_init(
|
||||
hugging_face_llm, mock_autotokenizer, mock_automodelforcausallm
|
||||
):
|
||||
assert hugging_face_llm.model_id == "test"
|
||||
mock_autotokenizer.from_pretrained.assert_called_once_with("test")
|
||||
mock_automodelforcausallm.from_pretrained.assert_called_once_with(
|
||||
"test", quantization_config=None
|
||||
# Test for setting max_length
|
||||
def test_llm_set_max_length(llm_instance):
|
||||
new_max_length = 1000
|
||||
llm_instance.set_max_length(new_max_length)
|
||||
assert llm_instance.max_length == new_max_length
|
||||
|
||||
|
||||
# Test for setting verbose
|
||||
def test_llm_set_verbose(llm_instance):
|
||||
llm_instance.set_verbose(True)
|
||||
assert llm_instance.verbose is True
|
||||
|
||||
|
||||
# Test for setting distributed
|
||||
def test_llm_set_distributed(llm_instance):
|
||||
llm_instance.set_distributed(True)
|
||||
assert llm_instance.distributed is True
|
||||
|
||||
|
||||
# Test for setting decoding
|
||||
def test_llm_set_decoding(llm_instance):
|
||||
llm_instance.set_decoding(True)
|
||||
assert llm_instance.decoding is True
|
||||
|
||||
|
||||
# Test for setting max_workers
|
||||
def test_llm_set_max_workers(llm_instance):
|
||||
new_max_workers = 10
|
||||
llm_instance.set_max_workers(new_max_workers)
|
||||
assert llm_instance.max_workers == new_max_workers
|
||||
|
||||
|
||||
# Test for setting repitition_penalty
|
||||
def test_llm_set_repitition_penalty(llm_instance):
|
||||
new_repitition_penalty = 1.5
|
||||
llm_instance.set_repitition_penalty(new_repitition_penalty)
|
||||
assert llm_instance.repitition_penalty == new_repitition_penalty
|
||||
|
||||
|
||||
# Test for setting no_repeat_ngram_size
|
||||
def test_llm_set_no_repeat_ngram_size(llm_instance):
|
||||
new_no_repeat_ngram_size = 6
|
||||
llm_instance.set_no_repeat_ngram_size(new_no_repeat_ngram_size)
|
||||
assert (
|
||||
llm_instance.no_repeat_ngram_size == new_no_repeat_ngram_size
|
||||
)
|
||||
|
||||
|
||||
def test_init_with_quantize(
|
||||
hugging_face_llm,
|
||||
mock_autotokenizer,
|
||||
mock_automodelforcausallm,
|
||||
mock_bitsandbytesconfig,
|
||||
):
|
||||
quantization_config = {
|
||||
"load_in_4bit": True,
|
||||
"bnb_4bit_use_double_quant": True,
|
||||
# Test for setting temperature
|
||||
def test_llm_set_temperature(llm_instance):
|
||||
new_temperature = 0.8
|
||||
llm_instance.set_temperature(new_temperature)
|
||||
assert llm_instance.temperature == new_temperature
|
||||
|
||||
|
||||
# Test for setting top_k
|
||||
def test_llm_set_top_k(llm_instance):
|
||||
new_top_k = 50
|
||||
llm_instance.set_top_k(new_top_k)
|
||||
assert llm_instance.top_k == new_top_k
|
||||
|
||||
|
||||
# Test for setting top_p
|
||||
def test_llm_set_top_p(llm_instance):
|
||||
new_top_p = 0.9
|
||||
llm_instance.set_top_p(new_top_p)
|
||||
assert llm_instance.top_p == new_top_p
|
||||
|
||||
|
||||
# Test for setting quantize
|
||||
def test_llm_set_quantize(llm_instance):
|
||||
llm_instance.set_quantize(True)
|
||||
assert llm_instance.quantize is True
|
||||
|
||||
|
||||
# Test for setting quantization_config
|
||||
def test_llm_set_quantization_config(llm_instance):
|
||||
new_quantization_config = {
|
||||
"load_in_4bit": False,
|
||||
"bnb_4bit_use_double_quant": False,
|
||||
"bnb_4bit_quant_type": "nf4",
|
||||
"bnb_4bit_compute_dtype": torch.bfloat16,
|
||||
}
|
||||
mock_bitsandbytesconfig.return_value = quantization_config
|
||||
llm_instance.set_quantization_config(new_quantization_config)
|
||||
assert llm_instance.quantization_config == new_quantization_config
|
||||
|
||||
HuggingFaceLLM(model_id="test", quantize=True)
|
||||
|
||||
mock_bitsandbytesconfig.assert_called_once_with(
|
||||
**quantization_config
|
||||
)
|
||||
mock_autotokenizer.from_pretrained.assert_called_once_with("test")
|
||||
mock_automodelforcausallm.from_pretrained.assert_called_once_with(
|
||||
"test", quantization_config=quantization_config
|
||||
# Test for setting model_id
|
||||
def test_llm_set_model_id(llm_instance):
|
||||
new_model_id = "EleutherAI/gpt-neo-2.7B"
|
||||
llm_instance.set_model_id(new_model_id)
|
||||
assert llm_instance.model_id == new_model_id
|
||||
|
||||
|
||||
# Test for setting model
|
||||
@patch(
|
||||
"swarms.models.huggingface.AutoModelForCausalLM.from_pretrained"
|
||||
)
|
||||
def test_llm_set_model(mock_model, llm_instance):
|
||||
mock_model.return_value = "mocked model"
|
||||
llm_instance.set_model(mock_model)
|
||||
assert llm_instance.model == "mocked model"
|
||||
|
||||
|
||||
# Test for setting tokenizer
|
||||
@patch("swarms.models.huggingface.AutoTokenizer.from_pretrained")
|
||||
def test_llm_set_tokenizer(mock_tokenizer, llm_instance):
|
||||
mock_tokenizer.return_value = "mocked tokenizer"
|
||||
llm_instance.set_tokenizer(mock_tokenizer)
|
||||
assert llm_instance.tokenizer == "mocked tokenizer"
|
||||
|
||||
|
||||
# Test for setting logger
|
||||
def test_llm_set_logger(llm_instance):
|
||||
new_logger = logging.getLogger("test_logger")
|
||||
llm_instance.set_logger(new_logger)
|
||||
assert llm_instance.logger == new_logger
|
||||
|
||||
|
||||
# Test for saving model
|
||||
@patch("torch.save")
|
||||
def test_llm_save_model(mock_save, llm_instance):
|
||||
llm_instance.save_model("path/to/save")
|
||||
mock_save.assert_called_once()
|
||||
|
||||
|
||||
def test_generate_text(hugging_face_llm):
|
||||
prompt_text = "test prompt"
|
||||
expected_output = "test output"
|
||||
hugging_face_llm.tokenizer.encode.return_value = torch.tensor(
|
||||
[0]
|
||||
) # Mock tensor
|
||||
hugging_face_llm.model.generate.return_value = torch.tensor(
|
||||
[0]
|
||||
) # Mock tensor
|
||||
hugging_face_llm.tokenizer.decode.return_value = expected_output
|
||||
# Test for print_dashboard
|
||||
@patch("builtins.print")
|
||||
def test_llm_print_dashboard(mock_print, llm_instance):
|
||||
llm_instance.print_dashboard("test task")
|
||||
mock_print.assert_called()
|
||||
|
||||
output = hugging_face_llm.generate_text(prompt_text)
|
||||
|
||||
assert output == expected_output
|
||||
# Test for __call__ method
|
||||
@patch("swarms.models.huggingface.HuggingfaceLLM.run")
|
||||
def test_llm_call(mock_run, llm_instance):
|
||||
mock_run.return_value = "mocked output"
|
||||
result = llm_instance("test task")
|
||||
assert result == "mocked output"
|
||||
|
@ -1,394 +0,0 @@
|
||||
import pytest
|
||||
import os
|
||||
from PIL import Image
|
||||
from swarms.models.kosmos2 import Kosmos2, Detections
|
||||
|
||||
|
||||
# Fixture for a sample image
|
||||
@pytest.fixture
|
||||
def sample_image():
|
||||
image = Image.new("RGB", (224, 224))
|
||||
return image
|
||||
|
||||
|
||||
# Fixture for initializing Kosmos2
|
||||
@pytest.fixture
|
||||
def kosmos2():
|
||||
return Kosmos2.initialize()
|
||||
|
||||
|
||||
# Test Kosmos2 initialization
|
||||
def test_kosmos2_initialization(kosmos2):
|
||||
assert kosmos2 is not None
|
||||
|
||||
|
||||
# Test Kosmos2 with a sample image
|
||||
def test_kosmos2_with_sample_image(kosmos2, sample_image):
|
||||
detections = kosmos2(img=sample_image)
|
||||
assert isinstance(detections, Detections)
|
||||
assert (
|
||||
len(detections.xyxy)
|
||||
== len(detections.class_id)
|
||||
== len(detections.confidence)
|
||||
== 0
|
||||
)
|
||||
|
||||
|
||||
# Mocked extract_entities function for testing
|
||||
def mock_extract_entities(text):
|
||||
return [
|
||||
("entity1", (0.1, 0.2, 0.3, 0.4)),
|
||||
("entity2", (0.5, 0.6, 0.7, 0.8)),
|
||||
]
|
||||
|
||||
|
||||
# Mocked process_entities_to_detections function for testing
|
||||
def mock_process_entities_to_detections(entities, image):
|
||||
return Detections(
|
||||
xyxy=[(10, 20, 30, 40), (50, 60, 70, 80)],
|
||||
class_id=[0, 0],
|
||||
confidence=[1.0, 1.0],
|
||||
)
|
||||
|
||||
|
||||
# Test Kosmos2 with mocked entity extraction and detection
|
||||
def test_kosmos2_with_mocked_extraction_and_detection(
|
||||
kosmos2, sample_image, monkeypatch
|
||||
):
|
||||
monkeypatch.setattr(
|
||||
kosmos2, "extract_entities", mock_extract_entities
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
kosmos2,
|
||||
"process_entities_to_detections",
|
||||
mock_process_entities_to_detections,
|
||||
)
|
||||
|
||||
detections = kosmos2(img=sample_image)
|
||||
assert isinstance(detections, Detections)
|
||||
assert (
|
||||
len(detections.xyxy)
|
||||
== len(detections.class_id)
|
||||
== len(detections.confidence)
|
||||
== 2
|
||||
)
|
||||
|
||||
|
||||
# Test Kosmos2 with empty entity extraction
|
||||
def test_kosmos2_with_empty_extraction(
|
||||
kosmos2, sample_image, monkeypatch
|
||||
):
|
||||
monkeypatch.setattr(kosmos2, "extract_entities", lambda x: [])
|
||||
detections = kosmos2(img=sample_image)
|
||||
assert isinstance(detections, Detections)
|
||||
assert (
|
||||
len(detections.xyxy)
|
||||
== len(detections.class_id)
|
||||
== len(detections.confidence)
|
||||
== 0
|
||||
)
|
||||
|
||||
|
||||
# Test Kosmos2 with invalid image path
|
||||
def test_kosmos2_with_invalid_image_path(kosmos2):
|
||||
with pytest.raises(Exception):
|
||||
kosmos2(img="invalid_image_path.jpg")
|
||||
|
||||
|
||||
# Additional tests can be added for various scenarios and edge cases
|
||||
|
||||
|
||||
# Test Kosmos2 with a larger image
|
||||
def test_kosmos2_with_large_image(kosmos2):
|
||||
large_image = Image.new("RGB", (1024, 768))
|
||||
detections = kosmos2(img=large_image)
|
||||
assert isinstance(detections, Detections)
|
||||
assert (
|
||||
len(detections.xyxy)
|
||||
== len(detections.class_id)
|
||||
== len(detections.confidence)
|
||||
== 0
|
||||
)
|
||||
|
||||
|
||||
# Test Kosmos2 with different image formats
|
||||
def test_kosmos2_with_different_image_formats(kosmos2, tmp_path):
|
||||
# Create a temporary directory
|
||||
temp_dir = tmp_path / "images"
|
||||
temp_dir.mkdir()
|
||||
|
||||
# Create sample images in different formats
|
||||
image_formats = ["jpeg", "png", "gif", "bmp"]
|
||||
for format in image_formats:
|
||||
image_path = temp_dir / f"sample_image.{format}"
|
||||
Image.new("RGB", (224, 224)).save(image_path)
|
||||
|
||||
# Test Kosmos2 with each image format
|
||||
for format in image_formats:
|
||||
image_path = temp_dir / f"sample_image.{format}"
|
||||
detections = kosmos2(img=image_path)
|
||||
assert isinstance(detections, Detections)
|
||||
assert (
|
||||
len(detections.xyxy)
|
||||
== len(detections.class_id)
|
||||
== len(detections.confidence)
|
||||
== 0
|
||||
)
|
||||
|
||||
|
||||
# Test Kosmos2 with a non-existent model
|
||||
def test_kosmos2_with_non_existent_model(kosmos2):
|
||||
with pytest.raises(Exception):
|
||||
kosmos2.model = None
|
||||
kosmos2(img="sample_image.jpg")
|
||||
|
||||
|
||||
# Test Kosmos2 with a non-existent processor
|
||||
def test_kosmos2_with_non_existent_processor(kosmos2):
|
||||
with pytest.raises(Exception):
|
||||
kosmos2.processor = None
|
||||
kosmos2(img="sample_image.jpg")
|
||||
|
||||
|
||||
# Test Kosmos2 with missing image
|
||||
def test_kosmos2_with_missing_image(kosmos2):
|
||||
with pytest.raises(Exception):
|
||||
kosmos2(img="non_existent_image.jpg")
|
||||
|
||||
|
||||
# ... (previous tests)
|
||||
|
||||
|
||||
# Test Kosmos2 with a non-existent model and processor
|
||||
def test_kosmos2_with_non_existent_model_and_processor(kosmos2):
|
||||
with pytest.raises(Exception):
|
||||
kosmos2.model = None
|
||||
kosmos2.processor = None
|
||||
kosmos2(img="sample_image.jpg")
|
||||
|
||||
|
||||
# Test Kosmos2 with a corrupted image
|
||||
def test_kosmos2_with_corrupted_image(kosmos2, tmp_path):
|
||||
# Create a temporary directory
|
||||
temp_dir = tmp_path / "images"
|
||||
temp_dir.mkdir()
|
||||
|
||||
# Create a corrupted image
|
||||
corrupted_image_path = temp_dir / "corrupted_image.jpg"
|
||||
with open(corrupted_image_path, "wb") as f:
|
||||
f.write(b"corrupted data")
|
||||
|
||||
with pytest.raises(Exception):
|
||||
kosmos2(img=corrupted_image_path)
|
||||
|
||||
|
||||
# Test Kosmos2 with a large batch size
|
||||
def test_kosmos2_with_large_batch_size(kosmos2, sample_image):
|
||||
kosmos2.batch_size = 32
|
||||
detections = kosmos2(img=sample_image)
|
||||
assert isinstance(detections, Detections)
|
||||
assert (
|
||||
len(detections.xyxy)
|
||||
== len(detections.class_id)
|
||||
== len(detections.confidence)
|
||||
== 0
|
||||
)
|
||||
|
||||
|
||||
# Test Kosmos2 with an invalid compute type
|
||||
def test_kosmos2_with_invalid_compute_type(kosmos2, sample_image):
|
||||
kosmos2.compute_type = "invalid_compute_type"
|
||||
with pytest.raises(Exception):
|
||||
kosmos2(img=sample_image)
|
||||
|
||||
|
||||
# Test Kosmos2 with a valid HF API key
|
||||
def test_kosmos2_with_valid_hf_api_key(kosmos2, sample_image):
|
||||
kosmos2.hf_api_key = "valid_api_key"
|
||||
detections = kosmos2(img=sample_image)
|
||||
assert isinstance(detections, Detections)
|
||||
assert (
|
||||
len(detections.xyxy)
|
||||
== len(detections.class_id)
|
||||
== len(detections.confidence)
|
||||
== 2
|
||||
)
|
||||
|
||||
|
||||
# Test Kosmos2 with an invalid HF API key
|
||||
def test_kosmos2_with_invalid_hf_api_key(kosmos2, sample_image):
|
||||
kosmos2.hf_api_key = "invalid_api_key"
|
||||
with pytest.raises(Exception):
|
||||
kosmos2(img=sample_image)
|
||||
|
||||
|
||||
# Test Kosmos2 with a very long generated text
|
||||
def test_kosmos2_with_long_generated_text(
|
||||
kosmos2, sample_image, monkeypatch
|
||||
):
|
||||
def mock_generate_text(*args, **kwargs):
|
||||
return "A" * 10000
|
||||
|
||||
monkeypatch.setattr(kosmos2.model, "generate", mock_generate_text)
|
||||
detections = kosmos2(img=sample_image)
|
||||
assert isinstance(detections, Detections)
|
||||
assert (
|
||||
len(detections.xyxy)
|
||||
== len(detections.class_id)
|
||||
== len(detections.confidence)
|
||||
== 0
|
||||
)
|
||||
|
||||
|
||||
# Test Kosmos2 with entities containing special characters
|
||||
def test_kosmos2_with_entities_containing_special_characters(
|
||||
kosmos2, sample_image, monkeypatch
|
||||
):
|
||||
def mock_extract_entities(text):
|
||||
return [
|
||||
(
|
||||
"entity1 with special characters (ü, ö, etc.)",
|
||||
(0.1, 0.2, 0.3, 0.4),
|
||||
)
|
||||
]
|
||||
|
||||
monkeypatch.setattr(
|
||||
kosmos2, "extract_entities", mock_extract_entities
|
||||
)
|
||||
detections = kosmos2(img=sample_image)
|
||||
assert isinstance(detections, Detections)
|
||||
assert (
|
||||
len(detections.xyxy)
|
||||
== len(detections.class_id)
|
||||
== len(detections.confidence)
|
||||
== 1
|
||||
)
|
||||
|
||||
|
||||
# Test Kosmos2 with image containing multiple objects
|
||||
def test_kosmos2_with_image_containing_multiple_objects(
|
||||
kosmos2, sample_image, monkeypatch
|
||||
):
|
||||
def mock_extract_entities(text):
|
||||
return [
|
||||
("entity1", (0.1, 0.2, 0.3, 0.4)),
|
||||
("entity2", (0.5, 0.6, 0.7, 0.8)),
|
||||
]
|
||||
|
||||
monkeypatch.setattr(
|
||||
kosmos2, "extract_entities", mock_extract_entities
|
||||
)
|
||||
detections = kosmos2(img=sample_image)
|
||||
assert isinstance(detections, Detections)
|
||||
assert (
|
||||
len(detections.xyxy)
|
||||
== len(detections.class_id)
|
||||
== len(detections.confidence)
|
||||
== 2
|
||||
)
|
||||
|
||||
|
||||
# Test Kosmos2 with image containing no objects
|
||||
def test_kosmos2_with_image_containing_no_objects(
|
||||
kosmos2, sample_image, monkeypatch
|
||||
):
|
||||
def mock_extract_entities(text):
|
||||
return []
|
||||
|
||||
monkeypatch.setattr(
|
||||
kosmos2, "extract_entities", mock_extract_entities
|
||||
)
|
||||
detections = kosmos2(img=sample_image)
|
||||
assert isinstance(detections, Detections)
|
||||
assert (
|
||||
len(detections.xyxy)
|
||||
== len(detections.class_id)
|
||||
== len(detections.confidence)
|
||||
== 0
|
||||
)
|
||||
|
||||
|
||||
# Test Kosmos2 with a valid YouTube video URL
|
||||
def test_kosmos2_with_valid_youtube_video_url(kosmos2):
|
||||
youtube_video_url = "https://www.youtube.com/watch?v=VIDEO_ID"
|
||||
detections = kosmos2(video_url=youtube_video_url)
|
||||
assert isinstance(detections, Detections)
|
||||
assert (
|
||||
len(detections.xyxy)
|
||||
== len(detections.class_id)
|
||||
== len(detections.confidence)
|
||||
== 2
|
||||
)
|
||||
|
||||
|
||||
# Test Kosmos2 with an invalid YouTube video URL
|
||||
def test_kosmos2_with_invalid_youtube_video_url(kosmos2):
|
||||
invalid_youtube_video_url = (
|
||||
"https://www.youtube.com/invalid_video"
|
||||
)
|
||||
with pytest.raises(Exception):
|
||||
kosmos2(video_url=invalid_youtube_video_url)
|
||||
|
||||
|
||||
# Test Kosmos2 with no YouTube video URL provided
|
||||
def test_kosmos2_with_no_youtube_video_url(kosmos2):
|
||||
with pytest.raises(Exception):
|
||||
kosmos2(video_url=None)
|
||||
|
||||
|
||||
# Test Kosmos2 installation
|
||||
def test_kosmos2_installation():
|
||||
kosmos2 = Kosmos2()
|
||||
kosmos2.install()
|
||||
assert os.path.exists("video.mp4")
|
||||
assert os.path.exists("video.mp3")
|
||||
os.remove("video.mp4")
|
||||
os.remove("video.mp3")
|
||||
|
||||
|
||||
# Test Kosmos2 termination
|
||||
def test_kosmos2_termination(kosmos2):
|
||||
kosmos2.terminate()
|
||||
assert kosmos2.process is None
|
||||
|
||||
|
||||
# Test Kosmos2 start_process method
|
||||
def test_kosmos2_start_process(kosmos2):
|
||||
kosmos2.start_process()
|
||||
assert kosmos2.process is not None
|
||||
|
||||
|
||||
# Test Kosmos2 preprocess_code method
|
||||
def test_kosmos2_preprocess_code(kosmos2):
|
||||
code = "print('Hello, World!')"
|
||||
preprocessed_code = kosmos2.preprocess_code(code)
|
||||
assert isinstance(preprocessed_code, str)
|
||||
assert "end_of_execution" in preprocessed_code
|
||||
|
||||
|
||||
# Test Kosmos2 run method with debug mode
|
||||
def test_kosmos2_run_with_debug_mode(kosmos2, sample_image):
|
||||
kosmos2.debug_mode = True
|
||||
detections = kosmos2(img=sample_image)
|
||||
assert isinstance(detections, Detections)
|
||||
|
||||
|
||||
# Test Kosmos2 handle_stream_output method
|
||||
def test_kosmos2_handle_stream_output(kosmos2):
|
||||
stream_output = "Sample output"
|
||||
kosmos2.handle_stream_output(stream_output, is_error=False)
|
||||
|
||||
|
||||
# Test Kosmos2 run method with invalid image path
|
||||
def test_kosmos2_run_with_invalid_image_path(kosmos2):
|
||||
with pytest.raises(Exception):
|
||||
kosmos2.run(img="invalid_image_path.jpg")
|
||||
|
||||
|
||||
# Test Kosmos2 run method with invalid video URL
|
||||
def test_kosmos2_run_with_invalid_video_url(kosmos2):
|
||||
with pytest.raises(Exception):
|
||||
kosmos2.run(video_url="invalid_video_url")
|
||||
|
||||
|
||||
# ... (more tests)
|
@ -0,0 +1,53 @@
|
||||
import pytest
|
||||
from unittest.mock import patch, MagicMock
|
||||
from swarms.models.mixtral import Mixtral
|
||||
|
||||
|
||||
@patch("swarms.models.mixtral.AutoTokenizer")
|
||||
@patch("swarms.models.mixtral.AutoModelForCausalLM")
|
||||
def test_mixtral_init(mock_model, mock_tokenizer):
|
||||
mixtral = Mixtral()
|
||||
mock_tokenizer.from_pretrained.assert_called_once()
|
||||
mock_model.from_pretrained.assert_called_once()
|
||||
assert mixtral.model_name == "mistralai/Mixtral-8x7B-v0.1"
|
||||
assert mixtral.max_new_tokens == 20
|
||||
|
||||
|
||||
@patch("swarms.models.mixtral.AutoTokenizer")
|
||||
@patch("swarms.models.mixtral.AutoModelForCausalLM")
|
||||
def test_mixtral_run(mock_model, mock_tokenizer):
|
||||
mixtral = Mixtral()
|
||||
mock_tokenizer_instance = MagicMock()
|
||||
mock_model_instance = MagicMock()
|
||||
mock_tokenizer.from_pretrained.return_value = (
|
||||
mock_tokenizer_instance
|
||||
)
|
||||
mock_model.from_pretrained.return_value = mock_model_instance
|
||||
mock_tokenizer_instance.return_tensors = "pt"
|
||||
mock_model_instance.generate.return_value = [101, 102, 103]
|
||||
mock_tokenizer_instance.decode.return_value = "Generated text"
|
||||
result = mixtral.run("Test task")
|
||||
assert result == "Generated text"
|
||||
mock_tokenizer_instance.assert_called_once_with(
|
||||
"Test task", return_tensors="pt"
|
||||
)
|
||||
mock_model_instance.generate.assert_called_once()
|
||||
mock_tokenizer_instance.decode.assert_called_once_with(
|
||||
[101, 102, 103], skip_special_tokens=True
|
||||
)
|
||||
|
||||
|
||||
@patch("swarms.models.mixtral.AutoTokenizer")
|
||||
@patch("swarms.models.mixtral.AutoModelForCausalLM")
|
||||
def test_mixtral_run_error(mock_model, mock_tokenizer):
|
||||
mixtral = Mixtral()
|
||||
mock_tokenizer_instance = MagicMock()
|
||||
mock_model_instance = MagicMock()
|
||||
mock_tokenizer.from_pretrained.return_value = (
|
||||
mock_tokenizer_instance
|
||||
)
|
||||
mock_model.from_pretrained.return_value = mock_model_instance
|
||||
mock_tokenizer_instance.return_tensors = "pt"
|
||||
mock_model_instance.generate.side_effect = Exception("Test error")
|
||||
with pytest.raises(Exception, match="Test error"):
|
||||
mixtral.run("Test task")
|
@ -1,222 +0,0 @@
|
||||
import os
|
||||
import subprocess
|
||||
import tempfile
|
||||
from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
import whisperx
|
||||
from pydub import AudioSegment
|
||||
from pytube import YouTube
|
||||
from swarms.models.whisperx_model import WhisperX
|
||||
|
||||
|
||||
# Fixture to create a temporary directory for testing
|
||||
@pytest.fixture
|
||||
def temp_dir():
|
||||
with tempfile.TemporaryDirectory() as tempdir:
|
||||
yield tempdir
|
||||
|
||||
|
||||
# Mock subprocess.run to prevent actual installation during tests
|
||||
@patch.object(subprocess, "run")
|
||||
def test_speech_to_text_install(mock_run):
|
||||
stt = WhisperX("https://www.youtube.com/watch?v=MJd6pr16LRM")
|
||||
stt.install()
|
||||
mock_run.assert_called_with(["pip", "install", "whisperx"])
|
||||
|
||||
|
||||
# Mock pytube.YouTube and pytube.Streams for download tests
|
||||
@patch("pytube.YouTube")
|
||||
@patch.object(YouTube, "streams")
|
||||
def test_speech_to_text_download_youtube_video(
|
||||
mock_streams, mock_youtube, temp_dir
|
||||
):
|
||||
# Mock YouTube and streams
|
||||
video_url = "https://www.youtube.com/watch?v=MJd6pr16LRM"
|
||||
mock_stream = mock_streams().filter().first()
|
||||
mock_stream.download.return_value = os.path.join(
|
||||
temp_dir, "video.mp4"
|
||||
)
|
||||
mock_youtube.return_value = mock_youtube
|
||||
mock_youtube.streams = mock_streams
|
||||
|
||||
stt = WhisperX(video_url)
|
||||
audio_file = stt.download_youtube_video()
|
||||
|
||||
assert os.path.exists(audio_file)
|
||||
assert audio_file.endswith(".mp3")
|
||||
|
||||
|
||||
# Mock whisperx.load_model and whisperx.load_audio for transcribe tests
|
||||
@patch("whisperx.load_model")
|
||||
@patch("whisperx.load_audio")
|
||||
@patch("whisperx.load_align_model")
|
||||
@patch("whisperx.align")
|
||||
@patch.object(whisperx.DiarizationPipeline, "__call__")
|
||||
def test_speech_to_text_transcribe_youtube_video(
|
||||
mock_diarization,
|
||||
mock_align,
|
||||
mock_align_model,
|
||||
mock_load_audio,
|
||||
mock_load_model,
|
||||
temp_dir,
|
||||
):
|
||||
# Mock whisperx functions
|
||||
mock_load_model.return_value = mock_load_model
|
||||
mock_load_model.transcribe.return_value = {
|
||||
"language": "en",
|
||||
"segments": [{"text": "Hello, World!"}],
|
||||
}
|
||||
|
||||
mock_load_audio.return_value = "audio_path"
|
||||
mock_align_model.return_value = (mock_align_model, "metadata")
|
||||
mock_align.return_value = {
|
||||
"segments": [{"text": "Hello, World!"}]
|
||||
}
|
||||
|
||||
# Mock diarization pipeline
|
||||
mock_diarization.return_value = None
|
||||
|
||||
video_url = "https://www.youtube.com/watch?v=MJd6pr16LRM/video"
|
||||
stt = WhisperX(video_url)
|
||||
transcription = stt.transcribe_youtube_video()
|
||||
|
||||
assert transcription == "Hello, World!"
|
||||
|
||||
|
||||
# More tests for different scenarios and edge cases can be added here.
|
||||
|
||||
|
||||
# Test transcribe method with provided audio file
|
||||
def test_speech_to_text_transcribe_audio_file(temp_dir):
|
||||
# Create a temporary audio file
|
||||
audio_file = os.path.join(temp_dir, "test_audio.mp3")
|
||||
AudioSegment.silent(duration=500).export(audio_file, format="mp3")
|
||||
|
||||
stt = WhisperX("https://www.youtube.com/watch?v=MJd6pr16LRM")
|
||||
transcription = stt.transcribe(audio_file)
|
||||
|
||||
assert transcription == ""
|
||||
|
||||
|
||||
# Test transcribe method when Whisperx fails
|
||||
@patch("whisperx.load_model")
|
||||
@patch("whisperx.load_audio")
|
||||
def test_speech_to_text_transcribe_whisperx_failure(
|
||||
mock_load_audio, mock_load_model, temp_dir
|
||||
):
|
||||
# Mock whisperx functions to raise an exception
|
||||
mock_load_model.side_effect = Exception("Whisperx failed")
|
||||
mock_load_audio.return_value = "audio_path"
|
||||
|
||||
stt = WhisperX("https://www.youtube.com/watch?v=MJd6pr16LRM")
|
||||
transcription = stt.transcribe("audio_path")
|
||||
|
||||
assert transcription == "Whisperx failed"
|
||||
|
||||
|
||||
# Test transcribe method with missing 'segments' key in Whisperx output
|
||||
@patch("whisperx.load_model")
|
||||
@patch("whisperx.load_audio")
|
||||
@patch("whisperx.load_align_model")
|
||||
@patch("whisperx.align")
|
||||
@patch.object(whisperx.DiarizationPipeline, "__call__")
|
||||
def test_speech_to_text_transcribe_missing_segments(
|
||||
mock_diarization,
|
||||
mock_align,
|
||||
mock_align_model,
|
||||
mock_load_audio,
|
||||
mock_load_model,
|
||||
):
|
||||
# Mock whisperx functions to return incomplete output
|
||||
mock_load_model.return_value = mock_load_model
|
||||
mock_load_model.transcribe.return_value = {"language": "en"}
|
||||
|
||||
mock_load_audio.return_value = "audio_path"
|
||||
mock_align_model.return_value = (mock_align_model, "metadata")
|
||||
mock_align.return_value = {}
|
||||
|
||||
# Mock diarization pipeline
|
||||
mock_diarization.return_value = None
|
||||
|
||||
stt = WhisperX("https://www.youtube.com/watch?v=MJd6pr16LRM")
|
||||
transcription = stt.transcribe("audio_path")
|
||||
|
||||
assert transcription == ""
|
||||
|
||||
|
||||
# Test transcribe method with Whisperx align failure
|
||||
@patch("whisperx.load_model")
|
||||
@patch("whisperx.load_audio")
|
||||
@patch("whisperx.load_align_model")
|
||||
@patch("whisperx.align")
|
||||
@patch.object(whisperx.DiarizationPipeline, "__call__")
|
||||
def test_speech_to_text_transcribe_align_failure(
|
||||
mock_diarization,
|
||||
mock_align,
|
||||
mock_align_model,
|
||||
mock_load_audio,
|
||||
mock_load_model,
|
||||
):
|
||||
# Mock whisperx functions to raise an exception during align
|
||||
mock_load_model.return_value = mock_load_model
|
||||
mock_load_model.transcribe.return_value = {
|
||||
"language": "en",
|
||||
"segments": [{"text": "Hello, World!"}],
|
||||
}
|
||||
|
||||
mock_load_audio.return_value = "audio_path"
|
||||
mock_align_model.return_value = (mock_align_model, "metadata")
|
||||
mock_align.side_effect = Exception("Align failed")
|
||||
|
||||
# Mock diarization pipeline
|
||||
mock_diarization.return_value = None
|
||||
|
||||
stt = WhisperX("https://www.youtube.com/watch?v=MJd6pr16LRM")
|
||||
transcription = stt.transcribe("audio_path")
|
||||
|
||||
assert transcription == "Align failed"
|
||||
|
||||
|
||||
# Test transcribe_youtube_video when Whisperx diarization fails
|
||||
@patch("pytube.YouTube")
|
||||
@patch.object(YouTube, "streams")
|
||||
@patch("whisperx.DiarizationPipeline")
|
||||
@patch("whisperx.load_audio")
|
||||
@patch("whisperx.load_align_model")
|
||||
@patch("whisperx.align")
|
||||
def test_speech_to_text_transcribe_diarization_failure(
|
||||
mock_align,
|
||||
mock_align_model,
|
||||
mock_load_audio,
|
||||
mock_diarization,
|
||||
mock_streams,
|
||||
mock_youtube,
|
||||
temp_dir,
|
||||
):
|
||||
# Mock YouTube and streams
|
||||
video_url = "https://www.youtube.com/watch?v=MJd6pr16LRM"
|
||||
mock_stream = mock_streams().filter().first()
|
||||
mock_stream.download.return_value = os.path.join(
|
||||
temp_dir, "video.mp4"
|
||||
)
|
||||
mock_youtube.return_value = mock_youtube
|
||||
mock_youtube.streams = mock_streams
|
||||
|
||||
# Mock whisperx functions
|
||||
mock_load_audio.return_value = "audio_path"
|
||||
mock_align_model.return_value = (mock_align_model, "metadata")
|
||||
mock_align.return_value = {
|
||||
"segments": [{"text": "Hello, World!"}]
|
||||
}
|
||||
|
||||
# Mock diarization pipeline to raise an exception
|
||||
mock_diarization.side_effect = Exception("Diarization failed")
|
||||
|
||||
stt = WhisperX(video_url)
|
||||
transcription = stt.transcribe_youtube_video()
|
||||
|
||||
assert transcription == "Diarization failed"
|
||||
|
||||
|
||||
# Add more tests for other scenarios and edge cases as needed.
|
@ -1,7 +1,7 @@
|
||||
import pytest
|
||||
import os
|
||||
from datetime import datetime
|
||||
from swarms.swarms.base import BaseStructure
|
||||
from swarms.structs.base import BaseStructure
|
||||
|
||||
|
||||
class TestBaseStructure:
|
@ -0,0 +1,241 @@
|
||||
import pytest
|
||||
from swarms.structs.conversation import Conversation
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def conversation():
|
||||
conv = Conversation()
|
||||
conv.add("user", "Hello, world!")
|
||||
conv.add("assistant", "Hello, user!")
|
||||
return conv
|
||||
|
||||
|
||||
def test_add_message():
|
||||
conv = Conversation()
|
||||
conv.add("user", "Hello, world!")
|
||||
assert len(conv.conversation_history) == 1
|
||||
assert conv.conversation_history[0]["role"] == "user"
|
||||
assert conv.conversation_history[0]["content"] == "Hello, world!"
|
||||
|
||||
|
||||
def test_add_message_with_time():
|
||||
conv = Conversation(time_enabled=True)
|
||||
conv.add("user", "Hello, world!")
|
||||
assert len(conv.conversation_history) == 1
|
||||
assert conv.conversation_history[0]["role"] == "user"
|
||||
assert conv.conversation_history[0]["content"] == "Hello, world!"
|
||||
assert "timestamp" in conv.conversation_history[0]
|
||||
|
||||
|
||||
def test_delete_message():
|
||||
conv = Conversation()
|
||||
conv.add("user", "Hello, world!")
|
||||
conv.delete(0)
|
||||
assert len(conv.conversation_history) == 0
|
||||
|
||||
|
||||
def test_delete_message_out_of_bounds():
|
||||
conv = Conversation()
|
||||
conv.add("user", "Hello, world!")
|
||||
with pytest.raises(IndexError):
|
||||
conv.delete(1)
|
||||
|
||||
|
||||
def test_update_message():
|
||||
conv = Conversation()
|
||||
conv.add("user", "Hello, world!")
|
||||
conv.update(0, "assistant", "Hello, user!")
|
||||
assert len(conv.conversation_history) == 1
|
||||
assert conv.conversation_history[0]["role"] == "assistant"
|
||||
assert conv.conversation_history[0]["content"] == "Hello, user!"
|
||||
|
||||
|
||||
def test_update_message_out_of_bounds():
|
||||
conv = Conversation()
|
||||
conv.add("user", "Hello, world!")
|
||||
with pytest.raises(IndexError):
|
||||
conv.update(1, "assistant", "Hello, user!")
|
||||
|
||||
|
||||
def test_return_history_as_string_with_messages(conversation):
|
||||
result = conversation.return_history_as_string()
|
||||
assert result is not None
|
||||
|
||||
|
||||
def test_return_history_as_string_with_no_messages():
|
||||
conv = Conversation()
|
||||
result = conv.return_history_as_string()
|
||||
assert result == ""
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"role, content",
|
||||
[
|
||||
("user", "Hello, world!"),
|
||||
("assistant", "Hello, user!"),
|
||||
("system", "System message"),
|
||||
("function", "Function message"),
|
||||
],
|
||||
)
|
||||
def test_return_history_as_string_with_different_roles(role, content):
|
||||
conv = Conversation()
|
||||
conv.add(role, content)
|
||||
result = conv.return_history_as_string()
|
||||
expected = f"{role}: {content}\n\n"
|
||||
assert result == expected
|
||||
|
||||
|
||||
@pytest.mark.parametrize("message_count", range(1, 11))
|
||||
def test_return_history_as_string_with_multiple_messages(
|
||||
message_count,
|
||||
):
|
||||
conv = Conversation()
|
||||
for i in range(message_count):
|
||||
conv.add("user", f"Message {i + 1}")
|
||||
result = conv.return_history_as_string()
|
||||
expected = "".join(
|
||||
[f"user: Message {i + 1}\n\n" for i in range(message_count)]
|
||||
)
|
||||
assert result == expected
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"content",
|
||||
[
|
||||
"Hello, world!",
|
||||
"This is a longer message with multiple words.",
|
||||
"This message\nhas multiple\nlines.",
|
||||
"This message has special characters: !@#$%^&*()",
|
||||
"This message has unicode characters: 你好,世界!",
|
||||
],
|
||||
)
|
||||
def test_return_history_as_string_with_different_contents(content):
|
||||
conv = Conversation()
|
||||
conv.add("user", content)
|
||||
result = conv.return_history_as_string()
|
||||
expected = f"user: {content}\n\n"
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_return_history_as_string_with_large_message(conversation):
|
||||
large_message = "Hello, world! " * 10000 # 10,000 repetitions
|
||||
conversation.add("user", large_message)
|
||||
result = conversation.return_history_as_string()
|
||||
expected = (
|
||||
"user: Hello, world!\n\nassistant: Hello, user!\n\nuser:"
|
||||
f" {large_message}\n\n"
|
||||
)
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_search_keyword_in_conversation(conversation):
|
||||
result = conversation.search_keyword_in_conversation("Hello")
|
||||
assert len(result) == 2
|
||||
assert result[0]["content"] == "Hello, world!"
|
||||
assert result[1]["content"] == "Hello, user!"
|
||||
|
||||
|
||||
def test_export_import_conversation(conversation, tmp_path):
|
||||
filename = tmp_path / "conversation.txt"
|
||||
conversation.export_conversation(filename)
|
||||
new_conversation = Conversation()
|
||||
new_conversation.import_conversation(filename)
|
||||
assert (
|
||||
new_conversation.return_history_as_string()
|
||||
== conversation.return_history_as_string()
|
||||
)
|
||||
|
||||
|
||||
def test_count_messages_by_role(conversation):
|
||||
counts = conversation.count_messages_by_role()
|
||||
assert counts["user"] == 1
|
||||
assert counts["assistant"] == 1
|
||||
|
||||
|
||||
def test_display_conversation(capsys, conversation):
|
||||
conversation.display_conversation()
|
||||
captured = capsys.readouterr()
|
||||
assert "user: Hello, world!\n\n" in captured.out
|
||||
assert "assistant: Hello, user!\n\n" in captured.out
|
||||
|
||||
|
||||
def test_display_conversation_detailed(capsys, conversation):
|
||||
conversation.display_conversation(detailed=True)
|
||||
captured = capsys.readouterr()
|
||||
assert "user: Hello, world!\n\n" in captured.out
|
||||
assert "assistant: Hello, user!\n\n" in captured.out
|
||||
|
||||
|
||||
def test_search():
|
||||
conv = Conversation()
|
||||
conv.add("user", "Hello, world!")
|
||||
conv.add("assistant", "Hello, user!")
|
||||
results = conv.search("Hello")
|
||||
assert len(results) == 2
|
||||
assert results[0]["content"] == "Hello, world!"
|
||||
assert results[1]["content"] == "Hello, user!"
|
||||
|
||||
|
||||
def test_return_history_as_string():
|
||||
conv = Conversation()
|
||||
conv.add("user", "Hello, world!")
|
||||
conv.add("assistant", "Hello, user!")
|
||||
result = conv.return_history_as_string()
|
||||
expected = "user: Hello, world!\n\nassistant: Hello, user!\n\n"
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_search_no_results():
|
||||
conv = Conversation()
|
||||
conv.add("user", "Hello, world!")
|
||||
conv.add("assistant", "Hello, user!")
|
||||
results = conv.search("Goodbye")
|
||||
assert len(results) == 0
|
||||
|
||||
|
||||
def test_search_case_insensitive():
|
||||
conv = Conversation()
|
||||
conv.add("user", "Hello, world!")
|
||||
conv.add("assistant", "Hello, user!")
|
||||
results = conv.search("hello")
|
||||
assert len(results) == 2
|
||||
assert results[0]["content"] == "Hello, world!"
|
||||
assert results[1]["content"] == "Hello, user!"
|
||||
|
||||
|
||||
def test_search_multiple_occurrences():
|
||||
conv = Conversation()
|
||||
conv.add("user", "Hello, world! Hello, world!")
|
||||
conv.add("assistant", "Hello, user!")
|
||||
results = conv.search("Hello")
|
||||
assert len(results) == 2
|
||||
assert results[0]["content"] == "Hello, world! Hello, world!"
|
||||
assert results[1]["content"] == "Hello, user!"
|
||||
|
||||
|
||||
def test_query_no_results():
|
||||
conv = Conversation()
|
||||
conv.add("user", "Hello, world!")
|
||||
conv.add("assistant", "Hello, user!")
|
||||
results = conv.query("Goodbye")
|
||||
assert len(results) == 0
|
||||
|
||||
|
||||
def test_query_case_insensitive():
|
||||
conv = Conversation()
|
||||
conv.add("user", "Hello, world!")
|
||||
conv.add("assistant", "Hello, user!")
|
||||
results = conv.query("hello")
|
||||
assert len(results) == 2
|
||||
assert results[0]["content"] == "Hello, world!"
|
||||
assert results[1]["content"] == "Hello, user!"
|
||||
|
||||
|
||||
def test_query_multiple_occurrences():
|
||||
conv = Conversation()
|
||||
conv.add("user", "Hello, world! Hello, world!")
|
||||
conv.add("assistant", "Hello, user!")
|
||||
results = conv.query("Hello")
|
||||
assert len(results) == 2
|
||||
assert results[0]["content"] == "Hello, world! Hello, world!"
|
||||
assert results[1]["content"] == "Hello, user!"
|
@ -1,36 +0,0 @@
|
||||
from unittest.mock import patch
|
||||
from swarms.swarms.god_mode import GodMode, LLM
|
||||
|
||||
|
||||
def test_godmode_initialization():
|
||||
godmode = GodMode(llms=[LLM] * 5)
|
||||
assert isinstance(godmode, GodMode)
|
||||
assert len(godmode.llms) == 5
|
||||
|
||||
|
||||
def test_godmode_run(monkeypatch):
|
||||
def mock_llm_run(self, task):
|
||||
return "response"
|
||||
|
||||
monkeypatch.setattr(LLM, "run", mock_llm_run)
|
||||
godmode = GodMode(llms=[LLM] * 5)
|
||||
responses = godmode.run("task1")
|
||||
assert len(responses) == 5
|
||||
assert responses == [
|
||||
"response",
|
||||
"response",
|
||||
"response",
|
||||
"response",
|
||||
"response",
|
||||
]
|
||||
|
||||
|
||||
@patch("builtins.print")
|
||||
def test_godmode_print_responses(mock_print, monkeypatch):
|
||||
def mock_llm_run(self, task):
|
||||
return "response"
|
||||
|
||||
monkeypatch.setattr(LLM, "run", mock_llm_run)
|
||||
godmode = GodMode(llms=[LLM] * 5)
|
||||
godmode.print_responses("task1")
|
||||
assert mock_print.call_count == 1
|
@ -1,7 +1,5 @@
|
||||
import os
|
||||
import subprocess
|
||||
import json
|
||||
import re
|
||||
import requests
|
||||
from dotenv import load_dotenv
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in new issue