Merge pull request #3 from kyegomez/master

sync with kye 

Former-commit-id: f536e178fb
discord-bot-framework
Zack Bradshaw 1 year ago committed by GitHub
commit e08c452d58

@ -23,7 +23,7 @@ At Swarms, we're transforming the landscape of AI from siloed AI agents to a uni
----- -----
# 🤝 Schedule a 1-on-1 Session # 🤝 Schedule a 1-on-1 Session
Book a [1-on-1 Session with Kye](https://calendly.com/apacai/agora), the Creator, to discuss any issues, provide feedback, or explore how we can improve Swarms for you. Book a [1-on-1 Session with Kye](https://calendly.com/swarm-corp/30min), the Creator, to discuss any issues, provide feedback, or explore how we can improve Swarms for you.
---------- ----------

@ -0,0 +1,187 @@
```markdown
# Swarm Alpha: Data Cruncher
**Overview**: Processes large datasets.
**Strengths**: Efficient data handling.
**Weaknesses**: Requires structured data.
**Pseudo Code**:
```sql
FOR each data_entry IN dataset:
result = PROCESS(data_entry)
STORE(result)
END FOR
RETURN aggregated_results
```
# Swarm Beta: Artistic Ally
**Overview**: Generates art pieces.
**Strengths**: Creativity.
**Weaknesses**: Somewhat unpredictable.
**Pseudo Code**:
```scss
INITIATE canvas_parameters
SELECT art_style
DRAW(canvas_parameters, art_style)
RETURN finished_artwork
```
# Swarm Gamma: Sound Sculptor
**Overview**: Crafts audio sequences.
**Strengths**: Diverse audio outputs.
**Weaknesses**: Complexity in refining outputs.
**Pseudo Code**:
```sql
DEFINE sound_parameters
SELECT audio_style
GENERATE_AUDIO(sound_parameters, audio_style)
RETURN audio_sequence
```
# Swarm Delta: Web Weaver
**Overview**: Constructs web designs.
**Strengths**: Modern design sensibility.
**Weaknesses**: Limited to web interfaces.
**Pseudo Code**:
```scss
SELECT template
APPLY user_preferences(template)
DESIGN_web(template, user_preferences)
RETURN web_design
```
# Swarm Epsilon: Code Compiler
**Overview**: Writes and compiles code snippets.
**Strengths**: Quick code generation.
**Weaknesses**: Limited to certain programming languages.
**Pseudo Code**:
```scss
DEFINE coding_task
WRITE_CODE(coding_task)
COMPILE(code)
RETURN executable
```
# Swarm Zeta: Security Shield
**Overview**: Detects system vulnerabilities.
**Strengths**: High threat detection rate.
**Weaknesses**: Potential false positives.
**Pseudo Code**:
```sql
MONITOR system_activity
IF suspicious_activity_detected:
ANALYZE threat_level
INITIATE mitigation_protocol
END IF
RETURN system_status
```
# Swarm Eta: Researcher Relay
**Overview**: Gathers and synthesizes research data.
**Strengths**: Access to vast databases.
**Weaknesses**: Depth of research can vary.
**Pseudo Code**:
```sql
DEFINE research_topic
SEARCH research_sources(research_topic)
SYNTHESIZE findings
RETURN research_summary
```
---
# Swarm Theta: Sentiment Scanner
**Overview**: Analyzes text for sentiment and emotional tone.
**Strengths**: Accurate sentiment detection.
**Weaknesses**: Contextual nuances might be missed.
**Pseudo Code**:
```arduino
INPUT text_data
ANALYZE text_data FOR emotional_tone
DETERMINE sentiment_value
RETURN sentiment_value
```
# Swarm Iota: Image Interpreter
**Overview**: Processes and categorizes images.
**Strengths**: High image recognition accuracy.
**Weaknesses**: Can struggle with abstract visuals.
**Pseudo Code**:
```objective-c
LOAD image_data
PROCESS image_data FOR features
CATEGORIZE image_based_on_features
RETURN image_category
```
# Swarm Kappa: Language Learner
**Overview**: Translates and interprets multiple languages.
**Strengths**: Supports multiple languages.
**Weaknesses**: Nuances in dialects might pose challenges.
**Pseudo Code**:
```vbnet
RECEIVE input_text, target_language
TRANSLATE input_text TO target_language
RETURN translated_text
```
# Swarm Lambda: Trend Tracker
**Overview**: Monitors and predicts trends based on data.
**Strengths**: Proactive trend identification.
**Weaknesses**: Requires continuous data stream.
**Pseudo Code**:
```sql
COLLECT data_over_time
ANALYZE data_trends
PREDICT upcoming_trends
RETURN trend_forecast
```
# Swarm Mu: Financial Forecaster
**Overview**: Analyzes financial data to predict market movements.
**Strengths**: In-depth financial analytics.
**Weaknesses**: Market volatility can affect predictions.
**Pseudo Code**:
```sql
GATHER financial_data
COMPUTE statistical_analysis
FORECAST market_movements
RETURN financial_projections
```
# Swarm Nu: Network Navigator
**Overview**: Optimizes and manages network traffic.
**Strengths**: Efficient traffic management.
**Weaknesses**: Depends on network infrastructure.
**Pseudo Code**:
```sql
MONITOR network_traffic
IDENTIFY congestion_points
OPTIMIZE traffic_flow
RETURN network_status
```
# Swarm Xi: Content Curator
**Overview**: Gathers and presents content based on user preferences.
**Strengths**: Personalized content delivery.
**Weaknesses**: Limited by available content sources.
**Pseudo Code**:
```sql
DEFINE user_preferences
SEARCH content_sources
FILTER content_matching_preferences
DISPLAY curated_content
```

@ -0,0 +1,50 @@
# Swarms Multi-Agent Permissions System (SMAPS)
## Description
SMAPS is a robust permissions management system designed to integrate seamlessly with Swarm's multi-agent AI framework. Drawing inspiration from Amazon's IAM, SMAPS ensures secure, granular control over agent actions while allowing for collaborative human-in-the-loop interventions.
## Technical Specification
### 1. Components
- **User Management**: Handle user registrations, roles, and profiles.
- **Agent Management**: Register, monitor, and manage AI agents.
- **Permissions Engine**: Define and enforce permissions based on roles.
- **Multiplayer Interface**: Allows multiple human users to intervene, guide, or collaborate on tasks being executed by AI agents.
### 2. Features
- **Role-Based Access Control (RBAC)**:
- Users can be assigned predefined roles (e.g., Admin, Agent Supervisor, Collaborator).
- Each role has specific permissions associated with it, defining what actions can be performed on AI agents or tasks.
- **Dynamic Permissions**:
- Create custom roles with specific permissions.
- Permissions granularity: From broad (e.g., view all tasks) to specific (e.g., modify parameters of a particular agent).
- **Multiplayer Collaboration**:
- Multiple users can join a task in real-time.
- Collaborators can provide real-time feedback or guidance to AI agents.
- A voting system for decision-making when human intervention is required.
- **Agent Supervision**:
- Monitor agent actions in real-time.
- Intervene, if necessary, to guide agent actions based on permissions.
- **Audit Trail**:
- All actions, whether performed by humans or AI agents, are logged.
- Review historical actions, decisions, and interventions for accountability and improvement.
### 3. Security
- **Authentication**: Secure login mechanisms with multi-factor authentication options.
- **Authorization**: Ensure users and agents can only perform actions they are permitted to.
- **Data Encryption**: All data, whether at rest or in transit, is encrypted using industry-standard protocols.
### 4. Integration
- **APIs**: Expose APIs for integrating SMAPS with other systems or for extending its capabilities.
- **SDK**: Provide software development kits for popular programming languages to facilitate integration and extension.
## Documentation Description
Swarms Multi-Agent Permissions System (SMAPS) offers a sophisticated permissions management mechanism tailored for multi-agent AI frameworks. It combines the robustness of Amazon IAM-like permissions with a unique "multiplayer" feature, allowing multiple humans to collaboratively guide AI agents in real-time. This ensures not only that tasks are executed efficiently but also that they uphold the highest standards of accuracy and ethics. With SMAPS, businesses can harness the power of swarms with confidence, knowing that they have full control and transparency over their AI operations.

@ -0,0 +1,73 @@
# AgentArchive Documentation
## Swarms Multi-Agent Framework
**AgentArchive is an advanced feature crafted to archive, bookmark, and harness the transcripts of agent runs. It promotes the storing and leveraging of successful agent interactions, offering a powerful means for users to derive "recipes" for future agents. Furthermore, with its public archive feature, users can contribute to and benefit from the collective wisdom of the community.**
---
## Overview:
AgentArchive empowers users to:
1. Preserve complete transcripts of agent instances.
2. Bookmark and annotate significant runs.
3. Categorize runs using various tags.
4. Transform successful runs into actionable "recipes".
5. Publish and access a shared knowledge base via a public archive.
---
## Features:
### 1. Archiving:
- **Save Transcripts**: Retain the full narrative of an agent's interaction and choices.
- **Searchable Database**: Dive into archives using specific keywords, timestamps, or tags.
### 2. Bookmarking:
- **Highlight Essential Runs**: Designate specific agent runs for future reference.
- **Annotations**: Embed notes or remarks to bookmarked runs for clearer understanding.
### 3. Tagging:
Organize and classify agent runs via:
- **Prompt**: The originating instruction that triggered the agent run.
- **Tasks**: Distinct tasks or operations executed by the agent.
- **Model**: The specific AI model or iteration used during the interaction.
- **Temperature (Temp)**: The set randomness or innovation level for the agent.
### 4. Recipe Generation:
- **Standardization**: Convert successful run transcripts into replicable "recipes".
- **Guidance**: Offer subsequent agents a structured approach, rooted in prior successes.
- **Evolution**: Periodically refine recipes based on newer, enhanced runs.
### 5. Public Archive & Sharing:
- **Publish Successful Runs**: Users can choose to share their successful agent runs.
- **Collaborative Knowledge Base**: Access a shared repository of successful agent interactions from the community.
- **Ratings & Reviews**: Users can rate and review shared runs, highlighting particularly effective "recipes."
- **Privacy & Redaction**: Ensure that any sensitive information is automatically redacted before publishing.
---
## Benefits:
1. **Efficiency**: Revisit past agent activities to inform and guide future decisions.
2. **Consistency**: Guarantee a uniform approach to recurring challenges, leading to predictable and trustworthy outcomes.
3. **Collaborative Learning**: Tap into a reservoir of shared experiences, fostering community-driven learning and growth.
4. **Transparency**: By sharing successful runs, users can build trust and contribute to the broader community's success.
---
## Usage:
1. **Access AgentArchive**: Navigate to the dedicated section within the Swarms Multi-Agent Framework dashboard.
2. **Search, Filter & Organize**: Utilize the search bar and tagging system for precise retrieval.
3. **Bookmark, Annotate & Share**: Pin important runs, add notes, and consider sharing with the broader community.
4. **Engage with Public Archive**: Explore, rate, and apply shared knowledge to enhance agent performance.
---
With AgentArchive, users not only benefit from their past interactions but can also leverage the collective expertise of the Swarms community, ensuring continuous improvement and shared success.

@ -0,0 +1,67 @@
# Swarms Multi-Agent Framework Documentation
## Table of Contents
- Agent Failure Protocol
- Swarm Failure Protocol
---
## Agent Failure Protocol
### 1. Overview
Agent failures may arise from bugs, unexpected inputs, or external system changes. This protocol aims to diagnose, address, and prevent such failures.
### 2. Root Cause Analysis
- **Data Collection**: Record the task, inputs, and environmental variables present during the failure.
- **Diagnostic Tests**: Run the agent in a controlled environment replicating the failure scenario.
- **Error Logging**: Analyze error logs to identify patterns or anomalies.
### 3. Solution Brainstorming
- **Code Review**: Examine the code sections linked to the failure for bugs or inefficiencies.
- **External Dependencies**: Check if external systems or data sources have changed.
- **Algorithmic Analysis**: Evaluate if the agent's algorithms were overwhelmed or faced an unhandled scenario.
### 4. Risk Analysis & Solution Ranking
- Assess the potential risks associated with each solution.
- Rank solutions based on:
- Implementation complexity
- Potential negative side effects
- Resource requirements
- Assign a success probability score (0.0 to 1.0) based on the above factors.
### 5. Solution Implementation
- Implement the top 3 solutions sequentially, starting with the highest success probability.
- If all three solutions fail, trigger the "Human-in-the-Loop" protocol.
---
## Swarm Failure Protocol
### 1. Overview
Swarm failures are more complex, often resulting from inter-agent conflicts, systemic bugs, or large-scale environmental changes. This protocol delves deep into such failures to ensure the swarm operates optimally.
### 2. Root Cause Analysis
- **Inter-Agent Analysis**: Examine if agents were in conflict or if there was a breakdown in collaboration.
- **System Health Checks**: Ensure all system components supporting the swarm are operational.
- **Environment Analysis**: Investigate if external factors or systems impacted the swarm's operation.
### 3. Solution Brainstorming
- **Collaboration Protocols**: Review and refine how agents collaborate.
- **Resource Allocation**: Check if the swarm had adequate computational and memory resources.
- **Feedback Loops**: Ensure agents are effectively learning from each other.
### 4. Risk Analysis & Solution Ranking
- Assess the potential systemic risks posed by each solution.
- Rank solutions considering:
- Scalability implications
- Impact on individual agents
- Overall swarm performance potential
- Assign a success probability score (0.0 to 1.0) based on the above considerations.
### 5. Solution Implementation
- Implement the top 3 solutions sequentially, prioritizing the one with the highest success probability.
- If all three solutions are unsuccessful, invoke the "Human-in-the-Loop" protocol for expert intervention.
---
By following these protocols, the Swarms Multi-Agent Framework can systematically address and prevent failures, ensuring a high degree of reliability and efficiency.

@ -0,0 +1,49 @@
# Human-in-the-Loop Task Handling Protocol
## Overview
The Swarms Multi-Agent Framework recognizes the invaluable contributions humans can make, especially in complex scenarios where nuanced judgment is required. The "Human-in-the-Loop Task Handling Protocol" ensures that when agents encounter challenges they cannot handle autonomously, the most capable human collaborator is engaged to provide guidance, based on their skills and expertise.
## Protocol Steps
### 1. Task Initiation & Analysis
- When a task is initiated, agents first analyze the task's requirements.
- The system maintains an understanding of each task's complexity, requirements, and potential challenges.
### 2. Automated Resolution Attempt
- Agents first attempt to resolve the task autonomously using their algorithms and data.
- If the task can be completed without issues, it progresses normally.
### 3. Challenge Detection
- If agents encounter challenges or uncertainties they cannot resolve, the "Human-in-the-Loop" protocol is triggered.
### 4. Human Collaborator Identification
- The system maintains a dynamic profile of each human collaborator, cataloging their skills, expertise, and past performance on related tasks.
- Using this profile data, the system identifies the most capable human collaborator to assist with the current challenge.
### 5. Real-time Collaboration
- The identified human collaborator is notified and provided with all the relevant information about the task and the challenge.
- Collaborators can provide guidance, make decisions, or even take over specific portions of the task.
### 6. Task Completion & Feedback Loop
- Once the challenge is resolved, agents continue with the task until completion.
- Feedback from human collaborators is used to update agent algorithms, ensuring continuous learning and improvement.
## Best Practices
1. **Maintain Up-to-date Human Profiles**: Ensure that the skillsets, expertise, and performance metrics of human collaborators are updated regularly.
2. **Limit Interruptions**: Implement mechanisms to limit the frequency of human interventions, ensuring collaborators are not overwhelmed with requests.
3. **Provide Context**: When seeking human intervention, provide collaborators with comprehensive context to ensure they can make informed decisions.
4. **Continuous Training**: Regularly update and train agents based on feedback from human collaborators.
5. **Measure & Optimize**: Monitor the efficiency of the "Human-in-the-Loop" protocol, aiming to reduce the frequency of interventions while maximizing the value of each intervention.
6. **Skill Enhancement**: Encourage human collaborators to continuously enhance their skills, ensuring that the collective expertise of the group grows over time.
## Conclusion
The integration of human expertise with AI capabilities is a cornerstone of the Swarms Multi-Agent Framework. This "Human-in-the-Loop Task Handling Protocol" ensures that tasks are executed efficiently, leveraging the best of both human judgment and AI automation. Through collaborative synergy, we can tackle challenges more effectively and drive innovation.

@ -0,0 +1,48 @@
# Secure Communication Protocols
## Overview
The Swarms Multi-Agent Framework prioritizes the security and integrity of data, especially personal and sensitive information. Our Secure Communication Protocols ensure that all communications between agents are encrypted, authenticated, and resistant to tampering or unauthorized access.
## Features
### 1. End-to-End Encryption
- All inter-agent communications are encrypted using state-of-the-art cryptographic algorithms.
- This ensures that data remains confidential and can only be read by the intended recipient agent.
### 2. Authentication
- Before initiating communication, agents authenticate each other using digital certificates.
- This prevents impersonation attacks and ensures that agents are communicating with legitimate counterparts.
### 3. Forward Secrecy
- Key exchange mechanisms employ forward secrecy, meaning that even if a malicious actor gains access to an encryption key, they cannot decrypt past communications.
### 4. Data Integrity
- Cryptographic hashes ensure that the data has not been altered in transit.
- Any discrepancies in data integrity result in the communication being rejected.
### 5. Zero-Knowledge Protocols
- When handling especially sensitive data, agents use zero-knowledge proofs to validate information without revealing the actual data.
### 6. Periodic Key Rotation
- To mitigate the risk of long-term key exposure, encryption keys are periodically rotated.
- Old keys are securely discarded, ensuring that even if they are compromised, they cannot be used to decrypt communications.
## Best Practices for Handling Personal and Sensitive Information
1. **Data Minimization**: Agents should only request and process the minimum amount of personal data necessary for the task.
2. **Anonymization**: Whenever possible, agents should anonymize personal data, stripping away identifying details.
3. **Data Retention Policies**: Personal data should be retained only for the period necessary to complete the task, after which it should be securely deleted.
4. **Access Controls**: Ensure that only authorized agents have access to personal and sensitive information. Implement strict access control mechanisms.
5. **Regular Audits**: Conduct regular security audits to ensure compliance with privacy regulations and to detect any potential vulnerabilities.
6. **Training**: All agents should be regularly updated and trained on the latest security protocols and best practices for handling sensitive data.
## Conclusion
Secure communication is paramount in the Swarms Multi-Agent Framework, especially when dealing with personal and sensitive information. Adhering to these protocols and best practices ensures the safety, privacy, and trust of all stakeholders involved.

@ -0,0 +1,68 @@
# Promptimizer Documentation
## Swarms Multi-Agent Framework
**The Promptimizer Tool stands as a cornerstone innovation within the Swarms Multi-Agent Framework, meticulously engineered to refine and supercharge prompts across diverse categories. Capitalizing on extensive libraries of best-practice prompting techniques, this tool ensures your prompts are razor-sharp, tailored, and primed for optimal outcomes.**
---
## Overview:
The Promptimizer Tool is crafted to:
1. Rigorously analyze and elevate the quality of provided prompts.
2. Furnish best-in-class recommendations rooted in proven prompting strategies.
3. Serve a spectrum of categories, from technical operations to expansive creative ventures.
---
## Core Features:
### 1. Deep Prompt Analysis:
- **Clarity Matrix**: A proprietary algorithm assessing prompt clarity, removing ambiguities and sharpening focus.
- **Efficiency Gauge**: Evaluates the prompt's structure to ensure swift and precise desired results.
### 2. Adaptive Recommendations:
- **Technique Engine**: Suggests techniques aligned with the gold standard for the chosen category.
- **Exemplar Database**: Offers an extensive array of high-quality prompt examples for comparison and inspiration.
### 3. Versatile Category Framework:
- **Tech Suite**: Optimizes prompts for technical tasks, ensuring actionable clarity.
- **Narrative Craft**: Hones prompts to elicit vivid and coherent stories.
- **Visual Visionary**: Shapes prompts for precise and dynamic visual generation.
- **Sonic Sculptor**: Orchestrates prompts for audio creation, tuning into desired tones and moods.
### 4. Machine Learning Integration:
- **Feedback Dynamo**: Harnesses user feedback, continually refining the tool's recommendation capabilities.
- **Live Library Updates**: Periodic syncing with the latest in prompting techniques, ensuring the tool remains at the cutting edge.
### 5. Collaboration & Sharing:
- **TeamSync**: Allows teams to collaborate on prompt optimization in real-time.
- **ShareSpace**: Share and access a community-driven repository of optimized prompts, fostering collective growth.
---
## Benefits:
1. **Precision Engineering**: Harness the power of refined prompts, ensuring desired outcomes are achieved with surgical precision.
2. **Learning Hub**: Immerse in a tool that not only refines but educates, enhancing the user's prompting acumen.
3. **Versatile Mastery**: Navigate seamlessly across categories, ensuring top-tier prompt quality regardless of the domain.
4. **Community-driven Excellence**: Dive into a world of shared knowledge, elevating the collective expertise of the Swarms community.
---
## Usage Workflow:
1. **Launch the Prompt Optimizer**: Access the tool directly from the Swarms Multi-Agent Framework dashboard.
2. **Prompt Entry**: Input the initial prompt for refinement.
3. **Category Selection**: Pinpoint the desired category for specialized optimization.
4. **Receive & Review**: Engage with the tool's recommendations, comparing original and optimized prompts.
5. **Collaborate, Implement & Share**: Work in tandem with team members, deploy the refined prompt, and consider contributing to the community repository.
---
By integrating the Promptimizer Tool into their workflow, Swarms users stand poised to redefine the boundaries of what's possible, turning each prompt into a beacon of excellence and efficiency.

@ -0,0 +1,68 @@
# Shorthand Communication System
## Swarms Multi-Agent Framework
**The Enhanced Shorthand Communication System is designed to streamline agent-agent communication within the Swarms Multi-Agent Framework. This system employs concise alphanumeric notations to relay task-specific details to agents efficiently.**
---
## Format:
The shorthand format is structured as `[AgentType]-[TaskLayer].[TaskNumber]-[Priority]-[Status]`.
---
## Components:
### 1. Agent Type:
- Denotes the specific agent role, such as:
* `C`: Code agent
* `D`: Data processing agent
* `M`: Monitoring agent
* `N`: Network agent
* `R`: Resource management agent
* `I`: Interface agent
* `S`: Security agent
### 2. Task Layer & Number:
- Represents the task's category.
* Example: `1.8` signifies Task layer 1, task number 8.
### 3. Priority:
- Indicates task urgency.
* `H`: High
* `M`: Medium
* `L`: Low
### 4. Status:
- Gives a snapshot of the task's progress.
* `I`: Initialized
* `P`: In-progress
* `C`: Completed
* `F`: Failed
* `W`: Waiting
---
## Extended Features:
### 1. Error Codes (for failures):
- `E01`: Resource issues
- `E02`: Data inconsistency
- `E03`: Dependency malfunction
... and more as needed.
### 2. Collaboration Flag:
- `+`: Denotes required collaboration.
---
## Example Codes:
- `C-1.8-H-I`: A high-priority coding task that's initializing.
- `D-2.3-M-P`: A medium-priority data task currently in-progress.
- `M-3.5-L-P+`: A low-priority monitoring task in progress needing collaboration.
---
By leveraging the Enhanced Shorthand Communication System, the Swarms Multi-Agent Framework can ensure swift interactions, concise communications, and effective task management.

@ -0,0 +1,146 @@
# BaseChunker Documentation
## Table of Contents
1. [Introduction](#introduction)
2. [Overview](#overview)
3. [Installation](#installation)
4. [Usage](#usage)
1. [BaseChunker Class](#basechunker-class)
2. [Examples](#examples)
5. [Additional Information](#additional-information)
6. [Conclusion](#conclusion)
---
## 1. Introduction <a name="introduction"></a>
The `BaseChunker` module is a tool for splitting text into smaller chunks that can be processed by a language model. It is a fundamental component in natural language processing tasks that require handling long or complex text inputs.
This documentation provides an extensive guide on using the `BaseChunker` module, explaining its purpose, parameters, and usage.
---
## 2. Overview <a name="overview"></a>
The `BaseChunker` module is designed to address the challenge of processing lengthy text inputs that exceed the maximum token limit of language models. By breaking such text into smaller, manageable chunks, it enables efficient and accurate processing.
Key features and parameters of the `BaseChunker` module include:
- `separators`: Specifies a list of `ChunkSeparator` objects used to split the text into chunks.
- `tokenizer`: Defines the tokenizer to be used for counting tokens in the text.
- `max_tokens`: Sets the maximum token limit for each chunk.
The `BaseChunker` module facilitates the chunking process and ensures that the generated chunks are within the token limit.
---
## 3. Installation <a name="installation"></a>
Before using the `BaseChunker` module, ensure you have the required dependencies installed. The module relies on `griptape` and `swarms` libraries. You can install these dependencies using pip:
```bash
pip install griptape swarms
```
---
## 4. Usage <a name="usage"></a>
In this section, we'll cover how to use the `BaseChunker` module effectively. It consists of the `BaseChunker` class and provides examples to demonstrate its usage.
### 4.1. `BaseChunker` Class <a name="basechunker-class"></a>
The `BaseChunker` class is the core component of the `BaseChunker` module. It is used to create a `BaseChunker` instance, which can split text into chunks efficiently.
#### Parameters:
- `separators` (list[ChunkSeparator]): Specifies a list of `ChunkSeparator` objects used to split the text into chunks.
- `tokenizer` (OpenAiTokenizer): Defines the tokenizer to be used for counting tokens in the text.
- `max_tokens` (int): Sets the maximum token limit for each chunk.
### 4.2. Examples <a name="examples"></a>
Let's explore how to use the `BaseChunker` class with different scenarios and applications.
#### Example 1: Basic Chunking
```python
from basechunker import BaseChunker, ChunkSeparator
# Initialize the BaseChunker
chunker = BaseChunker()
# Text to be chunked
input_text = "This is a long text that needs to be split into smaller chunks for processing."
# Chunk the text
chunks = chunker.chunk(input_text)
# Print the generated chunks
for idx, chunk in enumerate(chunks, start=1):
print(f"Chunk {idx}: {chunk.value}")
```
#### Example 2: Custom Separators
```python
from basechunker import BaseChunker, ChunkSeparator
# Define custom separators
custom_separators = [ChunkSeparator(","), ChunkSeparator(";")]
# Initialize the BaseChunker with custom separators
chunker = BaseChunker(separators=custom_separators)
# Text with custom separators
input_text = "This text, separated by commas; should be split accordingly."
# Chunk the text
chunks = chunker.chunk(input_text)
# Print the generated chunks
for idx, chunk in enumerate(chunks, start=1):
print(f"Chunk {idx}: {chunk.value}")
```
#### Example 3: Adjusting Maximum Tokens
```python
from basechunker import BaseChunker
# Initialize the BaseChunker with a custom maximum token limit
chunker = BaseChunker(max_tokens=50)
# Long text input
input_text = "This is an exceptionally long text that should be broken into smaller chunks based on token count."
# Chunk the text
chunks = chunker.chunk(input_text)
# Print the generated chunks
for idx, chunk in enumerate(chunks, start=1):
print(f"Chunk {idx}: {chunk.value}")
```
### 4.3. Additional Features
The `BaseChunker` class also provides additional features:
#### Recursive Chunking
The `_chunk_recursively` method handles the recursive chunking of text, ensuring that each chunk stays within the token limit.
---
## 5. Additional Information <a name="additional-information"></a>
- **Text Chunking**: The `BaseChunker` module is a fundamental tool for text chunking, a crucial step in preprocessing text data for various natural language processing tasks.
- **Custom Separators**: You can customize the separators used to split the text, allowing flexibility in how text is chunked.
- **Token Count**: The module accurately counts tokens using the specified tokenizer, ensuring that chunks do not exceed token limits.
---
## 6. Conclusion <a name="conclusion"></a>
The `BaseChunker` module is an essential tool for text preprocessing and handling long or complex text inputs in natural language processing tasks. This documentation has provided a comprehensive guide on its usage, parameters, and examples, enabling you to efficiently manage and process text data by splitting it into manageable chunks.
By using the `BaseChunker`, you can ensure that your text data remains within token limits and is ready for further analysis and processing.
*Please check the official `BaseChunker` repository and documentation for any updates beyond the knowledge cutoff date.*

@ -0,0 +1,147 @@
# PdfChunker Documentation
## Table of Contents
1. [Introduction](#introduction)
2. [Overview](#overview)
3. [Installation](#installation)
4. [Usage](#usage)
1. [PdfChunker Class](#pdfchunker-class)
2. [Examples](#examples)
5. [Additional Information](#additional-information)
6. [Conclusion](#conclusion)
---
## 1. Introduction <a name="introduction"></a>
The `PdfChunker` module is a specialized tool designed to split PDF text content into smaller, more manageable chunks. It is a valuable asset for processing PDF documents in natural language processing and text analysis tasks.
This documentation provides a comprehensive guide on how to use the `PdfChunker` module. It covers its purpose, parameters, and usage, ensuring that you can effectively process PDF text content.
---
## 2. Overview <a name="overview"></a>
The `PdfChunker` module serves a critical role in handling PDF text content, which is often lengthy and complex. Key features and parameters of the `PdfChunker` module include:
- `separators`: Specifies a list of `ChunkSeparator` objects used to split the PDF text content into chunks.
- `tokenizer`: Defines the tokenizer used for counting tokens in the text.
- `max_tokens`: Sets the maximum token limit for each chunk.
By using the `PdfChunker`, you can efficiently prepare PDF text content for further analysis and processing.
---
## 3. Installation <a name="installation"></a>
Before using the `PdfChunker` module, ensure you have the required dependencies installed. The module relies on the `swarms` library. You can install this dependency using pip:
```bash
pip install swarms
```
---
## 4. Usage <a name="usage"></a>
In this section, we'll explore how to use the `PdfChunker` module effectively. It consists of the `PdfChunker` class and provides examples to demonstrate its usage.
### 4.1. `PdfChunker` Class <a name="pdfchunker-class"></a>
The `PdfChunker` class is the core component of the `PdfChunker` module. It is used to create a `PdfChunker` instance, which can split PDF text content into manageable chunks.
#### Parameters:
- `separators` (list[ChunkSeparator]): Specifies a list of `ChunkSeparator` objects used to split the PDF text content into chunks.
- `tokenizer` (OpenAiTokenizer): Defines the tokenizer used for counting tokens in the text.
- `max_tokens` (int): Sets the maximum token limit for each chunk.
### 4.2. Examples <a name="examples"></a>
Let's explore how to use the `PdfChunker` class with different scenarios and applications.
#### Example 1: Basic Chunking
```python
from swarms.chunkers.pdf_chunker import PdfChunker
from swarms.chunkers.chunk_seperator import ChunkSeparator
# Initialize the PdfChunker
pdf_chunker = PdfChunker()
# PDF text content to be chunked
pdf_text = "This is a PDF document with multiple paragraphs and sentences. It should be split into smaller chunks for analysis."
# Chunk the PDF text content
chunks = pdf_chunker.chunk(pdf_text)
# Print the generated chunks
for idx, chunk in enumerate(chunks, start=1):
print(f"Chunk {idx}:\n{chunk.value}")
```
#### Example 2: Custom Separators
```python
from swarms.chunkers.pdf_chunker import PdfChunker
from swarms.chunkers.chunk_seperator import ChunkSeparator
# Define custom separators for PDF chunking
custom_separators = [ChunkSeparator("\n\n"), ChunkSeparator(". ")]
# Initialize the PdfChunker with custom separators
pdf_chunker = PdfChunker(separators=custom_separators)
# PDF text content with custom separators
pdf_text = "This PDF document has custom paragraph separators.\n\nIt also uses period-based sentence separators. Split accordingly."
# Chunk the PDF text content
chunks = pdf_chunker.chunk(pdf_text)
# Print the generated chunks
for idx, chunk in enumerate(chunks, start=1):
print(f"Chunk {idx}:\n{chunk.value}")
```
#### Example 3: Adjusting Maximum Tokens
```python
from swarms.chunkers.pdf_chunker import PdfChunker
# Initialize the PdfChunker with a custom maximum token limit
pdf_chunker = PdfChunker(max_tokens=50)
# Lengthy PDF text content
pdf_text = "This is an exceptionally long PDF document that should be broken into smaller chunks based on token count."
# Chunk the PDF text content
chunks = pdf_chunker.chunk(pdf_text)
# Print the generated chunks
for idx, chunk in enumerate(chunks, start=1):
print(f"Chunk {idx}:\n{chunk.value}")
```
### 4.3. Additional Features
The `PdfChunker` class also provides additional features:
#### Recursive Chunking
The `_chunk_recursively` method handles the recursive chunking of PDF text content, ensuring that each chunk stays within the token limit.
---
## 5. Additional Information <a name="additional-information"></a>
- **PDF Text Chunking**: The `PdfChunker` module is a specialized tool for splitting PDF text content into manageable chunks, making it suitable for natural language processing tasks involving PDF documents.
- **Custom Separators**: You can customize separators to adapt the PDF text content chunking process to specific document structures.
- **Token Count**: The module accurately counts tokens using the specified tokenizer, ensuring that chunks do not exceed token limits.
---
## 6. Conclusion <a name="conclusion"></a>
The `PdfChunker` module is a valuable asset for processing PDF text content in various natural language processing and text analysis tasks. This documentation has provided a comprehensive guide on its usage, parameters, and examples, ensuring that you can effectively prepare PDF documents for further analysis and processing.
By using the `PdfChunker`, you can efficiently break down lengthy and complex PDF text content into manageable chunks, making it ready for in-depth analysis.
*Please check the official `PdfChunker` repository and documentation for any updates beyond the knowledge cutoff date.*

@ -0,0 +1,157 @@
# `BioGPT` Documentation
## Table of Contents
1. [Introduction](#introduction)
2. [Overview](#overview)
3. [Installation](#installation)
4. [Usage](#usage)
1. [BioGPT Class](#biogpt-class)
2. [Examples](#examples)
5. [Additional Information](#additional-information)
6. [Conclusion](#conclusion)
---
## 1. Introduction <a name="introduction"></a>
The `BioGPT` module is a domain-specific generative language model designed for the biomedical domain. It is built upon the powerful Transformer architecture and pretrained on a large corpus of biomedical literature. This documentation provides an extensive guide on using the `BioGPT` module, explaining its purpose, parameters, and usage.
---
## 2. Overview <a name="overview"></a>
The `BioGPT` module addresses the need for a language model specialized in the biomedical domain. Unlike general-purpose language models, `BioGPT` excels in generating coherent and contextually relevant text specific to biomedical terms and concepts. It has been evaluated on various biomedical natural language processing tasks and has demonstrated superior performance.
Key features and parameters of the `BioGPT` module include:
- `model_name`: Name of the pretrained model.
- `max_length`: Maximum length of generated text.
- `num_return_sequences`: Number of sequences to return.
- `do_sample`: Whether to use sampling in generation.
- `min_length`: Minimum length of generated text.
The `BioGPT` module is equipped with features for generating text, extracting features, and more.
---
## 3. Installation <a name="installation"></a>
Before using the `BioGPT` module, ensure you have the required dependencies installed, including the Transformers library and Torch. You can install these dependencies using pip:
```bash
pip install transformers
pip install torch
```
---
## 4. Usage <a name="usage"></a>
In this section, we'll cover how to use the `BioGPT` module effectively. It consists of the `BioGPT` class and provides examples to demonstrate its usage.
### 4.1. `BioGPT` Class <a name="biogpt-class"></a>
The `BioGPT` class is the core component of the `BioGPT` module. It is used to create a `BioGPT` instance, which can generate text, extract features, and more.
#### Parameters:
- `model_name` (str): Name of the pretrained model.
- `max_length` (int): Maximum length of generated text.
- `num_return_sequences` (int): Number of sequences to return.
- `do_sample` (bool): Whether or not to use sampling in generation.
- `min_length` (int): Minimum length of generated text.
### 4.2. Examples <a name="examples"></a>
Let's explore how to use the `BioGPT` class with different scenarios and applications.
#### Example 1: Generating Biomedical Text
```python
from biogpt import BioGPT
# Initialize the BioGPT model
biogpt = BioGPT()
# Generate biomedical text
input_text = "The patient has a fever"
generated_text = biogpt(input_text)
print(generated_text)
```
#### Example 2: Extracting Features
```python
from biogpt import BioGPT
# Initialize the BioGPT model
biogpt = BioGPT()
# Extract features from a biomedical text
input_text = "The patient has a fever"
features = biogpt.get_features(input_text)
print(features)
```
#### Example 3: Using Beam Search Decoding
```python
from biogpt import BioGPT
# Initialize the BioGPT model
biogpt = BioGPT()
# Generate biomedical text using beam search decoding
input_text = "The patient has a fever"
generated_text = biogpt.beam_search_decoding(input_text)
print(generated_text)
```
### 4.3. Additional Features
The `BioGPT` class also provides additional features:
#### Set a New Pretrained Model
```python
biogpt.set_pretrained_model("new_pretrained_model")
```
#### Get the Model's Configuration
```python
config = biogpt.get_config()
print(config)
```
#### Save and Load the Model
```python
# Save the model and tokenizer to a directory
biogpt.save_model("saved_model")
# Load a model and tokenizer from a directory
biogpt.load_from_path("saved_model")
```
#### Print the Model's Architecture
```python
biogpt.print_model()
```
---
## 5. Additional Information <a name="additional-information"></a>
- **Biomedical Text Generation**: The `BioGPT` module is designed specifically for generating biomedical text, making it a valuable tool for various biomedical natural language processing tasks.
- **Feature Extraction**: It also provides the capability to extract features from biomedical text.
- **Beam Search Decoding**: Beam search decoding is available for generating text with improved quality.
- **Customization**: You can set a new pretrained model and save/load models for customization.
---
## 6. Conclusion <a name="conclusion"></a>
The `BioGPT` module is a powerful and specialized tool for generating and working with biomedical text. This documentation has provided a comprehensive guide on its usage, parameters, and examples, enabling you to effectively leverage it for various biomedical natural language processing tasks.
By using `BioGPT`, you can enhance your biomedical text generation and analysis tasks with contextually relevant and coherent text.
*Please check the official `BioGPT` repository and documentation for any updates beyond the knowledge cutoff date.*

@ -1,4 +1,4 @@
# Kosmos Documentation # `Kosmos` Documentation
## Introduction ## Introduction

@ -0,0 +1,88 @@
# `LayoutLMDocumentQA` Documentation
## Introduction
Welcome to the documentation for LayoutLMDocumentQA, a multimodal model designed for visual question answering (QA) on real-world documents, such as invoices, PDFs, and more. This comprehensive documentation will provide you with a deep understanding of the LayoutLMDocumentQA class, its architecture, usage, and examples.
## Overview
LayoutLMDocumentQA is a versatile model that combines layout-based understanding of documents with natural language processing to answer questions about the content of documents. It is particularly useful for automating tasks like invoice processing, extracting information from PDFs, and handling various document-based QA scenarios.
## Class Definition
```python
class LayoutLMDocumentQA(AbstractModel):
def __init__(
self,
model_name: str = "impira/layoutlm-document-qa",
task: str = "document-question-answering",
):
```
## Purpose
The LayoutLMDocumentQA class serves the following primary purposes:
1. **Document QA**: LayoutLMDocumentQA is specifically designed for document-based question answering. It can process both the textual content and the layout of a document to answer questions.
2. **Multimodal Understanding**: It combines natural language understanding with document layout analysis, making it suitable for documents with complex structures.
## Parameters
- `model_name` (str): The name or path of the pretrained LayoutLMDocumentQA model. Default: "impira/layoutlm-document-qa".
- `task` (str): The specific task for which the model will be used. Default: "document-question-answering".
## Usage
To use LayoutLMDocumentQA, follow these steps:
1. Initialize the LayoutLMDocumentQA instance:
```python
from swarms.models import LayoutLMDocumentQA
layout_lm_doc_qa = LayoutLMDocumentQA()
```
### Example 1 - Initialization
```python
layout_lm_doc_qa = LayoutLMDocumentQA()
```
2. Ask a question about a document and provide the document's image path:
```python
question = "What is the total amount?"
image_path = "path/to/document_image.png"
answer = layout_lm_doc_qa(question, image_path)
```
### Example 2 - Document QA
```python
layout_lm_doc_qa = LayoutLMDocumentQA()
question = "What is the total amount?"
image_path = "path/to/document_image.png"
answer = layout_lm_doc_qa(question, image_path)
```
## How LayoutLMDocumentQA Works
LayoutLMDocumentQA employs a multimodal approach to document QA. Here's how it works:
1. **Initialization**: When you create a LayoutLMDocumentQA instance, you can specify the model to use and the task, which is "document-question-answering" by default.
2. **Question and Document**: You provide a question about the document and the image path of the document to the LayoutLMDocumentQA instance.
3. **Multimodal Processing**: LayoutLMDocumentQA processes both the question and the document image. It combines layout-based analysis with natural language understanding.
4. **Answer Generation**: The model generates an answer to the question based on its analysis of the document layout and content.
## Additional Information
- LayoutLMDocumentQA uses the "impira/layoutlm-document-qa" pretrained model, which is specifically designed for document-based question answering.
- You can adapt this model to various document QA scenarios by changing the task and providing relevant questions and documents.
- This model is particularly useful for automating document-based tasks and extracting valuable information from structured documents.
That concludes the documentation for LayoutLMDocumentQA. We hope you find this tool valuable for your document-based question answering needs. If you have any questions or encounter any issues, please refer to the LayoutLMDocumentQA documentation for further assistance. Enjoy using LayoutLMDocumentQA!

@ -0,0 +1,118 @@
# `Nougat` Documentation
## Introduction
Welcome to the documentation for Nougat, a versatile model designed by Meta for transcribing scientific PDFs into user-friendly Markdown format, extracting information from PDFs, and extracting metadata from PDF documents. This documentation will provide you with a deep understanding of the Nougat class, its architecture, usage, and examples.
## Overview
Nougat is a powerful tool that combines language modeling and image processing capabilities to convert scientific PDF documents into Markdown format. It is particularly useful for researchers, students, and professionals who need to extract valuable information from PDFs quickly. With Nougat, you can simplify complex PDFs, making their content more accessible and easy to work with.
## Class Definition
```python
class Nougat:
def __init__(
self,
model_name_or_path="facebook/nougat-base",
min_length: int = 1,
max_new_tokens: int = 30,
):
```
## Purpose
The Nougat class serves the following primary purposes:
1. **PDF Transcription**: Nougat is designed to transcribe scientific PDFs into Markdown format. It helps convert complex PDF documents into a more readable and structured format, making it easier to extract information.
2. **Information Extraction**: It allows users to extract valuable information and content from PDFs efficiently. This can be particularly useful for researchers and professionals who need to extract data, figures, or text from scientific papers.
3. **Metadata Extraction**: Nougat can also extract metadata from PDF documents, providing essential details about the document, such as title, author, and publication date.
## Parameters
- `model_name_or_path` (str): The name or path of the pretrained Nougat model. Default: "facebook/nougat-base".
- `min_length` (int): The minimum length of the generated transcription. Default: 1.
- `max_new_tokens` (int): The maximum number of new tokens to generate in the Markdown transcription. Default: 30.
## Usage
To use Nougat, follow these steps:
1. Initialize the Nougat instance:
```python
from swarms.models import Nougat
nougat = Nougat()
```
### Example 1 - Initialization
```python
nougat = Nougat()
```
2. Transcribe a PDF image using Nougat:
```python
markdown_transcription = nougat("path/to/pdf_file.png")
```
### Example 2 - PDF Transcription
```python
nougat = Nougat()
markdown_transcription = nougat("path/to/pdf_file.png")
```
3. Extract information from a PDF:
```python
information = nougat.extract_information("path/to/pdf_file.png")
```
### Example 3 - Information Extraction
```python
nougat = Nougat()
information = nougat.extract_information("path/to/pdf_file.png")
```
4. Extract metadata from a PDF:
```python
metadata = nougat.extract_metadata("path/to/pdf_file.png")
```
### Example 4 - Metadata Extraction
```python
nougat = Nougat()
metadata = nougat.extract_metadata("path/to/pdf_file.png")
```
## How Nougat Works
Nougat employs a vision encoder-decoder model, along with a dedicated processor, to transcribe PDFs into Markdown format and perform information and metadata extraction. Here's how it works:
1. **Initialization**: When you create a Nougat instance, you can specify the model to use, the minimum transcription length, and the maximum number of new tokens to generate.
2. **Processing PDFs**: Nougat can process PDFs as input. You can provide the path to a PDF document.
3. **Image Processing**: The processor converts PDF pages into images, which are then encoded by the model.
4. **Transcription**: Nougat generates Markdown transcriptions of PDF content, ensuring a minimum length and respecting the token limit.
5. **Information Extraction**: Information extraction involves parsing the Markdown transcription to identify key details or content of interest.
6. **Metadata Extraction**: Metadata extraction involves identifying and extracting document metadata, such as title, author, and publication date.
## Additional Information
- Nougat leverages the "facebook/nougat-base" pretrained model, which is specifically designed for document transcription and extraction tasks.
- You can adjust the minimum transcription length and the maximum number of new tokens to control the output's length and quality.
- Nougat can be run on both CPU and GPU devices.
That concludes the documentation for Nougat. We hope you find this tool valuable for your PDF transcription, information extraction, and metadata extraction needs. If you have any questions or encounter any issues, please refer to the Nougat documentation for further assistance. Enjoy using Nougat!

@ -1,4 +1,4 @@
# `OpenAIChat`` Documentation # `OpenAIChat` Documentation
## Table of Contents ## Table of Contents

@ -90,22 +90,30 @@ nav:
- OmniModalAgent: "swarms/agents/omni_agent.md" - OmniModalAgent: "swarms/agents/omni_agent.md"
- Idea2Image: "swarms/agents/idea_to_image.md" - Idea2Image: "swarms/agents/idea_to_image.md"
- swarms.models: - swarms.models:
- Overview: "swarms/models/index.md" - Language:
- HuggingFaceLLM: "swarms/models/hf.md" - Overview: "swarms/models/index.md"
- Anthropic: "swarms/models/anthropic.md" - HuggingFaceLLM: "swarms/models/hf.md"
- OpenAI: "swarms/models/openai.md" - Anthropic: "swarms/models/anthropic.md"
- Fuyu: "swarms/models/fuyu.md" - OpenAI: "swarms/models/openai.md"
- Zephyr: "swarms/models/zephyr.md" - Zephyr: "swarms/models/zephyr.md"
- Vilt: "swarms/models/vilt.md" - BioGPT: "swarms/models/biogpt.md"
- Idefics: "swarms/models/idefics.md" - MultiModal:
- BingChat: "swarms/models/bingchat.md" - Fuyu: "swarms/models/fuyu.md"
- Kosmos: "swarms/models/kosmos.md" - Vilt: "swarms/models/vilt.md"
- Idefics: "swarms/models/idefics.md"
- BingChat: "swarms/models/bingchat.md"
- Kosmos: "swarms/models/kosmos.md"
- Nougat: "swarms/models/nougat.md"
- LayoutLMDocumentQA: "swarms/models/layoutlm_document_qa.md"
- swarms.structs: - swarms.structs:
- Overview: "swarms/structs/overview.md" - Overview: "swarms/structs/overview.md"
- Workflow: "swarms/structs/workflow.md" - Workflow: "swarms/structs/workflow.md"
- swarms.memory: - swarms.memory:
- PineconeVectorStoreStore: "swarms/memory/pinecone.md" - PineconeVectorStoreStore: "swarms/memory/pinecone.md"
- PGVectorStore: "swarms/memory/pg.md" - PGVectorStore: "swarms/memory/pg.md"
- swarms.chunkers:
- BaseChunker: "swarms/chunkers/basechunker.md"
- PdfChunker: "swarms/chunkers/pdf_chunker.md"
- Examples: - Examples:
- Overview: "examples/index.md" - Overview: "examples/index.md"
- Agents: - Agents:

@ -37,12 +37,16 @@ langchain-experimental = "*"
playwright = "*" playwright = "*"
duckduckgo-search = "*" duckduckgo-search = "*"
faiss-cpu = "*" faiss-cpu = "*"
datasets = "*"
diffusers = "*" diffusers = "*"
sentencepiece = "*"
wget = "*" wget = "*"
griptape = "*" griptape = "*"
httpx = "*" httpx = "*"
attrs = "*"
ggl = "*" ggl = "*"
beautifulsoup4 = "*" beautifulsoup4 = "*"
huggingface-hub = "*"
pydantic = "*" pydantic = "*"
tenacity = "*" tenacity = "*"
redis = "*" redis = "*"
@ -53,7 +57,9 @@ open-interpreter = "*"
tabulate = "*" tabulate = "*"
termcolor = "*" termcolor = "*"
black = "*" black = "*"
open_clip_torch = "*"
dalle3 = "*" dalle3 = "*"
soundfile = "*"
torchvision = "*" torchvision = "*"
rich = "*" rich = "*"
EdgeGPT = "*" EdgeGPT = "*"

@ -13,13 +13,19 @@ wget==3.2
simpleaichat simpleaichat
httpx httpx
torch torch
open_clip_torch
ggl ggl
beautifulsoup4 beautifulsoup4
google-search-results==2.4.2 google-search-results==2.4.2
Pillow Pillow
faiss-cpu faiss-cpu
openai openai
attrs
datasets
soundfile
huggingface-hub
google-generativeai google-generativeai
sentencepiece
duckduckgo-search duckduckgo-search
agent-protocol agent-protocol
chromadb chromadb

@ -8,14 +8,14 @@ warnings.filterwarnings("ignore", category=UserWarning)
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2" os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
from swarms import workers from swarms import workers
from swarms.workers.worker import Worker from swarms.workers.worker import Worker
# from swarms import chunkers # from swarms import chunkers
from swarms import models from swarms.models import * # import * only works when __all__ = [] is defined in __init__.py
from swarms import structs from swarms import structs
from swarms import swarms from swarms import swarms
from swarms import agents from swarms import agents
from swarms.logo import logo from swarms.logo import logo
print(logo) print(logo)

@ -1,8 +0,0 @@
attrs==21.2.0
griptape==0.18.2
oceandb==0.1.0
pgvector==0.2.3
pydantic==1.10.8
SQLAlchemy==1.4.49
SQLAlchemy==2.0.20
swarms==1.8.2

@ -4,10 +4,30 @@ from swarms.models.petals import Petals
from swarms.models.mistral import Mistral from swarms.models.mistral import Mistral
from swarms.models.openai_models import OpenAI, AzureOpenAI, OpenAIChat from swarms.models.openai_models import OpenAI, AzureOpenAI, OpenAIChat
from swarms.models.zephyr import Zephyr from swarms.models.zephyr import Zephyr
from swarms.models.biogpt import BioGPT
# MultiModal Models # MultiModal Models
from swarms.models.idefics import Idefics from swarms.models.idefics import Idefics
from swarms.models.kosmos_two import Kosmos from swarms.models.kosmos_two import Kosmos
from swarms.models.vilt import Vilt from swarms.models.vilt import Vilt
# from swarms.models.fuyu import Fuyu from swarms.models.nougat import Nougat
from swarms.models.layoutlm_document_qa import LayoutLMDocumentQA
# from swarms.models.fuyu import Fuyu # Not working, wait until they update
__all__ = [
"Anthropic",
"Petals",
"Mistral",
"OpenAI",
"AzureOpenAI",
"OpenAIChat",
"Zephyr",
"Idefics",
"Kosmos",
"Vilt",
"Nougat",
"LayoutLMDocumentQA",
"BioGPT",
]

@ -1,14 +1,17 @@
import time import time
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
def count_tokens(text: str) -> int: def count_tokens(text: str) -> int:
return len(text.split()) return len(text.split())
class AbstractModel(ABC): class AbstractModel(ABC):
""" """
AbstractModel AbstractModel
""" """
# abstract base class for language models # abstract base class for language models
def __init__(self): def __init__(self):
self.start_time = None self.start_time = None
@ -75,3 +78,16 @@ class AbstractModel(ABC):
if self.start_time and self.end_time: if self.start_time and self.end_time:
return self.end_time - self.start_time return self.end_time - self.start_time
return 0 return 0
def metrics(self) -> str:
_sec_to_first_token = self._sec_to_first_token()
_tokens_per_second = self._tokens_per_second()
_num_tokens = self._num_tokens(self.history)
_time_for_generation = self._time_for_generation(self.history)
return f"""
SEC TO FIRST TOKEN: {_sec_to_first_token}
TOKENS/SEC: {_tokens_per_second}
TOKENS: {_num_tokens}
Tokens/SEC: {_time_for_generation}
"""

@ -29,14 +29,22 @@ class BingChat:
self.cookies = json.loads(open(cookies_path, encoding="utf-8").read()) self.cookies = json.loads(open(cookies_path, encoding="utf-8").read())
self.bot = asyncio.run(Chatbot.create(cookies=self.cookies)) self.bot = asyncio.run(Chatbot.create(cookies=self.cookies))
def __call__(self, prompt: str, style: ConversationStyle = ConversationStyle.creative) -> str: def __call__(
self, prompt: str, style: ConversationStyle = ConversationStyle.creative
) -> str:
""" """
Get a text response using the EdgeGPT model based on the provided prompt. Get a text response using the EdgeGPT model based on the provided prompt.
""" """
response = asyncio.run(self.bot.ask(prompt=prompt, conversation_style=style, simplify_response=True)) response = asyncio.run(
return response['text'] self.bot.ask(
prompt=prompt, conversation_style=style, simplify_response=True
)
)
return response["text"]
def create_img(self, prompt: str, output_dir: str = "./output", auth_cookie: str = None) -> str: def create_img(
self, prompt: str, output_dir: str = "./output", auth_cookie: str = None
) -> str:
""" """
Generate an image based on the provided prompt and save it in the given output directory. Generate an image based on the provided prompt and save it in the given output directory.
Returns the path of the generated image. Returns the path of the generated image.
@ -48,7 +56,7 @@ class BingChat:
images = image_generator.get_images(prompt) images = image_generator.get_images(prompt)
image_generator.save_images(images, output_dir=output_dir) image_generator.save_images(images, output_dir=output_dir)
return Path(output_dir) / images[0]['path'] return Path(output_dir) / images[0]["path"]
@staticmethod @staticmethod
def set_cookie_dir_path(path: str): def set_cookie_dir_path(path: str):

@ -0,0 +1,170 @@
"""
BiomedCLIP-PubMedBERT_256-vit_base_patch16_224
https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224
BiomedCLIP is a biomedical vision-language foundation model that is pretrained on PMC-15M,
a dataset of 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central, using contrastive learning. It uses PubMedBERT as the text encoder and Vision Transformer as the image encoder, with domain-specific adaptations. It can perform various vision-language processing (VLP) tasks such as cross-modal retrieval, image classification, and visual question answering. BiomedCLIP establishes new state of the art in a wide range of standard datasets, and substantially outperforms prior VLP approaches:
Citation
@misc{https://doi.org/10.48550/arXiv.2303.00915,
doi = {10.48550/ARXIV.2303.00915},
url = {https://arxiv.org/abs/2303.00915},
author = {Zhang, Sheng and Xu, Yanbo and Usuyama, Naoto and Bagga, Jaspreet and Tinn, Robert and Preston, Sam and Rao, Rajesh and Wei, Mu and Valluri, Naveen and Wong, Cliff and Lungren, Matthew and Naumann, Tristan and Poon, Hoifung},
title = {Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing},
publisher = {arXiv},
year = {2023},
}
Model Use
How to use
Please refer to this example notebook.
Intended Use
This model is intended to be used solely for (I) future research on visual-language processing and (II) reproducibility of the experimental results reported in the reference paper.
Primary Intended Use
The primary intended use is to support AI researchers building on top of this work. BiomedCLIP and its associated models should be helpful for exploring various biomedical VLP research questions, especially in the radiology domain.
Out-of-Scope Use
Any deployed use case of the model --- commercial or otherwise --- is currently out of scope. Although we evaluated the models using a broad set of publicly-available research benchmarks, the models and evaluations are not intended for deployed use cases. Please refer to the associated paper for more details.
Data
This model builds upon PMC-15M dataset, which is a large-scale parallel image-text dataset for biomedical vision-language processing. It contains 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central. It covers a diverse range of biomedical image types, such as microscopy, radiography, histology, and more.
Limitations
This model was developed using English corpora, and thus can be considered English-only.
Further information
Please refer to the corresponding paper, "Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing" for additional details on the model training and evaluation.
"""
import open_clip
import glob
import torch
from PIL import Image
import matplotlib.pyplot as plt
class BioClip:
"""
BioClip
Args:
model_path (str): path to the model
Attributes:
model_path (str): path to the model
model (torch.nn.Module): the model
preprocess_train (torchvision.transforms.Compose): the preprocessing pipeline for training
preprocess_val (torchvision.transforms.Compose): the preprocessing pipeline for validation
tokenizer (open_clip.Tokenizer): the tokenizer
device (torch.device): the device to run the model on
Methods:
__call__(self, img_path: str, labels: list, template: str = 'this is a photo of ', context_length: int = 256):
returns a dictionary of labels and their probabilities
plot_image_with_metadata(img_path: str, metadata: dict): plots the image with the metadata
Usage:
clip = BioClip('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')
labels = [
'adenocarcinoma histopathology',
'brain MRI',
'covid line chart',
'squamous cell carcinoma histopathology',
'immunohistochemistry histopathology',
'bone X-ray',
'chest X-ray',
'pie chart',
'hematoxylin and eosin histopathology'
]
result = clip("your_image_path.jpg", labels)
metadata = {'filename': "your_image_path.jpg".split('/')[-1], 'top_probs': result}
clip.plot_image_with_metadata("your_image_path.jpg", metadata)
"""
def __init__(self, model_path: str):
self.model_path = model_path
(
self.model,
self.preprocess_train,
self.preprocess_val,
) = open_clip.create_model_and_transforms(model_path)
self.tokenizer = open_clip.get_tokenizer(model_path)
self.device = (
torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
)
self.model.to(self.device)
self.model.eval()
def __call__(
self,
img_path: str,
labels: list,
template: str = "this is a photo of ",
context_length: int = 256,
):
image = torch.stack([self.preprocess_val(Image.open(img_path))]).to(self.device)
texts = self.tokenizer(
[template + l for l in labels], context_length=context_length
).to(self.device)
with torch.no_grad():
image_features, text_features, logit_scale = self.model(image, texts)
logits = (
(logit_scale * image_features @ text_features.t())
.detach()
.softmax(dim=-1)
)
sorted_indices = torch.argsort(logits, dim=-1, descending=True)
logits = logits.cpu().numpy()
sorted_indices = sorted_indices.cpu().numpy()
results = {}
for idx in sorted_indices[0]:
label = labels[idx]
prob = logits[0][idx]
results[label] = prob
return results
@staticmethod
def plot_image_with_metadata(img_path: str, metadata: dict):
img = Image.open(img_path)
fig, ax = plt.subplots(figsize=(5, 5))
ax.imshow(img)
ax.axis("off")
title = (
metadata["filename"]
+ "\n"
+ "\n".join([f"{k}: {v*100:.1f}" for k, v in metadata["top_probs"].items()])
)
ax.set_title(title, fontsize=14)
plt.tight_layout()
plt.show()
# Usage
# clip = BioClip('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')
# labels = [
# 'adenocarcinoma histopathology',
# 'brain MRI',
# 'covid line chart',
# 'squamous cell carcinoma histopathology',
# 'immunohistochemistry histopathology',
# 'bone X-ray',
# 'chest X-ray',
# 'pie chart',
# 'hematoxylin and eosin histopathology'
# ]
# result = clip("your_image_path.jpg", labels)
# metadata = {'filename': "your_image_path.jpg".split('/')[-1], 'top_probs': result}
# clip.plot_image_with_metadata("your_image_path.jpg", metadata)

@ -0,0 +1,208 @@
"""
BioGPT
Pre-trained language models have attracted increasing attention in the biomedical domain,
inspired by their great success in the general natural language domain.
Among the two main branches of pre-trained language models in the general language domain, i.e. BERT (and its variants) and GPT (and its variants),
the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT.
While they have achieved great success on a variety of discriminative downstream biomedical tasks,
the lack of generation ability constrains their application scope.
In this paper, we propose BioGPT, a domain-specific generative Transformer language model
pre-trained on large-scale biomedical literature.
We evaluate BioGPT on six biomedical natural language processing tasks
and demonstrate that our model outperforms previous models on most tasks.
Especially, we get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI
end-to-end relation extraction tasks, respectively, and 78.2% accuracy on PubMedQA,
creating a new record. Our case study on text generation further demonstrates the
advantage of BioGPT on biomedical literature to generate fluent descriptions for biomedical terms.
@article{10.1093/bib/bbac409,
author = {Luo, Renqian and Sun, Liai and Xia, Yingce and Qin, Tao and Zhang, Sheng and Poon, Hoifung and Liu, Tie-Yan},
title = "{BioGPT: generative pre-trained transformer for biomedical text generation and mining}",
journal = {Briefings in Bioinformatics},
volume = {23},
number = {6},
year = {2022},
month = {09},
abstract = "{Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain. Among the two main branches of pre-trained language models in the general language domain, i.e. BERT (and its variants) and GPT (and its variants), the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT. While they have achieved great success on a variety of discriminative downstream biomedical tasks, the lack of generation ability constrains their application scope. In this paper, we propose BioGPT, a domain-specific generative Transformer language model pre-trained on large-scale biomedical literature. We evaluate BioGPT on six biomedical natural language processing tasks and demonstrate that our model outperforms previous models on most tasks. Especially, we get 44.98\%, 38.42\% and 40.76\% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks, respectively, and 78.2\% accuracy on PubMedQA, creating a new record. Our case study on text generation further demonstrates the advantage of BioGPT on biomedical literature to generate fluent descriptions for biomedical terms.}",
issn = {1477-4054},
doi = {10.1093/bib/bbac409},
url = {https://doi.org/10.1093/bib/bbac409},
note = {bbac409},
eprint = {https://academic.oup.com/bib/article-pdf/23/6/bbac409/47144271/bbac409.pdf},
}
"""
import torch
from transformers import pipeline, set_seed, BioGptTokenizer, BioGptForCausalLM
class BioGPT:
"""
A wrapper class for the BioGptForCausalLM model from the transformers library.
Attributes:
model_name (str): Name of the pretrained model.
model (BioGptForCausalLM): The pretrained BioGptForCausalLM model.
tokenizer (BioGptTokenizer): The tokenizer for the BioGptForCausalLM model.
Methods:
__call__: Generate text based on the given input.
get_features: Get the features of a given text.
beam_search_decoding: Generate text using beam search decoding.
set_pretrained_model: Set a new tokenizer and model.
get_config: Get the model's configuration.
save_model: Save the model and tokenizer to a directory.
load_from_path: Load a model and tokenizer from a directory.
print_model: Print the model's architecture.
Usage:
>>> from swarms.models.biogpt import BioGPTWrapper
>>> model = BioGPTWrapper()
>>> out = model("The patient has a fever")
>>> print(out)
"""
def __init__(
self,
model_name: str = "microsoft/biogpt",
max_length: int = 500,
num_return_sequences: int = 5,
do_sample: bool = True,
min_length: int = 100,
):
"""
Initialize the wrapper class with a model name.
Args:
model_name (str): Name of the pretrained model. Default is "microsoft/biogpt".
"""
self.model_name = model_name
self.max_length = max_length
self.num_return_sequences = num_return_sequences
self.do_sample = do_sample
self.min_length = min_length
self.model = BioGptForCausalLM.from_pretrained(self.model_name)
self.tokenizer = BioGptTokenizer.from_pretrained(self.model_name)
def __call__(self, text: str):
"""
Generate text based on the given input.
Args:
text (str): The input text to generate from.
max_length (int): Maximum length of the generated text.
num_return_sequences (int): Number of sequences to return.
do_sample (bool): Whether or not to use sampling in generation.
Returns:
list[dict]: A list of generated texts.
"""
set_seed(42)
generator = pipeline(
"text-generation", model=self.model, tokenizer=self.tokenizer
)
return generator(
text,
max_length=self.max_length,
num_return_sequences=self.num_return_sequences,
do_sample=self.do_sample,
)
def get_features(self, text):
"""
Get the features of a given text.
Args:
text (str): Input text.
Returns:
BaseModelOutputWithPastAndCrossAttentions: Model output.
"""
encoded_input = self.tokenizer(text, return_tensors="pt")
return self.model(**encoded_input)
def beam_search_decoding(
self,
sentence,
num_beams=5,
early_stopping=True,
):
"""
Generate text using beam search decoding.
Args:
sentence (str): The input sentence to generate from.
min_length (int): Minimum length of the generated text.
max_length (int): Maximum length of the generated text.
num_beams (int): Number of beams for beam search.
early_stopping (bool): Whether to stop early during beam search.
Returns:
str: The generated text.
"""
inputs = self.tokenizer(sentence, return_tensors="pt")
set_seed(42)
with torch.no_grad():
beam_output = self.model.generate(
**inputs,
min_length=self.min_length,
max_length=self.max_length,
num_beams=num_beams,
early_stopping=early_stopping
)
return self.tokenizer.decode(beam_output[0], skip_special_tokens=True)
# Feature 1: Set a new tokenizer and model
def set_pretrained_model(self, model_name):
"""
Set a new tokenizer and model.
Args:
model_name (str): Name of the pretrained model.
"""
self.model_name = model_name
self.model = BioGptForCausalLM.from_pretrained(self.model_name)
self.tokenizer = BioGptTokenizer.from_pretrained(self.model_name)
# Feature 2: Get the model's config details
def get_config(self):
"""
Get the model's configuration.
Returns:
PretrainedConfig: The configuration of the model.
"""
return self.model.config
# Feature 3: Save the model and tokenizer to disk
def save_model(self, path):
"""
Save the model and tokenizer to a directory.
Args:
path (str): Path to the directory.
"""
self.model.save_pretrained(path)
self.tokenizer.save_pretrained(path)
# Feature 4: Load a model from a custom path
def load_from_path(self, path):
"""
Load a model and tokenizer from a directory.
Args:
path (str): Path to the directory.
"""
self.model = BioGptForCausalLM.from_pretrained(path)
self.tokenizer = BioGptTokenizer.from_pretrained(path)
# Feature 5: Print the model's architecture
def print_model(self):
"""
Print the model's architecture.
"""
print(self.model)

@ -38,21 +38,22 @@ class Idefics:
# Usage # Usage
``` ```
from exa import idefics from swarms.models import idefics
mmi = idefics()
model = idefics()
user_input = "User: What is in this image? https://upload.wikimedia.org/wikipedia/commons/8/86/Id%C3%A9fix.JPG" user_input = "User: What is in this image? https://upload.wikimedia.org/wikipedia/commons/8/86/Id%C3%A9fix.JPG"
response = mmi.chat(user_input) response = model.chat(user_input)
print(response) print(response)
user_input = "User: And who is that? https://static.wikia.nocookie.net/asterix/images/2/25/R22b.gif/revision/latest?cb=20110815073052" user_input = "User: And who is that? https://static.wikia.nocookie.net/asterix/images/2/25/R22b.gif/revision/latest?cb=20110815073052"
response = mmi.chat(user_input) response = model.chat(user_input)
print(response) print(response)
mmi.set_checkpoint("new_checkpoint") model.set_checkpoint("new_checkpoint")
mmi.set_device("cpu") model.set_device("cpu")
mmi.set_max_length(200) model.set_max_length(200)
mmi.clear_chat_history() model.clear_chat_history()
``` ```
""" """

@ -0,0 +1,36 @@
"""
LayoutLMDocumentQA is a multimodal good for
visual question answering on real world docs lik invoice, pdfs, etc
"""
from transformers import pipeline
from swarms.models.base import AbstractModel
class LayoutLMDocumentQA(AbstractModel):
"""
LayoutLMDocumentQA for document question answering:
Args:
model_name (str, optional): [description]. Defaults to "impira/layoutlm-document-qa".
task (str, optional): [description]. Defaults to "document-question-answering".
Usage:
>>> from swarms.models import LayoutLMDocumentQA
>>> model = LayoutLMDocumentQA()
>>> out = model("What is the total amount?", "path/to/img.png")
>>> print(out)
"""
def __init__(
self,
model_name: str = "impira/layoutlm-document-qa",
task: str = "document-question-answering",
):
self.pipeline = pipeline(self.task, model=self.model_name)
def __call__(self, task: str, img_path: str):
"""Call for model"""
out = self.pipeline(img_path, task)
out = str(out)
return out

@ -0,0 +1,69 @@
"""
Nougat by Meta
Good for:
- transcribe Scientific PDFs into an easy to use markdown
format
- Extracting information from PDFs
- Extracting metadata from pdfs
"""
import torch
from PIL import Image
from transformers import NougatProcessor, VisionEncoderDecoderModel
class Nougat:
"""
Nougat
ArgsS:
model_name_or_path: str, default="facebook/nougat-base"
min_length: int, default=1
max_new_tokens: int, default=30
Usage:
>>> from swarms.models.nougat import Nougat
>>> nougat = Nougat()
>>> nougat("path/to/image.png")
"""
def __init__(
self,
model_name_or_path="facebook/nougat-base",
min_length: int = 1,
max_new_tokens: int = 30,
):
self.model_name_or_path = model_name_or_path
self.min_length = min_length
self.max_new_tokens = max_new_tokens
self.processor = NougatProcessor.from_pretrained(self.model_name_or_path)
self.model = VisionEncoderDecoderModel.from_pretrained(self.model_name_or_path)
self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.model.to(self.device)
def get_image(self, img_path: str):
"""Get an image from a path"""
image = Image.open(img_path)
return image
def __call__(self, img_path: str):
"""Call the model with an image_path str as an input"""
image = Image.open(img_path)
pixel_values = self.processor(image, return_tensors="pt").pixel_values
# Generate transcriptions, here we only generate 30 tokens
outputs = self.model.generate(
pixel_values.to(self.device),
min_length=self.min_length,
max_new_tokens=self.max_new_tokens,
bad_words_ids=[[self.processor.unk_token - id]],
)
sequence = self.processor.batch_decode(outputs, skip_special_tokens=True)[0]
sequence = self.processor.post_process_generation(sequence, fix_markdown=False)
return sequence

@ -0,0 +1 @@
"""Phi by Microsoft written by Kye"""

@ -0,0 +1,159 @@
"""
SpeechT5 (TTS task)
SpeechT5 model fine-tuned for speech synthesis (text-to-speech) on LibriTTS.
This model was introduced in SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.
SpeechT5 was first released in this repository, original weights. The license used is MIT.
Model Description
Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning. The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder.
Leveraging large-scale unlabeled speech and text data, we pre-train SpeechT5 to learn a unified-modal representation, hoping to improve the modeling capability for both speech and text. To align the textual and speech information into this unified semantic space, we propose a cross-modal vector quantization approach that randomly mixes up speech/text states with latent units as the interface between encoder and decoder.
Extensive evaluations show the superiority of the proposed SpeechT5 framework on a wide variety of spoken language processing tasks, including automatic speech recognition, speech synthesis, speech translation, voice conversion, speech enhancement, and speaker identification.
Developed by: Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.
Shared by [optional]: Matthijs Hollemans
Model type: text-to-speech
Language(s) (NLP): [More Information Needed]
License: MIT
Finetuned from model [optional]: [More Information Needed]
Model Sources [optional]
Repository: [https://github.com/microsoft/SpeechT5/]
Paper: [https://arxiv.org/pdf/2110.07205.pdf]
Blog Post: [https://huggingface.co/blog/speecht5]
Demo: [https://huggingface.co/spaces/Matthijs/speecht5-tts-demo]
"""
import torch
import soundfile as sf
from transformers import (
pipeline,
SpeechT5Processor,
SpeechT5ForTextToSpeech,
SpeechT5HifiGan,
)
from datasets import load_dataset
class SpeechT5:
"""
SpeechT5Wrapper
Args:
model_name (str, optional): Model name or path. Defaults to "microsoft/speecht5_tts".
vocoder_name (str, optional): Vocoder name or path. Defaults to "microsoft/speecht5_hifigan".
dataset_name (str, optional): Dataset name or path. Defaults to "Matthijs/cmu-arctic-xvectors".
Attributes:
model_name (str): Model name or path.
vocoder_name (str): Vocoder name or path.
dataset_name (str): Dataset name or path.
processor (SpeechT5Processor): Processor for the SpeechT5 model.
model (SpeechT5ForTextToSpeech): SpeechT5 model.
vocoder (SpeechT5HifiGan): SpeechT5 vocoder.
embeddings_dataset (datasets.Dataset): Dataset containing speaker embeddings.
Methods
__call__: Synthesize speech from text.
save_speech: Save speech to a file.
set_model: Change the model.
set_vocoder: Change the vocoder.
set_embeddings_dataset: Change the embeddings dataset.
get_sampling_rate: Get the sampling rate of the model.
print_model_details: Print details of the model.
quick_synthesize: Customize pipeline method for quick synthesis.
change_dataset_split: Change dataset split (train, validation, test).
load_custom_embedding: Load a custom speaker embedding (xvector) for the text.
Usage:
>>> speechT5 = SpeechT5Wrapper()
>>> result = speechT5("Hello, how are you?")
>>> speechT5.save_speech(result)
>>> print("Speech saved successfully!")
"""
def __init__(
self,
model_name="microsoft/speecht5_tts",
vocoder_name="microsoft/speecht5_hifigan",
dataset_name="Matthijs/cmu-arctic-xvectors",
):
self.model_name = model_name
self.vocoder_name = vocoder_name
self.dataset_name = dataset_name
self.processor = SpeechT5Processor.from_pretrained(self.model_name)
self.model = SpeechT5ForTextToSpeech.from_pretrained(self.model_name)
self.vocoder = SpeechT5HifiGan.from_pretrained(self.vocoder_name)
self.embeddings_dataset = load_dataset(self.dataset_name, split="validation")
def __call__(self, text: str, speaker_id: float = 7306):
"""Call the model on some text and return the speech."""
speaker_embedding = torch.tensor(
self.embeddings_dataset[speaker_id]["xvector"]
).unsqueeze(0)
inputs = self.processor(text=text, return_tensors="pt")
speech = self.model.generate_speech(
inputs["input_ids"], speaker_embedding, vocoder=self.vocoder
)
return speech
def save_speech(self, speech, filename="speech.wav"):
"""Save Speech to a file."""
sf.write(filename, speech.numpy(), samplerate=16000)
def set_model(self, model_name: str):
"""Set the model to a new model."""
self.model_name = model_name
self.processor = SpeechT5Processor.from_pretrained(self.model_name)
self.model = SpeechT5ForTextToSpeech.from_pretrained(self.model_name)
def set_vocoder(self, vocoder_name):
"""Set the vocoder to a new vocoder."""
self.vocoder_name = vocoder_name
self.vocoder = SpeechT5HifiGan.from_pretrained(self.vocoder_name)
def set_embeddings_dataset(self, dataset_name):
"""Set the embeddings dataset to a new dataset."""
self.dataset_name = dataset_name
self.embeddings_dataset = load_dataset(self.dataset_name, split="validation")
# Feature 1: Get sampling rate
def get_sampling_rate(self):
"""Get sampling rate of the model."""
return 16000
# Feature 2: Print details of the model
def print_model_details(self):
"""Print details of the model."""
print(f"Model Name: {self.model_name}")
print(f"Vocoder Name: {self.vocoder_name}")
# Feature 3: Customize pipeline method for quick synthesis
def quick_synthesize(self, text):
"""Customize pipeline method for quick synthesis."""
synthesiser = pipeline("text-to-speech", self.model_name)
speech = synthesiser(text)
return speech
# Feature 4: Change dataset split (train, validation, test)
def change_dataset_split(self, split="train"):
"""Change dataset split (train, validation, test)."""
self.embeddings_dataset = load_dataset(self.dataset_name, split=split)
# Feature 5: Load a custom speaker embedding (xvector) for the text
def load_custom_embedding(self, xvector):
"""Load a custom speaker embedding (xvector) for the text."""
return torch.tensor(xvector).unsqueeze(0)
# if __name__ == "__main__":
# speechT5 = SpeechT5Wrapper()
# result = speechT5("Hello, how are you?")
# speechT5.save_speech(result)
# print("Speech saved successfully!")

@ -0,0 +1,19 @@
"""
TROCR for Multi-Modal OCR tasks
"""
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
class TrOCR:
def __init__(
self,
):
pass
def __call__(self):
pass

@ -3,7 +3,6 @@ import torch
from transformers import pipeline from transformers import pipeline
class Zephyr: class Zephyr:
""" """
Zehpyr model from HF Zehpyr model from HF
@ -23,6 +22,7 @@ class Zephyr:
""" """
def __init__( def __init__(
self, self,
max_new_tokens: int = 300, max_new_tokens: int = 300,
@ -40,18 +40,23 @@ class Zephyr:
"text-generation", "text-generation",
model="HuggingFaceH4/zephyr-7b-alpha", model="HuggingFaceH4/zephyr-7b-alpha",
torch_dtype=torch.bfloa16, torch_dtype=torch.bfloa16,
device_map="auto" device_map="auto",
) )
self.messages = [ self.messages = [
{ {
"role": "system", "role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate", "content": "You are a friendly chatbot who always responds in the style of a pirate",
}, },
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"}, {
"role": "user",
"content": "How many helicopters can a human eat in one sitting?",
},
] ]
def __call__(self, text: str): def __call__(self, text: str):
"""Call the model""" """Call the model"""
prompt = self.pipe.tokenizer.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True) prompt = self.pipe.tokenizer.apply_chat_template(
self.messages, tokenize=False, add_generation_prompt=True
)
outputs = self.pipe(prompt, max_new_token=self.max_new_tokens) outputs = self.pipe(prompt, max_new_token=self.max_new_tokens)
print(outputs[0])["generated_text"] print(outputs[0])["generated_text"]

@ -0,0 +1,83 @@
import pytest
from swarms.chunkers.base import (
BaseChunker,
TextArtifact,
ChunkSeparator,
OpenAiTokenizer,
) # adjust the import paths accordingly
# 1. Test Initialization
def test_chunker_initialization():
chunker = BaseChunker()
assert isinstance(chunker, BaseChunker)
assert chunker.max_tokens == chunker.tokenizer.max_tokens
def test_default_separators():
chunker = BaseChunker()
assert chunker.separators == BaseChunker.DEFAULT_SEPARATORS
def test_default_tokenizer():
chunker = BaseChunker()
assert isinstance(chunker.tokenizer, OpenAiTokenizer)
# 2. Test Basic Chunking
@pytest.mark.parametrize(
"input_text, expected_output",
[
("This is a test.", [TextArtifact("This is a test.")]),
("Hello World!", [TextArtifact("Hello World!")]),
# Add more simple cases
],
)
def test_basic_chunk(input_text, expected_output):
chunker = BaseChunker()
result = chunker.chunk(input_text)
assert result == expected_output
# 3. Test Chunking with Different Separators
def test_custom_separators():
custom_separator = ChunkSeparator(";")
chunker = BaseChunker(separators=[custom_separator])
input_text = "Hello;World!"
expected_output = [TextArtifact("Hello;"), TextArtifact("World!")]
result = chunker.chunk(input_text)
assert result == expected_output
# 4. Test Recursive Chunking
def test_recursive_chunking():
chunker = BaseChunker(max_tokens=5)
input_text = "This is a more complex text."
expected_output = [
TextArtifact("This"),
TextArtifact("is a"),
TextArtifact("more"),
TextArtifact("complex"),
TextArtifact("text."),
]
result = chunker.chunk(input_text)
assert result == expected_output
# 5. Test Edge Cases and Special Scenarios
def test_empty_text():
chunker = BaseChunker()
result = chunker.chunk("")
assert result == []
def test_whitespace_text():
chunker = BaseChunker()
result = chunker.chunk(" ")
assert result == [TextArtifact(" ")]
def test_single_word():
chunker = BaseChunker()
result = chunker.chunk("Hello")
assert result == [TextArtifact("Hello")]

@ -0,0 +1,207 @@
from unittest.mock import patch
# Import necessary modules
import pytest
import torch
from transformers import BioGptForCausalLM, BioGptTokenizer
# Fixture for BioGPT instance
@pytest.fixture
def biogpt_instance():
from swarms.models import (
BioGPT,
)
return BioGPT()
# 36. Test if BioGPT provides a response for a simple biomedical question
def test_biomedical_response_1(biogpt_instance):
question = "What are the functions of the mitochondria?"
response = biogpt_instance(question)
assert response and isinstance(response, str)
# 37. Test for a genetics-based question
def test_genetics_response(biogpt_instance):
question = "Can you explain the Mendelian inheritance?"
response = biogpt_instance(question)
assert response and isinstance(response, str)
# 38. Test for a question about viruses
def test_virus_response(biogpt_instance):
question = "How do RNA viruses replicate?"
response = biogpt_instance(question)
assert response and isinstance(response, str)
# 39. Test for a cell biology related question
def test_cell_biology_response(biogpt_instance):
question = "Describe the cell cycle and its phases."
response = biogpt_instance(question)
assert response and isinstance(response, str)
# 40. Test for a question about protein structure
def test_protein_structure_response(biogpt_instance):
question = "What's the difference between alpha helix and beta sheet structures in proteins?"
response = biogpt_instance(question)
assert response and isinstance(response, str)
# 41. Test for a pharmacology question
def test_pharmacology_response(biogpt_instance):
question = "How do beta blockers work?"
response = biogpt_instance(question)
assert response and isinstance(response, str)
# 42. Test for an anatomy-based question
def test_anatomy_response(biogpt_instance):
question = "Describe the structure of the human heart."
response = biogpt_instance(question)
assert response and isinstance(response, str)
# 43. Test for a question about bioinformatics
def test_bioinformatics_response(biogpt_instance):
question = "What is a BLAST search?"
response = biogpt_instance(question)
assert response and isinstance(response, str)
# 44. Test for a neuroscience question
def test_neuroscience_response(biogpt_instance):
question = "Explain the function of synapses in the nervous system."
response = biogpt_instance(question)
assert response and isinstance(response, str)
# 45. Test for an immunology question
def test_immunology_response(biogpt_instance):
question = "What is the role of T cells in the immune response?"
response = biogpt_instance(question)
assert response and isinstance(response, str)
def test_init(bio_gpt):
assert bio_gpt.model_name == "microsoft/biogpt"
assert bio_gpt.max_length == 500
assert bio_gpt.num_return_sequences == 5
assert bio_gpt.do_sample is True
assert bio_gpt.min_length == 100
def test_call(bio_gpt, monkeypatch):
def mock_pipeline(*args, **kwargs):
class MockGenerator:
def __call__(self, text, **kwargs):
return ["Generated text"]
return MockGenerator()
monkeypatch.setattr("transformers.pipeline", mock_pipeline)
result = bio_gpt("Input text")
assert result == ["Generated text"]
def test_get_features(bio_gpt):
features = bio_gpt.get_features("Input text")
assert "last_hidden_state" in features
def test_beam_search_decoding(bio_gpt):
generated_text = bio_gpt.beam_search_decoding("Input text")
assert isinstance(generated_text, str)
def test_set_pretrained_model(bio_gpt):
bio_gpt.set_pretrained_model("new_model")
assert bio_gpt.model_name == "new_model"
def test_get_config(bio_gpt):
config = bio_gpt.get_config()
assert "vocab_size" in config
def test_save_load_model(tmp_path, bio_gpt):
bio_gpt.save_model(tmp_path)
bio_gpt.load_from_path(tmp_path)
assert bio_gpt.model_name == "microsoft/biogpt"
def test_print_model(capsys, bio_gpt):
bio_gpt.print_model()
captured = capsys.readouterr()
assert "BioGptForCausalLM" in captured.out
# 26. Test if set_pretrained_model changes the model_name
def test_set_pretrained_model_name_change(biogpt_instance):
biogpt_instance.set_pretrained_model("new_model_name")
assert biogpt_instance.model_name == "new_model_name"
# 27. Test get_config return type
def test_get_config_return_type(biogpt_instance):
config = biogpt_instance.get_config()
assert isinstance(config, type(biogpt_instance.model.config))
# 28. Test saving model functionality by checking if files are created
@patch.object(BioGptForCausalLM, "save_pretrained")
@patch.object(BioGptTokenizer, "save_pretrained")
def test_save_model(mock_save_model, mock_save_tokenizer, biogpt_instance):
path = "test_path"
biogpt_instance.save_model(path)
mock_save_model.assert_called_once_with(path)
mock_save_tokenizer.assert_called_once_with(path)
# 29. Test loading model from path
@patch.object(BioGptForCausalLM, "from_pretrained")
@patch.object(BioGptTokenizer, "from_pretrained")
def test_load_from_path(mock_load_model, mock_load_tokenizer, biogpt_instance):
path = "test_path"
biogpt_instance.load_from_path(path)
mock_load_model.assert_called_once_with(path)
mock_load_tokenizer.assert_called_once_with(path)
# 30. Test print_model doesn't raise any error
def test_print_model_metadata(biogpt_instance):
try:
biogpt_instance.print_model()
except Exception as e:
pytest.fail(f"print_model() raised an exception: {e}")
# 31. Test that beam_search_decoding uses the correct number of beams
@patch.object(BioGptForCausalLM, "generate")
def test_beam_search_decoding_num_beams(mock_generate, biogpt_instance):
biogpt_instance.beam_search_decoding("test_sentence", num_beams=7)
_, kwargs = mock_generate.call_args
assert kwargs["num_beams"] == 7
# 32. Test if beam_search_decoding handles early_stopping
@patch.object(BioGptForCausalLM, "generate")
def test_beam_search_decoding_early_stopping(mock_generate, biogpt_instance):
biogpt_instance.beam_search_decoding("test_sentence", early_stopping=False)
_, kwargs = mock_generate.call_args
assert kwargs["early_stopping"] is False
# 33. Test get_features return type
def test_get_features_return_type(biogpt_instance):
result = biogpt_instance.get_features("This is a sample text.")
assert isinstance(result, torch.nn.modules.module.Module)
# 34. Test if default model is set correctly during initialization
def test_default_model_name(biogpt_instance):
assert biogpt_instance.model_name == "microsoft/biogpt"

@ -0,0 +1,117 @@
# tests/test_fuyu.py
import pytest
from swarms.models import Fuyu
from transformers import FuyuProcessor, FuyuImageProcessor
from PIL import Image
# Basic test to ensure instantiation of class.
def test_fuyu_initialization():
fuyu_instance = Fuyu()
assert isinstance(fuyu_instance, Fuyu)
# Using parameterized testing for different init parameters.
@pytest.mark.parametrize(
"pretrained_path, device_map, max_new_tokens",
[
("adept/fuyu-8b", "cuda:0", 7),
("adept/fuyu-8b", "cpu", 10),
],
)
def test_fuyu_parameters(pretrained_path, device_map, max_new_tokens):
fuyu_instance = Fuyu(pretrained_path, device_map, max_new_tokens)
assert fuyu_instance.pretrained_path == pretrained_path
assert fuyu_instance.device_map == device_map
assert fuyu_instance.max_new_tokens == max_new_tokens
# Fixture for creating a Fuyu instance.
@pytest.fixture
def fuyu_instance():
return Fuyu()
# Test using the fixture.
def test_fuyu_processor_initialization(fuyu_instance):
assert isinstance(fuyu_instance.processor, FuyuProcessor)
assert isinstance(fuyu_instance.image_processor, FuyuImageProcessor)
# Test exception when providing an invalid image path.
def test_invalid_image_path(fuyu_instance):
with pytest.raises(FileNotFoundError):
fuyu_instance("Hello", "invalid/path/to/image.png")
# Using monkeypatch to replace the Image.open method to simulate a failure.
def test_image_open_failure(fuyu_instance, monkeypatch):
def mock_open(*args, **kwargs):
raise Exception("Mocked failure")
monkeypatch.setattr(Image, "open", mock_open)
with pytest.raises(Exception, match="Mocked failure"):
fuyu_instance(
"Hello",
"https://plus.unsplash.com/premium_photo-1687149699194-0207c04bc6e8?auto=format&fit=crop&q=80&w=1378&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
)
# Marking a slow test.
@pytest.mark.slow
def test_fuyu_model_output(fuyu_instance):
# This is a dummy test and may not be functional without real data.
output = fuyu_instance(
"Hello, my name is",
"https://plus.unsplash.com/premium_photo-1687149699194-0207c04bc6e8?auto=format&fit=crop&q=80&w=1378&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
)
assert isinstance(output, str)
def test_tokenizer_type(fuyu_instance):
assert "tokenizer" in dir(fuyu_instance)
def test_processor_has_image_processor_and_tokenizer(fuyu_instance):
assert fuyu_instance.processor.image_processor == fuyu_instance.image_processor
assert fuyu_instance.processor.tokenizer == fuyu_instance.tokenizer
def test_model_device_map(fuyu_instance):
assert fuyu_instance.model.device_map == fuyu_instance.device_map
# Testing maximum tokens setting
def test_max_new_tokens_setting(fuyu_instance):
assert fuyu_instance.max_new_tokens == 7
# Test if an exception is raised when invalid text is provided.
def test_invalid_text_input(fuyu_instance):
with pytest.raises(Exception):
fuyu_instance(None, "path/to/image.png")
# Test if an exception is raised when empty text is provided.
def test_empty_text_input(fuyu_instance):
with pytest.raises(Exception):
fuyu_instance("", "path/to/image.png")
# Test if an exception is raised when a very long text is provided.
def test_very_long_text_input(fuyu_instance):
with pytest.raises(Exception):
fuyu_instance("A" * 10000, "path/to/image.png")
# Check model's default device map
def test_default_device_map():
fuyu_instance = Fuyu()
assert fuyu_instance.device_map == "cuda:0"
# Testing if processor is correctly initialized
def test_processor_initialization(fuyu_instance):
assert isinstance(fuyu_instance.processor, FuyuProcessor)

@ -0,0 +1,119 @@
import pytest
from unittest.mock import patch
import torch
from swarms.models.idefics import Idefics, IdeficsForVisionText2Text, AutoProcessor
@pytest.fixture
def idefics_instance():
with patch(
"torch.cuda.is_available", return_value=False
): # Assuming tests are run on CPU for simplicity
instance = Idefics()
return instance
# Basic Tests
def test_init_default(idefics_instance):
assert idefics_instance.device == "cpu"
assert idefics_instance.max_length == 100
assert not idefics_instance.chat_history
@pytest.mark.parametrize(
"device,expected",
[
(None, "cpu"),
("cuda", "cuda"),
("cpu", "cpu"),
],
)
def test_init_device(device, expected):
with patch(
"torch.cuda.is_available", return_value=True if expected == "cuda" else False
):
instance = Idefics(device=device)
assert instance.device == expected
# Test `run` method
def test_run(idefics_instance):
prompts = [["User: Test"]]
with patch.object(idefics_instance, "processor") as mock_processor, patch.object(
idefics_instance, "model"
) as mock_model:
mock_processor.return_value = {"input_ids": torch.tensor([1, 2, 3])}
mock_model.generate.return_value = torch.tensor([1, 2, 3])
mock_processor.batch_decode.return_value = ["Test"]
result = idefics_instance.run(prompts)
assert result == ["Test"]
# Test `__call__` method (using the same logic as run for simplicity)
def test_call(idefics_instance):
prompts = [["User: Test"]]
with patch.object(idefics_instance, "processor") as mock_processor, patch.object(
idefics_instance, "model"
) as mock_model:
mock_processor.return_value = {"input_ids": torch.tensor([1, 2, 3])}
mock_model.generate.return_value = torch.tensor([1, 2, 3])
mock_processor.batch_decode.return_value = ["Test"]
result = idefics_instance(prompts)
assert result == ["Test"]
# Test `chat` method
def test_chat(idefics_instance):
user_input = "User: Hello"
response = "Model: Hi there!"
with patch.object(idefics_instance, "run", return_value=[response]):
result = idefics_instance.chat(user_input)
assert result == response
assert idefics_instance.chat_history == [user_input, response]
# Test `set_checkpoint` method
def test_set_checkpoint(idefics_instance):
new_checkpoint = "new_checkpoint"
with patch.object(
IdeficsForVisionText2Text, "from_pretrained"
) as mock_from_pretrained, patch.object(AutoProcessor, "from_pretrained"):
idefics_instance.set_checkpoint(new_checkpoint)
mock_from_pretrained.assert_called_with(new_checkpoint, torch_dtype=torch.bfloat16)
# Test `set_device` method
def test_set_device(idefics_instance):
new_device = "cuda"
with patch.object(idefics_instance.model, "to"):
idefics_instance.set_device(new_device)
assert idefics_instance.device == new_device
# Test `set_max_length` method
def test_set_max_length(idefics_instance):
new_length = 150
idefics_instance.set_max_length(new_length)
assert idefics_instance.max_length == new_length
# Test `clear_chat_history` method
def test_clear_chat_history(idefics_instance):
idefics_instance.chat_history = ["User: Test", "Model: Response"]
idefics_instance.clear_chat_history()
assert not idefics_instance.chat_history
# Exception Tests
def test_run_with_empty_prompts(idefics_instance):
with pytest.raises(
Exception
): # Replace Exception with the actual exception that may arise for an empty prompt.
idefics_instance.run([])

@ -0,0 +1,206 @@
import os
from unittest.mock import MagicMock, Mock, patch
import pytest
import torch
from PIL import Image
from transformers import NougatProcessor, VisionEncoderDecoderModel
from swarms.models.nougat import Nougat
@pytest.fixture
def setup_nougat():
return Nougat()
def test_nougat_default_initialization(setup_nougat):
assert setup_nougat.model_name_or_path == "facebook/nougat-base"
assert setup_nougat.min_length == 1
assert setup_nougat.max_new_tokens == 30
def test_nougat_custom_initialization():
nougat = Nougat(model_name_or_path="custom_path", min_length=10, max_new_tokens=50)
assert nougat.model_name_or_path == "custom_path"
assert nougat.min_length == 10
assert nougat.max_new_tokens == 50
def test_processor_initialization(setup_nougat):
assert isinstance(setup_nougat.processor, NougatProcessor)
def test_model_initialization(setup_nougat):
assert isinstance(setup_nougat.model, VisionEncoderDecoderModel)
@pytest.mark.parametrize(
"cuda_available, expected_device", [(True, "cuda"), (False, "cpu")]
)
def test_device_initialization(cuda_available, expected_device, monkeypatch):
monkeypatch.setattr(
torch, "cuda", Mock(is_available=Mock(return_value=cuda_available))
)
nougat = Nougat()
assert nougat.device == expected_device
def test_get_image_valid_path(setup_nougat):
with patch("PIL.Image.open") as mock_open:
mock_open.return_value = Mock(spec=Image.Image)
assert setup_nougat.get_image("valid_path") is not None
def test_get_image_invalid_path(setup_nougat):
with pytest.raises(FileNotFoundError):
setup_nougat.get_image("invalid_path")
@pytest.mark.parametrize(
"min_len, max_tokens",
[
(1, 30),
(5, 40),
(10, 50),
],
)
def test_model_call_with_diff_params(setup_nougat, min_len, max_tokens):
setup_nougat.min_length = min_len
setup_nougat.max_new_tokens = max_tokens
with patch("PIL.Image.open") as mock_open:
mock_open.return_value = Mock(spec=Image.Image)
# Here, mocking other required methods or adding more complex logic would be necessary.
result = setup_nougat("valid_path")
assert isinstance(result, str)
def test_model_call_invalid_image_path(setup_nougat):
with pytest.raises(FileNotFoundError):
setup_nougat("invalid_path")
def test_model_call_mocked_output(setup_nougat):
with patch("PIL.Image.open") as mock_open:
mock_open.return_value = Mock(spec=Image.Image)
mock_model = MagicMock()
mock_model.generate.return_value = "mocked_output"
setup_nougat.model = mock_model
result = setup_nougat("valid_path")
assert result == "mocked_output"
@pytest.fixture
def mock_processor_and_model():
"""Mock the NougatProcessor and VisionEncoderDecoderModel to simulate their behavior."""
with patch(
"transformers.NougatProcessor.from_pretrained", return_value=Mock()
), patch(
"transformers.VisionEncoderDecoderModel.from_pretrained", return_value=Mock()
):
yield
@pytest.mark.usefixtures("mock_processor_and_model")
def test_nougat_with_sample_image_1(setup_nougat):
result = setup_nougat(
os.path.join(
"sample_images",
"https://plus.unsplash.com/premium_photo-1687149699194-0207c04bc6e8?auto=format&fit=crop&q=80&w=1378&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
)
)
assert isinstance(result, str)
@pytest.mark.usefixtures("mock_processor_and_model")
def test_nougat_with_sample_image_2(setup_nougat):
result = setup_nougat(os.path.join("sample_images", "test2.png"))
assert isinstance(result, str)
@pytest.mark.usefixtures("mock_processor_and_model")
def test_nougat_min_length_param(setup_nougat):
setup_nougat.min_length = 10
result = setup_nougat(
os.path.join(
"sample_images",
"https://plus.unsplash.com/premium_photo-1687149699194-0207c04bc6e8?auto=format&fit=crop&q=80&w=1378&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
)
)
assert isinstance(result, str)
@pytest.mark.usefixtures("mock_processor_and_model")
def test_nougat_max_new_tokens_param(setup_nougat):
setup_nougat.max_new_tokens = 50
result = setup_nougat(
os.path.join(
"sample_images",
"https://plus.unsplash.com/premium_photo-1687149699194-0207c04bc6e8?auto=format&fit=crop&q=80&w=1378&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
)
)
assert isinstance(result, str)
@pytest.mark.usefixtures("mock_processor_and_model")
def test_nougat_different_model_path(setup_nougat):
setup_nougat.model_name_or_path = "different/path"
result = setup_nougat(
os.path.join(
"sample_images",
"https://plus.unsplash.com/premium_photo-1687149699194-0207c04bc6e8?auto=format&fit=crop&q=80&w=1378&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
)
)
assert isinstance(result, str)
@pytest.mark.usefixtures("mock_processor_and_model")
def test_nougat_bad_image_path(setup_nougat):
with pytest.raises(Exception): # Adjust the exception type accordingly.
setup_nougat("bad_image_path.png")
@pytest.mark.usefixtures("mock_processor_and_model")
def test_nougat_image_large_size(setup_nougat):
result = setup_nougat(
os.path.join(
"sample_images",
"https://images.unsplash.com/photo-1697641039266-bfa00367f7cb?auto=format&fit=crop&q=60&w=400&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHx0b3BpYy1mZWVkfDJ8SnBnNktpZGwtSGt8fGVufDB8fHx8fA%3D%3D",
)
)
assert isinstance(result, str)
@pytest.mark.usefixtures("mock_processor_and_model")
def test_nougat_image_small_size(setup_nougat):
result = setup_nougat(
os.path.join(
"sample_images",
"https://images.unsplash.com/photo-1697638626987-aa865b769276?auto=format&fit=crop&q=60&w=400&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHx0b3BpYy1mZWVkfDd8SnBnNktpZGwtSGt8fGVufDB8fHx8fA%3D%3D",
)
)
assert isinstance(result, str)
@pytest.mark.usefixtures("mock_processor_and_model")
def test_nougat_image_varied_content(setup_nougat):
result = setup_nougat(
os.path.join(
"sample_images",
"https://images.unsplash.com/photo-1697469994783-b12bbd9c4cff?auto=format&fit=crop&q=60&w=400&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHx0b3BpYy1mZWVkfDE0fEpwZzZLaWRsLUhrfHxlbnwwfHx8fHw%3D",
)
)
assert isinstance(result, str)
@pytest.mark.usefixtures("mock_processor_and_model")
def test_nougat_image_with_metadata(setup_nougat):
result = setup_nougat(
os.path.join(
"sample_images",
"https://images.unsplash.com/photo-1697273300766-5bbaa53ec2f0?auto=format&fit=crop&q=60&w=400&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHx0b3BpYy1mZWVkfDE5fEpwZzZLaWRsLUhrfHxlbnwwfHx8fHw%3D",
)
)
assert isinstance(result, str)

@ -0,0 +1,95 @@
import pytest
from unittest.mock import patch, Mock
from swarms.models.vilt import Vilt, Image, requests
# Fixture for Vilt instance
@pytest.fixture
def vilt_instance():
return Vilt()
# 1. Test Initialization
def test_vilt_initialization(vilt_instance):
assert isinstance(vilt_instance, Vilt)
assert vilt_instance.processor is not None
assert vilt_instance.model is not None
# 2. Test Model Predictions
@patch.object(requests, "get")
@patch.object(Image, "open")
def test_vilt_prediction(mock_image_open, mock_requests_get, vilt_instance):
mock_image = Mock()
mock_image_open.return_value = mock_image
mock_requests_get.return_value.raw = Mock()
# It's a mock response, so no real answer expected
with pytest.raises(Exception): # Ensure exception is more specific
vilt_instance(
"What is this image",
"https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80",
)
# 3. Test Exception Handling for network
@patch.object(requests, "get", side_effect=requests.RequestException("Network error"))
def test_vilt_network_exception(vilt_instance):
with pytest.raises(requests.RequestException):
vilt_instance(
"What is this image",
"https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80",
)
# Parameterized test cases for different inputs
@pytest.mark.parametrize(
"text,image_url",
[
("What is this?", "http://example.com/image1.jpg"),
("Who is in the image?", "http://example.com/image2.jpg"),
("Where was this picture taken?", "http://example.com/image3.jpg"),
# ... Add more scenarios
],
)
def test_vilt_various_inputs(text, image_url, vilt_instance):
with pytest.raises(Exception): # Again, ensure exception is more specific
vilt_instance(text, image_url)
# Test with invalid or empty text
@pytest.mark.parametrize(
"text,image_url",
[
(
"",
"https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80",
),
(
None,
"https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80",
),
(
" ",
"https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80",
),
# ... Add more scenarios
],
)
def test_vilt_invalid_text(text, image_url, vilt_instance):
with pytest.raises(ValueError):
vilt_instance(text, image_url)
# Test with invalid or empty image_url
@pytest.mark.parametrize(
"text,image_url",
[
("What is this?", ""),
("Who is in the image?", None),
("Where was this picture taken?", " "),
],
)
def test_vilt_invalid_image_url(text, image_url, vilt_instance):
with pytest.raises(ValueError):
vilt_instance(text, image_url)
Loading…
Cancel
Save