pull/64/head
Kye 1 year ago
parent b61aa635df
commit 40b4c9efd9

@ -23,7 +23,7 @@ At Swarms, we're transforming the landscape of AI from siloed AI agents to a uni
-----
# 🤝 Schedule a 1-on-1 Session
Book a [1-on-1 Session with Kye](https://calendly.com/apacai/agora), the Creator, to discuss any issues, provide feedback, or explore how we can improve Swarms for you.
Book a [1-on-1 Session with Kye](https://calendly.com/swarm-corp/30min), the Creator, to discuss any issues, provide feedback, or explore how we can improve Swarms for you.
----------

@ -0,0 +1,118 @@
# Nougat Documentation
## Introduction
Welcome to the documentation for Nougat, a versatile model designed by Meta for transcribing scientific PDFs into user-friendly Markdown format, extracting information from PDFs, and extracting metadata from PDF documents. This documentation will provide you with a deep understanding of the Nougat class, its architecture, usage, and examples.
## Overview
Nougat is a powerful tool that combines language modeling and image processing capabilities to convert scientific PDF documents into Markdown format. It is particularly useful for researchers, students, and professionals who need to extract valuable information from PDFs quickly. With Nougat, you can simplify complex PDFs, making their content more accessible and easy to work with.
## Class Definition
```python
class Nougat:
def __init__(
self,
model_name_or_path="facebook/nougat-base",
min_length: int = 1,
max_new_tokens: int = 30,
):
```
## Purpose
The Nougat class serves the following primary purposes:
1. **PDF Transcription**: Nougat is designed to transcribe scientific PDFs into Markdown format. It helps convert complex PDF documents into a more readable and structured format, making it easier to extract information.
2. **Information Extraction**: It allows users to extract valuable information and content from PDFs efficiently. This can be particularly useful for researchers and professionals who need to extract data, figures, or text from scientific papers.
3. **Metadata Extraction**: Nougat can also extract metadata from PDF documents, providing essential details about the document, such as title, author, and publication date.
## Parameters
- `model_name_or_path` (str): The name or path of the pretrained Nougat model. Default: "facebook/nougat-base".
- `min_length` (int): The minimum length of the generated transcription. Default: 1.
- `max_new_tokens` (int): The maximum number of new tokens to generate in the Markdown transcription. Default: 30.
## Usage
To use Nougat, follow these steps:
1. Initialize the Nougat instance:
```python
from swarms.models import Nougat
nougat = Nougat()
```
### Example 1 - Initialization
```python
nougat = Nougat()
```
2. Transcribe a PDF image using Nougat:
```python
markdown_transcription = nougat("path/to/pdf_file.png")
```
### Example 2 - PDF Transcription
```python
nougat = Nougat()
markdown_transcription = nougat("path/to/pdf_file.png")
```
3. Extract information from a PDF:
```python
information = nougat.extract_information("path/to/pdf_file.png")
```
### Example 3 - Information Extraction
```python
nougat = Nougat()
information = nougat.extract_information("path/to/pdf_file.png")
```
4. Extract metadata from a PDF:
```python
metadata = nougat.extract_metadata("path/to/pdf_file.png")
```
### Example 4 - Metadata Extraction
```python
nougat = Nougat()
metadata = nougat.extract_metadata("path/to/pdf_file.png")
```
## How Nougat Works
Nougat employs a vision encoder-decoder model, along with a dedicated processor, to transcribe PDFs into Markdown format and perform information and metadata extraction. Here's how it works:
1. **Initialization**: When you create a Nougat instance, you can specify the model to use, the minimum transcription length, and the maximum number of new tokens to generate.
2. **Processing PDFs**: Nougat can process PDFs as input. You can provide the path to a PDF document.
3. **Image Processing**: The processor converts PDF pages into images, which are then encoded by the model.
4. **Transcription**: Nougat generates Markdown transcriptions of PDF content, ensuring a minimum length and respecting the token limit.
5. **Information Extraction**: Information extraction involves parsing the Markdown transcription to identify key details or content of interest.
6. **Metadata Extraction**: Metadata extraction involves identifying and extracting document metadata, such as title, author, and publication date.
## Additional Information
- Nougat leverages the "facebook/nougat-base" pretrained model, which is specifically designed for document transcription and extraction tasks.
- You can adjust the minimum transcription length and the maximum number of new tokens to control the output's length and quality.
- Nougat can be run on both CPU and GPU devices.
That concludes the documentation for Nougat. We hope you find this tool valuable for your PDF transcription, information extraction, and metadata extraction needs. If you have any questions or encounter any issues, please refer to the Nougat documentation for further assistance. Enjoy using Nougat!

@ -100,6 +100,7 @@ nav:
- Idefics: "swarms/models/idefics.md"
- BingChat: "swarms/models/bingchat.md"
- Kosmos: "swarms/models/kosmos.md"
- Nougat: "swarms/models/nougat.md"
- swarms.structs:
- Overview: "swarms/structs/overview.md"
- Workflow: "swarms/structs/workflow.md"

@ -36,12 +36,15 @@ langchain-experimental = "*"
playwright = "*"
duckduckgo-search = "*"
faiss-cpu = "*"
datasets = "*"
diffusers = "*"
sentencepiece = "*"
wget = "*"
griptape = "*"
httpx = "*"
ggl = "*"
beautifulsoup4 = "*"
huggingface-hub = "*"
pydantic = "*"
tenacity = "*"
redis = "*"

@ -18,7 +18,10 @@ google-search-results==2.4.2
Pillow
faiss-cpu
openai
datasets
huggingface-hub
google-generativeai
sentencepiece
duckduckgo-search
agent-protocol
chromadb

@ -8,14 +8,14 @@ warnings.filterwarnings("ignore", category=UserWarning)
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
from swarms import workers
from swarms.workers.worker import Worker
# from swarms import chunkers
from swarms import models
from swarms.models import * # import * only works when __all__ = [] is defined in __init__.py
from swarms import structs
from swarms import swarms
from swarms import agents
from swarms.logo import logo
print(logo)
print(logo)

@ -10,4 +10,23 @@ from swarms.models.zephyr import Zephyr
from swarms.models.idefics import Idefics
from swarms.models.kosmos_two import Kosmos
from swarms.models.vilt import Vilt
# from swarms.models.fuyu import Fuyu
from swarms.models.nougat import Nougat
# from swarms.models.fuyu import Fuyu # Not working, wait until they update
__all__ = [
"Anthropic",
"Petals",
"Mistral",
"OpenAI",
"AzureOpenAI",
"OpenAIChat",
"Zephyr",
"Idefics",
"Kosmos",
"Vilt",
"Nougat",
]

@ -4,13 +4,13 @@ import os
class Anthropic:
"""
Anthropic large language models.
Args:
"""
def __init__(

@ -1,14 +1,17 @@
import time
from abc import ABC, abstractmethod
def count_tokens(text: str) -> int:
return len(text.split())
class AbstractModel(ABC):
"""
AbstractModel
"""
# abstract base class for language models
def __init__(self):
self.start_time = None
@ -41,7 +44,7 @@ class AbstractModel(ABC):
if elapsed_time == 0:
return float("inf")
return self._num_tokens() / elapsed_time
def _num_tokens(self, text: str) -> int:
"""Number of tokens"""
return count_tokens(text)

@ -29,14 +29,22 @@ class BingChat:
self.cookies = json.loads(open(cookies_path, encoding="utf-8").read())
self.bot = asyncio.run(Chatbot.create(cookies=self.cookies))
def __call__(self, prompt: str, style: ConversationStyle = ConversationStyle.creative) -> str:
def __call__(
self, prompt: str, style: ConversationStyle = ConversationStyle.creative
) -> str:
"""
Get a text response using the EdgeGPT model based on the provided prompt.
"""
response = asyncio.run(self.bot.ask(prompt=prompt, conversation_style=style, simplify_response=True))
return response['text']
response = asyncio.run(
self.bot.ask(
prompt=prompt, conversation_style=style, simplify_response=True
)
)
return response["text"]
def create_img(self, prompt: str, output_dir: str = "./output", auth_cookie: str = None) -> str:
def create_img(
self, prompt: str, output_dir: str = "./output", auth_cookie: str = None
) -> str:
"""
Generate an image based on the provided prompt and save it in the given output directory.
Returns the path of the generated image.
@ -48,7 +56,7 @@ class BingChat:
images = image_generator.get_images(prompt)
image_generator.save_images(images, output_dir=output_dir)
return Path(output_dir) / images[0]['path']
return Path(output_dir) / images[0]["path"]
@staticmethod
def set_cookie_dir_path(path: str):

@ -87,7 +87,7 @@ class Idefics:
prompts : list
A list of prompts. Each prompt is a list of text strings and images.
batched_mode : bool, optional
Whether to process the prompts in batched mode. If True, all prompts are
Whether to process the prompts in batched mode. If True, all prompts are
processed together. If False, only the first prompt is processed (default is True).
Returns
@ -131,8 +131,8 @@ class Idefics:
prompts : list
A list of prompts. Each prompt is a list of text strings and images.
batched_mode : bool, optional
Whether to process the prompts in batched mode.
If True, all prompts are processed together.
Whether to process the prompts in batched mode.
If True, all prompts are processed together.
If False, only the first prompt is processed (default is True).
Returns

@ -20,7 +20,7 @@ class Kosmos:
"""
Args:
# Initialize Kosmos

@ -0,0 +1,68 @@
"""
Nougat by Meta
Good for:
- transcribe Scientific PDFs into an easy to use markdown
format
- Extracting information from PDFs
- Extracting metadata from pdfs
"""
import torch
from PIL import Image
from transformers import NougatProcessor, VisionEncoderDecoderModel
class Nougat:
"""
Nougat
ArgsS:
model_name_or_path: str, default="facebook/nougat-base"
min_length: int, default=1
max_new_tokens: int, default=30
Usage:
>>> from swarms.models.nougat import Nougat
>>> nougat = Nougat()
>>> nougat("path/to/image.png")
"""
def __init__(
self,
model_name_or_path="facebook/nougat-base",
min_length: int = 1,
max_new_tokens: int = 30,
):
self.model_name_or_path = model_name_or_path
self.min_length = min_length
self.max_new_tokens = max_new_tokens
self.processor = NougatProcessor.from_pretrained(self.model_name_or_path)
self.model = VisionEncoderDecoderModel.from_pretrained(self.model_name_or_path)
self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.model.to(self.device)
def get_image(self, img_path: str):
"""Get an image from a path"""
image = Image.open(img_path)
return image
def __call__(self, img_path: str):
"""Call the model with an image_path str as an input"""
image = Image.open(img_path)
pixel_values = self.processor(image, return_tensors="pt").pixel_values
# Generate transcriptions, here we only generate 30 tokens
outputs = self.model.generate(
pixel_values.to(self.device),
min_length=self.min_length,
max_new_tokens=self.max_new_tokens,
bad_words_ids=[[self.processor.unk_token - id]],
)
sequence = self.processor.batch_decode(outputs, skip_special_tokens=True)[0]
sequence = self.processor.post_process_generation(sequence, fix_markdown=False)
return sequence

@ -0,0 +1 @@
"""Phi by Microsoft written by Kye"""

@ -0,0 +1,19 @@
"""
TROCR for Multi-Modal OCR tasks
"""
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
class TrOCR:
def __init__(
self,
):
pass
def __call__(self):
pass

@ -1,16 +1,15 @@
"""Zephyr by HF"""
import torch
import torch
from transformers import pipeline
class Zephyr:
"""
Zehpyr model from HF
Args:
max_new_tokens(int) = Number of max new tokens
max_new_tokens(int) = Number of max new tokens
temperature(float) = temperature of the LLM
top_k(float) = top k of the model set to 50
top_p(float) = top_p of the model set to 0.95
@ -23,6 +22,7 @@ class Zephyr:
"""
def __init__(
self,
max_new_tokens: int = 300,
@ -40,18 +40,23 @@ class Zephyr:
"text-generation",
model="HuggingFaceH4/zephyr-7b-alpha",
torch_dtype=torch.bfloa16,
device_map="auto"
device_map="auto",
)
self.messages = [
{
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
},
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
{
"role": "user",
"content": "How many helicopters can a human eat in one sitting?",
},
]
def __call__(self, text: str):
"""Call the model"""
prompt = self.pipe.tokenizer.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
prompt = self.pipe.tokenizer.apply_chat_template(
self.messages, tokenize=False, add_generation_prompt=True
)
outputs = self.pipe(prompt, max_new_token=self.max_new_tokens)
print(outputs[0])["generated_text"]
print(outputs[0])["generated_text"]

@ -1,9 +1,15 @@
from swarms.tools.tool import BaseTool
class EdgeGPTTool(BaseTool):
def __init__(self, model, name="EdgeGPTTool", description="Tool that uses EdgeGPTModel to generate responses"):
def __init__(
self,
model,
name="EdgeGPTTool",
description="Tool that uses EdgeGPTModel to generate responses",
):
super().__init__(name=name, description=description)
self.model = model
def _run(self, prompt):
return self.model.__call__(prompt)
return self.model.__call__(prompt)

Loading…
Cancel
Save