docs for DistilWhisperModel

2 years ago · 6e6fe8dc52
parent 7e1d486a02
commit 6e6fe8dc52
2 changed files with 124 additions and 0 deletions
--- a/docs/swarms/models/distilled_whisperx.md
+++ b/docs/swarms/models/distilled_whisperx.md
@ -0,0 +1,123 @@
+# DistilWhisperModel Documentation
+
+## Overview
+
+The `DistilWhisperModel` is a Python class designed to handle English speech recognition tasks. It leverages the capabilities of the Whisper model, which is fine-tuned for speech-to-text processes. It is designed for both synchronous and asynchronous transcription of audio inputs, offering flexibility for real-time applications or batch processing.
+
+## Installation
+
+Before you can use `DistilWhisperModel`, ensure you have the required libraries installed:
+
+```sh
+pip3 install --upgrade swarms
+```
+
+## Initialization
+
+The `DistilWhisperModel` class is initialized with the following parameters:
+
+| Parameter | Type | Description | Default |
+|-----------|------|-------------|---------|
+| `model_id` | `str` | The identifier for the pre-trained Whisper model | `"distil-whisper/distil-large-v2"` |
+
+Example of initialization:
+
+```python
+from swarms.models import DistilWhisperModel
+
+# Initialize with default model
+model_wrapper = DistilWhisperModel()
+
+# Initialize with a specific model ID
+model_wrapper = DistilWhisperModel(model_id='distil-whisper/distil-large-v2')
+```
+
+## Attributes
+
+After initialization, the `DistilWhisperModel` has several attributes:
+
+| Attribute | Type | Description |
+|-----------|------|-------------|
+| `device` | `str` | The device used for computation (`"cuda:0"` for GPU or `"cpu"`). |
+| `torch_dtype` | `torch.dtype` | The data type used for the Torch tensors. |
+| `model_id` | `str` | The model identifier string. |
+| `model` | `torch.nn.Module` | The actual Whisper model loaded from the identifier. |
+| `processor` | `transformers.AutoProcessor` | The processor for handling input data. |
+
+## Methods
+
+### `transcribe`
+
+Transcribes audio input synchronously.
+
+**Arguments**:
+
+| Argument | Type | Description |
+|----------|------|-------------|
+| `inputs` | `Union[str, dict]` | File path or audio data dictionary. |
+
+**Returns**: `str` - The transcribed text.
+
+**Usage Example**:
+
+```python
+# Synchronous transcription
+transcription = model_wrapper.transcribe('path/to/audio.mp3')
+print(transcription)
+```
+
+### `async_transcribe`
+
+Transcribes audio input asynchronously.
+
+**Arguments**:
+
+| Argument | Type | Description |
+|----------|------|-------------|
+| `inputs` | `Union[str, dict]` | File path or audio data dictionary. |
+
+**Returns**: `Coroutine` - A coroutine that when awaited, returns the transcribed text.
+
+**Usage Example**:
+
+```python
+import asyncio
+
+# Asynchronous transcription
+transcription = asyncio.run(model_wrapper.async_transcribe('path/to/audio.mp3'))
+print(transcription)
+```
+
+### `real_time_transcribe`
+
+Simulates real-time transcription of an audio file.
+
+**Arguments**:
+
+| Argument | Type | Description |
+|----------|------|-------------|
+| `audio_file_path` | `str` | Path to the audio file. |
+| `chunk_duration` | `int` | Duration of audio chunks in seconds. |
+
+**Usage Example**:
+
+```python
+# Real-time transcription simulation
+model_wrapper.real_time_transcribe('path/to/audio.mp3', chunk_duration=5)
+```
+
+## Error Handling
+
+The `DistilWhisperModel` class incorporates error handling for file not found errors and generic exceptions during the transcription process. If a non-recoverable exception is raised, it is printed to the console in red to indicate failure.
+
+## Conclusion
+
+The `DistilWhisperModel` offers a convenient interface to the powerful Whisper model for speech recognition. Its design supports both batch and real-time transcription, catering to different application needs. The class's error handling and retry logic make it robust for real-world applications.
+
+## Additional Notes
+
+- Ensure you have appropriate permissions to read audio files when using file paths.
+- Transcription quality depends on the audio quality and the Whisper model's performance on your dataset.
+- Adjust `chunk_duration` according to the processing power of your system for real-time transcription.
+
+For a full list of models supported by `transformers.AutoModelForSpeechSeq2Seq`, visit the [Hugging Face Model Hub](https://huggingface.co/models).
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -106,6 +106,7 @@ nav:
      - Kosmos: "swarms/models/kosmos.md"
      - Nougat: "swarms/models/nougat.md"
      - LayoutLMDocumentQA: "swarms/models/layoutlm_document_qa.md"
+      - DistilWhisperModel: "swarms/models/distilled_whisperx.md"
  - swarms.structs:
    - Overview: "swarms/structs/overview.md"
    - Workflow: "swarms/structs/workflow.md"