diff --git a/docs/swarms/models/openai_tts.md b/docs/swarms/models/openai_tts.md new file mode 100644 index 00000000..b2996312 --- /dev/null +++ b/docs/swarms/models/openai_tts.md @@ -0,0 +1,135 @@ +# `OpenAITTS` Documentation + +## Table of Contents +1. [Overview](#overview) +2. [Installation](#installation) +3. [Usage](#usage) + - [Initialization](#initialization) + - [Running TTS](#running-tts) + - [Running TTS and Saving](#running-tts-and-saving) +4. [Examples](#examples) + - [Basic Usage](#basic-usage) + - [Saving the Output](#saving-the-output) +5. [Advanced Options](#advanced-options) +6. [Troubleshooting](#troubleshooting) +7. [References](#references) + +## 1. Overview + +The `OpenAITTS` module is a Python library that provides an interface for converting text to speech (TTS) using the OpenAI TTS API. It allows you to generate high-quality speech from text input, making it suitable for various applications such as voice assistants, speech synthesis, and more. + +### Features: +- Convert text to speech using OpenAI's TTS model. +- Supports specifying the model name, voice, and other parameters. +- Option to save the generated speech to a WAV file. + +## 2. Installation + +To use the `OpenAITTS` model, you need to install the necessary dependencies. You can do this using `pip`: + +```bash +pip install swarms requests wave +``` + +## 3. Usage + +### Initialization + +To use the `OpenAITTS` module, you need to initialize an instance of the `OpenAITTS` class. Here's how you can do it: + +```python +from swarms.models.openai_tts import OpenAITTS + +# Initialize the OpenAITTS instance +tts = OpenAITTS( + model_name="tts-1-1106", + proxy_url="https://api.openai.com/v1/audio/speech", + openai_api_key=openai_api_key_env, + voice="onyx", +) +``` + +#### Parameters: +- `model_name` (str): The name of the TTS model to use (default is "tts-1-1106"). +- `proxy_url` (str): The URL for the OpenAI TTS API (default is "https://api.openai.com/v1/audio/speech"). +- `openai_api_key` (str): Your OpenAI API key. It can be obtained from the OpenAI website. +- `voice` (str): The voice to use for generating speech (default is "onyx"). +- `chunk_size` (int): The size of data chunks when fetching audio (default is 1024 * 1024 bytes). +- `autosave` (bool): Whether to automatically save the generated speech to a file (default is False). +- `saved_filepath` (str): The path to the file where the speech will be saved (default is "runs/tts_speech.wav"). + +### Running TTS + +Once the `OpenAITTS` instance is initialized, you can use it to convert text to speech using the `run` method: + +```python +# Generate speech from text +speech_data = tts.run("Hello, world!") +``` + +#### Parameters: +- `task` (str): The text you want to convert to speech. + +#### Returns: +- `speech_data` (bytes): The generated speech data. + +### Running TTS and Saving + +You can also use the `run_and_save` method to generate speech from text and save it to a file: + +```python +# Generate speech from text and save it to a file +speech_data = tts.run_and_save("Hello, world!") +``` + +#### Parameters: +- `task` (str): The text you want to convert to speech. + +#### Returns: +- `speech_data` (bytes): The generated speech data. + +## 4. Examples + +### Basic Usage + +Here's a basic example of how to use the `OpenAITTS` module to generate speech from text: + +```python +from swarms.models.openai_tts import OpenAITTS + +# Initialize the OpenAITTS instance +tts = OpenAITTS( + model_name="tts-1-1106", + proxy_url="https://api.openai.com/v1/audio/speech", + openai_api_key=openai_api_key_env, + voice="onyx", +) + +# Generate speech from text +speech_data = tts.run("Hello, world!") +``` + +### Saving the Output + +You can save the generated speech to a WAV file using the `run_and_save` method: + +```python +# Generate speech from text and save it to a file +speech_data = tts.run_and_save("Hello, world!") +``` + +## 5. Advanced Options + +The `OpenAITTS` module supports various advanced options for customizing the TTS generation process. You can specify the model name, voice, and other parameters during initialization. Additionally, you can configure the chunk size for audio data fetching and choose whether to automatically save the generated speech to a file. + +## 6. Troubleshooting + +If you encounter any issues while using the `OpenAITTS` module, please make sure you have installed all the required dependencies and that your OpenAI API key is correctly configured. If you still face problems, refer to the OpenAI documentation or contact their support for assistance. + +## 7. References + +- [OpenAI API Documentation](https://beta.openai.com/docs/) +- [Python Requests Library](https://docs.python-requests.org/en/latest/) +- [Python Wave Library](https://docs.python.org/3/library/wave.html) + +This documentation provides a comprehensive guide on how to use the `OpenAITTS` module to convert text to speech using OpenAI's TTS model. It covers initialization, basic usage, advanced options, troubleshooting, and references for further exploration. \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 69b13d21..9e39a3cc 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -96,6 +96,7 @@ nav: - LayoutLMDocumentQA: "swarms/models/layoutlm_document_qa.md" - DistilWhisperModel: "swarms/models/distilled_whisperx.md" - ElevenLabsText2SpeechTool: "swarms/models/elevenlabs.md" + - OpenAITTS: "swarms/models/openai_tts.md" - swarms.structs: - Overview: "swarms/structs/overview.md" - AutoScaler: "swarms/swarms/autoscaler.md" diff --git a/swarms/models/base_tts.py b/swarms/models/base_tts.py index 9e2d6b34..613ab19a 100644 --- a/swarms/models/base_tts.py +++ b/swarms/models/base_tts.py @@ -5,6 +5,27 @@ from abc import ABC, abstractmethod class BaseTTSModel(AbstractLLM): + """Base class for all TTS models. + + Args: + AbstractLLM (_type_): _description_ + model_name (_type_): _description_ + voice (_type_): _description_ + chunk_size (_type_): _description_ + save_to_file (bool, optional): _description_. Defaults to False. + saved_filepath (Optional[str], optional): _description_. Defaults to None. + + Raises: + NotImplementedError: _description_ + + Methods: + save: save the model to a file. + load: load the model from a file. + run: run the model on the given task. + __call__: call the model on the given task. + save_to_file: save the speech data to a file. + + """ def __init__( self, model_name, @@ -18,6 +39,11 @@ class BaseTTSModel(AbstractLLM): self.chunk_size = chunk_size def save(self, filepath: Optional[str] = None): + """Save the model to a file. + + Args: + filepath (Optional[str], optional): _description_. Defaults to None. + """ pass def load(self, filepath: Optional[str] = None): @@ -25,7 +51,23 @@ class BaseTTSModel(AbstractLLM): @abstractmethod def run(self, task: str, *args, **kwargs): + """Run the model on the given task. + + Args: + task (str): _description_ + """ pass + + def __call__(self, task: str, *args, **kwargs): + """Call the model on the given task. + + Args: + task (str): _description_ + + Returns: + _type_: _description_ + """ + return self.run(task, *args, **kwargs) def save_to_file(self, speech_data, filename): """Save the speech data to a file.