From 84e05db366329f721a9f54bf1838b2cbf3adc1fb Mon Sep 17 00:00:00 2001 From: Ben Xu Date: Mon, 30 Dec 2024 15:11:39 -0500 Subject: [PATCH] add local setup docs --- docs/server/configure.mdx | 23 ++++++++++++++++++----- docs/server/livekit.mdx | 8 +++----- 2 files changed, 21 insertions(+), 10 deletions(-) diff --git a/docs/server/configure.mdx b/docs/server/configure.mdx index 6319374..d681b38 100644 --- a/docs/server/configure.mdx +++ b/docs/server/configure.mdx @@ -27,7 +27,7 @@ poetry run 01 --profile `fast.py` uses Cartesia for TTS and Cerebras Llama3.1-8b, which are the fastest providers. -`local.py` uses coqui TTS and runs the --local explorer from Open Interpreter. +`local.py` requires additional setup to be used with LiveKit. Uses faster-whisper for STT, ollama/codestral for LLM (default), and piper for TTS (default). ### Custom Profiles @@ -123,11 +123,24 @@ interpreter.local_setup() interpreter.tts = "elevenlabs" ``` -### Local TTS +### Local TTS and STT with LiveKit -For local TTS, Coqui is used. +We recommend having Docker installed for the easiest setup. Local TTS and STT relies on the [openedai-speech](https://github.com/matatonic/openedai-speech?tab=readme-ov-file) and [faster-whisper-server](https://github.com/fedirz/faster-whisper-server) repositories respectively. +#### Local TTS +1. Clone the [openedai-speech](https://github.com/matatonic/openedai-speech?tab=readme-ov-file) repository +2. Set `base_url = os.environ.get("OPENAI_BASE_URL", "http://localhost:9000/v1")` to point to localhost at port 9000 in `say.py` +3. Follow the Docker Image instructions for your system. Default run `docker compose -f docker-compose.min.yml up` in the root. +4. Set your profile with local TTS service ```python -# Set your profile with a local TTS service -interpreter.tts = "coqui" +interpreter.stt = "local" +``` + +#### Local STT +1. Clone the [faster-whisper-server](https://github.com/fedirz/faster-whisper-server) repository +2. Follow the Docker Compose Quick Start instructions for your respective system. +3. Run `docker run --publish 8001:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface --env WHISPER__MODEL=Systran/faster-whisper-small --detach fedirz/faster-whisper-server:latest-cpu` to publish to port 8001 instead of the default 8000 (since our TTS uses this port). +4. Set your profile with local STT service +```python +interpreter.stt = "local" ``` \ No newline at end of file diff --git a/docs/server/livekit.mdx b/docs/server/livekit.mdx index 2c9d423..7849e36 100644 --- a/docs/server/livekit.mdx +++ b/docs/server/livekit.mdx @@ -100,11 +100,9 @@ poetry run 01 --server livekit --expose ``` - Currently, our Livekit server only works with Deepgram and Eleven Labs. We are - working to introduce all-local functionality as soon as possible. By setting - your profile (see [Configure Your Profile](/software/configure)), you can - still change your LLM to be a local LLM, but the `interpreter.tts` value will - be ignored for the Livekit server. + Livekit server now supports Local STT and TTS for fully local pipeline. + Setup instructions are provided in the [configuring your 01](/server/configure#local-tts-and-stt-with-livekit) section. + ## Livekit vs. Light Server