Integrating openai-edge-tts ๐ฃ๏ธ with HridaAI
What is openai-edge-tts?โ
OpenAI Edge TTS is a text-to-speech API that mimics the OpenAI API endpoint, allowing for a direct substitute in scenarios where you can define the endpoint URL, like with HridaAI.
It uses the edge-tts package, which leverages the Edge browser's free "Read Aloud" feature to emulate a request to Microsoft / Azure in order to receive very high quality text-to-speech for free.
How is it different from 'openedai-speech'?
Similar to openedai-speech, openai-edge-tts is a text-to-speech API endpoint that mimics the OpenAI API endpoint, allowing for a direct substitute in scenarios where the OpenAI Speech endpoint is callable and the server endpoint URL can be configured.
openedai-speech is a more comprehensive option that allows for entirely offline generation of speech with many modalities to choose from.
openai-edge-tts is a simpler option that uses a Python package called edge-tts to generate the audio.
Requirementsโ
- Docker installed on your system
- HridaAI running
โก๏ธ Quick startโ
The simplest way to get started without having to configure anything is to run the command below
docker run -d -p 5050:5050 travisvn/openai-edge-tts:latestThis will run the service at port 5050 with all the default configs
Setting up HridaAI to use openai-edge-ttsโ
- Open the Admin Panel and go to
Settings->Audio - Set your TTS Settings to match the screenshot below
- Note: you can specify the TTS Voice here
The default API key is the string your_api_key_here. You do not have to change that value if you do not need the added security.
And that's it! You can end here
Running with Python
๐ Running with Pythonโ
If you prefer to run this project directly with Python, follow these steps to set up a virtual environment, install dependencies, and start the server.
1. Clone the Repositoryโ
git clone
cd openai-edge-tts2. Set Up a Virtual Environmentโ
Create and activate a virtual environment to isolate dependencies:
# For macOS/Linux
python3 -m venv venv
source venv/bin/activate
# For Windows
python -m venv venv
venv\Scripts\activate3. Install Dependenciesโ
Use pip to install the required packages listed in requirements.txt:
pip install -r requirements.txt4. Configure Environment Variablesโ
Create a .env file in the root directory and set the following variables:
API_KEY=your_api_key_here
PORT=5050
DEFAULT_VOICE=en-US-AvaNeural
DEFAULT_RESPONSE_FORMAT=mp3
DEFAULT_SPEED=1.0
DEFAULT_LANGUAGE=en-US
REQUIRE_API_KEY=True
REMOVE_FILTER=False
EXPAND_API=True5. Run the Serverโ
Once configured, start the server with:
python app/server.pyThe server will start running at http://localhost:5050.
6. Test the APIโ
You can now interact with the API at http://localhost:5050/v1/audio/speech and other available endpoints. See the Usage section for request examples.
Usage details
Endpoint: /v1/audio/speech (aliased with /audio/speech)โ
Generates audio from the input text. Available parameters:
Required Parameter:
- input (string): The text to be converted to audio (up to 4096 characters).
Optional Parameters:
- model (string): Set to "tts-1" or "tts-1-hd" (default:
"tts-1"). - voice (string): One of the OpenAI-compatible voices (alloy, echo, fable, onyx, nova, shimmer) or any valid
edge-ttsvoice (default:"en-US-AvaNeural"). - response_format (string): Audio format. Options:
mp3,opus,aac,flac,wav,pcm(default:mp3). - speed (number): Playback speed (0.25 to 4.0). Default is
1.0.
You can browse available voices and listen to sample previews at tts.travisvn.com
Example request with curl and saving the output to an mp3 file:
curl -X POST http://localhost:5050/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{
"input": "Hello, I am your AI assistant! Just let me know how I can help bring your ideas to life.",
"voice": "echo",
"response_format": "mp3",
"speed": 1.0
}' \
--output speech.mp3Or, to be in line with the OpenAI API endpoint parameters:
curl -X POST http://localhost:5050/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{
"model": "tts-1",
"input": "Hello, I am your AI assistant! Just let me know how I can help bring your ideas to life.",
"voice": "alloy"
}' \
--output speech.mp3And an example of a language other than English:
curl -X POST http://localhost:5050/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{
"model": "tts-1",
"input": "ใใใใ่กใใ้ป่ปใฎๆ้ใ่ชฟในใฆใใใใ",
"voice": "ja-JP-KeitaNeural"
}' \
--output speech.mp3Additional Endpointsโ
- POST/GET /v1/models: Lists available TTS models.
- POST/GET /v1/voices: Lists
edge-ttsvoices for a given language / locale. - POST/GET /v1/voices/all: Lists all
edge-ttsvoices, with language support information.
The /v1 is now optional.
Additionally, there are endpoints for Azure AI Speech and ElevenLabs for potential future support if custom API endpoints are allowed for these options in HridaAI.
These can be disabled by setting the environment variable EXPAND_API=False.
๐ณ Quick Config for Dockerโ
You can configure the environment variables in the command used to run the project
docker run -d -p 5050:5050 \
-e API_KEY=your_api_key_here \
-e PORT=5050 \
-e DEFAULT_VOICE=en-US-AvaNeural \
-e DEFAULT_RESPONSE_FORMAT=mp3 \
-e DEFAULT_SPEED=1.0 \
-e DEFAULT_LANGUAGE=en-US \
-e REQUIRE_API_KEY=True \
-e REMOVE_FILTER=False \
-e EXPAND_API=True \
travisvn/openai-edge-tts:latestThe markdown text is now put through a filter for enhanced readability and support.
You can disable this by setting the environment variable REMOVE_FILTER=True.
Additional Resourcesโ
๐๏ธ Voice Samplesโ
Play voice samples and see all available Edge TTS voices
Troubleshootingโ
Connection Issuesโ
"localhost" Not Working from Dockerโ
If HridaAI runs in Docker and can't reach the TTS service at localhost:5050:
Solutions:
- Use
host.docker.internal:5050instead oflocalhost:5050(Docker Desktop on Windows/Mac) - On Linux, use the host's IP address, or add
--network hostto your Docker run command - If both services are in Docker Compose, use the container name:
http://openai-edge-tts:5050/v1
Example Docker Compose for both services on the same network:
services:
hrida-ai:
image: ghcr.io/hrida-ai/hrida-ai-studio:main
environment:
- AUDIO_TTS_ENGINE=openai
- AUDIO_TTS_OPENAI_API_BASE_URL=http://openai-edge-tts:5050/v1
- AUDIO_TTS_OPENAI_API_KEY=your_api_key_here
networks:
- hridaai-network
openai-edge-tts:
image: travisvn/openai-edge-tts:latest
ports:
- "5050:5050"
environment:
- API_KEY=your_api_key_here
networks:
- hridaai-network
networks:
hridaai-network:
driver: bridgeTesting the TTS Serviceโ
Verify the TTS service is working independently:
curl -X POST http://localhost:5050/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{"input": "Test message", "voice": "alloy"}' \
--output test.mp3If this works but HridaAI still can't connect, the issue is network-related between containers.
No Audio Output in HridaAIโ
- Check that the API Base URL ends with
/v1 - Verify the API key matches between both services (or remove the requirement)
- Check HridaAI container logs:
docker logs hrida-ai - Check openai-edge-tts logs:
docker logs openai-edge-tts(or your container name)
For more troubleshooting tips, see the Audio Troubleshooting Guide.