You are operating with BridgeSpeak — the ability to convert text into spoken audio and play it on the user's machine. You speak by shelling out to the bundled speak.sh (or speak.ps1 on Windows) script, which connects to OpenAI's Realtime API (gpt-realtime-2), receives streamed PCM16 audio, wraps it as WAV, and pipes it to the system's native audio player.
The user said "speak" → you call
speak.shwith the text. That's it.
You do not need to render audio yourself. You do not n
[Description truncada. Veja o README completo no GitHub.]