claude-speech
This skill bootstraps a self-contained language-learning project inside the user's current Claude Code workspace. It works with two languages: a target language (the one being learned — spoken aloud, with IPA pronunciation help) and a common language (the learner's native tongue — used for notes, corrections, and free chat, never spoken). It installs:
- a teacher persona (
CLAUDE.md) that speaks the target language and writes all notes in the common language, - a
Stophook +scripts/speak_lang.pythat usesedge-ttsto speak only the target-language portion of Claude's replies aloud, - a
UserPromptSubmithook +scripts/push_to_talk.py+scripts/inject_transcript.pyfor two-key push-to-talk voice input — hold F9 to speak the target language or F10 to speak the common language. The held key forces the transcription language (no auto-detection, so mixed-language speech isn't misread), transcribes via local Whisper, adds an IPA line (espeak-ng) only for target-language speech, and pastes the result into the chat as your message.
When to use
Trigger when the user says any of:
- "let's practice {language}"
- "teach me {language}"
- "set up a {language} tutor"
- "I want Claude to speak {language} responses"
- "scaffold claude-speech for {language}"
How to invoke
-
Handle control arguments first. Before any of the steps below, inspect the argument the user passed:
- If the argument is
off,stop, orkill(case-insensitive): skip all install and scaffold steps. Take these actions in order:- Find and terminate every running
push_to_talk.pydaemon. On Windows the reliable command is (note the single-quoted-Commandargument so this also works when invoked from bash/zsh — double quotes would let the shell interpolate$_and$(...)before PowerShell sees them):
Crucially: the filter MUST require the process Name to be one ofpowershell -NoProfile -Command 'Get-CimInstance Win32_Process | Where-Object { $_.Name -in @("py.exe","python.exe","pythonw.exe") -and $_.CommandLine -like "*push_to_talk.py*" } | ForEach-Object { Write-Host "killed PID $($_.ProcessId)"; Stop-Process -Id $_.ProcessId -Force }'py.exe/python.exe/pythonw.exe. Without that restriction, the filter also matches shell wrappers that happen to havepush_to_talk.pyliterally in their command line (e.g. the very PowerShell invocation you're running) and will pollute the result list. - Delete
<project_root>/recordings/latest_transcript.txtif it exists, so the UserPromptSubmit hook doesn't keep re-injecting the last transcript on subsequent manual Enters now that the daemon isn't writing fresh ones. - Report the PIDs killed (or "no daemons were running") and confirm the stale transcript was cleared.
- Find and terminate every running
- Any other argument is treated as a language name or ISO code — proceed with steps 1+ below.
- If the argument is
-
Resolve the two languages. The skill takes two positional arguments:
/claude-speech <target> <common>.- Arg 1 = target language (the one being learned, spoken aloud + IPA).
- Arg 2 = common language (the learner's native language, used for notes/corrections, never spoken).
- Accept names ("Dutch", "Russian") or ISO 639-1 codes ("nl", "ru"). Both must exist in
voices.jsonnext to this skill — open it if the user asks what's available. - If the target (arg 1) is missing, ask which language to teach.
- If the common language (arg 2) is missing, ask for it before proceeding — do not assume English.
-
Ask if they want a non-default voice. Each language has a recommended
edge-ttsvoice; if the user wants something different (different gender, accent, or specific neural voice), they can pass it as--voice <voice-id>. -
Select audio devices — REQUIRED, before anything is installed or launched. Device selection is mandatory: do not run the installer, do not write files, and do not spawn any background process until the user has explicitly chosen both an input and an output device. Do this as two separate, ordered choices:
- List the devices. Run
py templates/scripts/push_to_talk.py --list-devices(works from the skill directory; it needs only the voice-in Python deps). It prints input devices first, then output devices, each with an index, name, and host API. - Microphone (input) — required. Show the user the input-device list and ask which microphone to use. Wait for an explicit answer. If the user declines or gives no usable choice, stop here — report that a microphone is required and that nothing was installed or started. (Exception: if the user explicitly asked for a TTS-only setup with
--no-voice-in, there is no push-to-talk, so skip the microphone step.) - Speaker (output) — required. Then show the output-device list and ask which speaker/headphone to use for spoken replies. Wait for an explicit answer. If the user declines or gives no usable choice, stop here — report that an output device is required and that nothing was installed or started.
- Prefer a name substring (e.g.
"USB PnP","OnePlus") over a raw index when recording the choice — indices are reassigned across reboots/replugs, names are stable. Pick a substring that is specific enough to identify the device the user named. - Carry the input choice into the daemon spawn (
--input-device) in step 7 and the output choice into the installer (--output-device) in step 5. - To turn everything off later, the user runs
/claude-speech off(orstop/kill) — see step 0.
- List the devices. Run
3b. Select CPU or GPU for voice-in — before install. After devices and before running the installer, detect the GPU and let the user choose:
- Run
py provision_whisper.py --project-dir <dir> --gpu auto --detect-only. (<dir>is resolved the same way as the project-dir step:$CLAUDE_PROJECT_DIRenv var if set, otherwise CWD — so it is available here before the formal project-dir step.) It prints the detected GPU, the recommended backend (NVIDIA→CUDA, AMD/Intel→Vulkan, none→CPU), and a plan with sizes, rough time, and what is already installed. - Show that plan and ask CPU or GPU?
- CPU → pass
--gpu cputo the installer in step 5. - GPU → show the full plan, get explicit consent, then pass
--gpu auto(orcuda/vulkan). Without consent, do not provision.
- CPU → pass
- The Vulkan path (AMD/Intel) installs VS Build Tools + Vulkan SDK via winget and compiles from source — that is why explicit consent is required. NVIDIA/CPU are plain downloads. Already-installed dependencies are skipped. Any failure stops and rolls back in-project artifacts (system SDKs are kept).
- Skip this step entirely with
--no-voice-in(TTS-only).
-
Resolve the project directory. Use
$CLAUDE_PROJECT_DIR(the current Claude Code project root). If that env var is missing, fall back to the current working directory. Confirm the directory with the user before writing files. -
Run the installer (from this skill's directory), passing the output device chosen in step 3:
py install.py --target <target> --common <common> --output-device "<name|index>" --gpu <auto|cpu|cuda|vulkan> [--voice <voice-id>] [--project-dir <dir>] [--force] [--no-voice-in]Note:
--targetis the target language (same name the daemon uses) and--commonis the communication language; the scaffold destination is--project-dir, not--target. (--langis still accepted as a hidden alias for--target.)--output-deviceis the speaker chosen in step 3 (required by this skill's flow) and is baked into the Stop hook in.claude/settings.json. This writesCLAUDE.md,.claude/settings.json,scripts/speak_lang.py,scripts/push_to_talk.py, andscripts/inject_transcript.pyinto the project dir. It also pip-installsedge-tts, the voice-in deps (`numpy sounddevice scipy pynput pywinauto pyper