claude-speech

This skill bootstraps a self-contained language-learning project inside the user's current Claude Code workspace. It works with two languages: a target language (the one being learned — spoken aloud, with IPA pronunciation help) and a common language (the learner's native tongue — used for notes, corrections, and free chat, never spoken). It installs:

a teacher persona (CLAUDE.md) that speaks the target language and writes all notes in the common language,
a Stop hook + scripts/speak_lang.py that uses edge-tts to speak only the target-language portion of Claude's replies aloud,
a UserPromptSubmit hook + scripts/push_to_talk.py + scripts/inject_transcript.py for two-key push-to-talk voice input — hold F9 to speak the target language or F10 to speak the common language. The held key forces the transcription language (no auto-detection, so mixed-language speech isn't misread), transcribes via local Whisper, adds an IPA line (espeak-ng) only for target-language speech, and pastes the result into the chat as your message.

When to use

Trigger when the user says any of:

"let's practice {language}"
"teach me {language}"
"set up a {language} tutor"
"I want Claude to speak {language} responses"
"scaffold claude-speech for {language}"

How to invoke

Handle control arguments first. Before any of the steps below, inspect the argument the user passed:
- If the argument is off, stop, or kill (case-insensitive): skip all install and scaffold steps. Take these actions in order:
  1. Find and terminate every running push_to_talk.py daemon. On Windows the reliable command is (note the single-quoted -Command argument so this also works when invoked from bash/zsh — double quotes would let the shell interpolate $_ and $(...) before PowerShell sees them):
```
powershell -NoProfile -Command 'Get-CimInstance Win32_Process | Where-Object { $_.Name -in @("py.exe","python.exe","pythonw.exe") -and $_.CommandLine -like "*push_to_talk.py*" } | ForEach-Object { Write-Host "killed PID $($_.ProcessId)"; Stop-Process -Id $_.ProcessId -Force }'
```
    Crucially: the filter MUST require the process Name to be one of py.exe / python.exe / pythonw.exe. Without that restriction, the filter also matches shell wrappers that happen to have push_to_talk.py literally in their command line (e.g. the very PowerShell invocation you're running) and will pollute the result list.
  2. Delete <project_root>/recordings/latest_transcript.txt if it exists, so the UserPromptSubmit hook doesn't keep re-injecting the last transcript on subsequent manual Enters now that the daemon isn't writing fresh ones.
  3. Report the PIDs killed (or "no daemons were running") and confirm the stale transcript was cleared.
- Any other argument is treated as a language name or ISO code — proceed with steps 1+ below.
Resolve the two languages. The skill takes two positional arguments: /claude-speech <target> <common>.
- Arg 1 = target language (the one being learned, spoken aloud + IPA).
- Arg 2 = common language (the learner's native language, used for notes/corrections, never spoken).
- Accept names ("Dutch", "Russian") or ISO 639-1 codes ("nl", "ru"). Both must exist in voices.json next to this skill — open it if the user asks what's available.
- If the target (arg 1) is missing, ask which language to teach.
- If the common language (arg 2) is missing, ask for it before proceeding — do not assume English.
Ask if they want a non-default voice. Each language has a recommended edge-tts voice; if the user wants something different (different gender, accent, or specific neural voice), they can pass it as --voice <voice-id>.
Select audio devices — REQUIRED, before anything is installed or launched. Device selection is mandatory: do not run the installer, do not write files, and do not spawn any background process until the user has explicitly chosen both an input and an output device. Do this as two separate, ordered choices:
1. List the devices. Run py templates/scripts/push_to_talk.py --list-devices (works from the skill directory; it needs only the voice-in Python deps). It prints input devices first, then output devices, each with an index, name, and host API.
2. Microphone (input) — required. Show the user the input-device list and ask which microphone to use. Wait for an explicit answer. If the user declines or gives no usable choice, stop here — report that a microphone is required and that nothing was installed or started. (Exception: if the user explicitly asked for a TTS-only setup with --no-voice-in, there is no push-to-talk, so skip the microphone step.)
3. Speaker (output) — required. Then show the output-device list and ask which speaker/headphone to use for spoken replies. Wait for an explicit answer. If the user declines or gives no usable choice, stop here — report that an output device is required and that nothing was installed or started.
- Prefer a name substring (e.g. "USB PnP", "OnePlus") over a raw index when recording the choice — indices are reassigned across reboots/replugs, names are stable. Pick a substring that is specific enough to identify the device the user named.
- Carry the input choice into the daemon spawn (--input-device) in step 7 and the output choice into the installer (--output-device) in step 5.
- To turn everything off later, the user runs /claude-speech off (or stop / kill) — see step 0.

3b. Select CPU or GPU for voice-in — before install. After devices and before running the installer, detect the GPU and let the user choose:

Run py provision_whisper.py --project-dir <dir> --gpu auto --detect-only. (<dir> is resolved the same way as the project-dir step: $CLAUDE_PROJECT_DIR env var if set, otherwise CWD — so it is available here before the formal project-dir step.) It prints the detected GPU, the recommended backend (NVIDIA→CUDA, AMD/Intel→Vulkan, none→CPU), and a plan with sizes, rough time, and what is already installed.
Show that plan and ask CPU or GPU?
- CPU → pass --gpu cpu to the installer in step 5.
- GPU → show the full plan, get explicit consent, then pass --gpu auto (or cuda/vulkan). Without consent, do not provision.

The Vulkan path (AMD/Intel) installs VS Build Tools + Vulkan SDK via winget and compiles from source — that is why explicit consent is required. NVIDIA/CPU are plain downloads. Already-installed dependencies are skipped. Any failure stops and rolls back in-project artifacts (system SDKs are kept).
Skip this step entirely with --no-voice-in (TTS-only).

Resolve the project directory. Use $CLAUDE_PROJECT_DIR (the current Claude Code project root). If that env var is missing, fall back to the current working directory. Confirm the directory with the user before writing files.
Run the installer (from this skill's directory), passing the output device chosen in step 3:
```
py install.py --target <target> --common <common> --output-device "<name|index>" --gpu <auto|cpu|cuda|vulkan> [--voice <voice-id>] [--project-dir <dir>] [--force] [--no-voice-in]
```
Note: --target is the target language (same name the daemon uses) and --common is the communication language; the scaffold destination is --project-dir, not --target. (--lang is still accepted as a hidden alias for --target.) --output-device is the speaker chosen in step 3 (required by this skill's flow) and is baked into the Stop hook in .claude/settings.json. This writes CLAUDE.md, .claude/settings.json, scripts/speak_lang.py, scripts/push_to_talk.py, and scripts/inject_transcript.py into the project dir. It also pip-installs edge-tts, the voice-in deps (`numpy sounddevice scipy pynput pywinauto pyper

claude-speech

claude-speech

When to use

How to invoke

Como adicionar

Comentários · Nenhum comentário