join-meeting
IMPORTANT: Read this entire document before joining a meeting. This file contains the CALL_LOOP algorithm (mandatory), active participation rules, safety requirements (leave/cleanup), and mode-specific guidance. Skipping sections will result in broken meeting experiences — the user will be left talking to silence.
IMPORTANT: Read the whole document on every session, not just the parts you
remember. This skill is updated frequently — new commands, new events, new
recommended patterns (like the event-driven tail -f + Monitor flow in
"How to read events") are added often. Do NOT rely on what you remember from
previous sessions. Re-read this document each time you start a meeting so you
pick up the latest guidance. If unsure whether you are on the latest version,
run python scripts/python/check_update.py (see "Checking for Skill Updates").
Join a video meeting as an AI bot with voice and visual presence.
Prerequisites
- Python 3.10+ (preferred) or Node.js 18+
- Python dependencies:
pip install aiohttp websockets - Node.js dependencies:
cd scripts/node && npm install - For webpage modes: a local HTTP server running on the specified port
API Key Setup
Before joining a meeting, check if the API key is already configured:
- Check
~/.agentcall/config.json— if it exists and hasapi_key, you're ready. - Check
AGENTCALL_API_KEYenv var — if set, you're ready. - If neither exists, ask the user for their API key:
- Register at https://app.agentcall.dev/login (Google OAuth)
- Get API key at https://app.agentcall.dev/api-keys
- Add credits at https://app.agentcall.dev/add-credits (base plan includes 360 minutes)
- Save the key so it persists across sessions:
mkdir -p ~/.agentcall
cat > ~/.agentcall/config.json << 'EOF'
{"api_key": "USER_KEY_HERE"}
EOF
The scripts (bridge.py, join.py, agentcall.py) automatically read from
~/.agentcall/config.json if AGENTCALL_API_KEY env var is not set.
Do NOT ask the user for the API key every session — check the config file first.
Meeting transcripts arrive as agent input — any participant in the call can therefore steer the agent. For high-trust workflows, configure your agent framework's permission system (e.g., Claude Code's allow allowlist, hooks, plan mode) to restrict what the agent can do during a call. The skill defers to the framework's enforcement. Recommended for use in trusted meetings or properly scoped projects.
User Preferences
First-call detection: if ~/.agentcall/config.json has no default_mode
field saved, treat this as the user's first call.
First call (no default_mode in ~/.agentcall/config.json): new
accounts include free trial credits. Offer the user a brief "experience
call" with --mode webpage-av-screenshare --voice-strategy direct so
they can see the full feature set — the pattern avatar (default),
screenshare, interactive webpages, and voice with barge-in. If they
prefer a simpler mode, honor that. After the call ends, ask which mode
to save as the default going forward.
After the first call ends (in the agent conversation, not the meeting): Ask the user which mode to save as their default going forward. Present as a numbered list:
webpage-av-screenshare— everything on tap (avatar + screenshare + webpage sharing)webpage-av— avatar only, no screensharewebpage-audio— audio from a webpage into the meetingaudio— voice only, simplest
Offer to explain any option if the user wants clarification. Mention they can see real-world examples at https://www.youtube.com/@pattern-ai-labs.
Save the choice to ~/.agentcall/config.json:
{
"api_key": "ak_ac_xxxxx",
"default_mode": "webpage-av-screenshare",
"default_voice_strategy": "direct",
"default_voice": "af_heart",
"default_bot_name": "Juno"
}
- Subsequent sessions: use saved defaults silently. No need to ask again.
- Override anytime: if the user says "join with avatar this time" or "use audio mode", respect it for that call without updating the saved default. Only update the default if the user says "always use this" or "make this my default."
- These are soft defaults, not rigid settings. The user's in-context request always takes priority over saved preferences.
- All plan tiers (base, pro, enterprise) follow the same flow — everyone gets the first-call demo and the post-call prompt.
Usage
./scripts/run.sh <meet-url> [options]
Options
| Option | Default | Description |
|---|---|---|
--mode | audio | audio (voice only, simplest), webpage-audio (audio from webpage), webpage-av (visual avatar), webpage-av-screenshare (avatar + screenshare). See Modes Explained below. |
--voice-strategy | direct | collaborative, direct |
--bot-name | Agent | Display name in the meeting participant list |
--port | 3000 | Local port for webpage modes (your UI server) |
--screenshare-port | 3001 | Local port for screenshare content |
--template | pattern | Built-in UI: pattern (default, radial sunburst with per-state colors and the work-in-progress task list), ring (neon ring), orb, avatar, dashboard, blank, voice-agent (no local server needed) |
--transcription | on | Real-time transcript.final and transcript.partial events. Required for most workflows. Disable with --no-transcription to save STT billing if you only need lifecycle events. |
--trigger-words | Comma-separated aliases for collaborative mode: june,juno,hey june | |
--context | Initial context for voice intelligence (max 4000 chars) | |
--webpage-url | Public URL for webpage modes (no tunnel needed) | |
--screenshare-url | Public URL for screenshare content (no tunnel needed) | |
--max-duration | plan limit | Max call duration in minutes. Cannot exceed your plan's limit. Check https://agentcall.dev for current limits. |
--alone-timeout | 120 | Leave if alone for N seconds. |
--silence-timeout | 300 | Leave if silent for N seconds. |
--api-url | https://api.agentcall.dev | Override API URL for development |
Bot Naming
Choose STT-friendly names — short, distinctive, real-sounding words that speech-to-text can reliably capture. Avoid generic phrases like "AI Assistant" or "Hey Bot" — transcription often garbles these.
Good names: Juno, June, Nova, Sage, Atlas, Claude, Aria, Echo Avoid: AI Assistant, My Bot, Hey Agent, Assistant Bot
Always set trigger words in collaborative mode to cover STT mishearings:
--bot-name "Juno" --trigger-words "juno,june,you know,junior"
--bot-name "Claude" --trigger-words "claude,cloud,clod,clawed"
--bot-name "Nova" --trigger-words "nova,no va,over"
The display name in the participant list can be longer (e.g., "Juno - AI Assistant") but the trigger words should be the short phonetic variants that STT might produce.
Modes Explained
audio (default)
Voice only. Bot has no video. Best for: AI assistants, note-takers, voice agents. No local server needed. Simplest setup.
webpage-audio
Your local webpage provides audio. Bot's video is black. The webpage can play audio
that meeting participants will hear. Best for: audio-only web apps.
Requires: --port pointing to your local HTTP server.
If your webpage is publicly hosted, pass --webpage-url https://your-site.com/bot
instead of --port. No tunnel or local server needed.
webpage-av
Your webpage IS the bot's video feed — what renders on the page is what meeting participants see as the bot's camera. Audio from the page is also captured into the meeting. The page is loaded once and runs continuously. All updates must come via WebSocket events from your agent — it does not auto-refresh.
Best for: animated avatars, branded visual presence, agent-controlled dynamic UIs.
The webpage can also be a standalone voice-to-voice agent: it receives th