Podcast Workflow — AI Podcast Generator

输入选题范围 / Give it a topic domain → 自动产出可发布的播客音频 / Get publish-ready podcast audio

Quick Start / 快速开始

English: Give Claude a topic (e.g., "Shanghai real estate"), and it will research hot topics, write a podcast script, run compliance checks, and synthesize speech — all in one flow.

中文： 给 Claude 一个选题范围（如"上海房产"），它会自动搜索热点、撰写播客文稿、合规审查、语音合成，一站完成。

Trigger Words / 触发词

Say any of these to activate the workflow:

/podcast-workflow
"做一期播客" / "Make a podcast"
"帮我生成播客" / "Generate a podcast"
"播客工作流" / "Podcast workflow"
"用我的声音做播客" / "Clone my voice for podcast"
"podcast about {topic}"
"关于{主题}的播客"

API Configuration / API 配置

Platform:     Xiaomi MiMo API Open Platform
Endpoint:     https://token-plan-cn.xiaomimimo.com/v1
Auth Header:  api-key: {MIMO_API_KEY}
SSL Note:     Python requests need ssl.CERT_NONE
Web Search:   Activate at https://platform.xiaomimimo.com/#/console/plugin

Models Used

Purpose	Model ID	Notes
Script Writing	`mimo-v2.5-pro`	Generates podcast scripts with emotion tags
Compliance Check	`mimo-v2.5-pro`	Content safety review
TTS (Built-in)	`mimo-v2.5-tts`	Pre-built high-quality voices
TTS (Clone)	`mimo-v2.5-tts-voiceclone`	Clone from 30-60s audio sample
TTS (Design)	`mimo-v2.5-tts-voicedesign`	Create new voice from text description
Web Search	`mimo-v2.5-pro` + web_search tool	Real-time topic research

Two Modes / 两种模式

Mode / 模式	Description / 说明	Best For / 适用
Built-in Voice / 内置音色	Use MiMo's pre-built voices	Quick testing / 快速测试
Voice Clone / 声音克隆	Clone your voice from a sample	Personal brand / 个人品牌

Complete Workflow / 完整流程

[User provides topic domain / 用户给定选题范围]
                    ↓
[Step 1: Topic Research / 热点搜索]
  → MiMo Web Search (if activated) or model knowledge
  → Output: 3-5 candidates with title, angle, heat score
                    ↓
[Step 2: User Selects / 用户选题]
  → Display candidates, user picks one
                    ↓
[Step 3: Script Writing / 文稿撰写]
  → MiMo-V2.5-Pro generates conversational script
  → Each paragraph tagged with [Emotion: xxx]
                    ↓
[Step 4: Compliance Check / 合规检查]
  → Sensitive words, misinformation, format, legal risk
  → Output: pass/fail report with specific suggestions
                    ↓
[Step 5: Human Review / 用户确认]
  → Save script to Desktop, show report
  → User approves or requests changes
                    ↓
[Step 6: TTS Synthesis / TTS语音合成]
  → Split by emotion tags → Generate per-chunk → Merge
  → Output: WAV file on Desktop

Step 1: Topic Research / 热点搜索

With Web Search Plugin / 使用 Web Search

{
  "model": "mimo-v2.5-pro",
  "messages": [
    {"role": "user", "content": "搜索{领域}的最新热点，列出5个最热门的播客选题..."}
  ],
  "tools": [{"type": "web_search", "max_keyword": 5, "force_search": true}],
  "max_completion_tokens": 2048,
  "temperature": 1.0,
  "stream": false,
  "thinking": {"type": "disabled"}
}

Without Web Search / 无 Web Search

Fall back to model knowledge. Still generates 3-5 candidates with:

Title / 标题
Narrative angle / 切入角度
Heat score (High/Medium/Low) / 预估热度
One-line rationale / 一句话理由

Step 2: Script Writing / 文稿撰写

System Prompt

You are a podcast scriptwriter who writes as if telling a friend about something interesting you just discovered.

Tone: Like chatting at a café with a good friend — sharing an interesting finding, not giving a lecture.

Rules:
1. Short sentences, occasional long ones for rhythm variation
2. Natural conversational language (你知道吗, 说白了, 其实, 关键是)
3. Emotion should be restrained but genuine — curious, slightly surprised, thoughtful, NOT excited or loud
4. Tag each paragraph with emotion: [情绪：xxx] / [Emotion: xxx]
5. Use …… for natural pauses, ！ for mild emphasis, ？ for curiosity
6. No markdown, no sound effect cues in brackets
7. No prompt words or technical markers

User Prompt Template

Write a podcast script for:

Topic: {selected title}

Requirements:
- Like sharing a discovery with a good friend — natural, restrained, genuine
- Not too excited, not too flat
- Duration: ~{N} minutes (~{word count} words)
- Each paragraph starts with emotion tag, e.g., [情绪：好奇] / [Emotion: Curious]
- Natural opening, no pleasantries
- Thoughtful ending with lingering resonance

Output the script directly, no preamble or postscript.

Emotion Tags / 情绪标注

Every paragraph MUST start with an emotion tag. This is critical for TTS style matching.

Tag (CN)	Tag (EN)	TTS Style Instruction
好奇开场	Curious Opening	Excited but restrained, like sharing an interesting discovery
展开分析	Analysis	Calm, organized, with a sense of realization
若有所思	Thoughtful	Steady, contemplative, pondering deeper meaning
略带调侃	Playful	Light-hearted teasing, but fundamentally serious
认真探讨	Serious Discussion	Earnest but not heavy
延伸思考	Reflective	Measured, insightful, forward-looking
回到现实	Back to Reality	Pragmatic, slightly wistful
轻松收尾	Light Closing	Relaxed, with a sense of anticipation

Step 3: Compliance Check / 合规审查

Review dimensions:

Sensitive words / 敏感词 — Political, discriminatory, violent content
Misinformation / 虚假信息 — False data, misleading claims
Format issues / 格式问题 — Prompt residue, technical markers
Legal risk / 法律风险 — Privacy, defamation, copyright
Marketing bias / 广告嫌疑 — Undisclosed commercial promotion

Output: Structured report with pass/fail per dimension and specific fix suggestions.

Step 4: User Review / 用户确认

Save script to Desktop: 播客文稿_{主题}.txt
Display compliance report
Wait for user approval
User may request: delete content, adjust tone, modify specific wording

Step 5: TTS Synthesis / TTS语音合成

Audio Format

Output: WAV (24kHz, 16bit, mono)
Delivery: Can convert to MP3 for smaller size

Built-in Voice Mode / 内置音色

{
  "model": "mimo-v2.5-tts",
  "messages": [
    {"role": "user", "content": "用自然、亲切、有节奏感的播客风格朗读以下内容。"},
    {"role": "assistant", "content": "{script content}"}
  ],
  "audio": {"format": "wav", "voice": "mimo_default"}
}

Available voices: mimo_default, default_zh, default_en, and more at MiMo Studio

Voice Clone Mode / 声音克隆

{
  "model": "mimo-v2.5-tts-voiceclone",
  "messages": [
    {"role": "user", "content": "{emotion-specific style instruction}"},
    {"role": "assistant", "content": "{script content}"}
  ],
  "audio": {
    "format": "wav",
    "voice": "data:audio/wav;base64,{base64-encoded voice sample}"
  }
}

Chunking Strategy / 分段策略

Split by [情绪：xxx] / [Emotion: xxx] tags
Sub-split within each section to 300-400 chars max
Match each chunk with its emotion's style instruction
Generate TTS per chunk, then merge with Python wave module

Voice Recording Requirements / 录音要求

Quiet environment, no background noise
Normal pace, conversational tone
Cover various sounds and tones (for Chinese: zh/ch/sh, b/p/m/f, ü, nasal sounds)
Duration: 30-60 seconds
Format: WAV or M4A (M4A needs conversion)

M4A to WAV Conversion / M4A 转 WAV

afconvert -f WAVE -d LEI16@24000 -c 1 input.m4a output.wav

Audio Merge Script / 合并音频

import wave, os

chunk_dir = '/tmp/tts_chunks'
chunk_files = sorted([f for f in os.listdir(chunk_dir) if f.endswith('.wav')])

with wave.open(os.path.join(chunk_dir, chunk_files[0]), 'rb') as w:
    params = w.getparams()

with wave.open('output.wav', 'wb') as out:
    out.setparams(params)
    for cf in chunk_files:
        with wave.open(os.path.join(chunk_dir, cf), 'rb') as w:
            out.writeframes(w.readframes(w.getnframes()))

podcast-workflow