Anything-to-MD Skill

Convert any file to clean, LLM-ready Markdown. Supports 50+ file formats including documents, images (OCR), audio/video, and YouTube URLs.

Quick Start

# Convert single file
anything-to-md file document.pdf -o ./output

# Convert entire directory
anything-to-md dir ./my-docs ./my-mds --report

# Extract YouTube transcript
anything-to-md youtube "https://youtube.com/watch?v=xxx"

Capabilities

File Types Supported

Category	Formats
Documents	PDF, DOCX, XLSX, PPTX, ODT, ODS, ODP
Web	HTML, HTM, XHTML
eBooks	EPUB, MOBI
Data	CSV, TSV, JSON, XML
Images	PNG, JPG, GIF, BMP, TIFF, WEBP (with OCR)
Audio	MP3, WAV, M4A, FLAC, OGG, AAC
Video	MP4, MKV, AVI, MOV, WEBM
URLs	YouTube, Wikipedia, RSS feeds

Video Intelligent Routing (NEW)

Videos are processed through a smart 4-phase pipeline:

PROBE → DECIDE → EXTRACT → FUSE

Phase 1: PROBE (< 5 seconds)

ffprobe: Detect embedded subtitle tracks, audio streams
Sidecar detection: Check for .srt/.vtt files
Sample frame OCR: Extract 5 frames, run quick OCR to detect on-screen text

Phase 2: DECIDE - Choose optimal strategy

Video Type	Strategy	Description
Has embedded subtitles	`embedded_subtitle`	Extract with ffmpeg, parse SRT
Has sidecar .srt/.vtt	`sidecar_subtitle`	Parse external subtitle file
Audio + on-screen text	`hybrid`	faster-whisper + frame OCR
Pure audio (podcast)	`audio_transcribe`	faster-whisper transcription
PPT recording / tutorial	`visual_ocr`	Scene detection + keyframe OCR
Unknown / mixed	`full_pipeline`	Run all extraction methods

Phase 3: EXTRACT

Subtitles: ffmpeg extraction → SRT parsing
Audio: faster-whisper (4x faster than OpenAI Whisper)
Frames: PySceneDetect for scene changes + perceptual hash deduplication
OCR: RapidOCR (PaddleOCR models + ONNX runtime)

Phase 4: FUSE

Timeline alignment across sources
Content deduplication (subtitle vs audio vs OCR)
Priority: subtitle > audio > OCR for overlapping content

Key Insight: Don't OCR every frame. Scene detection reduces 1-hour video from ~108,000 frames to 50-200 keyframes.

MCP Tools Available

convert_file_to_markdown - Convert a single file
convert_directory_to_markdown - Batch convert entire directories
convert_youtube_to_markdown - Extract YouTube transcripts
get_supported_formats - List all supported formats

Usage Examples

Example 1: Convert Single File

# Using MCP tool
result = await convert_file_to_markdown(
    file_path="/path/to/document.pdf",
    output_dir="/path/to/output",
    output_name="document.md"
)

Example 2: Batch Convert Directory

# Using MCP tool
result = await convert_directory_to_markdown(
    source_dir="/path/to/documents",
    target_dir="/path/to/markdown-output",
    preserve_structure=True,
    skip_patterns=["*.tmp", "draft_*"]
)

Example 3: YouTube Transcript

# Using MCP tool
result = await convert_youtube_to_markdown(
    url="https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    output_path="./transcript.md"
)

Example 4: Video with On-Screen Text

# Video intelligent routing is automatic
result = await convert_file_to_markdown(
    file_path="/path/to/tutorial.mp4",
    output_dir="/path/to/output"
)
# Output includes:
# - Audio transcription (if speech detected)
# - On-screen text via OCR (if detected)
# - Extracted keyframes saved to <video>_frames/

Output Format

The converter produces clean, structured Markdown:

# Document Title

## Section 1

Content with preserved structure...

### Subsection

- Lists are preserved
- Tables converted to Markdown tables
- Headers maintain hierarchy

| Column A | Column B |
|----------|----------|
| Data     | Data     |

Video Output Format

# Video Transcript: tutorial

## Metadata

- **Source**: `tutorial.mp4`
- **Duration**: 00:15:32
- **Strategy**: hybrid
- **Audio transcribed**: true
- **Subtitles extracted**: false
- **Frames analyzed**: 47
- **Frames with text**: 32

## Transcript

### Audio Transcription

- [00:00:01] Welcome to this tutorial on...
- [00:00:15] Let's start by opening the editor...

### On-Screen Text (OCR)

- [00:01:23] npm install anything-to-md
- [00:02:45] const converter = new AnythingToMD()

## Extracted Frames

Frames saved to: `tutorial_frames/`

Configuration

Environment Variables

ANYTHING_TO_MD_ENABLE_PLUGINS=true - Enable MarkItDown plugins

Skip Patterns

Default patterns to skip:

.git/, __pycache__/, node_modules/
.venv/, venv/
*.pyc, *.pyo
.DS_Store, Thumbs.db

Add custom patterns via CLI:

anything-to-md dir ./src ./output --skip "*.test.*" --skip "temp_*"

Dependencies

Required

markitdown - Microsoft's document converter
pypdf - PDF fallback

PDF Enhancement

mineru (CLI) - High-quality OCR for PDFs

Video Processing

ffmpeg / ffprobe - Video analysis and extraction
rapidocr-onnxruntime - OCR engine
faster-whisper - Audio transcription
scenedetect + opencv-python - Scene detection
imagehash + Pillow - Frame deduplication

Integration

With Claude Code

The MCP server can be configured in Claude Code settings:

{
  "mcpServers": {
    "anything-to-md": {
      "command": "uvx",
      "args": ["anything-to-md-mcp"]
    }
  }
}

With AionUI

Add to AionUI MCP configuration:

mcp_servers:
  - name: anything-to-md
    command: python -m anything_to_md.mcp_server
    enabled: true

Troubleshooting

File not converting?

Check if format is supported: anything-to-md formats
Ensure file isn't corrupted
Check error message in conversion report

YouTube extraction failing?

Ensure yt-dlp is installed: pip install yt-dlp
Video may not have subtitles/transcript available
Check if video is age-restricted

Video OCR missing text?

Ensure rapidocr-onnxruntime is installed: pip install rapidocr-onnxruntime
For hard-coded subtitles, text may be too small
Try increasing scene detection sensitivity in code

Large directory taking too long?

Use --skip patterns to exclude unnecessary files
Consider processing in smaller batches

anything-to-md

How to add

Drop this on your repo README

Related skills

pdf

pptx

docx

canvas-design

Get new Documentos skills every Monday