Tree Formatting & Visualization
Conventions for rendering phylogenetic trees using ggtree (R/Bioconductor) or iTOL (Interactive Tree of Life, web-based).
Step 0: Choose Rendering Backend
Ask the user which backend to use based on their needs:
| Backend | Best for | Output | Language |
|---|---|---|---|
| ggtree | Publication figures, full programmatic control, offline use | PDF/PNG/SVG | R (.qmd script) |
| iTOL | Interactive exploration, quick iteration, web sharing, UI tweaking | Web + PDF/SVG/PNG exports | R (.qmd annotations) + Python (.qmd upload) |
Backend comparison
| Feature | ggtree | iTOL |
|---|---|---|
| Interactive exploration | No | Yes (web UI) |
| Label alignment control | Full (programmatic) | Limited (UI toggle only, not via API) |
| Collapse triangle labels | Manual geom_text() | Built-in LABELS for internal nodes |
| Circular label positioning | Complex (manual angle computation) | Automatic |
| Branch length display | Yes (phylogram/cladogram toggle) | Yes (via UI) |
| Offline/reproducible | Fully offline | Requires iTOL API + internet |
| Two-script workflow | No (single .qmd) | Yes (R .qmd annotations + Python .qmd upload) |
Step 1: Choose the Tree Type
Help the user select the right visualization. Ask about purpose and tree size, then recommend from the options below.
Tree type options
| Type | Best for | Tips | Key features |
|---|---|---|---|
| Collapsed rectangular phylogram | Large family trees; showing branch-length variation and gene family structure | 250-2000+ | Collapsed pure clades, branch lengths, selective labels |
| Collapsed rectangular cladogram | Large family trees; topology focus, cleaner labels | 250-2000+ | Same as phylogram but no branch lengths, narrower page |
| Collapsed circular | Large trees; compact overview showing overall structure | 250-2000+ | Circular layout, collapsed clades, optional selective labels |
| Simple rectangular phylogram | Small-medium trees where all tips are readable | < 250 | All tips labeled, no collapsing needed |
| Unrooted | Networks, showing relationships without root assumption | Any | No directionality implied |
Decision flow
- How many tips?
- < 250: Simple rectangular (all tips labeled)
- 250+: Collapsed rectangular or circular — ask user preference
- Branch lengths meaningful?
- Yes -> phylogram option available
- No / topology-only -> cladogram
- Layout: Rectangular or circular? Often useful to produce both.
- Both phylogram and cladogram? Often useful to produce both for rectangular trees.
- Which species to highlight? -> Focal species list (see Step 2)
Step 2: User Prompts (Ask Before Building)
Gather these decisions before writing any code:
- Rendering backend: ggtree or iTOL? (see Step 0)
- Tree type: Offer the relevant options from the table above based on tip count
- Collapsing strategy: "Should pure clades be collapsed?
(Recommended for trees with >100 tips.)"
- Which groups to collapse? The
collapse_groupsparameter controls which taxonomic groups are eligible. Common choices:c("Bilateria")— only collapse bilaterians (keeps sponges/cnidarians expanded)c("Bilateria", "Protostomia", "Deuterostomia")— collapse specific groupsNULL— all groups eligible for collapsing
- Purity threshold: 100% pure (strict) or 90%+ (relaxed)?
- Model species on triangles: Collapsed triangles automatically list gene
names of model species (human, mouse, fly, worm) inside them, e.g.,
"Bilateria (36 tips: LAMA1, LAMA2, LAMB1)". This ensures key gene family members remain visible even when the clade is collapsed. - Never collapse by gene family — unless eggNOG orthogroup data is available
- Which groups to collapse? The
- Labeling level: "What level of tip labeling do you want?"
- No labels — branch colors only (good for overview figures)
- All tips labeled — every visible tip gets a label (good for small trees)
- Selective — model species + focal species only (recommended for large trees)
- Focal species list (if selective labeling): "Which non-model species should be individually labeled? Typically sponges + species with single-cell data (e.g., Hydra, Nematostella). Provide full species names."
- Rooting strategy: "Midpoint root, or specify an outgroup?"
- Gene name resolution: "Do tips include model species from non-Swiss-Prot sources (e.g., tr| entries, Ensembl, FlyBase, WormBase)? If so, we need to look up gene symbols." -> See gene-lookup skill for database-specific workflows.
- iTOL project (if iTOL backend): "Which iTOL project should the tree go in?
Name an existing project, or create a new one in the iTOL web UI
(My Trees > New Project) and tell me the name." Set as
ITOL_PROJECTenv var or hardcode in the upload script.
Step 3a: Build with ggtree
All ggtree templates are Quarto .qmd documents following the project's data
science conventions (YAML frontmatter with status field, git hash, BUILD_INFO.txt).
Collapsed rectangular (phylogram / cladogram)
Reference template: ~/.claude/skills/tree-formatting/templates/ggtree/collapsed_rectangular.qmd
This template is a complete, runnable .qmd with all tuned style parameters. Copy it
into the project's scripts/ directory and adapt the sections marked PROJECT-SPECIFIC:
- File paths
- Tip label parsing functions (must match actual label formats in the tree)
- Taxonomy mapping (species -> group)
- Model and focal species lists
collapse_groupsparameter (which taxonomic groups to collapse)
The template handles: tree loading, midpoint rooting, pure-clade collapsing by taxonomic group, branch coloring by taxonomy, all visible tips labeled, model species gene names on collapsed triangle labels, formula-based page sizing, and PDF output.
Key features:
- No branch capping — branch lengths are never manipulated (this is a hard rule)
- Formula-based page sizing —
INCHES_PER_TIP = 0.12, height =max(8, n_visible * INCHES_PER_TIP) collapse_groupsparameter — controls which taxonomic groups are eligible for collapsing (e.g.,c("Bilateria")to only collapse bilaterians, orNULLfor all)- Model species gene names on triangles — collapsed labels show
"Group (N tips: GENE1, GENE2, ...)"so key gene family members remain visible - Collapse label positioning — labels at
max(pre_data$x[tip_ids])(triangle tip), not at internal node x (triangle base)
Collapsed circular (overview and/or labeled)
Reference template: ~/.claude/skills/tree-formatting/templates/ggtree/collapsed_circular.qmd
Same structure as rectangular — adapt PROJECT-SPECIFIC sections. Produces:
- Circular overview (no labels): 20" square page, branch colors only
- Circular labeled (selective labels): 28" square page, manually positioned labels
Critical circular gotcha: Labels must be positioned BEFORE collapse() is called.
The template handles this by computing angles from y-position (y / max_y * 360),
flipping text on the left half of the circle, and using geom_text() with explicit
angle/hjust values instead of geom_tiplab2().
Other tree types
For simple rectangular or unrooted trees, no template exists yet. Build from ggtree basics:
# Simple rectangular (all tips labeled)
p <- ggtree(tree) + geom_tiplab(size = 2)
# Unrooted
p <- ggtree(tree, layout = "unrooted")
All style parameters are defined as named constants at the top of each template
(e.g., BRANCH_LINE_WIDTH, LABEL_SIZE, INCHES_PER_TIP). Do not scatter
magic numbers through the code.
Step 3b: Build with iTOL
Two-script workflow
iTOL requires separate R and Python steps (do not mix in one .qmd):
- R script — generates annotation fi