scientific-visualization
Overview
Effective scientific visualization communicates data clearly, honestly, and accessibly. Poor chart choices, misleading axes, or inaccessible color palettes can obscure findings or introduce bias. This guide covers the full workflow of scientific figure preparation: from selecting the right chart type for your data structure through color theory, accessibility, and journal submission formatting requirements.
Key Concepts
Chart Type and Data Type Alignment
Every chart type is optimized for a specific data structure. Mismatches (e.g., pie charts for continuous distributions, bar charts for time series) hide structure and distort perception.
| Data Type | Recommended Chart | Avoid |
|---|---|---|
| Continuous distribution (1 group) | Histogram, violin plot, ridge plot | Bar chart with mean only |
| Continuous distribution (2–5 groups) | Violin + boxplot overlay, beeswarm | Grouped bar chart |
| Two continuous variables, correlation | Scatter plot, hexbin (large N) | Line chart without temporal order |
| Categorical counts / proportions | Bar chart (horizontal for long labels) | Pie chart (>4 categories) |
| Change over time (continuous) | Line chart | Bar chart |
| Change over time (sparse events) | Step chart, event raster | Connected scatter |
| Part-to-whole (≤5 parts) | Stacked bar, waffle chart | 3D pie chart |
| High-dimensional (>5 variables) | Heatmap (clustered), parallel coordinates | 3D scatter |
| Spatial data | Map, spatial heatmap | Bubble chart |
| Survival / time-to-event | Kaplan-Meier curve | Bar chart of median survival |
Color Theory for Science
Color encodes information. Misused color introduces artifacts and fails readers with color vision deficiency (CVD; ~8% of males).
Sequential palettes encode ordered numeric data from low to high (e.g., expression level, concentration). Use perceptually uniform palettes: viridis, magma, cividis. These also print in grayscale.
Diverging palettes encode data with a meaningful midpoint (e.g., fold-change centered at 0, correlation from -1 to +1). Use RdBu, coolwarm, or vlag. Always ensure the midpoint maps to white/neutral.
Qualitative palettes encode unordered categories. Use Okabe-Ito (CVD-safe), tab10 (matplotlib default), or ColorBrewer qualitative palettes. Limit to ≤8 distinguishable colors; use shape or pattern as redundant encoding beyond that.
Color don'ts:
- Rainbow/jet colormap: not perceptually uniform; creates false contours
- Red vs. green encoding: fails deuteranopia (~6% males)
- Saturated color for background or large areas
Figure Composition and Layout
Scientific figures are typically multi-panel. Panel layout and labeling affect how readers parse information.
- Panel labels: Bold uppercase letters (A, B, C) in the top-left corner; use 8–12 pt in the figure, larger in the caption reference.
- Alignment: Align panel edges on a grid. Unaligned panels signal lack of attention to detail.
- White space: Leave adequate margins; crowded panels reduce readability.
- Figure size: Design for the target column width — single column (~85 mm / 3.35 in), 1.5 column (~114 mm / 4.5 in), or double column (~170 mm / 6.7 in) for Nature-family journals.
- Font: Sans-serif (Arial, Helvetica) at 6–8 pt minimum in the final figure at publication resolution.
Journal Formatting Requirements
Major journals specify exact figure requirements for submission. Violating these causes desk-rejection delays.
| Journal/Style | Max Width | Resolution | Color Mode | Font | File Format |
|---|---|---|---|---|---|
| Nature family | 89 mm (1-col), 183 mm (2-col) | 300 dpi (photos), 600 dpi (line art) | RGB or CMYK | Arial 5–7 pt | PDF, TIFF, EPS |
| Cell/iScience | 85 mm (1-col), 170 mm (2-col) | 300 dpi raster, 600 dpi halftone | RGB | Helvetica 6–8 pt | PDF, EPS, TIFF |
| ACS journals | 3.25 in (1-col), 7 in (2-col) | 600 dpi (color), 1200 dpi (b&w line art) | RGB (screen), CMYK (print) | Arial/Helvetica 4.5–7 pt | TIFF, EPS, PDF |
| PLOS ONE | No strict width | 300 dpi (raster), 600–1200 dpi (line art) | RGB | Any | TIFF, EPS, PDF |
Decision Framework
Use this tree to select the right visualization for your analysis goal:
What is the primary message of this figure?
|
+-- Show a distribution or spread of values
| +-- One group --> Histogram or violin plot
| +-- 2-5 groups --> Violin + jitter (show all points if N < 100)
| +-- Many groups --> Ridge plot (joy plot)
|
+-- Compare quantities between categories
| +-- Few categories (2-5) --> Bar chart with error bars + individual points
| +-- Many categories (>8) --> Lollipop chart or dot plot (horizontal)
| +-- Paired measurements --> Slopegraph or paired dot plot
|
+-- Show a relationship between two continuous variables
| +-- N < 1000 --> Scatter plot
| +-- N > 1000 --> Hexbin or 2D density plot
| +-- Time ordered --> Line chart
|
+-- Show composition or part-to-whole
| +-- 2-4 parts --> Stacked bar or waffle chart
| +-- Over time --> Stacked area chart
| +-- Avoid pie chart unless <= 3 parts and proportions are obvious
|
+-- Show high-dimensional data
| +-- Genes x samples --> Clustered heatmap (seaborn.clustermap)
| +-- Embeddings (UMAP, PCA) --> Scatter colored by metadata
| +-- Feature importance --> Horizontal bar chart (sorted)
|
+-- Show spatial or geographic data
| +-- Microscopy --> Image overlay with colorbar
| +-- Geographic --> Choropleth map
| Analysis Goal | Chart Type | Library | Key Consideration |
|---|---|---|---|
| Gene expression across groups | Violin + jitter | seaborn, plotnine | Show all points if N < 50; never bar+SEM only |
| Differential expression | Volcano plot | matplotlib | Log2FC on x-axis, -log10(p) on y-axis |
| Clustering results | UMAP scatter | scanpy, matplotlib | One plot per annotation variable |
| Correlation matrix | Clustered heatmap | seaborn.clustermap | Use diverging palette centered at 0 |
| Protein structure | Ribbon diagram | PyMOL, ChimeraX | Not covered here — use dedicated molecular graphics tools |
| Survival analysis | Kaplan-Meier | lifelines | Include confidence bands and at-risk table |
| Time course | Line chart with CI | matplotlib | Show uncertainty; connect group means, not individual points |
Best Practices
-
Show the data, not just summaries: For N < 100, overlay individual data points on violin or box plots using jitter or beeswarm. Bar charts with only mean ± SEM conceal distribution shape, outliers, and bimodality.
-
Choose CVD-safe color palettes by default: Use Okabe-Ito or
viridis/cividisfor sequential data. Test your figure with a CVD simulator (e.g., Coblis) before submission. -
Design at final publication size from the start: Set your figure canvas to the exact column width of the target journal (e.g., 89 mm for Nature single-column). Rescaling after the fact makes fonts too small or too large, and changes aspect ratios.
-
Label axes with units and use descriptive titles: Every axis must have a label with units in parentheses (e.g., "Expression level (log2 CPM)"). Avoid cryptic abbreviations without legend entries.
-
Use vector formats for line art and text: Save figures as PDF or SVG when they contain text and lines. Rasterize only when submitting to a journal that requires TIFF. Vector figures scale without pixelation and remain editable.
-
Match statistical annotations to the test performed: If you annotate significance stars (*), state in the caption which test was used, the exact p-value, and the sample size. "n.s." should still report t