Seaborn — Statistical Plots
Overview
Seaborn is a Python library for statistical data visualization built on top of matplotlib. It works directly with pandas DataFrames, automatically handles grouping by categorical variables, computes confidence intervals and kernel density estimates, and produces attractive publication-ready figures with minimal configuration. Seaborn separates axes-level functions (embeddable in custom layouts) from figure-level functions (with built-in faceting), enabling both quick exploratory analysis and structured multi-panel figures.
When to Use
- Comparing gene expression, protein abundance, or measurement distributions across experimental conditions (treatment vs. control, cell lines, time points)
- Generating grouped box plots, violin plots, or strip plots to show both summary statistics and individual data points simultaneously
- Visualizing pairwise correlations in multi-gene or multi-feature datasets as annotated heatmaps
- Plotting regression fits with confidence bands between continuous variables (e.g., cell viability vs. drug concentration)
- Faceting a single plot type across multiple sample subsets, tissue types, or experimental batches in one call
- Rapid exploratory analysis of a new dataset using
pairplotto survey all pairwise relationships at once - Use
matplotlibdirectly when you need pixel-level control over figure elements, complex mixed-type layouts, or non-statistical custom plots - Use
plotlywhen the output must be interactive (hover tooltips, zoom, pan) or embedded in a web application
Prerequisites
- Python packages:
seaborn>=0.13,matplotlib,pandas,numpy - Data requirements: Pandas DataFrame in long-form (tidy) format; each observation is a row, each variable is a column
- Environment: Standard Python environment; no GPU or special hardware required
pip install "seaborn>=0.13" matplotlib pandas numpy scipy
Quick Start
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Simulate gene expression across conditions
rng = np.random.default_rng(42)
df = pd.DataFrame({
"gene": ["BRCA1"] * 60 + ["TP53"] * 60,
"condition": ["control", "treated"] * 60,
"log2_expr": np.concatenate([
rng.normal(5.2, 0.8, 60),
rng.normal(6.1, 0.9, 60),
])
})
sns.set_theme(style="ticks", context="notebook")
sns.boxplot(data=df, x="gene", y="log2_expr", hue="condition", palette="Set2")
plt.ylabel("log2 Expression")
plt.title("Gene Expression by Condition")
plt.tight_layout()
plt.savefig("quickstart_boxplot.png", dpi=150)
print("Saved quickstart_boxplot.png")
Core API
1. Distribution Plots
Visualize univariate distributions and compare them across groups. histplot bins data; kdeplot fits a smooth density estimate; displot is the figure-level wrapper that adds faceting.
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
rng = np.random.default_rng(0)
n = 200
df = pd.DataFrame({
"log2_tpm": np.concatenate([rng.normal(4.5, 1.1, n), rng.normal(6.0, 1.3, n)]),
"sample": ["tumor"] * n + ["normal"] * n,
})
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# Histogram with density normalization and stacked hue groups
sns.histplot(data=df, x="log2_tpm", hue="sample", stat="density",
multiple="stack", bins=30, ax=axes[0])
axes[0].set_title("Histogram (stacked)")
# KDE with fill — bandwidth controlled by bw_adjust
sns.kdeplot(data=df, x="log2_tpm", hue="sample", fill=True,
bw_adjust=0.8, alpha=0.4, ax=axes[1])
axes[1].set_title("KDE (filled)")
# ECDF — useful for comparing cumulative distributions
sns.ecdfplot(data=df, x="log2_tpm", hue="sample", ax=axes[2])
axes[2].set_title("ECDF")
plt.tight_layout()
plt.savefig("distributions.png", dpi=150)
print("Saved distributions.png")
# Bivariate KDE: joint distribution of two continuous variables
rng = np.random.default_rng(1)
df2 = pd.DataFrame({
"log2_rna": rng.normal(5.5, 1.2, 300),
"log2_prot": rng.normal(4.8, 1.0, 300) + 0.6 * rng.normal(5.5, 1.2, 300),
})
sns.kdeplot(data=df2, x="log2_rna", y="log2_prot",
fill=True, levels=8, thresh=0.05, cmap="Blues")
plt.xlabel("log2 RNA (TPM)")
plt.ylabel("log2 Protein (iBAQ)")
plt.title("RNA–Protein Correlation Density")
plt.tight_layout()
plt.savefig("bivariate_kde.png", dpi=150)
print("Saved bivariate_kde.png")
2. Categorical Plots
Compare distributions or aggregated statistics across categorical groups. Axes-level functions (boxplot, violinplot, stripplot, swarmplot, barplot) accept an ax= parameter for embedding in custom layouts.
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
rng = np.random.default_rng(2)
conditions = ["DMSO", "Drug A 1uM", "Drug A 10uM", "Drug B 1uM", "Drug B 10uM"]
df = pd.DataFrame({
"condition": np.repeat(conditions, 30),
"viability": np.concatenate([
rng.normal(100, 5, 30),
rng.normal(92, 7, 30),
rng.normal(65, 10, 30),
rng.normal(88, 8, 30),
rng.normal(45, 12, 30),
])
})
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
# Box plot — shows quartiles and outliers
sns.boxplot(data=df, x="condition", y="viability",
palette="husl", width=0.5, ax=axes[0])
axes[0].set_xticklabels(axes[0].get_xticklabels(), rotation=30, ha="right")
axes[0].set_title("Box Plot")
# Violin — KDE shape + inner quartile lines
sns.violinplot(data=df, x="condition", y="viability",
inner="quart", palette="muted", ax=axes[1])
axes[1].set_xticklabels(axes[1].get_xticklabels(), rotation=30, ha="right")
axes[1].set_title("Violin Plot")
# Strip plot overlaid on box — shows all individual points
sns.boxplot(data=df, x="condition", y="viability",
palette="pastel", width=0.5, ax=axes[2])
sns.stripplot(data=df, x="condition", y="viability",
color="black", alpha=0.4, size=3, jitter=True, ax=axes[2])
axes[2].set_xticklabels(axes[2].get_xticklabels(), rotation=30, ha="right")
axes[2].set_title("Box + Strip")
plt.tight_layout()
plt.savefig("categorical.png", dpi=150)
print("Saved categorical.png")
# Bar plot with mean ± 95% CI and individual points (swarm)
fig, ax = plt.subplots(figsize=(8, 5))
sns.barplot(data=df, x="condition", y="viability",
estimator="mean", errorbar="ci", palette="Set3", ax=ax)
sns.swarmplot(data=df, x="condition", y="viability",
color="black", size=3, alpha=0.5, ax=ax)
ax.set_ylabel("Cell Viability (%)")
ax.set_xticklabels(ax.get_xticklabels(), rotation=30, ha="right")
plt.tight_layout()
plt.savefig("barswarm.png", dpi=150)
print("Saved barswarm.png")
3. Relational Plots
Visualize relationships between continuous variables. scatterplot and lineplot are axes-level; relplot is the figure-level wrapper that supports col and row faceting.
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
rng = np.random.default_rng(3)
n = 150
df = pd.DataFrame({
"molecular_weight": rng.uniform(200, 800, n),
"logP": rng.uniform(-2, 6, n),
"pIC50": rng.normal(6.5, 1.2, n),
"target_class": rng.choice(["kinase", "GPCR", "protease"], n),
"pass_lipinski": rng.choice(["yes", "no"], n, p=[0.7, 0.3]),
})
# Scatter with hue (categorical color) + size (continuous) + style (marker)
sns.scatterplot(data=df, x="molecular_weight", y="pIC50",
hue="target_class", size="logP", style="pass_lipinski",
sizes=(30, 120), alpha=0.7)
plt.xlabel("Molecular Weight (Da)")
plt.ylabel("pIC50")
plt.title("Compound Bioactivity by Target Class")
plt.tight_layout()
plt.savefig("relational_scatter.png", dpi=150)
print("Saved relational_scatter.png")
# Line plot with automatic mean aggregation and SD error band