Seaborn — Statistical Visualization
Overview
Seaborn is a Python visualization library for creating publication-quality statistical graphics with minimal code. It works directly with pandas DataFrames, provides automatic statistical estimation (means, CIs, KDE), and offers attractive default themes. Built on matplotlib for full customization access.
When to Use
- Creating distribution plots (histograms, KDE, violin plots, box plots) for data exploration
- Visualizing relationships between variables with automatic trend fitting and confidence intervals
- Comparing distributions across categorical groups (treatment vs control, tissue types)
- Generating correlation heatmaps and clustered heatmaps
- Quick exploratory data analysis with
pairplotfor all pairwise relationships - Multi-panel figures with automatic faceting by categorical variables
- For interactive plots with hover/zoom, use plotly instead
- For low-level figure control or custom layouts, use matplotlib directly
Prerequisites
pip install seaborn matplotlib pandas
Quick Start
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
df = sns.load_dataset("tips")
sns.scatterplot(data=df, x="total_bill", y="tip", hue="day", style="time")
plt.title("Tips by Day and Time")
plt.tight_layout()
plt.savefig("scatter.png", dpi=150)
print("Saved scatter.png")
Core API
1. Distribution Plots
Visualize univariate and bivariate distributions.
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("tips")
# Histogram with density normalization
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
sns.histplot(data=df, x="total_bill", hue="time", stat="density",
multiple="stack", ax=axes[0])
axes[0].set_title("Histogram")
# KDE (smooth density estimate)
sns.kdeplot(data=df, x="total_bill", hue="time", fill=True,
bw_adjust=0.8, ax=axes[1])
axes[1].set_title("KDE")
# ECDF (empirical cumulative distribution)
sns.ecdfplot(data=df, x="total_bill", hue="time", ax=axes[2])
axes[2].set_title("ECDF")
plt.tight_layout()
plt.savefig("distributions.png", dpi=150)
print("Saved distributions.png")
# Bivariate KDE with contours
sns.kdeplot(data=df, x="total_bill", y="tip", fill=True,
levels=5, thresh=0.1, cmap="mako")
plt.title("Bivariate KDE")
plt.savefig("bivariate_kde.png", dpi=150)
2. Categorical Plots
Compare distributions or estimates across discrete categories.
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("tips")
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# Box plot — quartiles and outliers
sns.boxplot(data=df, x="day", y="total_bill", hue="sex",
dodge=True, ax=axes[0])
axes[0].set_title("Box Plot")
# Violin plot — KDE + quartiles
sns.violinplot(data=df, x="day", y="total_bill", hue="sex",
split=True, inner="quart", ax=axes[1])
axes[1].set_title("Violin Plot")
# Bar plot — mean with CI
sns.barplot(data=df, x="day", y="total_bill", hue="sex",
estimator="mean", errorbar="ci", ax=axes[2])
axes[2].set_title("Bar Plot (mean ± 95% CI)")
plt.tight_layout()
plt.savefig("categorical.png", dpi=150)
print("Saved categorical.png")
# Swarm plot — all individual observations, non-overlapping
sns.swarmplot(data=df, x="day", y="total_bill", hue="sex", dodge=True)
plt.title("Swarm Plot")
plt.savefig("swarm.png", dpi=150)
3. Relational Plots
Explore relationships between continuous variables.
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("tips")
# Scatter with multiple semantic mappings
sns.scatterplot(data=df, x="total_bill", y="tip",
hue="day", size="size", style="time")
plt.title("Scatter with Multi-Encoding")
plt.savefig("relational.png", dpi=150)
# Line plot with automatic aggregation and CI
fmri = sns.load_dataset("fmri")
sns.lineplot(data=fmri, x="timepoint", y="signal",
hue="region", style="event", errorbar="sd")
plt.title("Line Plot (mean ± SD)")
plt.savefig("lineplot.png", dpi=150)
4. Regression Plots
Fit and visualize linear models.
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("tips")
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# Linear regression with CI band
sns.regplot(data=df, x="total_bill", y="tip", ci=95, ax=axes[0])
axes[0].set_title("Linear Regression")
# Residual plot (check model assumptions)
sns.residplot(data=df, x="total_bill", y="tip", ax=axes[1])
axes[1].set_title("Residuals")
plt.tight_layout()
plt.savefig("regression.png", dpi=150)
print("Saved regression.png")
5. Matrix Plots
Visualize rectangular data (correlations, heatmaps).
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# Correlation heatmap
df = sns.load_dataset("tips")
corr = df.select_dtypes(include=[np.number]).corr()
sns.heatmap(corr, annot=True, fmt=".2f", cmap="coolwarm",
center=0, square=True, linewidths=0.5)
plt.title("Correlation Heatmap")
plt.tight_layout()
plt.savefig("heatmap.png", dpi=150)
print("Saved heatmap.png")
# Clustered heatmap with hierarchical clustering
flights = sns.load_dataset("flights").pivot(index="month", columns="year", values="passengers")
sns.clustermap(flights, cmap="viridis", standard_scale=1,
figsize=(10, 8), linewidths=0.5)
plt.savefig("clustermap.png", dpi=150)
6. Figure-Level Functions and Faceting
Create multi-panel figures with automatic faceting.
import seaborn as sns
df = sns.load_dataset("tips")
# relplot — faceted scatter/line plots
g = sns.relplot(data=df, x="total_bill", y="tip",
col="time", row="sex", hue="smoker",
kind="scatter", height=3, aspect=1.2)
g.set_axis_labels("Total Bill ($)", "Tip ($)")
g.savefig("faceted_scatter.png", dpi=150)
print("Saved faceted_scatter.png")
# catplot — faceted categorical plots
g = sns.catplot(data=df, x="day", y="total_bill",
col="time", kind="box", height=4, aspect=1)
g.set_titles("{col_name}")
g.savefig("faceted_boxplot.png", dpi=150)
7. Exploratory Grids (pairplot, jointplot)
Quickly explore all pairwise relationships.
import seaborn as sns
iris = sns.load_dataset("iris")
# Pairplot — matrix of pairwise relationships
g = sns.pairplot(iris, hue="species", corner=True,
diag_kind="kde", plot_kws={"alpha": 0.6})
g.savefig("pairplot.png", dpi=150)
print("Saved pairplot.png")
# Joint plot — bivariate + marginal distributions
g = sns.jointplot(data=iris, x="sepal_length", y="petal_length",
hue="species", kind="scatter")
g.savefig("jointplot.png", dpi=150)
Key Concepts
Figure-Level vs Axes-Level Functions
Understanding this distinction is critical for composing seaborn with matplotlib:
| Feature | Axes-Level | Figure-Level |
|---|---|---|
| Examples | scatterplot, histplot, boxplot, heatmap | relplot, displot, catplot, lmplot |
| Returns | matplotlib.axes.Axes | FacetGrid / JointGrid / PairGrid |
| Faceting | Manual (create subplots yourself) | Built-in (col, row params) |
| Sizing | figsize on parent figure | height + aspect per subplot |
| Placement | ax= parameter | Cannot be placed in existing figure |
| Use when | Combining with other plot types, custom layouts | Quick faceted views, exploratory analysis |
# Axes-level: embed in custom layout
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
sns.boxplot(data=df, x="day", y="tip", ax=axes[0])
sns.scatterplot(data=df, x="total_bill", y="tip", ax=axes[1])
Data Format: Long vs Wide
Seaborn strongly prefers long-form (tidy) data where each variable is a column:
# Long-form (preferred) — works with all functi