Seaborn Statistical Visualization
Overview
Seaborn is a Python visualization library for creating publication-quality statistical graphics. Use this skill for dataset-oriented plotting, multivariate analysis, automatic statistical estimation, and complex multi-panel figures with minimal code.
Design Philosophy
Seaborn follows these core principles:
- Dataset-oriented: Work directly with DataFrames and named variables rather than abstract coordinates
- Semantic mapping: Automatically translate data values into visual properties (colors, sizes, styles)
- Statistical awareness: Built-in aggregation, error estimation, and confidence intervals
- Aesthetic defaults: Publication-ready themes and color palettes out of the box
- Matplotlib integration: Full compatibility with matplotlib customization when needed
Quick Start
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load example dataset
df = sns.load_dataset('tips')
# Create a simple visualization
sns.scatterplot(data=df, x='total_bill', y='tip', hue='day')
plt.show()
Core Plotting Interfaces
Function Interface (Traditional)
The function interface provides specialized plotting functions organized by visualization type. Each category has axes-level functions (plot to single axes) and figure-level functions (manage entire figure with faceting).
When to use:
- Quick exploratory analysis
- Single-purpose visualizations
- When you need a specific plot type
Objects Interface (Modern)
The seaborn.objects interface provides a declarative, composable API similar to ggplot2. Build visualizations by chaining methods to specify data mappings, marks, transformations, and scales.
When to use:
- Complex layered visualizations
- When you need fine-grained control over transformations
- Building custom plot types
- Programmatic plot generation
from seaborn import objects as so
# Declarative syntax
(
so.Plot(data=df, x='total_bill', y='tip')
.add(so.Dot(), color='day')
.add(so.Line(), so.PolyFit())
)
Plotting Functions by Category
Relational Plots (Relationships Between Variables)
Use for: Exploring how two or more variables relate to each other
scatterplot()- Display individual observations as pointslineplot()- Show trends and changes (automatically aggregates and computes CI)relplot()- Figure-level interface with automatic faceting
Key parameters:
x,y- Primary variableshue- Color encoding for additional categorical/continuous variablesize- Point/line size encodingstyle- Marker/line style encodingcol,row- Facet into multiple subplots (figure-level only)
# Scatter with multiple semantic mappings
sns.scatterplot(data=df, x='total_bill', y='tip',
hue='time', size='size', style='sex')
# Line plot with confidence intervals
sns.lineplot(data=timeseries, x='date', y='value', hue='category')
# Faceted relational plot
sns.relplot(data=df, x='total_bill', y='tip',
col='time', row='sex', hue='smoker', kind='scatter')
Distribution Plots (Single and Bivariate Distributions)
Use for: Understanding data spread, shape, and probability density
histplot()- Bar-based frequency distributions with flexible binningkdeplot()- Smooth density estimates using Gaussian kernelsecdfplot()- Empirical cumulative distribution (no parameters to tune)rugplot()- Individual observation tick marksdisplot()- Figure-level interface for univariate and bivariate distributionsjointplot()- Bivariate plot with marginal distributionspairplot()- Matrix of pairwise relationships across dataset
Key parameters:
x,y- Variables (y optional for univariate)hue- Separate distributions by categorystat- Normalization: "count", "frequency", "probability", "density"bins/binwidth- Histogram binning controlbw_adjust- KDE bandwidth multiplier (higher = smoother)fill- Fill area under curvemultiple- How to handle hue: "layer", "stack", "dodge", "fill"
# Histogram with density normalization
sns.histplot(data=df, x='total_bill', hue='time',
stat='density', multiple='stack')
# Bivariate KDE with contours
sns.kdeplot(data=df, x='total_bill', y='tip',
fill=True, levels=5, thresh=0.1)
# Joint plot with marginals
sns.jointplot(data=df, x='total_bill', y='tip',
kind='scatter', hue='time')
# Pairwise relationships
sns.pairplot(data=df, hue='species', corner=True)
Categorical Plots (Comparisons Across Categories)
Use for: Comparing distributions or statistics across discrete categories
Categorical scatterplots:
stripplot()- Points with jitter to show all observationsswarmplot()- Non-overlapping points (beeswarm algorithm)
Distribution comparisons:
boxplot()- Quartiles and outliersviolinplot()- KDE + quartile informationboxenplot()- Enhanced boxplot for larger datasets
Statistical estimates:
barplot()- Mean/aggregate with confidence intervalspointplot()- Point estimates with connecting linescountplot()- Count of observations per category
Figure-level:
catplot()- Faceted categorical plots (setkindparameter)
Key parameters:
x,y- Variables (one typically categorical)hue- Additional categorical groupingorder,hue_order- Control category orderingdodge- Separate hue levels side-by-sideorient- "v" (vertical) or "h" (horizontal)kind- Plot type for catplot: "strip", "swarm", "box", "violin", "bar", "point"
# Swarm plot showing all points
sns.swarmplot(data=df, x='day', y='total_bill', hue='sex')
# Violin plot with split for comparison
sns.violinplot(data=df, x='day', y='total_bill',
hue='sex', split=True)
# Bar plot with error bars
sns.barplot(data=df, x='day', y='total_bill',
hue='sex', estimator='mean', errorbar='ci')
# Faceted categorical plot
sns.catplot(data=df, x='day', y='total_bill',
col='time', kind='box')
Regression Plots (Linear Relationships)
Use for: Visualizing linear regressions and residuals
regplot()- Axes-level regression plot with scatter + fit linelmplot()- Figure-level with faceting supportresidplot()- Residual plot for assessing model fit
Key parameters:
x,y- Variables to regressorder- Polynomial regression orderlogistic- Fit logistic regressionrobust- Use robust regression (less sensitive to outliers)ci- Confidence interval width (default 95)scatter_kws,line_kws- Customize scatter and line properties
# Simple linear regression
sns.regplot(data=df, x='total_bill', y='tip')
# Polynomial regression with faceting
sns.lmplot(data=df, x='total_bill', y='tip',
col='time', order=2, ci=95)
# Check residuals
sns.residplot(data=df, x='total_bill', y='tip')
Matrix Plots (Rectangular Data)
Use for: Visualizing matrices, correlations, and grid-structured data
heatmap()- Color-encoded matrix with annotationsclustermap()- Hierarchically-clustered heatmap
Key parameters:
data- 2D rectangular dataset (DataFrame or array)annot- Display values in cellsfmt- Format string for annotations (e.g., ".2f")cmap- Colormap namecenter- Value at colormap center (for diverging colormaps)vmin,vmax- Color scale limitssquare- Force square cellslinewidths- Gap between cells
# Correlation heatmap
corr = df.corr()
sns.heatmap(corr, annot=True, fmt='.2f',
cmap='coolwarm', center=0, square=True)
# Clustered heatmap
sns.clustermap(data, cmap='viridis',
standard_scale=1, figsize=(10, 10))
Multi-Plot Grids
Seaborn provides grid objects for creating complex multi-panel figures:
FacetGrid
C