Seaborn — Statistical Plots

Overview

Seaborn is a Python library for statistical data visualization built on top of matplotlib. It works directly with pandas DataFrames, automatically handles grouping by categorical variables, computes confidence intervals and kernel density estimates, and produces attractive publication-ready figures with minimal configuration. Seaborn separates axes-level functions (embeddable in custom layouts) from figure-level functions (with built-in faceting), enabling both quick exploratory analysis and structured multi-panel figures.

When to Use

Comparing gene expression, protein abundance, or measurement distributions across experimental conditions (treatment vs. control, cell lines, time points)
Generating grouped box plots, violin plots, or strip plots to show both summary statistics and individual data points simultaneously
Visualizing pairwise correlations in multi-gene or multi-feature datasets as annotated heatmaps
Plotting regression fits with confidence bands between continuous variables (e.g., cell viability vs. drug concentration)
Faceting a single plot type across multiple sample subsets, tissue types, or experimental batches in one call
Rapid exploratory analysis of a new dataset using pairplot to survey all pairwise relationships at once
Use matplotlib directly when you need pixel-level control over figure elements, complex mixed-type layouts, or non-statistical custom plots
Use plotly when the output must be interactive (hover tooltips, zoom, pan) or embedded in a web application

Prerequisites

Python packages: seaborn>=0.13, matplotlib, pandas, numpy
Data requirements: Pandas DataFrame in long-form (tidy) format; each observation is a row, each variable is a column
Environment: Standard Python environment; no GPU or special hardware required

pip install "seaborn>=0.13" matplotlib pandas numpy scipy

Quick Start

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Simulate gene expression across conditions
rng = np.random.default_rng(42)
df = pd.DataFrame({
    "gene":      ["BRCA1"] * 60 + ["TP53"] * 60,
    "condition": ["control", "treated"] * 60,
    "log2_expr": np.concatenate([
        rng.normal(5.2, 0.8, 60),
        rng.normal(6.1, 0.9, 60),
    ])
})

sns.set_theme(style="ticks", context="notebook")
sns.boxplot(data=df, x="gene", y="log2_expr", hue="condition", palette="Set2")
plt.ylabel("log2 Expression")
plt.title("Gene Expression by Condition")
plt.tight_layout()
plt.savefig("quickstart_boxplot.png", dpi=150)
print("Saved quickstart_boxplot.png")

Core API

1. Distribution Plots

Visualize univariate distributions and compare them across groups. histplot bins data; kdeplot fits a smooth density estimate; displot is the figure-level wrapper that adds faceting.

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

rng = np.random.default_rng(0)
n = 200
df = pd.DataFrame({
    "log2_tpm":  np.concatenate([rng.normal(4.5, 1.1, n), rng.normal(6.0, 1.3, n)]),
    "sample":    ["tumor"] * n + ["normal"] * n,
})

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Histogram with density normalization and stacked hue groups
sns.histplot(data=df, x="log2_tpm", hue="sample", stat="density",
             multiple="stack", bins=30, ax=axes[0])
axes[0].set_title("Histogram (stacked)")

# KDE with fill — bandwidth controlled by bw_adjust
sns.kdeplot(data=df, x="log2_tpm", hue="sample", fill=True,
            bw_adjust=0.8, alpha=0.4, ax=axes[1])
axes[1].set_title("KDE (filled)")

# ECDF — useful for comparing cumulative distributions
sns.ecdfplot(data=df, x="log2_tpm", hue="sample", ax=axes[2])
axes[2].set_title("ECDF")

plt.tight_layout()
plt.savefig("distributions.png", dpi=150)
print("Saved distributions.png")

# Bivariate KDE: joint distribution of two continuous variables
rng = np.random.default_rng(1)
df2 = pd.DataFrame({
    "log2_rna": rng.normal(5.5, 1.2, 300),
    "log2_prot": rng.normal(4.8, 1.0, 300) + 0.6 * rng.normal(5.5, 1.2, 300),
})
sns.kdeplot(data=df2, x="log2_rna", y="log2_prot",
            fill=True, levels=8, thresh=0.05, cmap="Blues")
plt.xlabel("log2 RNA (TPM)")
plt.ylabel("log2 Protein (iBAQ)")
plt.title("RNA–Protein Correlation Density")
plt.tight_layout()
plt.savefig("bivariate_kde.png", dpi=150)
print("Saved bivariate_kde.png")

2. Categorical Plots

Compare distributions or aggregated statistics across categorical groups. Axes-level functions (boxplot, violinplot, stripplot, swarmplot, barplot) accept an ax= parameter for embedding in custom layouts.

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

rng = np.random.default_rng(2)
conditions = ["DMSO", "Drug A 1uM", "Drug A 10uM", "Drug B 1uM", "Drug B 10uM"]
df = pd.DataFrame({
    "condition": np.repeat(conditions, 30),
    "viability": np.concatenate([
        rng.normal(100, 5, 30),
        rng.normal(92, 7, 30),
        rng.normal(65, 10, 30),
        rng.normal(88, 8, 30),
        rng.normal(45, 12, 30),
    ])
})

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Box plot — shows quartiles and outliers
sns.boxplot(data=df, x="condition", y="viability",
            palette="husl", width=0.5, ax=axes[0])
axes[0].set_xticklabels(axes[0].get_xticklabels(), rotation=30, ha="right")
axes[0].set_title("Box Plot")

# Violin — KDE shape + inner quartile lines
sns.violinplot(data=df, x="condition", y="viability",
               inner="quart", palette="muted", ax=axes[1])
axes[1].set_xticklabels(axes[1].get_xticklabels(), rotation=30, ha="right")
axes[1].set_title("Violin Plot")

# Strip plot overlaid on box — shows all individual points
sns.boxplot(data=df, x="condition", y="viability",
            palette="pastel", width=0.5, ax=axes[2])
sns.stripplot(data=df, x="condition", y="viability",
              color="black", alpha=0.4, size=3, jitter=True, ax=axes[2])
axes[2].set_xticklabels(axes[2].get_xticklabels(), rotation=30, ha="right")
axes[2].set_title("Box + Strip")

plt.tight_layout()
plt.savefig("categorical.png", dpi=150)
print("Saved categorical.png")

# Bar plot with mean ± 95% CI and individual points (swarm)
fig, ax = plt.subplots(figsize=(8, 5))
sns.barplot(data=df, x="condition", y="viability",
            estimator="mean", errorbar="ci", palette="Set3", ax=ax)
sns.swarmplot(data=df, x="condition", y="viability",
              color="black", size=3, alpha=0.5, ax=ax)
ax.set_ylabel("Cell Viability (%)")
ax.set_xticklabels(ax.get_xticklabels(), rotation=30, ha="right")
plt.tight_layout()
plt.savefig("barswarm.png", dpi=150)
print("Saved barswarm.png")

3. Relational Plots

Visualize relationships between continuous variables. scatterplot and lineplot are axes-level; relplot is the figure-level wrapper that supports col and row faceting.

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

rng = np.random.default_rng(3)
n = 150
df = pd.DataFrame({
    "molecular_weight": rng.uniform(200, 800, n),
    "logP":             rng.uniform(-2, 6, n),
    "pIC50":            rng.normal(6.5, 1.2, n),
    "target_class":     rng.choice(["kinase", "GPCR", "protease"], n),
    "pass_lipinski":    rng.choice(["yes", "no"], n, p=[0.7, 0.3]),
})

# Scatter with hue (categorical color) + size (continuous) + style (marker)
sns.scatterplot(data=df, x="molecular_weight", y="pIC50",
                hue="target_class", size="logP", style="pass_lipinski",
                sizes=(30, 120), alpha=0.7)
plt.xlabel("Molecular Weight (Da)")
plt.ylabel("pIC50")
plt.title("Compound Bioactivity by Target Class")
plt.tight_layout()
plt.savefig("relational_scatter.png", dpi=150)
print("Saved relational_scatter.png")

# Line plot with automatic mean aggregation and SD error band

seaborn-statistical-plots

How to add

Drop this on your repo README

Related skills

xlsx

mem-search

weekly-digests

how-it-works

Get new Dados e Análise skills every Monday