Histolab
Overview
Histolab is a Python library for processing whole slide images (WSI) in digital pathology. It automates tissue detection, extracts informative tiles from gigapixel images, and prepares datasets for deep learning pipelines. The library handles multiple WSI formats, implements sophisticated tissue segmentation, and provides flexible tile extraction strategies.
Installation
Install OpenSlide system libraries first (OpenSlide download), then install histolab:
uv pip install histolab
For built-in TCGA sample slides via histolab.data, also install pooch:
uv pip install pooch
Histolab 0.7.0 (latest stable) supports Python 3.8–3.11 on Linux and macOS. Windows is not supported as of 0.7.0.
Quick Start
Basic workflow for extracting tiles from a whole slide image:
from histolab.slide import Slide
from histolab.tiler import RandomTiler
# Load slide
slide = Slide("slide.svs", processed_path="output/")
# Configure tiler
tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=100,
level=0,
seed=42
)
# Preview tile locations
tiler.locate_tiles(slide, n_tiles=20)
# Extract tiles
tiler.extract(slide)
Core Capabilities
1. Slide Management
Load, inspect, and work with whole slide images in various formats.
Common operations:
- Loading WSI files (SVS, TIFF, NDPI, etc.)
- Accessing slide metadata (dimensions, magnification, properties)
- Generating thumbnails for visualization
- Working with pyramidal image structures
- Extracting regions at specific coordinates
Key classes: Slide
Reference: references/slide_management.md contains comprehensive documentation on:
- Slide initialization and configuration
- Built-in sample datasets (prostate, ovarian, breast, heart, kidney tissues)
- Accessing slide properties and metadata
- Thumbnail generation and visualization
- Working with pyramid levels
- Multi-slide processing workflows
Example workflow:
from histolab.slide import Slide
from histolab.data import prostate_tissue
# Load sample data
prostate_svs, prostate_path = prostate_tissue()
# Initialize slide
slide = Slide(prostate_path, processed_path="output/")
# Inspect properties
print(f"Dimensions: {slide.dimensions}")
print(f"Levels: {slide.levels}")
print(f"Magnification: {slide.properties.get('openslide.objective-power')}")
# Save thumbnail to processed_path
from pathlib import Path
Path(slide.processed_path).mkdir(parents=True, exist_ok=True)
slide.thumbnail.save(Path(slide.processed_path) / f"{slide.name}_thumbnail.png")
2. Tissue Detection and Masks
Automatically identify tissue regions and filter background/artifacts.
Common operations:
- Creating binary tissue masks
- Detecting largest tissue region
- Excluding background and artifacts
- Custom tissue segmentation
- Removing pen annotations
Key classes: TissueMask, BiggestTissueBoxMask, BinaryMask
Reference: references/tissue_masks.md contains comprehensive documentation on:
- TissueMask: Segments all tissue regions using automated filters
- BiggestTissueBoxMask: Returns bounding box of largest tissue region (default)
- BinaryMask: Base class for custom mask implementations
- Visualizing masks with
locate_mask() - Creating custom rectangular and annotation-exclusion masks
- Mask integration with tile extraction
- Best practices and troubleshooting
Example workflow:
from histolab.masks import TissueMask, BiggestTissueBoxMask
# Create tissue mask for all tissue regions
tissue_mask = TissueMask()
# Visualize mask on slide
slide.locate_mask(tissue_mask)
# Get mask array
mask_array = tissue_mask(slide)
# Use largest tissue region (default for most extractors)
biggest_mask = BiggestTissueBoxMask()
When to use each mask:
TissueMask: Multiple tissue sections, comprehensive analysisBiggestTissueBoxMask: Single main tissue section, exclude artifacts (default)- Custom
BinaryMask: Specific ROI, exclude annotations, custom segmentation
3. Tile Extraction
Extract smaller regions from large WSI using different strategies.
Three extraction strategies:
RandomTiler: Extract fixed number of randomly positioned tiles
- Best for: Sampling diverse regions, exploratory analysis, training data
- Key parameters:
n_tiles,seedfor reproducibility
GridTiler: Systematically extract tiles across tissue in grid pattern
- Best for: Complete coverage, spatial analysis, reconstruction
- Key parameters:
pixel_overlapfor sliding windows
ScoreTiler: Extract top-ranked tiles based on scoring functions
- Best for: Most informative regions, quality-driven selection
- Key parameters:
scorer(NucleiScorer, CellularityScorer, custom)
Common parameters:
tile_size: Tile dimensions (e.g., (512, 512))level: Pyramid level for extraction (0 = highest resolution)check_tissue: Filter tiles by tissue contenttissue_percent: Minimum tissue coverage (default 80%)extraction_mask: Mask defining extraction region
Reference: references/tile_extraction.md contains comprehensive documentation on:
- Detailed explanation of each tiler strategy
- Available scorers (NucleiScorer, CellularityScorer, custom)
- Tile preview with
locate_tiles() - Extraction workflows and reporting
- Advanced patterns (multi-level, hierarchical extraction)
- Performance optimization and troubleshooting
Example workflows:
from histolab.tiler import RandomTiler, GridTiler, ScoreTiler
from histolab.scorer import NucleiScorer
# Random sampling (fast, diverse)
random_tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=100,
level=0,
seed=42,
check_tissue=True,
tissue_percent=80.0
)
random_tiler.extract(slide)
# Grid coverage (comprehensive)
grid_tiler = GridTiler(
tile_size=(512, 512),
level=0,
pixel_overlap=0,
check_tissue=True
)
grid_tiler.extract(slide)
# Score-based selection (most informative)
score_tiler = ScoreTiler(
tile_size=(512, 512),
n_tiles=50,
scorer=NucleiScorer(),
level=0
)
score_tiler.extract(slide, report_path="tiles_report.csv")
Always preview before extracting:
# Preview tile locations on thumbnail
tiler.locate_tiles(slide, n_tiles=20)
4. Filters and Preprocessing
Apply image processing filters for tissue detection, quality control, and preprocessing.
Filter categories:
Image Filters: Color space conversions, thresholding, contrast enhancement
RgbToGrayscale,RgbToHsv,RgbToHedOtsuThreshold,AdaptiveThresholdStretchContrast,HistogramEqualization
Morphological Filters: Structural operations on binary images
BinaryDilation,BinaryErosionBinaryOpening,BinaryClosingRemoveSmallObjects,RemoveSmallHoles
Composition: Chain multiple filters together
Compose: Create filter pipelines
Reference: references/filters_preprocessing.md contains comprehensive documentation on:
- Detailed explanation of each filter type
- Filter composition and chaining
- Common preprocessing pipelines (tissue detection, pen removal, nuclei enhancement)
- Applying filters to tiles
- Custom mask filters
- Quality control filters (blur detection, tissue coverage)
- Best practices and troubleshooting
Example workflows:
from histolab.filters.compositions import Compose
from histolab.filters.image_filters import RgbToGrayscale, OtsuThreshold
from histolab.filters.morphological_filters import (
BinaryDilation, RemoveSmallHoles, RemoveSmallObjects
)
# Standard tissue detection pipeline
tissue_detection = Compose([
RgbToGrayscale(),
OtsuThreshold(),
BinaryDilation(disk_size=5),
RemoveSmallHoles(area_threshold=1000),
RemoveSmallObjects(area_threshold=500)
])
# Use with custom mask
from histolab.masks import TissueMask
custom_mask = TissueMask(filters=tissue_detection)
# Apply filters to tile
fr