PyTorch Geometric (PyG)

PyG is the standard library for Graph Neural Networks built on PyTorch. It provides data structures for graphs, 60+ GNN layer implementations, scalable mini-batch training, and support for heterogeneous graphs.

Installation

Tested against torch-geometric 2.7.x (Oct 2025). Requires Python 3.10+ and PyTorch 2.6+.

# 1. Install PyTorch first (match your CUDA/CPU setup — see https://pytorch.org/get-started/locally/)
uv pip install torch

# 2. Core PyG (no extension wheels required for basic usage)
uv pip install torch_geometric

Optional accelerated ops (pyg-lib, torch-scatter, torch-sparse, torch-cluster) are not required for basic PyG usage (since PyG 2.3). Install version-matched wheels from the PyG wheel index after checking your PyTorch and CUDA versions:

python -c "import torch; print(torch.__version__, torch.version.cuda)"
# Then install wheels for your torch+CUDA combo, e.g.:
uv pip install pyg-lib torch-scatter torch-sparse torch-cluster \
  -f https://data.pyg.org/whl/torch-2.8.0+cu128.html

Check your version:

import torch_geometric
print(torch_geometric.__version__)

Conda: the pyg conda channel is no longer maintained for PyTorch >2.5 — use uv pip install and the wheel index above instead.

PyG 2.7 notes

PyG 2.7 dropped Python 3.9 and PyTorch ≤2.5. See the 2.7.0 release notes for PyTorch 2.6–2.8 compatibility tables. torch_geometric.distributed is deprecated — use standard torch.distributed DDP (see references/scaling.md).

Core Concepts

Graph Data: `Data` and `HeteroData`

A graph lives in a Data object. The key attributes:

from torch_geometric.data import Data

data = Data(
    x=node_features,          # [num_nodes, num_node_features]
    edge_index=edge_index,     # [2, num_edges] — COO format, dtype=torch.long
    edge_attr=edge_features,   # [num_edges, num_edge_features]
    y=labels,                  # node-level [num_nodes, *] or graph-level [1, *]
    pos=positions,             # [num_nodes, num_dimensions] (for point clouds/spatial)
)

edge_index format is critical: it's a [2, num_edges] tensor where edge_index[0] = source nodes, edge_index[1] = target nodes. It is NOT a list of tuples. If you have edge pairs as rows, transpose and call .contiguous():

# If edges are [[src1, dst1], [src2, dst2], ...] — transpose first:
edge_index = edge_pairs.t().contiguous()

For undirected graphs, include both directions: edge (0,1) needs both [0,1] and [1,0] in edge_index.

For heterogeneous graphs, use HeteroData — see the Heterogeneous Graphs section below.

Datasets

PyG bundles many standard datasets that auto-download and preprocess:

from torch_geometric.datasets import Planetoid, TUDataset

# Single-graph node classification (Cora, Citeseer, Pubmed)
dataset = Planetoid(root='./data', name='Cora')
data = dataset[0]  # single graph with train/val/test masks

# Multi-graph classification (ENZYMES, MUTAG, IMDB-BINARY, etc.)
dataset = TUDataset(root='./data', name='ENZYMES')
# dataset[0], dataset[1], ... are individual graphs

Common datasets by task:

Node classification: Planetoid (Cora/Citeseer/Pubmed), OGB (ogbn-arxiv, ogbn-products, ogbn-mag)
Graph classification: TUDataset (MUTAG, ENZYMES, PROTEINS, IMDB-BINARY), OGB (ogbg-molhiv)
Link prediction: OGB (ogbl-collab, ogbl-citation2)
Molecular: QM7, QM9, MoleculeNet
Point cloud/mesh: ShapeNet, ModelNet10/40, FAUST

Transforms

Transforms preprocess or augment graph data, analogous to torchvision transforms:

import torch_geometric.transforms as T

# Common transforms
T.NormalizeFeatures()    # Row-normalize node features to sum to 1
T.ToUndirected()         # Add reverse edges to make graph undirected
T.AddSelfLoops()         # Add self-loop edges
T.KNNGraph(k=6)          # Build k-NN graph from point cloud positions
T.RandomJitter(0.01)     # Random noise augmentation on positions
T.Compose([...])         # Chain multiple transforms

# Apply as pre_transform (once, saved to disk) or transform (every access)
dataset = ShapeNet(root='./data', pre_transform=T.KNNGraph(k=6),
                   transform=T.RandomJitter(0.01))

Building GNN Models

Quick Start: Using Built-in Layers

The fastest way to build a GNN — stack conv layers from torch_geometric.nn:

import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

class GCN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, out_channels)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index).relu()
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv2(x, edge_index)
        return x

Important: PyG conv layers do NOT include activation functions — apply them yourself after each layer. This is by design for flexibility.

Choosing a Conv Layer

Pick based on your task and graph structure:

Layer	Best for	Key idea
`GCNConv`	Homogeneous, semi-supervised node classification	Spectral-inspired, degree-normalized aggregation
`GATConv` / `GATv2Conv`	When neighbor importance varies	Attention-weighted messages
`SAGEConv`	Large graphs, inductive settings	Sampling-friendly, learnable aggregation
`GINConv`	Graph classification, maximizing expressiveness	As powerful as WL test
`TransformerConv`	Rich edge features, complex interactions	Multi-head attention with edge features
`EdgeConv`	Point clouds, dynamic graphs	MLP on edge features (x_i, x_j - x_i)
`RGCNConv`	Heterogeneous with many relation types	Relation-specific weight matrices
`HGTConv`	Heterogeneous graphs	Type-specific attention

All conv layers accept (x, edge_index) at minimum. Many also accept edge_attr for edge features.

Lazy Initialization

Use -1 for input channels to let PyG infer dimensions automatically — especially useful for heterogeneous models:

conv = SAGEConv((-1, -1), 64)  # Input dims inferred on first forward pass
# Initialize lazy modules:
with torch.no_grad():
    out = model(data.x, data.edge_index)

High-Level Model APIs

For common architectures, PyG provides ready-made model classes:

from torch_geometric.nn import GraphSAGE, GCN, GAT, GIN

model = GraphSAGE(
    in_channels=dataset.num_features,
    hidden_channels=64,
    out_channels=dataset.num_classes,
    num_layers=2,
)

Custom Layers via MessagePassing

To implement a novel GNN layer, subclass MessagePassing. The framework is:

propagate() orchestrates the message passing
message() defines what info flows along each edge (the phi function)
aggregate() combines messages at each node (sum/mean/max)
update() transforms the aggregated result (the gamma function)

from torch_geometric.nn import MessagePassing
from torch_geometric.utils import add_self_loops, degree

class MyConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        super().__init__(aggr='add')  # "add", "mean", or "max"
        self.lin = torch.nn.Linear(in_channels, out_channels)

    def forward(self, x, edge_index):
        # Pre-processing before message passing
        x = self.lin(x)
        # Start message passing
        return self.propagate(edge_index, x=x)

    def message(self, x_j):
        # x_j: features of source nodes for each edge [num_edges, features]
        # The _j suffix auto-indexes source nodes, _i indexes target nodes
        return x_j

The _i / _j convention: any tensor passed to propagate() can be auto-indexed by ap

torch-geometric

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

xlsx

mem-search

weekly-digests

how-it-works

Recibe nuevas skills de Dados e Análise todos los lunes

PyTorch Geometric (PyG)

Installation

PyG 2.7 notes

Core Concepts

Graph Data: `Data` and `HeteroData`

Datasets

Transforms

Building GNN Models

Quick Start: Using Built-in Layers

Choosing a Conv Layer

Lazy Initialization

High-Level Model APIs

Custom Layers via MessagePassing

Comentarios · Sin comentarios

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

xlsx

mem-search

weekly-digests

how-it-works

Recibe nuevas skills de Dados e Análise todos los lunes

PyTorch Geometric (PyG)

Installation

PyG 2.7 notes

Core Concepts

Graph Data: Data and HeteroData

Datasets

Transforms

Building GNN Models

Quick Start: Using Built-in Layers

Choosing a Conv Layer

Lazy Initialization

High-Level Model APIs

Custom Layers via MessagePassing

Comentarios · Sin comentarios

Graph Data: `Data` and `HeteroData`