Graphsignal SDK

Name: graphsignal-sdk
Rating: 5 (205 reviews)
Author: graphsignal

Graphsignal captures performance profiles, traces, metrics, and errors for inference workloads. Auto-instrumentation covers vLLM, SGLang, and PyTorch out of the box.

Install

pip install -U graphsignal

For CUPTI-based GPU profiling on Linux, install the extra matching the CUDA version:

pip install graphsignal[cu12]   # CUDA 12.x
pip install graphsignal[cu13]   # CUDA 13.x

Configure

Option A: In Python code

import graphsignal

graphsignal.configure(api_key='my-api-key')

All configure() args can be set via env vars: GRAPHSIGNAL_API_KEY, GRAPHSIGNAL_DEBUG_MODE, etc.

configure() parameters:

Arg	Env var	Purpose
`api_key`	`GRAPHSIGNAL_API_KEY`	API key (required)
`api_base`	`GRAPHSIGNAL_API_BASE`	On-premise server URL
`tags`	`GRAPHSIGNAL_TAG_{KEY}`	Process-level tags
`auto_instrument`	`GRAPHSIGNAL_AUTO_INSTRUMENT`	Auto-instrument libraries (default `True`)
`debug_mode`	`GRAPHSIGNAL_DEBUG_MODE`	Enable debug logging

Option B: graphsignal-run CLI

Wrap any command — no code changes needed:

export GRAPHSIGNAL_API_KEY="..."
graphsignal-run <my-app>

Integrate with vLLM

Graphsignal automatically instruments vLLM (engine, scheduler, KV cache, attention, output processing, Prometheus metrics).

In a Python app

import graphsignal
graphsignal.configure(api_key='my-api-key')
# then use vLLM normally

vLLM server via graphsignal-run

export GRAPHSIGNAL_API_KEY="..."
graphsignal-run vllm serve Qwen/Qwen1.5-7B-Chat --port 8000

vLLM Docker

docker run --gpus all \
  -p 8000:8000 --ipc=host \
  -e GRAPHSIGNAL_API_KEY=YOUR_API_KEY \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --entrypoint bash \
  vllm/vllm-openai:latest \
  -lc 'pip install --no-cache-dir graphsignal[cu12] \
       && exec graphsignal-run vllm serve \
           --model Qwen/Qwen2-VL-7B-Instruct \
           --trust-remote-code'

Integrate with SGLang

Graphsignal automatically instruments SGLang (operations, OTEL spans, Prometheus metrics).

In a Python app

import graphsignal
graphsignal.configure(api_key='my-api-key')
# then use SGLang normally

SGLang server via graphsignal-run

export GRAPHSIGNAL_API_KEY="..."
graphsignal-run sglang serve \
  --model-path Qwen/Qwen1.5-7B-Chat \
  --port 8000

SGLang Docker

docker run --gpus all \
  -p 8000:8000 --ipc=host \
  -e GRAPHSIGNAL_API_KEY=YOUR_API_KEY \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --entrypoint bash \
  your-sglang-image:latest \
  -lc 'pip install --no-cache-dir graphsignal[cu12] \
       && exec graphsignal-run sglang serve \
           --model-path Qwen/Qwen2.5-1.5B-Instruct \
           --port 8000'

Integrate via dstack

dstack runs inference as services. Use graphsignal-run around the server command, same as bare-metal.

dstack service config (SGLang example)

type: service
name: deepseek-r1

image: lmsysorg/sglang:latest
env:
  - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
  - GRAPHSIGNAL_API_KEY

commands:
  - |
    pip install --no-cache-dir 'graphsignal[cu12]' && \
    graphsignal-run python3 -m sglang.launch_server \
      --model-path $MODEL_ID \
      --port 8000 \
      --trust-remote-code

port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B

resources:
  gpu: 24GB

Deploy:

dstack apply -f service.dstack.yml

If your Docker image already includes Graphsignal, skip the pip install step. If using the sglang CLI instead of launch_server, use graphsignal-run sglang serve ....

Manual Tracing

For code not auto-instrumented, use trace() or @trace_function:

with graphsignal.trace('inference') as span:
    span.set_counter('prompt_tokens', prompt_tokens)
    span.set_counter('completion_tokens', completion_tokens)

    with span.trace('model-cold-boot') as sub_span:
        ...

@graphsignal.trace_function
def my_function():
    ...

Record counter metrics within a trace:

with graphsignal.trace('my-function') as span:
    span.inc_counter_metric('call_cost', price_per_call)

Manual Profiling

Profile specific Python functions (3.12+):

graphsignal.profile_function(func=slow_transform, category='transform', op_name='data-transform')

Profile by import path (avoids direct import):

graphsignal.profile_function_path(path='myapp.tasks.prepare_data', category='preprocessing')

Profile CUDA kernels by pattern (Linux, CUPTI required):

graphsignal.profile_cuda_kernel(kernel_pattern="cublas", op_name="matmul_gemm")

Troubleshooting

Enable debug logging: graphsignal.configure(debug_mode=True) or GRAPHSIGNAL_DEBUG_MODE=true.

Ensure outgoing connections to https://api.graphsignal.com are allowed.

Reference

Full Python API: https://graphsignal.com/docs/reference/python-api/
vLLM integration: https://graphsignal.com/docs/integrations/vllm/
SGLang integration: https://graphsignal.com/docs/integrations/sglang/
dstack integration: https://graphsignal.com/docs/integrations/dstack/

graphsignal-sdk

How to add

Drop this on your repo README

Related skills

internal-comms

babysit

do

smart-explore

Get new DevOps e Infra skills every Monday

Graphsignal SDK

Install

Configure

Option A: In Python code

Option B: graphsignal-run CLI

Integrate with vLLM

In a Python app

vLLM server via graphsignal-run

vLLM Docker

Integrate with SGLang

In a Python app

SGLang server via graphsignal-run

SGLang Docker

Integrate via dstack

dstack service config (SGLang example)

Manual Tracing

Manual Profiling

Tags

Troubleshooting

Reference

Comments · No comments