Keldron Agent — GPU Monitoring with Risk Intelligence
1. Overview
Keldron Agent is a vendor-neutral GPU monitoring agent that runs locally and exposes real-time telemetry and risk scores via a Prometheus endpoint. It supports Apple Silicon (M1–M5), NVIDIA consumer GPUs (RTX 3090/4090/5090), NVIDIA datacenter (H100/B200), AMD GPUs, and any Linux machine.
Dual mode: Use the local agent on localhost:9100 for fast, real-time, single-device queries (works offline). Use Keldron Cloud (https://api.keldron.ai) for fleet overview, historical telemetry, analytics, and proactive fleet monitoring when an API key is configured.
No sudo required on any platform for the agent binary. On Linux, Docker may require sudo or membership in the docker group — see Docker post-install or rootless Docker if you hit permission errors.
Use this skill when the user wants to:
- Monitor GPU temperature, power, utilization, or memory
- Get risk assessments for their GPU
- Track power costs
- Set up alerts or watch a fleet (via cloud polling)
- Open the real dashboard (local or cloud)
- Ask fleet, history, or analytics questions
Mode detection (run at the start of an interaction)
Field names differ by endpoint — see Cloud API field names before writing jq filters.
# Check 1: Is the local agent running?
LOCAL_AGENT=$(curl -sf localhost:9100/healthz 2>/dev/null | jq -r '.status' 2>/dev/null)
# Check 2: Is cloud configured? (env → ~/.keldron/credentials from login → YAML)
CLOUD_KEY="${KELDRON_CLOUD_API_KEY:-}"
CLOUD_ENDPOINT=""
if [ -z "$CLOUD_KEY" ] && [ -f ~/.keldron/credentials ] && command -v jq &>/dev/null; then
CLOUD_KEY=$(jq -r '.api_key // ""' ~/.keldron/credentials 2>/dev/null)
CLOUD_ENDPOINT=$(jq -r '.endpoint // ""' ~/.keldron/credentials 2>/dev/null)
fi
if [ -z "$CLOUD_KEY" ]; then
if command -v yq &>/dev/null; then
CLOUD_KEY=$(yq '.cloud.api_key // ""' ~/.config/keldron/keldron-agent.yaml 2>/dev/null)
CLOUD_ENDPOINT=$(yq '.cloud.endpoint // ""' ~/.config/keldron/keldron-agent.yaml 2>/dev/null)
else
CLOUD_KEY=$(grep -A3 'cloud:' ~/.config/keldron/keldron-agent.yaml 2>/dev/null \
| grep 'api_key:' | awk '{print $2}' | tr -d "\"'" | xargs 2>/dev/null)
CLOUD_ENDPOINT=$(grep -A3 'cloud:' ~/.config/keldron/keldron-agent.yaml 2>/dev/null \
| grep 'endpoint:' | awk '{print $2}' | tr -d "\"'" | xargs 2>/dev/null)
fi
fi
CLOUD_ENDPOINT="${CLOUD_ENDPOINT:-https://api.keldron.ai}"
if [ -z "$CLOUD_KEY" ]; then
echo "No cloud API key found. Run keldron-agent login, set KELDRON_CLOUD_API_KEY (non-interactive login or streaming), or add cloud.api_key to ~/.config/keldron/keldron-agent.yaml. Sign up at https://app.keldron.ai"
fi
# Check 3: Does cloud respond? (only if we have a key)
CLOUD_OK=""
if [ -n "$CLOUD_KEY" ]; then
CLOUD_OK=$(curl -sf "${CLOUD_ENDPOINT}/health" 2>/dev/null | jq -r '.status' 2>/dev/null)
fi
Mode priority:
- Both (healthy local + cloud) → Cloud for fleet, history, analytics, proactive fleet loops; local for instantaneous single-device metrics.
- Cloud only → Use cloud for everything that needs the API; mention local agent if they want lower-latency realtime on that machine.
- Local only → Use
localhost:9100for realtime; naturally mention cloud for history, fleet, and alerts: "That needs historical data — connect at app.keldron.ai." - Neither → Run the Auto-setup flow.
2. Installation
Prefer a GitHub release binary (keldron-agent). To build from source, clone the repo and run make build (requires Go and Node.js for the full dashboard).
Mac (Apple Silicon)
curl -sfL https://github.com/keldron-ai/keldron-agent/releases/latest/download/keldron-agent-darwin-arm64 -o keldron-agent
chmod +x keldron-agent
Linux (AMD64)
curl -sfL https://github.com/keldron-ai/keldron-agent/releases/latest/download/keldron-agent-linux-amd64 -o keldron-agent
chmod +x keldron-agent
Linux (ARM64)
curl -sfL https://github.com/keldron-ai/keldron-agent/releases/latest/download/keldron-agent-linux-arm64 -o keldron-agent
chmod +x keldron-agent
Linux (Docker)
docker rm -f keldron-agent 2>/dev/null || true
docker run -d --name keldron-agent --restart unless-stopped \
-p 9100:9100 -p 9200:9200 -p 8081:8081 \
-e KELDRON_OUTPUT_PROMETHEUS_HOST=0.0.0.0 \
-e KELDRON_API_HOST=0.0.0.0 \
-e KELDRON_HEALTH_BIND=0.0.0.0:8081 \
ghcr.io/keldron-ai/keldron-agent:latest
Verify installation
./keldron-agent --version
3. Auto-setup flow
Trigger phrases: "monitor my hardware", "set up monitoring", "install keldron", "get started", "help me set up"
Step 1: Detect environment
OS=$(uname -s)
if [ "$OS" = "Darwin" ]; then
CHIP=$(sysctl -n machdep.cpu.brand_string 2>/dev/null || echo "unknown")
echo "Detected: macOS with $CHIP"
elif command -v nvidia-smi &>/dev/null; then
GPU=$(nvidia-smi --query-gpu=name --format=csv,noheader 2>/dev/null | head -1)
echo "Detected: Linux with NVIDIA $GPU"
else
echo "Detected: Linux (generic thermal monitoring)"
fi
Step 2: Check if agent is running
if curl -sf localhost:9100/healthz | jq -e '.status == "healthy"' &>/dev/null; then
echo "Keldron agent is already running. Skipping to Step 4."
exit 0
fi
Step 3: Install agent
Download the release binary for the OS/arch into the current directory, then start in local mode. Example:
ARCH=$(uname -m)
if [ "$OS" = "Darwin" ]; then
BINARY="keldron-agent-darwin-arm64"
elif [ "$ARCH" = "x86_64" ]; then
BINARY="keldron-agent-linux-amd64"
else
BINARY="keldron-agent-linux-arm64"
fi
if [ "$OS" = "Darwin" ]; then
curl -sfL "https://github.com/keldron-ai/keldron-agent/releases/latest/download/${BINARY}" -o keldron-agent
chmod +x keldron-agent
./keldron-agent --local &
sleep 3
fi
if [ "$OS" = "Linux" ]; then
if command -v docker &>/dev/null; then
docker rm -f keldron-agent 2>/dev/null || true
if ! docker run -d --name keldron-agent --restart unless-stopped \
-p 9100:9100 -p 9200:9200 -p 8081:8081 \
-e KELDRON_OUTPUT_PROMETHEUS_HOST=0.0.0.0 \
-e KELDRON_API_HOST=0.0.0.0 \
-e KELDRON_HEALTH_BIND=0.0.0.0:8081 \
ghcr.io/keldron-ai/keldron-agent:latest; then
echo "Error: Failed to start keldron-agent container. Check Docker permissions and network."
exit 1
fi
else
curl -sfL "https://github.com/keldron-ai/keldron-agent/releases/latest/download/${BINARY}" -o keldron-agent
chmod +x keldron-agent
./keldron-agent --local &
fi
sleep 3
fi
curl -sf localhost:9100/healthz | jq -e '.status == "healthy"'
Report initial readings (temperature, utilization, risk score) from Quick status queries once healthy.
Step 4: Offer cloud connection
Guide the user conversationally:
- Account: Ask: Do you have a Keldron Cloud account? You can sign up free at https://app.keldron.ai
- If yes: Run
keldron-agent login— you can use email/password or paste your API key (option 2 in the menu). - If no: Sign up at https://app.keldron.ai (GitHub login available), then run
keldron-agent login. - Verify: Run
keldron-agent whoamito confirm you're connected. - Restart the agent so it picks up credentials and begins streaming.
If they are not yet interested, summarize value:
if [ -n "$CLOUD_KEY" ]; then
echo "Cloud already connected."
else
echo "Want to connect to Keldron Cloud?"
echo " • 180-day telemetry history"
echo " • Fleet analytics and device comparison"
echo " • Device health tracking"
echo " • Proactive fleet alerts"
echo " • Dashboard at app.keldron.ai"
fi