SSkilltecabyclaudinhocode
Enviar skill
← Voltar para o catálogo

keldron-agent

Dados e Análise

GPU monitoring with risk intelligence. Local + cloud fleet monitoring, health tracking, proactive alerts, and AI-powered fleet analytics.

6estrelas
Ver no GitHub ↗Autor: keldron-aiLicença: Apache-2.0

Keldron Agent — GPU Monitoring with Risk Intelligence

1. Overview

Keldron Agent is a vendor-neutral GPU monitoring agent that runs locally and exposes real-time telemetry and risk scores via a Prometheus endpoint. It supports Apple Silicon (M1–M5), NVIDIA consumer GPUs (RTX 3090/4090/5090), NVIDIA datacenter (H100/B200), AMD GPUs, and any Linux machine.

Dual mode: Use the local agent on localhost:9100 for fast, real-time, single-device queries (works offline). Use Keldron Cloud (https://api.keldron.ai) for fleet overview, historical telemetry, analytics, and proactive fleet monitoring when an API key is configured.

No sudo required on any platform for the agent binary. On Linux, Docker may require sudo or membership in the docker group — see Docker post-install or rootless Docker if you hit permission errors.

Use this skill when the user wants to:

  • Monitor GPU temperature, power, utilization, or memory
  • Get risk assessments for their GPU
  • Track power costs
  • Set up alerts or watch a fleet (via cloud polling)
  • Open the real dashboard (local or cloud)
  • Ask fleet, history, or analytics questions

Mode detection (run at the start of an interaction)

Field names differ by endpoint — see Cloud API field names before writing jq filters.

# Check 1: Is the local agent running?
LOCAL_AGENT=$(curl -sf localhost:9100/healthz 2>/dev/null | jq -r '.status' 2>/dev/null)

# Check 2: Is cloud configured? (env → ~/.keldron/credentials from login → YAML)
CLOUD_KEY="${KELDRON_CLOUD_API_KEY:-}"
CLOUD_ENDPOINT=""
if [ -z "$CLOUD_KEY" ] && [ -f ~/.keldron/credentials ] && command -v jq &>/dev/null; then
  CLOUD_KEY=$(jq -r '.api_key // ""' ~/.keldron/credentials 2>/dev/null)
  CLOUD_ENDPOINT=$(jq -r '.endpoint // ""' ~/.keldron/credentials 2>/dev/null)
fi
if [ -z "$CLOUD_KEY" ]; then
  if command -v yq &>/dev/null; then
    CLOUD_KEY=$(yq '.cloud.api_key // ""' ~/.config/keldron/keldron-agent.yaml 2>/dev/null)
    CLOUD_ENDPOINT=$(yq '.cloud.endpoint // ""' ~/.config/keldron/keldron-agent.yaml 2>/dev/null)
  else
    CLOUD_KEY=$(grep -A3 'cloud:' ~/.config/keldron/keldron-agent.yaml 2>/dev/null \
      | grep 'api_key:' | awk '{print $2}' | tr -d "\"'" | xargs 2>/dev/null)
    CLOUD_ENDPOINT=$(grep -A3 'cloud:' ~/.config/keldron/keldron-agent.yaml 2>/dev/null \
      | grep 'endpoint:' | awk '{print $2}' | tr -d "\"'" | xargs 2>/dev/null)
  fi
fi
CLOUD_ENDPOINT="${CLOUD_ENDPOINT:-https://api.keldron.ai}"
if [ -z "$CLOUD_KEY" ]; then
  echo "No cloud API key found. Run keldron-agent login, set KELDRON_CLOUD_API_KEY (non-interactive login or streaming), or add cloud.api_key to ~/.config/keldron/keldron-agent.yaml. Sign up at https://app.keldron.ai"
fi

# Check 3: Does cloud respond? (only if we have a key)
CLOUD_OK=""
if [ -n "$CLOUD_KEY" ]; then
  CLOUD_OK=$(curl -sf "${CLOUD_ENDPOINT}/health" 2>/dev/null | jq -r '.status' 2>/dev/null)
fi

Mode priority:

  • Both (healthy local + cloud) → Cloud for fleet, history, analytics, proactive fleet loops; local for instantaneous single-device metrics.
  • Cloud only → Use cloud for everything that needs the API; mention local agent if they want lower-latency realtime on that machine.
  • Local only → Use localhost:9100 for realtime; naturally mention cloud for history, fleet, and alerts: "That needs historical data — connect at app.keldron.ai."
  • Neither → Run the Auto-setup flow.

2. Installation

Prefer a GitHub release binary (keldron-agent). To build from source, clone the repo and run make build (requires Go and Node.js for the full dashboard).

Mac (Apple Silicon)

curl -sfL https://github.com/keldron-ai/keldron-agent/releases/latest/download/keldron-agent-darwin-arm64 -o keldron-agent
chmod +x keldron-agent

Linux (AMD64)

curl -sfL https://github.com/keldron-ai/keldron-agent/releases/latest/download/keldron-agent-linux-amd64 -o keldron-agent
chmod +x keldron-agent

Linux (ARM64)

curl -sfL https://github.com/keldron-ai/keldron-agent/releases/latest/download/keldron-agent-linux-arm64 -o keldron-agent
chmod +x keldron-agent

Linux (Docker)

docker rm -f keldron-agent 2>/dev/null || true
docker run -d --name keldron-agent --restart unless-stopped \
  -p 9100:9100 -p 9200:9200 -p 8081:8081 \
  -e KELDRON_OUTPUT_PROMETHEUS_HOST=0.0.0.0 \
  -e KELDRON_API_HOST=0.0.0.0 \
  -e KELDRON_HEALTH_BIND=0.0.0.0:8081 \
  ghcr.io/keldron-ai/keldron-agent:latest

Verify installation

./keldron-agent --version

3. Auto-setup flow

Trigger phrases: "monitor my hardware", "set up monitoring", "install keldron", "get started", "help me set up"

Step 1: Detect environment

OS=$(uname -s)
if [ "$OS" = "Darwin" ]; then
  CHIP=$(sysctl -n machdep.cpu.brand_string 2>/dev/null || echo "unknown")
  echo "Detected: macOS with $CHIP"
elif command -v nvidia-smi &>/dev/null; then
  GPU=$(nvidia-smi --query-gpu=name --format=csv,noheader 2>/dev/null | head -1)
  echo "Detected: Linux with NVIDIA $GPU"
else
  echo "Detected: Linux (generic thermal monitoring)"
fi

Step 2: Check if agent is running

if curl -sf localhost:9100/healthz | jq -e '.status == "healthy"' &>/dev/null; then
  echo "Keldron agent is already running. Skipping to Step 4."
  exit 0
fi

Step 3: Install agent

Download the release binary for the OS/arch into the current directory, then start in local mode. Example:

ARCH=$(uname -m)

if [ "$OS" = "Darwin" ]; then
  BINARY="keldron-agent-darwin-arm64"
elif [ "$ARCH" = "x86_64" ]; then
  BINARY="keldron-agent-linux-amd64"
else
  BINARY="keldron-agent-linux-arm64"
fi

if [ "$OS" = "Darwin" ]; then
  curl -sfL "https://github.com/keldron-ai/keldron-agent/releases/latest/download/${BINARY}" -o keldron-agent
  chmod +x keldron-agent
  ./keldron-agent --local &
  sleep 3
fi

if [ "$OS" = "Linux" ]; then
  if command -v docker &>/dev/null; then
    docker rm -f keldron-agent 2>/dev/null || true
    if ! docker run -d --name keldron-agent --restart unless-stopped \
      -p 9100:9100 -p 9200:9200 -p 8081:8081 \
      -e KELDRON_OUTPUT_PROMETHEUS_HOST=0.0.0.0 \
      -e KELDRON_API_HOST=0.0.0.0 \
      -e KELDRON_HEALTH_BIND=0.0.0.0:8081 \
      ghcr.io/keldron-ai/keldron-agent:latest; then
      echo "Error: Failed to start keldron-agent container. Check Docker permissions and network."
      exit 1
    fi
  else
    curl -sfL "https://github.com/keldron-ai/keldron-agent/releases/latest/download/${BINARY}" -o keldron-agent
    chmod +x keldron-agent
    ./keldron-agent --local &
  fi
  sleep 3
fi

curl -sf localhost:9100/healthz | jq -e '.status == "healthy"'

Report initial readings (temperature, utilization, risk score) from Quick status queries once healthy.

Step 4: Offer cloud connection

Guide the user conversationally:

  1. Account: Ask: Do you have a Keldron Cloud account? You can sign up free at https://app.keldron.ai
  2. If yes: Run keldron-agent login — you can use email/password or paste your API key (option 2 in the menu).
  3. If no: Sign up at https://app.keldron.ai (GitHub login available), then run keldron-agent login.
  4. Verify: Run keldron-agent whoami to confirm you're connected.
  5. Restart the agent so it picks up credentials and begins streaming.

If they are not yet interested, summarize value:

if [ -n "$CLOUD_KEY" ]; then
  echo "Cloud already connected."
else
  echo "Want to connect to Keldron Cloud?"
  echo "  • 180-day telemetry history"
  echo "  • Fleet analytics and device comparison"
  echo "  • Device health tracking"
  echo "  • Proactive fleet alerts"
  echo "  • Dashboard at app.keldron.ai"
fi

Step 5 optional: env o

Como adicionar

/plugin marketplace add keldron-ai/keldron-agent

O comando exato pode variar conforme o repositório. Confira o README no GitHub.

Comentários · Nenhum comentário

Entre para comentar. Entrar

  • Ainda não há comentários. Seja o primeiro.