eval-harness

Name: eval-harness
Rating: 5 (2 reviews)
Author: bg-szy

DevOps e Infra

Author: bg-szy

Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles

2stars

Updated last month

View on GitHub ↗

How to add

/plugin marketplace add bg-szy/TOP-SKILLS

The exact command may vary by repository. Check the README on GitHub.

For the skill author

Drop this on your repo README

Shows your skill is listed on Skillteca, generates a backlink and trackable traffic.

[![Listada na Skillteca](https://www.skillteca.com.br/api/badge/eval-harness-bg-szy/svg)](https://www.skillteca.com.br/skills/eval-harness-bg-szy?utm_source=badge&utm_medium=readme&utm_campaign=badge)

Related skills

See all in DevOps e Infra →

internal-comms

153.1k

Resources to assist in writing various internal communications, adhering to company-preferred formats. Claude should utilize this skill for status reports, leadership updates, newsletters, FAQs, and other internal documents.

DevOps e Infraby anthropics

babysit

83.4k

Monitors a pull request or review cycle until it is ready to merge. This skill is used to track PR comments, reviews, and CI status until all actionable issues are resolved.

DevOps e Infra#aiby thedotmack

do

83.4k

Execute a phased implementation plan using subagents. Use when asked to execute, run, or carry out a plan — especially one created by make-plan.

DevOps e Infra#aiby thedotmack

smart-explore

83.4k

Token-optimized structural code search using tree-sitter AST parsing. Use this instead of reading full files when you need to understand code structure, find functions, or explore a codebase efficiently.

DevOps e Infra#aiby thedotmack

Category alert

Get new DevOps e Infra skills every Monday

One short email with only the new DevOps e Infra skills. 4 minutes of reading, no spam, unsubscribe with one click.

You confirm your email on the first send. No spam. Unsubscribe with one click.

Eval Harness Skill

A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles.

When to Activate

Setting up eval-driven development (EDD) for AI-assisted workflows
Defining pass/fail criteria for Claude Code task completion
Measuring agent reliability with pass@k metrics
Creating regression test suites for prompt or agent changes
Benchmarking agent performance across model versions

Philosophy

Eval-Driven Development treats

[Description truncada. Veja o README completo no GitHub.]

ShareX LinkedIn

Comments · No comments

No comments yet. Be the first.