Agent Evaluation Framework Builder

Name: Agent Evaluation Framework Builder
Rating: 5 (8 reviews)
Author: Notysoty

Designs an eval suite for an LLM agent or pipeline including success metrics, trajectory scoring, LLM-as-judge setup, and regression test cases.

8stars

Updated 3 months ago

View on GitHub ↗License: MIT

How to add

/plugin marketplace add Notysoty/openagentskills

The exact command may vary by repository. Check the README on GitHub.

For the skill author

Drop this on your repo README

Shows your skill is listed on Skillteca, generates a backlink and trackable traffic.

[![Listada na Skillteca](https://www.skillteca.com.br/api/badge/agent-evaluation-framework-builder/svg)](https://www.skillteca.com.br/skills/agent-evaluation-framework-builder?utm_source=badge&utm_medium=readme&utm_campaign=badge)

#llm #ai #test

Related skills

See all in Design e Frontend →

webapp-testing

143.8k

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

Design e Frontend#testby anthropics

brand-guidelines

143.8k

Applies Anthropic's official brand colors and typography to any artifact that may benefit from its look-and-feel. Use it when brand colors, style guidelines, visual formatting, or company design standards apply.

Design e Frontendby anthropics

frontend-design

143.8k

Creates distinctive, production-grade frontend interfaces with high design quality, generating creative, polished code and UI design that avoids generic AI aesthetics. Use for building web components, pages, and applications, or for styling/beautifying web UIs.

Design e Frontend#css#aiby anthropics

web-artifacts-builder

143.8k

Suite of tools for creating elaborate, multi-component claude.ai HTML artifacts using modern frontend web technologies (React, Tailwind CSS, shadcn/ui). Use for complex artifacts requiring state management, routing, or shadcn/ui components - not for simple single-file HTML/JSX artifacts.

Design e Frontend#css#aiby anthropics

Category alert

Get new Design e Frontend skills every Monday

One short email with only the new Design e Frontend skills. 4 minutes of reading, no spam, unsubscribe with one click.

You confirm your email on the first send. No spam. Unsubscribe with one click.

Agent Evaluation Framework Builder

What this skill does

This skill designs an evaluation framework for an LLM agent or pipeline. Most teams skip evals until something breaks in production — this skill helps you build evals before launch so you have a baseline, catch regressions, and measure quality improvements objectively. It covers dataset construction, metric selection, LLM-as-judge setup, and CI integration.

How to use

Claude Code / Cline

Copy this file to `.agents/skills/ag

[Description truncada. Veja o README completo no GitHub.]

ShareX LinkedIn

Comments · No comments

No comments yet. Be the first.