deepeval

Name: deepeval
Rating: 5 (1 reviews)
Author: EvXata

A BCG-calibrated evaluation framework for LLM agent outputs, featuring a Codex-native judge and a 4-tier stack with an 8-dimension BCG rubric and a 10-signal novelty stack. It includes an adversarial Skeptic Agent for sycophancy and ambiguity probes, supports day/week/30-day cadences, and integrates into any Codex project without API keys.

1stars

Updated 21 days ago

View on GitHub ↗License: MIT

How to add

/plugin marketplace add EvXata/deepeval-bcg

The exact command may vary by repository. Check the README on GitHub.

For the skill author

Drop this on your repo README

Shows your skill is listed on Skillteca, generates a backlink and trackable traffic.

[![Listada na Skillteca](https://www.skillteca.com.br/api/badge/deepeval/svg)](https://www.skillteca.com.br/skills/deepeval?utm_source=badge&utm_medium=readme&utm_campaign=badge)

#llm #api

Related skills

See all in Design e Frontend →

webapp-testing

143.8k

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

Design e Frontend#testby anthropics

brand-guidelines

143.8k

Applies Anthropic's official brand colors and typography to any artifact that may benefit from its look-and-feel. Use it when brand colors, style guidelines, visual formatting, or company design standards apply.

Design e Frontendby anthropics

frontend-design

143.8k

Creates distinctive, production-grade frontend interfaces with high design quality, generating creative, polished code and UI design that avoids generic AI aesthetics. Use for building web components, pages, and applications, or for styling/beautifying web UIs.

Design e Frontend#css#aiby anthropics

web-artifacts-builder

143.8k

Suite of tools for creating elaborate, multi-component claude.ai HTML artifacts using modern frontend web technologies (React, Tailwind CSS, shadcn/ui). Use for complex artifacts requiring state management, routing, or shadcn/ui components - not for simple single-file HTML/JSX artifacts.

Design e Frontend#css#aiby anthropics

Category alert

Get new Design e Frontend skills every Monday

One short email with only the new Design e Frontend skills. 4 minutes of reading, no spam, unsubscribe with one click.

You confirm your email on the first send. No spam. Unsubscribe with one click.

DeepEval — Codex-native MBB-Grade Quality Framework

This skill scores any LLM-generated artifact against an MBB-grade rubric (BCG-calibrated). It works in any Codex project. No external API keys, no vendor SDKs. Codex itself is the judge.

Three core promises

Tier stack: deterministic → heuristic → Codex judge → human, by cost/latency budget.
BCG-calibrated rubric — 8 dimensions, 1–3 scale, verbatim BCG anchor language.
Day/Week/30-day cadence only. NO

[Description truncada. Veja o README completo no GitHub.]

ShareX LinkedIn

Comments · No comments

No comments yet. Be the first.