pydantic-evals

Name: pydantic-evals
Rating: 5 (9 reviews)
Author: Fuenfgeld

Test and evaluate AI agents and LLM outputs using code-first evaluation framework with strong typing. Use when the user wants to: (1) Create evaluation datasets with test cases for AI agents, (2) Define evaluators (deterministic, LLM-as-Judge, custom, or span-based), (3) Run evaluations and generate reports, (4) Compare model performance across experiments, (5) Integrate evaluations with Pydantic

9stars

Updated 6 months ago

View on GitHub ↗License: MIT

How to add

/plugin marketplace add Fuenfgeld/pydantic-ai-skills

The exact command may vary by repository. Check the README on GitHub.

For the skill author

Drop this on your repo README

Shows your skill is listed on Skillteca, generates a backlink and trackable traffic.

[![Listada na Skillteca](https://www.skillteca.com.br/api/badge/pydantic-evals/svg)](https://www.skillteca.com.br/skills/pydantic-evals?utm_source=badge&utm_medium=readme&utm_campaign=badge)

#llm #ai #test

Related skills

See all in Dados e Análise →

xlsx

153.1k

Use this skill for any task involving spreadsheet files as primary input or output, such as opening, reading, editing, fixing, creating, or converting .xlsx, .xlsm, .csv, or .tsv files.

Dados e Análise#xlsxby anthropics

how-it-works

83.4k

This skill explains how claude-mem captures observations, when memory injection occurs, and where its data is stored.

Dados e Análise#aiby thedotmack

mem-search

83.4k

Searches claude-mem's persistent cross-session memory database. Use this to answer questions about previous solutions or retrieve work from past sessions.

Dados e Análise#aiby thedotmack

weekly-digests

83.4k

Generates a week-by-week narrative digest of a project's Claude-mem timeline, splitting it into ISO-week files and using subagents to produce weekly chapters. Ideal for "weekly digests" or "narrative chapters" of a project's history.

Dados e Análise#aiby thedotmack

Category alert

Get new Dados e Análise skills every Monday

One short email with only the new Dados e Análise skills. 4 minutes of reading, no spam, unsubscribe with one click.

You confirm your email on the first send. No spam. Unsubscribe with one click.

Pydantic Evals

Overview

Pydantic Evals provides rigorous testing and evaluation for AI agents and LLM outputs using a code-first approach with Pydantic models. It enables "Evaluation-Driven Development" (EDD) where evaluation suites live alongside application code, subject to version control and CI/CD.

Core Concepts

Understand these key primitives:

Case

A single test scenario with inputs, optional expected output, and metadata.

from pydantic_evals import Case

case = 

[Description truncada. Veja o README completo no GitHub.]

ShareX LinkedIn

Comments · No comments

No comments yet. Be the first.