Pydantic Evals
Overview
Pydantic Evals provides rigorous testing and evaluation for AI agents and LLM outputs using a code-first approach with Pydantic models. It enables "Evaluation-Driven Development" (EDD) where evaluation suites live alongside application code, subject to version control and CI/CD.
Core Concepts
Understand these key primitives:
Case
A single test scenario with inputs, optional expected output, and metadata.
from pydantic_evals import Case
case =
[Description truncada. Veja o README completo no GitHub.]