Eval-Driven Development

Build AI features test-first using evals — define success criteria before writing prompts or model code.

4.7(38 reviews)

14,200 installs

by Community

About

Brings test-driven discipline to AI feature development. Writes evals first (input/expected output pairs), runs them against multiple model/prompt variants, and reports pass rates. Helps you avoid the 'it worked once' fallacy when building with LLMs.

Skill Instructions Preview

# Eval-Driven Development

Build AI features test-first using evals.

## Process
1. Define the task in one sentence
2. Write 5-20 input/expected-output pairs
3. Define grading function (exact match, fuzzy, LLM-as-judge)
4. Implement v1 of the prompt or chain
5. Run evals → record pass rate
6. Iterate on prompt → re-run
7. Track pass rate over time

## Eval Format
\`\`\`json
{
  "input": "...",
  "expected": "...",
  "tags": ["edge-case"]
}
\`\`\`

Never ship a prompt change without running the eval suite.

Related Skills

Skill⭐

testing

FREE

Playwright Test Generator

Generate end-to-end Playwright tests from a feature description or by exploring a live web page.

playwrighte2etesting

Microsoft

4.8(142)

56k

★3.2k

SkillNEW⭐

FREE

AGENTS.md Creator

Generate an AGENTS.md file giving AI assistants the context they need to work effectively in your repo.

agentscontextonboarding

Community

4.9(52)

19k

★1.4k

Install

# Add as Claude Code slash command:
curl -fsSL "https://raw.githubusercontent.com/github/awesome-copilot/main/skills/eval-driven-dev/SKILL.md" \
  -o ~/.claude/commands/eval-driven-dev.md

View source on GitHub →

Compatible with

claude code

Trigger phrase

/eval