CustomAgent.app
Skill
testing
✓ Verified
NEW
Free

Eval-Driven Development

Build AI features test-first using evals — define success criteria before writing prompts or model code.

4.7(38 reviews)
14,200 installs
by Community

About

Brings test-driven discipline to AI feature development. Writes evals first (input/expected output pairs), runs them against multiple model/prompt variants, and reports pass rates. Helps you avoid the 'it worked once' fallacy when building with LLMs.

Tags

aievalstddtestingllm

Skill Instructions Preview

# Eval-Driven Development

Build AI features test-first using evals.

## Process
1. Define the task in one sentence
2. Write 5-20 input/expected-output pairs
3. Define grading function (exact match, fuzzy, LLM-as-judge)
4. Implement v1 of the prompt or chain
5. Run evals → record pass rate
6. Iterate on prompt → re-run
7. Track pass rate over time

## Eval Format
\`\`\`json
{
  "input": "...",
  "expected": "...",
  "tags": ["edge-case"]
}
\`\`\`

Never ship a prompt change without running the eval suite.

Related Skills

Skill
testing
FREE

Playwright Test Generator

Generate end-to-end Playwright tests from a feature description or by exploring a live web page.

playwrighte2etesting
M
Microsoft
4.8(142)
56k
3.2k
SkillNEW
ai
FREE

AGENTS.md Creator

Generate an AGENTS.md file giving AI assistants the context they need to work effectively in your repo.

agentscontextonboarding
C
Community
4.9(52)
19k
1.4k
Install
# Add as Claude Code slash command:
curl -fsSL "https://raw.githubusercontent.com/github/awesome-copilot/main/skills/eval-driven-dev/SKILL.md" \
  -o ~/.claude/commands/eval-driven-dev.md
View source on GitHub →

Compatible with

claude code

Trigger phrase

/eval
C

Community

@awesome-copilot-community

View on GitHub