Skill
testing
✓ Verified
NEW
Free
Eval-Driven Development
Build AI features test-first using evals — define success criteria before writing prompts or model code.
4.7(38 reviews)
14,200 installs
by Community
About
Brings test-driven discipline to AI feature development. Writes evals first (input/expected output pairs), runs them against multiple model/prompt variants, and reports pass rates. Helps you avoid the 'it worked once' fallacy when building with LLMs.
Tags
aievalstddtestingllm
Skill Instructions Preview
# Eval-Driven Development
Build AI features test-first using evals.
## Process
1. Define the task in one sentence
2. Write 5-20 input/expected-output pairs
3. Define grading function (exact match, fuzzy, LLM-as-judge)
4. Implement v1 of the prompt or chain
5. Run evals → record pass rate
6. Iterate on prompt → re-run
7. Track pass rate over time
## Eval Format
\`\`\`json
{
"input": "...",
"expected": "...",
"tags": ["edge-case"]
}
\`\`\`
Never ship a prompt change without running the eval suite.Related Skills
Skill⭐
testingFREE
Playwright Test Generator
Generate end-to-end Playwright tests from a feature description or by exploring a live web page.
playwrighte2etesting
M
Microsoft4.8(142)
56k
★3.2k
Install
# Add as Claude Code slash command: curl -fsSL "https://raw.githubusercontent.com/github/awesome-copilot/main/skills/eval-driven-dev/SKILL.md" \ -o ~/.claude/commands/eval-driven-dev.md
Compatible with
claude code
Trigger phrase
/evalC
Community
@awesome-copilot-community