Improving Claude's Frontend Design Skill

Teaching Claude to Design Better

Description

Anthropic open-sources the skill prompts that tell Claude how to write frontend code. I thought the original was leaving performance on the table, so I rewrote it from scratch. The core insight: prompts fail when they assume the model knows things it doesn't, or tell it to do things that contradict how it actually generates code. I stripped those contradictions out and rebuilt the architecture around what the model can actually do. Then I proved it worked. Blind A/B testing across 50+ comparisons, scored by an evaluator that couldn't see which version produced which output. 75% win rate over Anthropic's original (p = 0.0063).

More context

This project began with one sentence in Anthropic's frontend design skill: "Never converge on common design patterns." The intention made sense. The instruction did not. Claude cannot remember every design it has generated across sessions, so the prompt was asking for a kind of self-awareness the model does not have.

The rewrite focused on instructions Claude can actually execute inside one generation. Instead of asking for impossible global memory, it asks for visible exploration, stronger defaults, concrete layout decisions, and less generic frontend taste.

SkillEval grew out of the same problem. If you rewrite a skill, you need a way to test whether the rewrite helps. The frontend skill became the first test case for a more general question: how do you improve an agent skill without lying to yourself?

Facts

Links

Media

Tags