Crafter Station benchmark

Which agent writes better modern CSS?

Blind A/B voting across 60 current CSS challenges. Compare vanilla Codex outputs against Codex with css-bash context, then inspect the examples and evidence behind the result.

62 votes / 2 voters / last vote 39d ago

Benchmark console

60 tasks, 120 outputs

css-bash vs vanilla

rounds

challenges

120

outputs

Popover toolbar with viewport fallbacks anchor-positioning Ready Native details accordion to auto height interpolate-size Ready Progress bar and chapter tracker scroll-driven Ready

Examples

Inspect both outputs

Browse rounds, prompts, judge verdicts and rendered HTML without signing in.

Community

See aggregate votes

Track css-bash, vanilla and tie preferences across every challenge.

Evidence

Read the model runs

Check the AI Gateway batches and long-run evidence from the experiment.