AI Gateway batch . 2026-05-25

Does css-bash actually improve agent CSS?

The strongest evidence is not universal output improvement. It is better discovery when a natural UI task needs a recent or long-tail CSS primitive.

cheap clean

openai/gpt-5.1-instant

natural

0.54

native

0.77

css-bash

1.77

css-bash wins 9/13 cases. Native hint wins 3/13. Empty outputs: 0.

2026-05-25T16-24-48Z

frontier clean

anthropic/claude-sonnet-4.6

natural

0.38

native

0.46

css-bash

1.62

css-bash wins 9/13 cases. Native hint wins 2/13. Empty outputs: 0.

2026-05-25T17-38-57Z

default partial

openai/gpt-5.1-codex-mini

natural

0.00

native

0.46

css-bash

1.00

css-bash wins 9/13 cases. Native hint wins 4/13. Empty outputs: natural 7, native 5, css-bash 3.

2026-05-25T18-18-48Z

Frontier model case deltas

Sonnet through AI Gateway

Scored by expected primitive hits minus forbidden workaround hits.

9/13 css-bash wins

scoped-widget-boundary

@scope

natural

native

css

win

Sonnet missed scoped CSS in natural and native-hint prompts. css-bash found both @scope and to (.

details-native-open-selector

:open

natural

-1

native

-1

css

win

Natural and native-hint used the forbidden [open] selector. css-bash found :open cleanly.

typed-property-progress-ring

@property

natural

native

css

tie

Control tie. Sonnet already knew the typed custom property pattern.

intrinsic-details-accordion

interpolate-size

natural

native

css

win

Natural missed the intrinsic accordion stack. css-bash hit interpolate-size, allow-keywords, and ::details-content.

textarea-native-autogrow

field-sizing

natural

native

css

win

All arms found the primitive, but css-bash avoided forbidden JS measurement patterns.

auto-contrast-runtime-badges

contrast-color()

natural

native

css

win

css-bash found contrast-color() and avoided hardcoded white/black text fallbacks.

top-layer-discrete-dialog

@starting-style

natural

native

css

tie

Control tie. Sonnet already used current dialog animation primitives.

view-transition-class-groups

view-transition-class

natural

native

css

win

Natural and native-hint missed class grouping. css-bash found view-transition-class.

sibling-index-stagger

sibling-index()

natural

native

css

tie

Mostly tie. Sonnet often knows sibling-index() once the task is explicit.

scroll-marker-carousel

::scroll-marker

natural

native

css

win

Strong win. css-bash found ::scroll-marker, ::scroll-marker-group, and :target-current.

if-function-density-card

if()

natural

native

css

tie

Tie. The model partially found CSS if(), but did not hit the full expected signal set.

randomized-note-wall

random()

natural

-2

native

-2

css

win

css-bash removed forbidden nth-child patterns but still did not get CSS random().

reading-flow-dashboard

reading-flow

natural

-1

native

css

win

css-bash found reading-flow, but also included forbidden tabindex, so it tied native-hint.

Codex 5 hour run

strict improved

mixed

ties

regressions

125

paired trials

250

variants

Verdict

Position it as a retrieval and eval harness.

css-bash is strongest when the prompt describes desired behavior and the agent has to discover the CSS primitive. It is weaker as a universal output improver when the prompt already names the exact feature.