openai/gpt-5.1-instant
css-bash wins 9/13 cases. Native hint wins 3/13. Empty outputs: 0.
The strongest evidence is not universal output improvement. It is better discovery when a natural UI task needs a recent or long-tail CSS primitive.
css-bash wins 9/13 cases. Native hint wins 3/13. Empty outputs: 0.
css-bash wins 9/13 cases. Native hint wins 2/13. Empty outputs: 0.
css-bash wins 9/13 cases. Native hint wins 4/13. Empty outputs: natural 7, native 5, css-bash 3.
Scored by expected primitive hits minus forbidden workaround hits.
Sonnet missed scoped CSS in natural and native-hint prompts. css-bash found both @scope and to (.
Natural and native-hint used the forbidden [open] selector. css-bash found :open cleanly.
Control tie. Sonnet already knew the typed custom property pattern.
Natural missed the intrinsic accordion stack. css-bash hit interpolate-size, allow-keywords, and ::details-content.
All arms found the primitive, but css-bash avoided forbidden JS measurement patterns.
css-bash found contrast-color() and avoided hardcoded white/black text fallbacks.
Control tie. Sonnet already used current dialog animation primitives.
Natural and native-hint missed class grouping. css-bash found view-transition-class.
Mostly tie. Sonnet often knows sibling-index() once the task is explicit.
Strong win. css-bash found ::scroll-marker, ::scroll-marker-group, and :target-current.
Tie. The model partially found CSS if(), but did not hit the full expected signal set.
css-bash removed forbidden nth-child patterns but still did not get CSS random().
css-bash found reading-flow, but also included forbidden tabindex, so it tied native-hint.
css-bash is strongest when the prompt describes desired behavior and the agent has to discover the CSS primitive. It is weaker as a universal output improver when the prompt already names the exact feature.