The hidden cost of Tailwind arbitrary values

Here is a class string from a pull request that shipped last week at a company you have probably heard of. Read it the way your code reviewer did — quickly, on a phone, between meetings.

className="flex items-center gap-3 rounded-lg bg-[#1A5276] px-4 py-2 text-sm font-medium text-white hover:opacity-90 focus:outline-none focus:ring-2 focus:ring-offset-2"

Nothing wrong, on the face of it. Eleven Tailwind utilities, one arbitrary color value. The color even looks brand-correct. It passed review. It merged.

Seven weeks later the brand team shifts primary to #173F62. The token updates. The tokens cascade. This button does not move. Neither do the 134 others like it. Nobody can explain why the brand looks inconsistent across the product — the tokens are right, the design is right, the code is what lies.

What arbitrary values actually are

Tailwind ships a finite scale of utilities — bg-blue-500, p-4, rounded-lg. When a design calls for a value the scale does not cover, Tailwind gives you the escape hatch: arbitrary values in square brackets.

bg-[#1a5276] /* any hex */

p-[17px] /* any length */

grid-cols-[auto_1fr] /* any CSS */

The feature exists for good reasons. Real designs have edge cases. Some CSS properties do not have a corresponding utility. Iterating quickly is worth a bypass now and then. Tailwind's own docs recommend arbitrary values for exactly these moments.

The problem is not the escape hatch. The problem is the proportion. When arbitrary values are 2% of your class strings, your design system is intact. When they are 30%, you no longer have a design system — you have a colour-coordinated accident. Most teams never find out which side of that line they are on.

Three archetypes of drift

Arbitrary values do not enter a codebase uniformly. Three patterns account for almost all real-world drift. Learn to recognise them in a diff and you will catch 90% of the damage before it merges.

The one-off exception that sticks around

A designer eyeballed 17 pixels in Figma. 16 looked a touch tight, 20 a touch loose. The engineer — correctly, under a deadline — typed p-[17px]. The card shipped. Everyone forgot. The component library now quietly contains one location where spacing does not come from the scale — and every component built by copy-paste from it inherits the deviation.

Cost: almost nothing the day it ships. Real cost: discovered the day someone asks why padding looks different on the settings page.

The shadow scale

You have text-sm (14px), text-base (16px), and text-lg (18px) as tokens. Your codebase has 13, 15, and 17 hiding between them. Each one was the path of least resistance on some Tuesday. Together they form a parallel typographic scale nobody wrote down, nobody owns, and nobody can refactor because removing any single instance might be the one the design team actually wanted.

Cost: your type ramp is now seven values, not three. Your vertical rhythm is an illusion.

The hex-outside-palette

Most damaging

// in tokens:

--brand-primary: #1A5276

// in the button that shipped:

The value is correct. The mechanism is wrong. The token exists, the engineer typed the hex anyway — probably because their editor offered to autocomplete from the style guide rather than from the Tailwind config. When the brand team shifts primary to #173F62, every component using bg-brand updates. This one does not. The design system quietly breaks its only real promise: change the token, change the product.

Cost: your design system is now suggestions, not infrastructure. Nobody notices until the rebrand.

The design system you have is whatever survives your last thirty AI-generated pull requests.
An observation that keeps getting more true

Why AI coding agents amplify this tenfold

Before AI wrote your UI, arbitrary values accumulated at human speed — one per sprint, one per refactor, one per tired Friday. Code review caught most of them because the diff was small enough for a human to read, and because the engineer writing the component knew the token scale by heart.

That constraint is gone. Claude Code, Cursor, Windsurf, and Codex know Tailwind's syntax perfectly and know your design system not at all. Ask any of them to "add a card component with a subtle accent border" and you get back fluent, runnable code with an arbitrary colour, an arbitrary padding, and a rounded corner that does not match any of your four token radii.

This is not a model-quality problem. It is a context problem. The agent does not know what your tokens are, so it defaults to the most specific value it can generate — a hex, a pixel count, a raw CSS string. Every one of them is technically correct. Every one of them adds a line to your shadow scale.

The math, rough but directional

20%of drift caught by design review — only the parts that look obviously wrong in Figma diffs.
10%caught by code review — when the PR is small enough to read and the reviewer happens to know the scale.
70%ships. Compounds. Becomes the shadow scale nobody owns.

The only durable fix is a deterministic check that runs on every diff, knows your token scale, and surfaces the three archetypes before a human ever looks at the PR — ideally before the agent even finishes writing.

What the existing tools actually catch

Most teams already run a linter. The honest answer to "do we need another one?" is: the tools you have are excellent at what they were built for. None of them were built for this.

Tool	Hex outside palette	Shadow scale	One-off exception	Autofix
stylelint CSS-level linting Reads authored CSS. Never sees Tailwind utility classes.	None	None	None	None
eslint-plugin-tailwindcss Class order + duplicates Sorts classes and flags duplicates. Does not evaluate whether a class exists in your scale.	None	None	None	Partial
Prettier (tailwind plugin) Formatting only Reorders classes. No semantic analysis.	None	None	None	None
Code review Humans on small diffs Catches the obvious cases, misses the drift that compounds.	Partial	None	Partial	None
Deslint no-arbitrary-* rule family Evaluates every className against your imported token scale. Flags deterministically.	Full	Full	Full	Partial

Full coveragePartialNot in scope

Keep stylelint. Keep Prettier. They do their jobs well. They just do not see what an agent is generating into your JSX at commit time. That is the layer that needs its own check.

How to lint arbitrary values deterministically

Deslint approaches the problem in three moves. Import the token scale. Flag anything outside it. Leave the semantic choice of which token is right to a human.

1. Import your token scale as configuration

Deslint reads a .deslintrc.json that knows your colours, spacing, type, and radii. You can write it by hand, but the point of deslint import-tokens is that you do not have to.

# Figma Variables
npx deslint import-tokens --figma <file-id>  --format deslintrc

# Style Dictionary
npx deslint import-tokens --style-dictionary ./tokens --format deslintrc

The command prints a per-bucket summary — how many colours, radii, spacing values, and font families it found, and which rules each bucket unlocks. Merge the emitted fragment into .deslintrc.json and you are done with configuration.

2. Four rules, one per archetype axis

no-arbitrary-colorscatches: bg-[#1A5276], text-[hsl(210,40%,35%)]

no-arbitrary-spacingcatches: p-[17px], mt-[22px], gap-[13px]

no-arbitrary-typographycatches: text-[13px], text-[1.05rem], leading-[27px]

no-arbitrary-border-radiuscatches: rounded-[11px], rounded-[0.375rem]

Each rule reads its allowed set from the imported tokens plus Tailwind's default scale. Anything else is a drift violation with a rule ID, a file, a line, and a column. No judgement, no heuristics — deterministic enough to gate a merge on.

3. What a real run looks like

$ npx deslint scan ./src/components/Button.tsx

src/components/Button.tsx
  12:21  error    'bg-[#1A5276]' matches existing token 'brand-primary'.
                  Use 'bg-brand-primary' instead.            no-arbitrary-colors
  12:48  warning  'p-[17px]' is not on the spacing scale.
                  Nearest tokens: p-4 (16px), p-5 (20px).     no-arbitrary-spacing
  14:10  warning  'text-[15px]' is not on the type scale.
                  Nearest tokens: text-sm (14px), text-base (16px).
                                                              no-arbitrary-typography

3 problems (1 error, 2 warnings)
1 error and 0 warnings auto-fixable with `--fix`.

Design Health Score: 88/100

Notice what deslint does and does not do. The hex that matches an existing token gets an error and a safe autofix — the token exists, the mechanism is wrong, the swap is unambiguous. The off-scale spacing and typography get warnings with the nearest legal values, but no autofix — choosing between 16 and 20 px is a design decision, not a linter's call.

This is the principle: surface the drift loudly, automate only the fixes that cannot be wrong. The linter that tries to be clever about the rest is the linter your team turns off in month three.

Three commands to measure your own drift

Before you decide whether this matters, measure. Point deslint at your repo and look at the number. A score above 90 means your design system is infrastructure. Below 70, it is suggestions.

# 1. install
npm install --save-dev @deslint/cli

# 2. import your tokens (Figma / Style Dictionary / Stitch)
npx deslint import-tokens --figma <file-id> --format deslintrc

# 3. measure
npx deslint coverage

Want deslint inside the AI loop, not after it?

The CLI tells you what drifted. The MCP server tells the agent before it writes. Claude Code, Cursor, Codex, and Windsurf can call deslint as a tool — the same deterministic rules, a single stdio subprocess, zero cloud.

Claude Code setup Cursor setup OpenAI Codex setup Windsurf setup

One last honest note

Arbitrary values are not the enemy. The best Tailwind codebases have a small, deliberate population of them — the truly exceptional cases a token scale cannot anticipate. What those codebases have in common is that someone decided, on each one, that the exception was earned.

A deterministic linter is how you get that conversation to happen at the pull request, when it is cheap, instead of at the rebrand, when it is expensive. Everything else — the archetypes, the tooling, the 62 rules — is mechanics in service of that one idea.