Here is a class string from a pull request that shipped last week at a company you have probably heard of. Read it the way your code reviewer did — quickly, on a phone, between meetings.
Nothing wrong, on the face of it. Eleven Tailwind utilities, one arbitrary color value. The color even looks brand-correct. It passed review. It merged.
Seven weeks later the brand team shifts primary to #173F62. The token updates. The tokens cascade. This button does not move. Neither do the 134 others like it. Nobody can explain why the brand looks inconsistent across the product — the tokens are right, the design is right, the code is what lies.
What arbitrary values actually are
Tailwind ships a finite scale of utilities — bg-blue-500, p-4, rounded-lg. When a design calls for a value the scale does not cover, Tailwind gives you the escape hatch: arbitrary values in square brackets.
The feature exists for good reasons. Real designs have edge cases. Some CSS properties do not have a corresponding utility. Iterating quickly is worth a bypass now and then. Tailwind's own docs recommend arbitrary values for exactly these moments.
The problem is not the escape hatch. The problem is the proportion. When arbitrary values are 2% of your class strings, your design system is intact. When they are 30%, you no longer have a design system — you have a colour-coordinated accident. Most teams never find out which side of that line they are on.
Three archetypes of drift
Arbitrary values do not enter a codebase uniformly. Three patterns account for almost all real-world drift. Learn to recognise them in a diff and you will catch 90% of the damage before it merges.
The one-off exception that sticks around
A designer eyeballed 17 pixels in Figma. 16 looked a touch tight, 20 a touch loose. The engineer — correctly, under a deadline — typed p-[17px]. The card shipped. Everyone forgot. The component library now quietly contains one location where spacing does not come from the scale — and every component built by copy-paste from it inherits the deviation.
Cost: almost nothing the day it ships. Real cost: discovered the day someone asks why padding looks different on the settings page.
The shadow scale
You have text-sm (14px), text-base (16px), and text-lg (18px) as tokens. Your codebase has 13, 15, and 17 hiding between them. Each one was the path of least resistance on some Tuesday. Together they form a parallel typographic scale nobody wrote down, nobody owns, and nobody can refactor because removing any single instance might be the one the design team actually wanted.
Cost: your type ramp is now seven values, not three. Your vertical rhythm is an illusion.
The hex-outside-palette
Most damagingThe value is correct. The mechanism is wrong. The token exists, the engineer typed the hex anyway — probably because their editor offered to autocomplete from the style guide rather than from the Tailwind config. When the brand team shifts primary to #173F62, every component using bg-brand updates. This one does not. The design system quietly breaks its only real promise: change the token, change the product.
Cost: your design system is now suggestions, not infrastructure. Nobody notices until the rebrand.
The design system you have is whatever survives your last thirty AI-generated pull requests.
An observation that keeps getting more true
Why AI coding agents amplify this tenfold
Before AI wrote your UI, arbitrary values accumulated at human speed — one per sprint, one per refactor, one per tired Friday. Code review caught most of them because the diff was small enough for a human to read, and because the engineer writing the component knew the token scale by heart.
That constraint is gone. Claude Code, Cursor, Windsurf, and Codex know Tailwind's syntax perfectly and know your design system not at all. Ask any of them to "add a card component with a subtle accent border" and you get back fluent, runnable code with an arbitrary colour, an arbitrary padding, and a rounded corner that does not match any of your four token radii.
This is not a model-quality problem. It is a context problem. The agent does not know what your tokens are, so it defaults to the most specific value it can generate — a hex, a pixel count, a raw CSS string. Every one of them is technically correct. Every one of them adds a line to your shadow scale.
The math, rough but directional
- 20%of drift caught by design review — only the parts that look obviously wrong in Figma diffs.
- 10%caught by code review — when the PR is small enough to read and the reviewer happens to know the scale.
- 70%ships. Compounds. Becomes the shadow scale nobody owns.
The only durable fix is a deterministic check that runs on every diff, knows your token scale, and surfaces the three archetypes before a human ever looks at the PR — ideally before the agent even finishes writing.
What the existing tools actually catch
Most teams already run a linter. The honest answer to "do we need another one?" is: the tools you have are excellent at what they were built for. None of them were built for this.
| Tool | Hex outside palette | Shadow scale | One-off exception | Autofix |
|---|---|---|---|---|
stylelint CSS-level linting Reads authored CSS. Never sees Tailwind utility classes. | None | None | None | None |
eslint-plugin-tailwindcss Class order + duplicates Sorts classes and flags duplicates. Does not evaluate whether a class exists in your scale. | None | None | None | Partial |
Prettier (tailwind plugin) Formatting only Reorders classes. No semantic analysis. | None | None | None | None |
Code review Humans on small diffs Catches the obvious cases, misses the drift that compounds. | Partial | None | Partial | None |
Deslint no-arbitrary-* rule family Evaluates every className against your imported token scale. Flags deterministically. | Full | Full | Full | Partial |
Keep stylelint. Keep Prettier. They do their jobs well. They just do not see what an agent is generating into your JSX at commit time. That is the layer that needs its own check.
How to lint arbitrary values deterministically
Deslint approaches the problem in three moves. Import the token scale. Flag anything outside it. Leave the semantic choice of which token is right to a human.
1. Import your token scale as configuration
Deslint reads a .deslintrc.json that knows your colours, spacing, type, and radii. You can write it by hand, but the point of deslint import-tokens is that you do not have to.
# Figma Variables
npx deslint import-tokens --figma <file-id> --format deslintrc
# Style Dictionary
npx deslint import-tokens --style-dictionary ./tokens --format deslintrcThe command prints a per-bucket summary — how many colours, radii, spacing values, and font families it found, and which rules each bucket unlocks. Merge the emitted fragment into .deslintrc.json and you are done with configuration.
2. Four rules, one per archetype axis
no-arbitrary-colorscatches: bg-[#1A5276], text-[hsl(210,40%,35%)]no-arbitrary-spacingcatches: p-[17px], mt-[22px], gap-[13px]no-arbitrary-typographycatches: text-[13px], text-[1.05rem], leading-[27px]no-arbitrary-border-radiuscatches: rounded-[11px], rounded-[0.375rem]Each rule reads its allowed set from the imported tokens plus Tailwind's default scale. Anything else is a drift violation with a rule ID, a file, a line, and a column. No judgement, no heuristics — deterministic enough to gate a merge on.
3. What a real run looks like
src/components/Button.tsx
12:21 error 'bg-[#1A5276]' matches existing token 'brand-primary'.
Use 'bg-brand-primary' instead. no-arbitrary-colors
12:48 warning 'p-[17px]' is not on the spacing scale.
Nearest tokens: p-4 (16px), p-5 (20px). no-arbitrary-spacing
14:10 warning 'text-[15px]' is not on the type scale.
Nearest tokens: text-sm (14px), text-base (16px).
no-arbitrary-typography
3 problems (1 error, 2 warnings)
1 error and 0 warnings auto-fixable with `--fix`.
Design Health Score: 88/100Notice what deslint does and does not do. The hex that matches an existing token gets an error and a safe autofix — the token exists, the mechanism is wrong, the swap is unambiguous. The off-scale spacing and typography get warnings with the nearest legal values, but no autofix — choosing between 16 and 20 px is a design decision, not a linter's call.
This is the principle: surface the drift loudly, automate only the fixes that cannot be wrong. The linter that tries to be clever about the rest is the linter your team turns off in month three.
Three commands to measure your own drift
Before you decide whether this matters, measure. Point deslint at your repo and look at the number. A score above 90 means your design system is infrastructure. Below 70, it is suggestions.
# 1. install
npm install --save-dev @deslint/cli
# 2. import your tokens (Figma / Style Dictionary / Stitch)
npx deslint import-tokens --figma <file-id> --format deslintrc
# 3. measure
npx deslint coverageWant deslint inside the AI loop, not after it?
The CLI tells you what drifted. The MCP server tells the agent before it writes. Claude Code, Cursor, Codex, and Windsurf can call deslint as a tool — the same deterministic rules, a single stdio subprocess, zero cloud.
One last honest note
Arbitrary values are not the enemy. The best Tailwind codebases have a small, deliberate population of them — the truly exceptional cases a token scale cannot anticipate. What those codebases have in common is that someone decided, on each one, that the exception was earned.
A deterministic linter is how you get that conversation to happen at the pull request, when it is cheap, instead of at the rebrand, when it is expensive. Everything else — the archetypes, the tooling, the 34 rules — is mechanics in service of that one idea.
Related reading