Skip to main content

fallow_engine/health/styling_score/
mod.rs

1//! Styling-health score: a SECOND health axis derived purely from the structural
2//! CSS analytics ([`CssAnalyticsReport`]), orthogonal to the JS/TS code-health
3//! score. Mirrors the code-score shape (`vital_signs::compute_health_score`):
4//! start at 100, subtract capped per-category penalties, map the result to a
5//! letter grade via the shared [`letter_grade`]. The code score is never touched.
6//!
7//! # Penalty rubric (v3, [`STYLING_HEALTH_FORMULA_VERSION`])
8//!
9//! The weights below were recalibrated from v1 (v1 -> v2 re-normalized
10//! `dead_surface` and `token_erosion`; v2 -> v3 re-weighted the duplication
11//! family toward value DRIFT, see below) after running across real projects
12//! (government design systems plus small Tailwind apps); any further
13//! recalibration bumps [`STYLING_HEALTH_FORMULA_VERSION`] (the same versioning
14//! discipline the code score uses). Every category is capped and evaluated
15//! against the analytics that are always present once `--css` ran, so there is
16//! no "missing pipeline" partial-credit case to model.
17//!
18//! | Category | Cap | Signal | Scaling |
19//! |---|---|---|---|
20//! | `duplication` | 20pt | `summary.duplicate_declarations_total` (removable declarations from copy-paste blocks) | `total / max(non_atomic_declarations, 1) * 80`, capped (v3 down-weighted from `* 200`: exact CSS duplication is the least-harmful pattern, so ~25% of declarations removable = full 20pt, a soft hint not a dominant term; the non-atomic denominator is the 3c atomic-exclusion behavior) |
21//! | `dead_surface` | 20pt | (a) unused `@theme` tokens, as a share of all defined `@theme` tokens; (b) unreferenced classes + unused at-rules + dead `@font-face`, as a share of `total_declarations` | token term `min(unused_theme_tokens / max(theme_tokens_defined, 1) * 15, 15)` + other term `min(other_dead / max(non_atomic_declarations, 1) * 150, 8)`, summed then capped at 20 (the token term is a *per-population* death ratio so a handful of dead tokens in a declaration-sparse Tailwind project no longer explodes the penalty) |
22//! | `broken_references` | 15pt | `unresolved_class_references` + `keyframes_undefined` | `count * 3`, capped (so 5 broken refs = full 15pt) |
23//! | `token_erosion` | 10pt | mixed `font-size` units (above 2) + distinct Tailwind arbitrary-value tokens + distinct HARDCODED `box-shadow`/`border-radius`/`line-height` values (v3 sprawl/drift sub-term) | `min((extra_units * 2), 4)` unit term + `min(arbitrary / 18, 8)` arbitrary term + `min(sum_per_axis_excess / 6, 5)` sprawl term (per-axis excess above baselines 10/8/6), summed then capped at 10 |
24//! | `structural` | 10pt | `!important` density + deep nesting | `(important_pct - 5).clamp * 1` + `(max_nesting - 4).clamp * 1`, capped |
25//!
26//! All point values are rounded to one decimal, the final score is clamped to
27//! `[0, 100]`, matching the code score's `round1` / clamp behaviour.
28//!
29//! ## What v1 -> v2 changed, and why
30//!
31//! - `dead_surface` was normalized by `files_analyzed` in v1, so a 2-stylesheet
32//!   Tailwind project with a few dead entities scored far higher than a 32-file
33//!   design system with one dead entity. v2 splits the category into two
34//!   independently-normalized terms. Unused `@theme` tokens are divided by the
35//!   *total number of `@theme` tokens defined* (a per-population death ratio,
36//!   threaded in as `theme_tokens_defined`): this is the principled denominator
37//!   because Tailwind projects author almost no CSS declarations, so dividing a
38//!   few unused tokens by `total_declarations` exploded the penalty (a project
39//!   with 4 unused tokens over 24 declarations capped the whole category at 20).
40//!   The remaining dead entities (unreferenced classes, unused at-rules, dead
41//!   `@font-face`) still divide by `total_declarations`, the size-stable
42//!   denominator `duplication` uses, so neither term swings with stylesheet count.
43//! - `token_erosion` in v1 added `tailwind_arbitrary_values` raw, so just ~10
44//!   distinct arbitrary values maxed the category, punishing ordinary Tailwind
45//!   apps. v2 saturates the arbitrary term via a divisor and caps the unit term
46//!   so neither sub-signal alone reaches the ceiling.
47//!
48//! ## What v2 -> v3 changed, and why
49//!
50//! Research is clear that exact CSS duplication is the LEAST-harmful CSS pattern
51//! (repeated declarations gzip away; graphical properties are loosely coupled; CSS
52//! has no native abstraction so some repetition is unavoidable), while the real
53//! maintenance harm is value DRIFT / inconsistency (the same design intent
54//! expressed with divergence: a radius `6px` here and `5px` there; a shadow
55//! `rgba(0,0,0,.1)` vs `.12`), which design tokens address. So v3 re-weights the
56//! duplication FAMILY toward drift and away from byte-identical repetition:
57//!
58//! - `duplication` exact-block scale dropped from `* 200` to `* 80` (see
59//!   `EXACT_DUP_SCALE`). The notation-canonical exact-block detector is kept
60//!   (lightningcss already collapses `0px`/`#fff`/`rgb()`), just down-weighted to
61//!   a soft hint. The 20pt cap is unchanged.
62//! - `token_erosion` gains a HARDCODED-value-sprawl drift sub-term (see
63//!   `value_sprawl_term`) sourced from the previously-descriptive-only
64//!   `summary.unique_box_shadows` / `unique_border_radii` / `unique_line_heights`
65//!   distinct-value counts. It counts only HARDCODED literals: a system that
66//!   tokenizes its scales via `var(--*)` scores 0 (lightningcss parses
67//!   `box-shadow: var(--x)` as `Property::Unparsed`, so var-referenced and
68//!   `@theme`-defined values never reach the typed-property collectors). Per-axis
69//!   baselines (10/8/6) clear every observed real design system; the term
70//!   saturates and is sub-capped at 5pt inside the unchanged 10pt category.
71//!
72//! The original v3 plan (re-canonicalize the duplicate-block fingerprint to detect
73//! drift clones) was DROPPED after a Step-0 audit found it a no-op: the css-metrics
74//! fingerprint serializes via lightningcss `to_css_string`, which already
75//! canonicalizes `0px`/`#fff`/`rgb()`, so it is notation-semantic, not byte-literal.
76//! Genuine value drift is surfaced via the un-tokenized-literal sprawl count above.
77//!
78//! ## Deliberately excluded signals
79//!
80//! `custom_properties_unreferenced` and `custom_properties_undefined` are
81//! intentionally NOT folded into this score. They are false-positive-prone for
82//! exactly the projects this axis most needs to grade fairly: a design system
83//! exports custom properties that are referenced by *consumers*, not within its
84//! own package, so "unreferenced-in-package" is the intended state rather than a
85//! smell; and in a cross-package monorepo a property defined in one package and
86//! consumed in another reads as "undefined" to a single-package analysis. Both
87//! signals stay available in the raw `CssAnalyticsReport` for descriptive
88//! surfaces, but they do not move the grade.
89//!
90//! ## Confidence (descriptive metadata, NOT part of the formula)
91//!
92//! The grade carries a [`StylingHealthConfidence`] marking it `Low` when the
93//! authored-declaration count is below `MIN_CONFIDENT_DECLARATIONS` (with a
94//! stated reason), so a grade computed from a thin CSS surface is not presented
95//! with the same authority as one from a full design system. Confidence NEVER
96//! feeds the score: `score` / `grade` / `penalties` are byte-identical whether it
97//! is high or low. The framing is "thin authored-CSS surface", not "unreliable
98//! analysis": a utility-first Tailwind project legitimately authors little CSS,
99//! so a low mark means the declaration-normalized rubric had little to measure,
100//! not that fallow's analysis failed. The EMPTY case (no import-reachable
101//! stylesheet, so `css_analytics` and `styling_health` are both `None`) is the
102//! strongest form of withholding and is handled one layer up by the human "No
103//! stylesheets analyzed" note; this function only ever runs on a non-empty report.
104//!
105//! ## Calibration (v3, corpus-locked)
106//!
107//! The v2 weights were validated against a 10-project corpus (3 design systems /
108//! SCSS, 4 Tailwind apps, 3 empty / CSS-in-JS); v3 re-ran that corpus plus the 3c
109//! atomic CSS-in-JS smoke (Braid / vanilla-extract / StyleX / Panda / emotion) and
110//! drift anchors. [`STYLING_HEALTH_FORMULA_VERSION`] bumps 2 -> 3 because rubric
111//! constants moved (`EXACT_DUP_SCALE` and the new sprawl sub-term). v3 result, NO
112//! band misclassification: every real design system stays A with a 0 sprawl term
113//! (utrecht 5 distinct hardcoded shadows < baseline 10; rijkshuisstijl, jsonforms
114//! 7 radii < baseline 8, all 0pt sprawl); atomic CSS-in-JS gains no false drift
115//! penalty (<= 3 distinct values per axis); a clean token-driven system keeps a 0
116//! sprawl term (var-referenced values are uncounted). The sprawl term fires only on
117//! genuine hardcoded drift (a 16-distinct-per-axis anchor scores ~4pt). The
118//! duplication down-weight moves moderate-dup projects by ~1-2pt (no band flip).
119//!
120//! Consumer note: a v2 -> v3 bump moves `styling_health.score`/`grade` for some
121//! projects, so `--format json` snapshot diffs (including fallow's own golden
122//! snapshots) and styling-grade trend dashboards see a one-time step-change at the
123//! version boundary; gate on `formula_version`. No exit-code, badge, gate, regression
124//! baseline, or trend-snapshot consumes the styling score, so nothing fails; the
125//! code `health_score` is byte-unchanged.
126
127use fallow_output::{
128    CssAnalyticsReport, STYLING_HEALTH_FORMULA_VERSION, StylingHealth, StylingHealthConfidence,
129    StylingHealthPenalties, letter_grade,
130};
131
132const DUPLICATION_CAP: f64 = 20.0;
133const DEAD_SURFACE_CAP: f64 = 20.0;
134
135/// Per-removable-declaration scale for the EXACT-block duplication penalty (v3).
136/// Down-weighted from the v2 `* 200` because exact (notation-canonical) CSS
137/// duplication is the LEAST-harmful CSS pattern: repeated declarations gzip away,
138/// graphical properties are loosely coupled, and CSS has no native abstraction so
139/// some repetition is unavoidable. Exact duplication therefore stays a soft hint,
140/// not a dominant term; the maintenance harm CSS tooling should weight is value
141/// DRIFT (the `token_erosion` sprawl sub-term below), not byte-identical repeats.
142/// At `* 80` the category caps at 25% removable declarations (was 10% at `* 200`).
143const EXACT_DUP_SCALE: f64 = 80.0;
144const BROKEN_REFERENCES_CAP: f64 = 15.0;
145const TOKEN_EROSION_CAP: f64 = 10.0;
146const STRUCTURAL_CAP: f64 = 10.0;
147
148/// Authored-CSS declaration floor below which the grade is marked low-confidence.
149/// Below this, the declaration-normalized penalty ratios are hypersensitive: a
150/// single minimal duplicate block (4 declarations appearing twice = 4 removable)
151/// contributes `4 / total * 200` to duplication, which is the full 20pt cap at 40
152/// declarations and 16pt at 50, and a handful of `!important` declarations
153/// likewise pushes the structural penalty to its cap. So below ~50 authored
154/// declarations a single finding can move an entire penalty category to its
155/// ceiling and the grade reflects sampling noise rather than systematic quality;
156/// above it, individual findings contribute proportionally. Empirically separates
157/// the calibration corpus: fallow-tools (24 declarations) and leenders-coaching
158/// (38) read as thin authored surfaces, while every design system and the
159/// `>= 145`-declaration Tailwind apps stay confident.
160const MIN_CONFIDENT_DECLARATIONS: u32 = 50;
161
162/// Share of analyzed declarations originating from flat-by-construction atomic
163/// object CSS-in-JS (StyleX/Panda) at or above which the grade is marked
164/// low-confidence: the structural axis (nesting, `!important` density) is inert
165/// for compile-time-atomic CSS, so a predominantly-atomic project's grade
166/// reflects token hygiene only. Distinct from [`MIN_CONFIDENT_DECLARATIONS`]
167/// (a thin-surface caveat); when both fire the atomic caveat is the more
168/// informative one and wins the single `confidence_reason` slot.
169const ATOMIC_CONFIDENCE_SHARE: f64 = 0.7;
170
171/// `!important` density (as a percentage of declarations) below which no
172/// structural penalty accrues. A small amount of `!important` is normal.
173const IMPORTANT_DENSITY_FLOOR: f64 = 5.0;
174
175/// Style-rule nesting depth at or below which no structural penalty accrues.
176/// Shallow nesting is healthy; the penalty grows past this floor.
177const NESTING_DEPTH_FLOOR: f64 = 4.0;
178
179/// The number of distinct `font-size` units considered a healthy baseline (e.g.
180/// `rem` for type plus `px` for fixed chrome). Each unit beyond this erodes the
181/// token system.
182const FONT_SIZE_UNIT_BASELINE: u32 = 2;
183
184/// `dead_surface` unused-`@theme`-token term scale (v2): the unused-token count
185/// is normalized by the TOTAL number of `@theme` tokens defined (a per-population
186/// death ratio that is `>= 0`; the [`TOKEN_DEATH_TERM_CAP`] `.min` bounds the
187/// term even if a caller violates the population invariant and the ratio exceeds
188/// 1.0), then multiplied by this factor. So 100% of defined tokens dead
189/// contributes the full [`TOKEN_DEATH_TERM_CAP`], 20% dead ~= 3pt. This is the
190/// principled, size-independent denominator: Tailwind projects author almost no
191/// CSS declarations, so the v1 `total_declarations` denominator let a few unused
192/// tokens explode the penalty; dividing by the token population fixes that
193/// without rewarding or punishing project size.
194const TOKEN_DEATH_SCALE: f64 = 15.0;
195
196/// `dead_surface` unused-`@theme`-token term cap (v2). Bounds the token-death
197/// contribution so even a fully-dead token population leaves room for the other
198/// dead-entity term within the 20pt category cap.
199const TOKEN_DEATH_TERM_CAP: f64 = 15.0;
200
201/// `dead_surface` non-token-dead-entity term scale (v2): unreferenced classes,
202/// unused `@property`/`@layer` at-rules, and dead `@font-face` families are
203/// normalized by `total_declarations` (the same size-stable denominator the
204/// duplication penalty uses) and multiplied by this factor before the term cap.
205const OTHER_DEAD_SCALE: f64 = 150.0;
206
207/// `dead_surface` non-token-dead-entity term cap (v2). Bounds the
208/// declaration-share dead-entity contribution so it cannot dominate the category
209/// on its own; the token-death term carries the rest of the 20pt budget.
210const OTHER_DEAD_TERM_CAP: f64 = 8.0;
211
212/// `token_erosion` per-extra-unit weight (v2). Each distinct `font-size` unit
213/// past [`FONT_SIZE_UNIT_BASELINE`] adds this many points to the unit term.
214const FONT_SIZE_UNIT_WEIGHT: f64 = 2.0;
215
216/// `token_erosion` font-size-unit term cap (v2). Bounds the unit contribution so
217/// `font-size` units alone can never dominate the category; the arbitrary-value
218/// term carries the rest of the budget.
219const FONT_SIZE_UNIT_TERM_CAP: f64 = 4.0;
220
221/// `token_erosion` arbitrary-value divisor (v2). Distinct Tailwind arbitrary
222/// values are divided by this before the arbitrary term cap, so the term
223/// saturates gently around ~50-100+ distinct values instead of instant-capping
224/// at 10. Normal Tailwind usage yields a moderate penalty, not the ceiling.
225const ARBITRARY_VALUE_DIVISOR: f64 = 18.0;
226
227/// `token_erosion` arbitrary-value term cap (v2). Bounds the arbitrary-value
228/// contribution so the unit term still has room within the 10pt category cap.
229const ARBITRARY_VALUE_TERM_CAP: f64 = 8.0;
230
231/// `token_erosion` hardcoded-value-sprawl baselines (v3), per axis: the count of
232/// distinct HARDCODED literal values for a should-be-tokenized property below
233/// which no sprawl penalty accrues. A clean token-driven system reuses a small
234/// palette via `var(--*)` (which lightningcss parses as `Property::Unparsed`, so
235/// var-referenced and `@theme`-defined values are NOT counted), so its hardcoded
236/// distinct count is ~0; a drifted system hardcodes many ad-hoc values. Baselines
237/// are set to each property's natural FULL-SCALE cardinality and conservatively
238/// (so a legitimately rich scale is not penalized), corpus-validated to clear
239/// every real design system (utrecht 5 shadows, jsonforms 7 radii) at 0:
240/// - shadow: rich elevation systems (Material authors up to ~24 levels), the
241///   widest legit range, so the highest baseline.
242/// - radius: a complete radius scale (none/xs..2xl/full) is ~8.
243/// - line-height: a complete line-height scale (tight..loose + reset) is ~6.
244const SHADOW_SPRAWL_BASELINE: u32 = 10;
245const RADIUS_SPRAWL_BASELINE: u32 = 8;
246const LINE_HEIGHT_SPRAWL_BASELINE: u32 = 6;
247
248/// `token_erosion` sprawl divisor (v3). The summed per-axis excess (distinct
249/// hardcoded values above each baseline) is divided by this before the sprawl
250/// sub-cap, so the term saturates gently (mirroring [`ARBITRARY_VALUE_DIVISOR`]):
251/// a large multi-brand monorepo accumulates a bounded, slowly-growing penalty
252/// rather than a per-value cliff. Deliberately UN-NORMALIZED by declaration count:
253/// normalizing would rebuild the v1 size-denominator bug in reverse (20 drifted
254/// radii over 100k declarations would round to nothing, masking real drift),
255/// whereas a literals-only project-wide count is bounded by design INTENT, not
256/// file count (utrecht is 792 declarations yet only 5 distinct shadows).
257const SPRAWL_DIVISOR: f64 = 6.0;
258
259/// `token_erosion` sprawl sub-term cap (v3). Bounds the hardcoded-value-sprawl
260/// contribution within the unchanged [`TOKEN_EROSION_CAP`] (10pt), summed with
261/// the unit + arbitrary terms. The category cap is NOT raised, so the existing
262/// font-size-unit / Tailwind-arbitrary cohort is unaffected; when those terms
263/// already approach the 10pt ceiling the sprawl term is mildly diluted (it
264/// under-counts, never false-positives, the correct direction for a descriptive
265/// grade).
266const SPRAWL_TERM_CAP: f64 = 5.0;
267
268/// Internal scoring inputs threaded into the styling-health grade, NOT serialized
269/// (mirrors the prior `theme_tokens_defined` parameter, so the wire contract /
270/// schema is unchanged). The declaration counts are split into an atomic and a
271/// non-atomic share so flat-by-construction atomic object CSS-in-JS (StyleX,
272/// Panda) does not dilute the declaration-normalized penalties or invert the
273/// confidence knee: every penalty that divides by a declaration count divides by
274/// the NON-ATOMIC count, and the confidence trigger reads the non-atomic count
275/// plus the atomic share. For authored `.css` / SFC / non-atomic CSS-in-JS the
276/// non-atomic counts equal the summary aggregates and `atomic_declarations` is 0,
277/// so the grade is byte-identical to the pre-object-CSS-in-JS behavior.
278#[derive(Debug, Clone, Copy, Default)]
279pub struct StylingScoringInputs {
280    /// Total Tailwind `@theme` tokens DEFINED across the project (the population
281    /// from which `summary.unused_theme_tokens` is a subset). Denominator for the
282    /// per-population token-death ratio in `dead_surface`.
283    pub theme_tokens_defined: u32,
284    /// Declarations from CSS whose structure is meaningful: authored `.css`, SFC
285    /// `<style>`, and non-atomic CSS-in-JS (vanilla-extract / emotion, object +
286    /// template). The denominator for every declaration-normalized penalty and
287    /// the confidence trigger.
288    pub non_atomic_declarations: u32,
289    /// `!important` declarations from the non-atomic surface (structural numerator).
290    pub non_atomic_important_declarations: u32,
291    /// Deepest nesting from the non-atomic surface (structural nesting term).
292    pub non_atomic_max_nesting_depth: u8,
293    /// Declarations from flat atomic object CSS-in-JS (StyleX/Panda). Excluded
294    /// from the penalties; drives the predominantly-atomic confidence caveat.
295    pub atomic_declarations: u32,
296}
297
298impl StylingScoringInputs {
299    /// Build inputs for a report with no atomic object CSS-in-JS: the non-atomic
300    /// surface IS the whole summary, so the grade matches the pre-3c behavior.
301    /// Used by the back-compat [`compute_styling_health`] entry and unit tests.
302    #[allow(
303        dead_code,
304        reason = "back-compat test helper; production uses explicit StylingScoringInputs"
305    )]
306    #[must_use]
307    pub fn from_report(report: &CssAnalyticsReport, theme_tokens_defined: u32) -> Self {
308        let s = &report.summary;
309        Self {
310            theme_tokens_defined,
311            non_atomic_declarations: s.total_declarations,
312            non_atomic_important_declarations: s.important_declarations,
313            non_atomic_max_nesting_depth: s.max_nesting_depth,
314            atomic_declarations: 0,
315        }
316    }
317}
318
319/// Compute the styling-health score from the structural CSS analytics, treating
320/// the whole report as the gradeable surface (no atomic object CSS-in-JS split).
321///
322/// Mirrors the code score: penalties subtract from a starting 100, the result is
323/// rounded and clamped, and the grade reuses the shared [`letter_grade`]
324/// thresholds. See the module docs for the per-category rubric (v2).
325///
326/// `theme_tokens_defined` is the total number of Tailwind `@theme` tokens DEFINED
327/// across the project. It is an internal scoring input only, NOT a serialized
328/// field, so the wire contract / schema is unchanged.
329#[allow(
330    dead_code,
331    reason = "back-compat test helper; production uses compute_styling_health_with_inputs"
332)]
333#[must_use]
334pub fn compute_styling_health(
335    report: &CssAnalyticsReport,
336    theme_tokens_defined: u32,
337) -> StylingHealth {
338    compute_styling_health_with_inputs(
339        report,
340        &StylingScoringInputs::from_report(report, theme_tokens_defined),
341    )
342}
343
344/// Compute the styling-health score from the structural CSS analytics plus the
345/// atomic / non-atomic declaration split (CSS program Phase 3c). The
346/// declaration-normalized penalties and the confidence trigger read the
347/// non-atomic counts in `inputs`, so flat atomic object CSS-in-JS does not dilute
348/// the grade nor inflate confidence. The descriptive `report` is still the source
349/// for the non-ratio signals (`token_erosion`, `broken_references`) and the
350/// serialized aggregates.
351#[must_use]
352pub fn compute_styling_health_with_inputs(
353    report: &CssAnalyticsReport,
354    inputs: &StylingScoringInputs,
355) -> StylingHealth {
356    let penalties = compute_styling_penalties(report, inputs);
357    let score = apply_styling_penalties(&penalties);
358    let grade = letter_grade(score);
359    let (confidence, confidence_reason) = styling_confidence(report, inputs);
360
361    StylingHealth {
362        formula_version: STYLING_HEALTH_FORMULA_VERSION,
363        score,
364        grade,
365        penalties,
366        confidence,
367        confidence_reason,
368    }
369}
370
371/// Classify the grade's confidence from the analyzed CSS surface. `Low` in two
372/// cases, with the more informative reason winning the single slot:
373///
374/// 1. **Predominantly atomic** (precedence): at least [`ATOMIC_CONFIDENCE_SHARE`]
375///    of analyzed declarations come from flat compile-time-atomic object
376///    CSS-in-JS (StyleX/Panda), whose structure (nesting, `!important` density)
377///    is inert, so the grade reflects token hygiene only, not structure.
378/// 2. **Thin authored surface**: the non-atomic (gradeable) declaration count is
379///    below [`MIN_CONFIDENT_DECLARATIONS`], so the declaration-normalized ratios
380///    are hypersensitive.
381///
382/// `High` (no reason) otherwise. Descriptive metadata only: it never feeds the
383/// score. See the module docs for why the framing is a property of the CSS
384/// surface, not "unreliable analysis".
385fn styling_confidence(
386    report: &CssAnalyticsReport,
387    inputs: &StylingScoringInputs,
388) -> (StylingHealthConfidence, Option<String>) {
389    let total = inputs
390        .non_atomic_declarations
391        .saturating_add(inputs.atomic_declarations);
392    if inputs.atomic_declarations > 0
393        && total > 0
394        && f64::from(inputs.atomic_declarations) / f64::from(total) >= ATOMIC_CONFIDENCE_SHARE
395    {
396        // Atomic CSS is flat at the COMPILED layer; the source rules fallow lifts
397        // are not representative of structure, so the structural axis is inert.
398        let reason = "structure (nesting, !important density) is not assessable for \
399                      compile-time-atomic CSS-in-JS (StyleX/Panda); this grade reflects \
400                      token hygiene only"
401            .to_string();
402        return (StylingHealthConfidence::Low, Some(reason));
403    }
404    if inputs.non_atomic_declarations >= MIN_CONFIDENT_DECLARATIONS {
405        return (StylingHealthConfidence::High, None);
406    }
407    let reason = format!(
408        "graded from only {} declaration{} across {} stylesheet{}",
409        inputs.non_atomic_declarations,
410        if inputs.non_atomic_declarations == 1 {
411            ""
412        } else {
413            "s"
414        },
415        report.summary.files_analyzed,
416        if report.summary.files_analyzed == 1 {
417            ""
418        } else {
419            "s"
420        },
421    );
422    (StylingHealthConfidence::Low, Some(reason))
423}
424
425fn compute_styling_penalties(
426    report: &CssAnalyticsReport,
427    inputs: &StylingScoringInputs,
428) -> StylingHealthPenalties {
429    StylingHealthPenalties {
430        duplication: duplication_penalty(report, inputs),
431        dead_surface: dead_surface_penalty(report, inputs),
432        broken_references: broken_references_penalty(report),
433        token_erosion: token_erosion_penalty(report),
434        structural: structural_penalty(inputs),
435    }
436}
437
438fn apply_styling_penalties(penalties: &StylingHealthPenalties) -> f64 {
439    let mut score = 100.0_f64;
440    score -= penalties.duplication;
441    score -= penalties.dead_surface;
442    score -= penalties.broken_references;
443    score -= penalties.token_erosion;
444    score -= penalties.structural;
445    round1(score).clamp(0.0, 100.0)
446}
447
448/// Copy-paste declaration blocks: penalize by the share of all declarations that
449/// could be removed by consolidating duplicate blocks. ~10% removable -> full cap.
450///
451/// Atomic object CSS-in-JS (StyleX/Panda) is excluded from duplicate-block
452/// fingerprinting upstream, so `duplicate_declarations_total` is already a
453/// non-atomic numerator; dividing by the non-atomic declaration count keeps the
454/// ratio from being diluted by flat atomic declarations flooding the denominator.
455fn duplication_penalty(report: &CssAnalyticsReport, inputs: &StylingScoringInputs) -> f64 {
456    let removable = f64::from(report.summary.duplicate_declarations_total);
457    let total = f64::from(inputs.non_atomic_declarations).max(1.0);
458    round1((removable / total * EXACT_DUP_SCALE).min(DUPLICATION_CAP))
459}
460
461/// Dead styling surface, computed as two independently-normalized terms:
462///
463/// 1. **Token-death term** ([`TOKEN_DEATH_SCALE`], capped at
464///    [`TOKEN_DEATH_TERM_CAP`]): unused `@theme` tokens as a share of ALL defined
465///    `@theme` tokens (`theme_tokens_defined`). This per-population death ratio is
466///    size-independent. v1 divided unused tokens by `total_declarations`, which
467///    exploded for Tailwind projects (they author almost no CSS declarations, so
468///    a few unused tokens capped the whole category); normalizing by the token
469///    population the tokens are drawn from is the principled fix.
470/// 2. **Other-dead term** ([`OTHER_DEAD_SCALE`], capped at
471///    [`OTHER_DEAD_TERM_CAP`]): unreferenced classes, unused `@property`/`@layer`
472///    at-rules, and dead `@font-face` families as a share of the non-atomic
473///    declaration count (the same size-stable denominator the duplication penalty
474///    uses). These scale with authored-CSS size, so the declaration denominator is
475///    correct for them; only the `@theme` tokens needed the per-population
476///    treatment.
477///
478/// The two terms are summed and capped at [`DEAD_SURFACE_CAP`]. Both `@theme`
479/// tokens are EXCLUDED from the other-dead term (they live only in term 1). The
480/// other-dead entities are authored-CSS constructs (no atomic object CSS-in-JS
481/// contributes a class / at-rule / `@font-face` to them), so the term is
482/// normalized by the non-atomic declaration count to avoid atomic dilution.
483fn dead_surface_penalty(report: &CssAnalyticsReport, inputs: &StylingScoringInputs) -> f64 {
484    let s = &report.summary;
485
486    let token_population = f64::from(inputs.theme_tokens_defined).max(1.0);
487    let token_death_ratio = f64::from(s.unused_theme_tokens) / token_population;
488    let token_term = (token_death_ratio * TOKEN_DEATH_SCALE).min(TOKEN_DEATH_TERM_CAP);
489
490    let other_dead = f64::from(
491        s.unreferenced_css_classes
492            .saturating_add(s.unused_property_registrations)
493            .saturating_add(s.unused_layers)
494            .saturating_add(s.unused_font_faces),
495    );
496    let total = f64::from(inputs.non_atomic_declarations).max(1.0);
497    let other_term = (other_dead / total * OTHER_DEAD_SCALE).min(OTHER_DEAD_TERM_CAP);
498
499    round1((token_term + other_term).min(DEAD_SURFACE_CAP))
500}
501
502/// Broken references: markup classes one edit from a defined class, plus
503/// animations referencing a `@keyframes` defined nowhere. Each is a likely typo
504/// or stale rename; 5 broken refs reach the cap.
505fn broken_references_penalty(report: &CssAnalyticsReport) -> f64 {
506    let s = &report.summary;
507    let broken = f64::from(
508        s.unresolved_class_references
509            .saturating_add(s.keyframes_undefined),
510    );
511    round1((broken * 3.0).min(BROKEN_REFERENCES_CAP))
512}
513
514/// Design-token erosion: mixing `font-size` units past a healthy baseline and
515/// Tailwind arbitrary-value bypasses both work against a single source of truth
516/// for the scale.
517///
518/// v2 splits the category into two saturating terms. The unit term is capped at
519/// [`FONT_SIZE_UNIT_TERM_CAP`] so `font-size` units alone cannot dominate. The
520/// arbitrary-value term divides the distinct-value count by
521/// [`ARBITRARY_VALUE_DIVISOR`] and caps at [`ARBITRARY_VALUE_TERM_CAP`], so it
522/// saturates gently around ~50-100+ distinct values instead of instant-capping
523/// the whole category at 10 (v1 reached the ceiling at just 10 arbitrary
524/// values, punishing normal Tailwind usage). v3 adds a third saturating term,
525/// [`value_sprawl_term`], for hardcoded `box-shadow` / `border-radius` /
526/// `line-height` value drift (var-blind: scales tokenized via `var(--*)` score 0).
527/// The category cap stays 10pt.
528fn token_erosion_penalty(report: &CssAnalyticsReport) -> f64 {
529    let s = &report.summary;
530    let extra_units = f64::from(
531        s.font_size_units_used
532            .saturating_sub(FONT_SIZE_UNIT_BASELINE),
533    );
534    let unit_term = (extra_units * FONT_SIZE_UNIT_WEIGHT).min(FONT_SIZE_UNIT_TERM_CAP);
535    let arbitrary = f64::from(s.tailwind_arbitrary_values);
536    let arbitrary_term = (arbitrary / ARBITRARY_VALUE_DIVISOR).min(ARBITRARY_VALUE_TERM_CAP);
537    round1((unit_term + arbitrary_term + value_sprawl_term(s)).min(TOKEN_EROSION_CAP))
538}
539
540/// Hardcoded-value-sprawl sub-term (v3): the distinct count of hardcoded literal
541/// `box-shadow` / `border-radius` / `line-height` values above each per-axis
542/// healthy baseline, summed and saturated. This is the design-token DRIFT signal
543/// the v3 reweight shifts the duplication-family weight toward: a system that
544/// tokenizes its scales via `var(--*)` scores 0 (var-referenced values are
545/// invisible to the `unique_*` counts), while one that hardcodes many ad-hoc
546/// values accrues a bounded, gently-growing penalty. Counts come from
547/// `summary.unique_*` (recomputed at health time, never cached); see the module
548/// docs and the [`SHADOW_SPRAWL_BASELINE`] / [`SPRAWL_DIVISOR`] rationale.
549fn value_sprawl_term(s: &fallow_output::CssAnalyticsSummary) -> f64 {
550    let excess = s
551        .unique_box_shadows
552        .saturating_sub(SHADOW_SPRAWL_BASELINE)
553        .saturating_add(s.unique_border_radii.saturating_sub(RADIUS_SPRAWL_BASELINE))
554        .saturating_add(
555            s.unique_line_heights
556                .saturating_sub(LINE_HEIGHT_SPRAWL_BASELINE),
557        );
558    (f64::from(excess) / SPRAWL_DIVISOR).min(SPRAWL_TERM_CAP)
559}
560
561/// Structural smells: `!important` density above a healthy floor and deep
562/// style-rule nesting past a shallow floor.
563///
564/// Computed over the NON-ATOMIC surface only: flat compile-time-atomic object
565/// CSS-in-JS (StyleX/Panda) has zero `!important` and minimal nesting by
566/// construction, so including it would dilute the `!important` density (lowering
567/// the penalty) and never raise nesting, trivially inflating the grade.
568fn structural_penalty(inputs: &StylingScoringInputs) -> f64 {
569    let important_pct = if inputs.non_atomic_declarations > 0 {
570        f64::from(inputs.non_atomic_important_declarations)
571            / f64::from(inputs.non_atomic_declarations)
572            * 100.0
573    } else {
574        0.0
575    };
576    let important = (important_pct - IMPORTANT_DENSITY_FLOOR).max(0.0);
577    let nesting = (f64::from(inputs.non_atomic_max_nesting_depth) - NESTING_DEPTH_FLOOR).max(0.0);
578    round1((important + nesting).min(STRUCTURAL_CAP))
579}
580
581fn round1(value: f64) -> f64 {
582    (value * 10.0).round() / 10.0
583}
584
585#[cfg(test)]
586mod tests;