fallow-engine 3.0.0

Typed analysis engine facade for fallow consumers
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
//! Styling-health score: a SECOND health axis derived purely from the structural
//! CSS analytics ([`CssAnalyticsReport`]), orthogonal to the JS/TS code-health
//! score. Mirrors the code-score shape (`vital_signs::compute_health_score`):
//! start at 100, subtract capped per-category penalties, map the result to a
//! letter grade via the shared [`letter_grade`]. The code score is never touched.
//!
//! # Penalty rubric (v3, [`STYLING_HEALTH_FORMULA_VERSION`])
//!
//! The weights below were recalibrated from v1 (v1 -> v2 re-normalized
//! `dead_surface` and `token_erosion`; v2 -> v3 re-weighted the duplication
//! family toward value DRIFT, see below) after running across real projects
//! (government design systems plus small Tailwind apps); any further
//! recalibration bumps [`STYLING_HEALTH_FORMULA_VERSION`] (the same versioning
//! discipline the code score uses). Every category is capped and evaluated
//! against the analytics that are always present once `--css` ran, so there is
//! no "missing pipeline" partial-credit case to model.
//!
//! | Category | Cap | Signal | Scaling |
//! |---|---|---|---|
//! | `duplication` | 20pt | `summary.duplicate_declarations_total` (removable declarations from copy-paste blocks) | `total / max(non_atomic_declarations, 1) * 80`, capped (v3 down-weighted from `* 200`: exact CSS duplication is the least-harmful pattern, so ~25% of declarations removable = full 20pt, a soft hint not a dominant term; the non-atomic denominator is the 3c atomic-exclusion behavior) |
//! | `dead_surface` | 20pt | (a) unused `@theme` tokens, as a share of all defined `@theme` tokens; (b) unreferenced classes + unused at-rules + dead `@font-face`, as a share of `total_declarations` | token term `min(unused_theme_tokens / max(theme_tokens_defined, 1) * 15, 15)` + other term `min(other_dead / max(non_atomic_declarations, 1) * 150, 8)`, summed then capped at 20 (the token term is a *per-population* death ratio so a handful of dead tokens in a declaration-sparse Tailwind project no longer explodes the penalty) |
//! | `broken_references` | 15pt | `unresolved_class_references` + `keyframes_undefined` | `count * 3`, capped (so 5 broken refs = full 15pt) |
//! | `token_erosion` | 10pt | mixed `font-size` units (above 2) + distinct Tailwind arbitrary-value tokens + distinct HARDCODED `box-shadow`/`border-radius`/`line-height` values (v3 sprawl/drift sub-term) | `min((extra_units * 2), 4)` unit term + `min(arbitrary / 18, 8)` arbitrary term + `min(sum_per_axis_excess / 6, 5)` sprawl term (per-axis excess above baselines 10/8/6), summed then capped at 10 |
//! | `structural` | 10pt | `!important` density + deep nesting | `(important_pct - 5).clamp * 1` + `(max_nesting - 4).clamp * 1`, capped |
//!
//! All point values are rounded to one decimal, the final score is clamped to
//! `[0, 100]`, matching the code score's `round1` / clamp behaviour.
//!
//! ## What v1 -> v2 changed, and why
//!
//! - `dead_surface` was normalized by `files_analyzed` in v1, so a 2-stylesheet
//!   Tailwind project with a few dead entities scored far higher than a 32-file
//!   design system with one dead entity. v2 splits the category into two
//!   independently-normalized terms. Unused `@theme` tokens are divided by the
//!   *total number of `@theme` tokens defined* (a per-population death ratio,
//!   threaded in as `theme_tokens_defined`): this is the principled denominator
//!   because Tailwind projects author almost no CSS declarations, so dividing a
//!   few unused tokens by `total_declarations` exploded the penalty (a project
//!   with 4 unused tokens over 24 declarations capped the whole category at 20).
//!   The remaining dead entities (unreferenced classes, unused at-rules, dead
//!   `@font-face`) still divide by `total_declarations`, the size-stable
//!   denominator `duplication` uses, so neither term swings with stylesheet count.
//! - `token_erosion` in v1 added `tailwind_arbitrary_values` raw, so just ~10
//!   distinct arbitrary values maxed the category, punishing ordinary Tailwind
//!   apps. v2 saturates the arbitrary term via a divisor and caps the unit term
//!   so neither sub-signal alone reaches the ceiling.
//!
//! ## What v2 -> v3 changed, and why
//!
//! Research is clear that exact CSS duplication is the LEAST-harmful CSS pattern
//! (repeated declarations gzip away; graphical properties are loosely coupled; CSS
//! has no native abstraction so some repetition is unavoidable), while the real
//! maintenance harm is value DRIFT / inconsistency (the same design intent
//! expressed with divergence: a radius `6px` here and `5px` there; a shadow
//! `rgba(0,0,0,.1)` vs `.12`), which design tokens address. So v3 re-weights the
//! duplication FAMILY toward drift and away from byte-identical repetition:
//!
//! - `duplication` exact-block scale dropped from `* 200` to `* 80` (see
//!   `EXACT_DUP_SCALE`). The notation-canonical exact-block detector is kept
//!   (lightningcss already collapses `0px`/`#fff`/`rgb()`), just down-weighted to
//!   a soft hint. The 20pt cap is unchanged.
//! - `token_erosion` gains a HARDCODED-value-sprawl drift sub-term (see
//!   `value_sprawl_term`) sourced from the previously-descriptive-only
//!   `summary.unique_box_shadows` / `unique_border_radii` / `unique_line_heights`
//!   distinct-value counts. It counts only HARDCODED literals: a system that
//!   tokenizes its scales via `var(--*)` scores 0 (lightningcss parses
//!   `box-shadow: var(--x)` as `Property::Unparsed`, so var-referenced and
//!   `@theme`-defined values never reach the typed-property collectors). Per-axis
//!   baselines (10/8/6) clear every observed real design system; the term
//!   saturates and is sub-capped at 5pt inside the unchanged 10pt category.
//!
//! The original v3 plan (re-canonicalize the duplicate-block fingerprint to detect
//! drift clones) was DROPPED after a Step-0 audit found it a no-op: the css-metrics
//! fingerprint serializes via lightningcss `to_css_string`, which already
//! canonicalizes `0px`/`#fff`/`rgb()`, so it is notation-semantic, not byte-literal.
//! Genuine value drift is surfaced via the un-tokenized-literal sprawl count above.
//!
//! ## Deliberately excluded signals
//!
//! `custom_properties_unreferenced` and `custom_properties_undefined` are
//! intentionally NOT folded into this score. They are false-positive-prone for
//! exactly the projects this axis most needs to grade fairly: a design system
//! exports custom properties that are referenced by *consumers*, not within its
//! own package, so "unreferenced-in-package" is the intended state rather than a
//! smell; and in a cross-package monorepo a property defined in one package and
//! consumed in another reads as "undefined" to a single-package analysis. Both
//! signals stay available in the raw `CssAnalyticsReport` for descriptive
//! surfaces, but they do not move the grade.
//!
//! ## Confidence (descriptive metadata, NOT part of the formula)
//!
//! The grade carries a [`StylingHealthConfidence`] marking it `Low` when the
//! authored-declaration count is below `MIN_CONFIDENT_DECLARATIONS` (with a
//! stated reason), so a grade computed from a thin CSS surface is not presented
//! with the same authority as one from a full design system. Confidence NEVER
//! feeds the score: `score` / `grade` / `penalties` are byte-identical whether it
//! is high or low. The framing is "thin authored-CSS surface", not "unreliable
//! analysis": a utility-first Tailwind project legitimately authors little CSS,
//! so a low mark means the declaration-normalized rubric had little to measure,
//! not that fallow's analysis failed. The EMPTY case (no import-reachable
//! stylesheet, so `css_analytics` and `styling_health` are both `None`) is the
//! strongest form of withholding and is handled one layer up by the human "No
//! stylesheets analyzed" note; this function only ever runs on a non-empty report.
//!
//! ## Calibration (v3, corpus-locked)
//!
//! The v2 weights were validated against a 10-project corpus (3 design systems /
//! SCSS, 4 Tailwind apps, 3 empty / CSS-in-JS); v3 re-ran that corpus plus the 3c
//! atomic CSS-in-JS smoke (Braid / vanilla-extract / StyleX / Panda / emotion) and
//! drift anchors. [`STYLING_HEALTH_FORMULA_VERSION`] bumps 2 -> 3 because rubric
//! constants moved (`EXACT_DUP_SCALE` and the new sprawl sub-term). v3 result, NO
//! band misclassification: every real design system stays A with a 0 sprawl term
//! (utrecht 5 distinct hardcoded shadows < baseline 10; rijkshuisstijl, jsonforms
//! 7 radii < baseline 8, all 0pt sprawl); atomic CSS-in-JS gains no false drift
//! penalty (<= 3 distinct values per axis); a clean token-driven system keeps a 0
//! sprawl term (var-referenced values are uncounted). The sprawl term fires only on
//! genuine hardcoded drift (a 16-distinct-per-axis anchor scores ~4pt). The
//! duplication down-weight moves moderate-dup projects by ~1-2pt (no band flip).
//!
//! Consumer note: a v2 -> v3 bump moves `styling_health.score`/`grade` for some
//! projects, so `--format json` snapshot diffs (including fallow's own golden
//! snapshots) and styling-grade trend dashboards see a one-time step-change at the
//! version boundary; gate on `formula_version`. No exit-code, badge, gate, regression
//! baseline, or trend-snapshot consumes the styling score, so nothing fails; the
//! code `health_score` is byte-unchanged.

use fallow_output::{
    CssAnalyticsReport, STYLING_HEALTH_FORMULA_VERSION, StylingHealth, StylingHealthConfidence,
    StylingHealthPenalties, letter_grade,
};

const DUPLICATION_CAP: f64 = 20.0;
const DEAD_SURFACE_CAP: f64 = 20.0;

/// Per-removable-declaration scale for the EXACT-block duplication penalty (v3).
/// Down-weighted from the v2 `* 200` because exact (notation-canonical) CSS
/// duplication is the LEAST-harmful CSS pattern: repeated declarations gzip away,
/// graphical properties are loosely coupled, and CSS has no native abstraction so
/// some repetition is unavoidable. Exact duplication therefore stays a soft hint,
/// not a dominant term; the maintenance harm CSS tooling should weight is value
/// DRIFT (the `token_erosion` sprawl sub-term below), not byte-identical repeats.
/// At `* 80` the category caps at 25% removable declarations (was 10% at `* 200`).
const EXACT_DUP_SCALE: f64 = 80.0;
const BROKEN_REFERENCES_CAP: f64 = 15.0;
const TOKEN_EROSION_CAP: f64 = 10.0;
const STRUCTURAL_CAP: f64 = 10.0;

/// Authored-CSS declaration floor below which the grade is marked low-confidence.
/// Below this, the declaration-normalized penalty ratios are hypersensitive: a
/// single minimal duplicate block (4 declarations appearing twice = 4 removable)
/// contributes `4 / total * 200` to duplication, which is the full 20pt cap at 40
/// declarations and 16pt at 50, and a handful of `!important` declarations
/// likewise pushes the structural penalty to its cap. So below ~50 authored
/// declarations a single finding can move an entire penalty category to its
/// ceiling and the grade reflects sampling noise rather than systematic quality;
/// above it, individual findings contribute proportionally. Empirically separates
/// the calibration corpus: fallow-tools (24 declarations) and leenders-coaching
/// (38) read as thin authored surfaces, while every design system and the
/// `>= 145`-declaration Tailwind apps stay confident.
const MIN_CONFIDENT_DECLARATIONS: u32 = 50;

/// Share of analyzed declarations originating from flat-by-construction atomic
/// object CSS-in-JS (StyleX/Panda) at or above which the grade is marked
/// low-confidence: the structural axis (nesting, `!important` density) is inert
/// for compile-time-atomic CSS, so a predominantly-atomic project's grade
/// reflects token hygiene only. Distinct from [`MIN_CONFIDENT_DECLARATIONS`]
/// (a thin-surface caveat); when both fire the atomic caveat is the more
/// informative one and wins the single `confidence_reason` slot.
const ATOMIC_CONFIDENCE_SHARE: f64 = 0.7;

/// `!important` density (as a percentage of declarations) below which no
/// structural penalty accrues. A small amount of `!important` is normal.
const IMPORTANT_DENSITY_FLOOR: f64 = 5.0;

/// Style-rule nesting depth at or below which no structural penalty accrues.
/// Shallow nesting is healthy; the penalty grows past this floor.
const NESTING_DEPTH_FLOOR: f64 = 4.0;

/// The number of distinct `font-size` units considered a healthy baseline (e.g.
/// `rem` for type plus `px` for fixed chrome). Each unit beyond this erodes the
/// token system.
const FONT_SIZE_UNIT_BASELINE: u32 = 2;

/// `dead_surface` unused-`@theme`-token term scale (v2): the unused-token count
/// is normalized by the TOTAL number of `@theme` tokens defined (a per-population
/// death ratio that is `>= 0`; the [`TOKEN_DEATH_TERM_CAP`] `.min` bounds the
/// term even if a caller violates the population invariant and the ratio exceeds
/// 1.0), then multiplied by this factor. So 100% of defined tokens dead
/// contributes the full [`TOKEN_DEATH_TERM_CAP`], 20% dead ~= 3pt. This is the
/// principled, size-independent denominator: Tailwind projects author almost no
/// CSS declarations, so the v1 `total_declarations` denominator let a few unused
/// tokens explode the penalty; dividing by the token population fixes that
/// without rewarding or punishing project size.
const TOKEN_DEATH_SCALE: f64 = 15.0;

/// `dead_surface` unused-`@theme`-token term cap (v2). Bounds the token-death
/// contribution so even a fully-dead token population leaves room for the other
/// dead-entity term within the 20pt category cap.
const TOKEN_DEATH_TERM_CAP: f64 = 15.0;

/// `dead_surface` non-token-dead-entity term scale (v2): unreferenced classes,
/// unused `@property`/`@layer` at-rules, and dead `@font-face` families are
/// normalized by `total_declarations` (the same size-stable denominator the
/// duplication penalty uses) and multiplied by this factor before the term cap.
const OTHER_DEAD_SCALE: f64 = 150.0;

/// `dead_surface` non-token-dead-entity term cap (v2). Bounds the
/// declaration-share dead-entity contribution so it cannot dominate the category
/// on its own; the token-death term carries the rest of the 20pt budget.
const OTHER_DEAD_TERM_CAP: f64 = 8.0;

/// `token_erosion` per-extra-unit weight (v2). Each distinct `font-size` unit
/// past [`FONT_SIZE_UNIT_BASELINE`] adds this many points to the unit term.
const FONT_SIZE_UNIT_WEIGHT: f64 = 2.0;

/// `token_erosion` font-size-unit term cap (v2). Bounds the unit contribution so
/// `font-size` units alone can never dominate the category; the arbitrary-value
/// term carries the rest of the budget.
const FONT_SIZE_UNIT_TERM_CAP: f64 = 4.0;

/// `token_erosion` arbitrary-value divisor (v2). Distinct Tailwind arbitrary
/// values are divided by this before the arbitrary term cap, so the term
/// saturates gently around ~50-100+ distinct values instead of instant-capping
/// at 10. Normal Tailwind usage yields a moderate penalty, not the ceiling.
const ARBITRARY_VALUE_DIVISOR: f64 = 18.0;

/// `token_erosion` arbitrary-value term cap (v2). Bounds the arbitrary-value
/// contribution so the unit term still has room within the 10pt category cap.
const ARBITRARY_VALUE_TERM_CAP: f64 = 8.0;

/// `token_erosion` hardcoded-value-sprawl baselines (v3), per axis: the count of
/// distinct HARDCODED literal values for a should-be-tokenized property below
/// which no sprawl penalty accrues. A clean token-driven system reuses a small
/// palette via `var(--*)` (which lightningcss parses as `Property::Unparsed`, so
/// var-referenced and `@theme`-defined values are NOT counted), so its hardcoded
/// distinct count is ~0; a drifted system hardcodes many ad-hoc values. Baselines
/// are set to each property's natural FULL-SCALE cardinality and conservatively
/// (so a legitimately rich scale is not penalized), corpus-validated to clear
/// every real design system (utrecht 5 shadows, jsonforms 7 radii) at 0:
/// - shadow: rich elevation systems (Material authors up to ~24 levels), the
///   widest legit range, so the highest baseline.
/// - radius: a complete radius scale (none/xs..2xl/full) is ~8.
/// - line-height: a complete line-height scale (tight..loose + reset) is ~6.
const SHADOW_SPRAWL_BASELINE: u32 = 10;
const RADIUS_SPRAWL_BASELINE: u32 = 8;
const LINE_HEIGHT_SPRAWL_BASELINE: u32 = 6;

/// `token_erosion` sprawl divisor (v3). The summed per-axis excess (distinct
/// hardcoded values above each baseline) is divided by this before the sprawl
/// sub-cap, so the term saturates gently (mirroring [`ARBITRARY_VALUE_DIVISOR`]):
/// a large multi-brand monorepo accumulates a bounded, slowly-growing penalty
/// rather than a per-value cliff. Deliberately UN-NORMALIZED by declaration count:
/// normalizing would rebuild the v1 size-denominator bug in reverse (20 drifted
/// radii over 100k declarations would round to nothing, masking real drift),
/// whereas a literals-only project-wide count is bounded by design INTENT, not
/// file count (utrecht is 792 declarations yet only 5 distinct shadows).
const SPRAWL_DIVISOR: f64 = 6.0;

/// `token_erosion` sprawl sub-term cap (v3). Bounds the hardcoded-value-sprawl
/// contribution within the unchanged [`TOKEN_EROSION_CAP`] (10pt), summed with
/// the unit + arbitrary terms. The category cap is NOT raised, so the existing
/// font-size-unit / Tailwind-arbitrary cohort is unaffected; when those terms
/// already approach the 10pt ceiling the sprawl term is mildly diluted (it
/// under-counts, never false-positives, the correct direction for a descriptive
/// grade).
const SPRAWL_TERM_CAP: f64 = 5.0;

/// Internal scoring inputs threaded into the styling-health grade, NOT serialized
/// (mirrors the prior `theme_tokens_defined` parameter, so the wire contract /
/// schema is unchanged). The declaration counts are split into an atomic and a
/// non-atomic share so flat-by-construction atomic object CSS-in-JS (StyleX,
/// Panda) does not dilute the declaration-normalized penalties or invert the
/// confidence knee: every penalty that divides by a declaration count divides by
/// the NON-ATOMIC count, and the confidence trigger reads the non-atomic count
/// plus the atomic share. For authored `.css` / SFC / non-atomic CSS-in-JS the
/// non-atomic counts equal the summary aggregates and `atomic_declarations` is 0,
/// so the grade is byte-identical to the pre-object-CSS-in-JS behavior.
#[derive(Debug, Clone, Copy, Default)]
pub struct StylingScoringInputs {
    /// Total Tailwind `@theme` tokens DEFINED across the project (the population
    /// from which `summary.unused_theme_tokens` is a subset). Denominator for the
    /// per-population token-death ratio in `dead_surface`.
    pub theme_tokens_defined: u32,
    /// Declarations from CSS whose structure is meaningful: authored `.css`, SFC
    /// `<style>`, and non-atomic CSS-in-JS (vanilla-extract / emotion, object +
    /// template). The denominator for every declaration-normalized penalty and
    /// the confidence trigger.
    pub non_atomic_declarations: u32,
    /// `!important` declarations from the non-atomic surface (structural numerator).
    pub non_atomic_important_declarations: u32,
    /// Deepest nesting from the non-atomic surface (structural nesting term).
    pub non_atomic_max_nesting_depth: u8,
    /// Declarations from flat atomic object CSS-in-JS (StyleX/Panda). Excluded
    /// from the penalties; drives the predominantly-atomic confidence caveat.
    pub atomic_declarations: u32,
}

impl StylingScoringInputs {
    /// Build inputs for a report with no atomic object CSS-in-JS: the non-atomic
    /// surface IS the whole summary, so the grade matches the pre-3c behavior.
    /// Used by the back-compat [`compute_styling_health`] entry and unit tests.
    #[allow(
        dead_code,
        reason = "back-compat test helper; production uses explicit StylingScoringInputs"
    )]
    #[must_use]
    pub fn from_report(report: &CssAnalyticsReport, theme_tokens_defined: u32) -> Self {
        let s = &report.summary;
        Self {
            theme_tokens_defined,
            non_atomic_declarations: s.total_declarations,
            non_atomic_important_declarations: s.important_declarations,
            non_atomic_max_nesting_depth: s.max_nesting_depth,
            atomic_declarations: 0,
        }
    }
}

/// Compute the styling-health score from the structural CSS analytics, treating
/// the whole report as the gradeable surface (no atomic object CSS-in-JS split).
///
/// Mirrors the code score: penalties subtract from a starting 100, the result is
/// rounded and clamped, and the grade reuses the shared [`letter_grade`]
/// thresholds. See the module docs for the per-category rubric (v2).
///
/// `theme_tokens_defined` is the total number of Tailwind `@theme` tokens DEFINED
/// across the project. It is an internal scoring input only, NOT a serialized
/// field, so the wire contract / schema is unchanged.
#[allow(
    dead_code,
    reason = "back-compat test helper; production uses compute_styling_health_with_inputs"
)]
#[must_use]
pub fn compute_styling_health(
    report: &CssAnalyticsReport,
    theme_tokens_defined: u32,
) -> StylingHealth {
    compute_styling_health_with_inputs(
        report,
        &StylingScoringInputs::from_report(report, theme_tokens_defined),
    )
}

/// Compute the styling-health score from the structural CSS analytics plus the
/// atomic / non-atomic declaration split (CSS program Phase 3c). The
/// declaration-normalized penalties and the confidence trigger read the
/// non-atomic counts in `inputs`, so flat atomic object CSS-in-JS does not dilute
/// the grade nor inflate confidence. The descriptive `report` is still the source
/// for the non-ratio signals (`token_erosion`, `broken_references`) and the
/// serialized aggregates.
#[must_use]
pub fn compute_styling_health_with_inputs(
    report: &CssAnalyticsReport,
    inputs: &StylingScoringInputs,
) -> StylingHealth {
    let penalties = compute_styling_penalties(report, inputs);
    let score = apply_styling_penalties(&penalties);
    let grade = letter_grade(score);
    let (confidence, confidence_reason) = styling_confidence(report, inputs);

    StylingHealth {
        formula_version: STYLING_HEALTH_FORMULA_VERSION,
        score,
        grade,
        penalties,
        confidence,
        confidence_reason,
    }
}

/// Classify the grade's confidence from the analyzed CSS surface. `Low` in two
/// cases, with the more informative reason winning the single slot:
///
/// 1. **Predominantly atomic** (precedence): at least [`ATOMIC_CONFIDENCE_SHARE`]
///    of analyzed declarations come from flat compile-time-atomic object
///    CSS-in-JS (StyleX/Panda), whose structure (nesting, `!important` density)
///    is inert, so the grade reflects token hygiene only, not structure.
/// 2. **Thin authored surface**: the non-atomic (gradeable) declaration count is
///    below [`MIN_CONFIDENT_DECLARATIONS`], so the declaration-normalized ratios
///    are hypersensitive.
///
/// `High` (no reason) otherwise. Descriptive metadata only: it never feeds the
/// score. See the module docs for why the framing is a property of the CSS
/// surface, not "unreliable analysis".
fn styling_confidence(
    report: &CssAnalyticsReport,
    inputs: &StylingScoringInputs,
) -> (StylingHealthConfidence, Option<String>) {
    let total = inputs
        .non_atomic_declarations
        .saturating_add(inputs.atomic_declarations);
    if inputs.atomic_declarations > 0
        && total > 0
        && f64::from(inputs.atomic_declarations) / f64::from(total) >= ATOMIC_CONFIDENCE_SHARE
    {
        // Atomic CSS is flat at the COMPILED layer; the source rules fallow lifts
        // are not representative of structure, so the structural axis is inert.
        let reason = "structure (nesting, !important density) is not assessable for \
                      compile-time-atomic CSS-in-JS (StyleX/Panda); this grade reflects \
                      token hygiene only"
            .to_string();
        return (StylingHealthConfidence::Low, Some(reason));
    }
    if inputs.non_atomic_declarations >= MIN_CONFIDENT_DECLARATIONS {
        return (StylingHealthConfidence::High, None);
    }
    let reason = format!(
        "graded from only {} declaration{} across {} stylesheet{}",
        inputs.non_atomic_declarations,
        if inputs.non_atomic_declarations == 1 {
            ""
        } else {
            "s"
        },
        report.summary.files_analyzed,
        if report.summary.files_analyzed == 1 {
            ""
        } else {
            "s"
        },
    );
    (StylingHealthConfidence::Low, Some(reason))
}

fn compute_styling_penalties(
    report: &CssAnalyticsReport,
    inputs: &StylingScoringInputs,
) -> StylingHealthPenalties {
    StylingHealthPenalties {
        duplication: duplication_penalty(report, inputs),
        dead_surface: dead_surface_penalty(report, inputs),
        broken_references: broken_references_penalty(report),
        token_erosion: token_erosion_penalty(report),
        structural: structural_penalty(inputs),
    }
}

fn apply_styling_penalties(penalties: &StylingHealthPenalties) -> f64 {
    let mut score = 100.0_f64;
    score -= penalties.duplication;
    score -= penalties.dead_surface;
    score -= penalties.broken_references;
    score -= penalties.token_erosion;
    score -= penalties.structural;
    round1(score).clamp(0.0, 100.0)
}

/// Copy-paste declaration blocks: penalize by the share of all declarations that
/// could be removed by consolidating duplicate blocks. ~10% removable -> full cap.
///
/// Atomic object CSS-in-JS (StyleX/Panda) is excluded from duplicate-block
/// fingerprinting upstream, so `duplicate_declarations_total` is already a
/// non-atomic numerator; dividing by the non-atomic declaration count keeps the
/// ratio from being diluted by flat atomic declarations flooding the denominator.
fn duplication_penalty(report: &CssAnalyticsReport, inputs: &StylingScoringInputs) -> f64 {
    let removable = f64::from(report.summary.duplicate_declarations_total);
    let total = f64::from(inputs.non_atomic_declarations).max(1.0);
    round1((removable / total * EXACT_DUP_SCALE).min(DUPLICATION_CAP))
}

/// Dead styling surface, computed as two independently-normalized terms:
///
/// 1. **Token-death term** ([`TOKEN_DEATH_SCALE`], capped at
///    [`TOKEN_DEATH_TERM_CAP`]): unused `@theme` tokens as a share of ALL defined
///    `@theme` tokens (`theme_tokens_defined`). This per-population death ratio is
///    size-independent. v1 divided unused tokens by `total_declarations`, which
///    exploded for Tailwind projects (they author almost no CSS declarations, so
///    a few unused tokens capped the whole category); normalizing by the token
///    population the tokens are drawn from is the principled fix.
/// 2. **Other-dead term** ([`OTHER_DEAD_SCALE`], capped at
///    [`OTHER_DEAD_TERM_CAP`]): unreferenced classes, unused `@property`/`@layer`
///    at-rules, and dead `@font-face` families as a share of the non-atomic
///    declaration count (the same size-stable denominator the duplication penalty
///    uses). These scale with authored-CSS size, so the declaration denominator is
///    correct for them; only the `@theme` tokens needed the per-population
///    treatment.
///
/// The two terms are summed and capped at [`DEAD_SURFACE_CAP`]. Both `@theme`
/// tokens are EXCLUDED from the other-dead term (they live only in term 1). The
/// other-dead entities are authored-CSS constructs (no atomic object CSS-in-JS
/// contributes a class / at-rule / `@font-face` to them), so the term is
/// normalized by the non-atomic declaration count to avoid atomic dilution.
fn dead_surface_penalty(report: &CssAnalyticsReport, inputs: &StylingScoringInputs) -> f64 {
    let s = &report.summary;

    let token_population = f64::from(inputs.theme_tokens_defined).max(1.0);
    let token_death_ratio = f64::from(s.unused_theme_tokens) / token_population;
    let token_term = (token_death_ratio * TOKEN_DEATH_SCALE).min(TOKEN_DEATH_TERM_CAP);

    let other_dead = f64::from(
        s.unreferenced_css_classes
            .saturating_add(s.unused_property_registrations)
            .saturating_add(s.unused_layers)
            .saturating_add(s.unused_font_faces),
    );
    let total = f64::from(inputs.non_atomic_declarations).max(1.0);
    let other_term = (other_dead / total * OTHER_DEAD_SCALE).min(OTHER_DEAD_TERM_CAP);

    round1((token_term + other_term).min(DEAD_SURFACE_CAP))
}

/// Broken references: markup classes one edit from a defined class, plus
/// animations referencing a `@keyframes` defined nowhere. Each is a likely typo
/// or stale rename; 5 broken refs reach the cap.
fn broken_references_penalty(report: &CssAnalyticsReport) -> f64 {
    let s = &report.summary;
    let broken = f64::from(
        s.unresolved_class_references
            .saturating_add(s.keyframes_undefined),
    );
    round1((broken * 3.0).min(BROKEN_REFERENCES_CAP))
}

/// Design-token erosion: mixing `font-size` units past a healthy baseline and
/// Tailwind arbitrary-value bypasses both work against a single source of truth
/// for the scale.
///
/// v2 splits the category into two saturating terms. The unit term is capped at
/// [`FONT_SIZE_UNIT_TERM_CAP`] so `font-size` units alone cannot dominate. The
/// arbitrary-value term divides the distinct-value count by
/// [`ARBITRARY_VALUE_DIVISOR`] and caps at [`ARBITRARY_VALUE_TERM_CAP`], so it
/// saturates gently around ~50-100+ distinct values instead of instant-capping
/// the whole category at 10 (v1 reached the ceiling at just 10 arbitrary
/// values, punishing normal Tailwind usage). v3 adds a third saturating term,
/// [`value_sprawl_term`], for hardcoded `box-shadow` / `border-radius` /
/// `line-height` value drift (var-blind: scales tokenized via `var(--*)` score 0).
/// The category cap stays 10pt.
fn token_erosion_penalty(report: &CssAnalyticsReport) -> f64 {
    let s = &report.summary;
    let extra_units = f64::from(
        s.font_size_units_used
            .saturating_sub(FONT_SIZE_UNIT_BASELINE),
    );
    let unit_term = (extra_units * FONT_SIZE_UNIT_WEIGHT).min(FONT_SIZE_UNIT_TERM_CAP);
    let arbitrary = f64::from(s.tailwind_arbitrary_values);
    let arbitrary_term = (arbitrary / ARBITRARY_VALUE_DIVISOR).min(ARBITRARY_VALUE_TERM_CAP);
    round1((unit_term + arbitrary_term + value_sprawl_term(s)).min(TOKEN_EROSION_CAP))
}

/// Hardcoded-value-sprawl sub-term (v3): the distinct count of hardcoded literal
/// `box-shadow` / `border-radius` / `line-height` values above each per-axis
/// healthy baseline, summed and saturated. This is the design-token DRIFT signal
/// the v3 reweight shifts the duplication-family weight toward: a system that
/// tokenizes its scales via `var(--*)` scores 0 (var-referenced values are
/// invisible to the `unique_*` counts), while one that hardcodes many ad-hoc
/// values accrues a bounded, gently-growing penalty. Counts come from
/// `summary.unique_*` (recomputed at health time, never cached); see the module
/// docs and the [`SHADOW_SPRAWL_BASELINE`] / [`SPRAWL_DIVISOR`] rationale.
fn value_sprawl_term(s: &fallow_output::CssAnalyticsSummary) -> f64 {
    let excess = s
        .unique_box_shadows
        .saturating_sub(SHADOW_SPRAWL_BASELINE)
        .saturating_add(s.unique_border_radii.saturating_sub(RADIUS_SPRAWL_BASELINE))
        .saturating_add(
            s.unique_line_heights
                .saturating_sub(LINE_HEIGHT_SPRAWL_BASELINE),
        );
    (f64::from(excess) / SPRAWL_DIVISOR).min(SPRAWL_TERM_CAP)
}

/// Structural smells: `!important` density above a healthy floor and deep
/// style-rule nesting past a shallow floor.
///
/// Computed over the NON-ATOMIC surface only: flat compile-time-atomic object
/// CSS-in-JS (StyleX/Panda) has zero `!important` and minimal nesting by
/// construction, so including it would dilute the `!important` density (lowering
/// the penalty) and never raise nesting, trivially inflating the grade.
fn structural_penalty(inputs: &StylingScoringInputs) -> f64 {
    let important_pct = if inputs.non_atomic_declarations > 0 {
        f64::from(inputs.non_atomic_important_declarations)
            / f64::from(inputs.non_atomic_declarations)
            * 100.0
    } else {
        0.0
    };
    let important = (important_pct - IMPORTANT_DENSITY_FLOOR).max(0.0);
    let nesting = (f64::from(inputs.non_atomic_max_nesting_depth) - NESTING_DEPTH_FLOOR).max(0.0);
    round1((important + nesting).min(STRUCTURAL_CAP))
}

fn round1(value: f64) -> f64 {
    (value * 10.0).round() / 10.0
}

#[cfg(test)]
mod tests;