# Architecture
## Overview
sbom-tools follows a linear pipeline that normalizes inputs, performs semantic
diffing and scoring, and renders the result through reports or the TUI.
```
SBOM files
-> parsers (CycloneDX/SPDX, streaming for large files)
-> NormalizedSbom (canonical model)
-> matching (PURL, alias, ecosystem, adaptive fuzzy, LSH index)
-> diff engine (semantic + graph)
-> DiffResult / QualityReport
-> reports (json/sarif/html/markdown/csv/summary/table/side-by-side) or TUI
```
## Core Modules
- **cli** (`src/cli/`): Clap command handlers for diff, view, validate, quality, query, diff-multi, timeline, matrix, completions, and config-schema.
- **config** (`src/config/`): Typed configuration with YAML/JSON support, presets, validation, and schema generation.
- **parsers** (`src/parsers/`): CycloneDX/SPDX format detection and parsing into NormalizedSbom. Includes a streaming parser for large files (>512MB) with progress callbacks.
- **model** (`src/model/`): Canonical data model — NormalizedSbom, Component, CanonicalId, DocumentMetadata, Vulnerability, DependencyEdge, License.
- **matching** (`src/matching/`): Multi-tier fuzzy matching for component alignment.
- Exact PURL match, alias lookup, ecosystem-specific normalization, string similarity (Jaro-Winkler, Levenshtein).
- Adaptive thresholds that adjust based on score distribution.
- LSH (locality-sensitive hashing) index for fast candidate lookup.
- Custom rule engine for user-defined matching rules.
- **diff** (`src/diff/`): Semantic diff engine with graph-aware dependency diffing, incremental diff tracking, and cost-model scoring.
- **enrichment** (`src/enrichment/`): OSV and KEV vulnerability database integration plus EOL detection via endoflife.date API (feature-gated behind `enrichment`). Includes file-based caching with TTL and staleness tracking.
- **quality** (`src/quality/`): SBOM quality scoring and compliance checks against NTIA, FDA, CRA, NIST SSDF, and EO 14028 standards.
- **pipeline** (`src/pipeline/`): Orchestrates the parse → enrich → diff → report workflow. Handles stage sequencing and output routing.
- **reports** (`src/reports/`): Report generators for JSON, SARIF, HTML, Markdown, CSV, summary, table, and side-by-side formats. Includes a streaming reporter for large outputs.
- **tui** (`src/tui/`): Interactive ratatui-based UI for exploring diffs and single SBOMs. Supports diff mode, view mode, fleet comparison, and timeline views.
## Data Flow
### Single Diff (`diff` command)
The `diff` command uses the full pipeline:
1. CLI parses arguments and merges config (`src/cli/`, `src/config/`).
2. `pipeline::parse_sbom_with_context()` reads and parses both SBOMs into `ParsedSbom` (preserves raw content for TUI Source tab).
3. Optional enrichment mutates SBOMs in-place with OSV/KEV data (`pipeline::enrich_sbom()`, feature-gated). Currently called from CLI, not pipeline.
4. `pipeline::compute_diff()` builds `DiffEngine` with matching config, rules, and graph options, then diffs.
5. `pipeline::output_report()` selects reporter format, pre-computes CRA compliance, and writes to file or stdout. For TUI output, raw content is preserved; for non-TUI, it is dropped to save memory.
### Multi-SBOM Commands (`diff-multi`, `timeline`, `matrix`)
Multi-SBOM commands bypass the pipeline and use `MultiDiffEngine` directly:
```
cli/multi.rs
-> parse_sbom() (direct, not pipeline)
-> FuzzyMatchConfig::from_preset()
-> MultiDiffEngine::new()
-> .diff_multi() / .timeline() / .matrix()
-> JSON or TUI output only
```
Key differences from single-diff:
- No `DiffConfig` — uses scattered function parameters instead
- No enrichment — vulnerability data not available in multi-SBOM views
- No report format variety — JSON or TUI only (no SARIF/CSV/HTML/Markdown)
- No streaming support
- No matching rules
### Query Command (`query`)
The `query` command searches for components across multiple SBOMs:
```
cli/query.rs
-> parse_multiple_sboms() (reused from multi.rs)
-> Optional: enrich_sbom() / enrich_eol() (feature-gated)
-> For each SBOM: NormalizedSbomIndex::build()
-> QueryFilter::matches() on each component via ComponentSortKey
-> Deduplicate by (name_lower, version), merge found_in sources
-> Output: table (default), JSON, or CSV
```
Key design:
- Reuses `parse_multiple_sboms()` and `get_sbom_name()` from `cli/multi.rs`
- Supports optional enrichment (OSV vulns + EOL) before searching
- Version filter tries semver range parsing first (for `<2.17.0`), falls back to exact match
- All filters are AND-combined; pattern filter uses `ComponentSortKey::contains()` for broad matching
- Deduplication groups by `(name_lower, version)` and merges `found_in` sources and vulnerability IDs
- Exit code 1 if no matches (useful for CI gate checks)
### Enrichment Flow
Enrichment is feature-gated behind the `enrichment` Cargo feature. When enabled,
the CLI layer (`src/cli/diff.rs`) constructs `OsvEnricherConfig` from `DiffConfig.enrichment`
and calls `pipeline::enrich_sbom()` to mutate each SBOM in-place before diffing.
```
DiffConfig.enrichment → OsvEnricherConfig
→ pipeline::enrich_sbom(&mut sbom, &config)
→ OsvEnricher::new() → enricher.enrich(&mut components)
→ Re-insert enriched components into sbom.components
```
The pipeline module exports `enrich_sbom()` but does not orchestrate it — the CLI is
responsible for calling it at the right time.
## TUI Architecture
The TUI has two systems:
- **Legacy system** (`src/tui/views/`): Monolithic `App` struct holds all state. Each tab renders via functions in `views/*.rs` that take `&App` or `&mut App`. Event handling is a large match tree in `input.rs`. All tabs except Quality use this system.
- **Modern system** (`src/tui/view/views/`): `ViewState` trait with per-tab state structs. Events return `EventResult` enums for navigation, overlays, status messages. Only the Quality tab uses this system as a proof-of-concept.
Both systems coexist. The `App` struct has a `quality_view: Option<QualityView>` field that dispatches to the modern system when present. Migration of remaining tabs to the `ViewState` trait is possible but not planned — the legacy system works and is well-tested.
## Invariants and Conventions
- NormalizedSbom is the single source of truth for parsed data.
- Components are keyed by CanonicalId for stability across formats.
- DiffResult summary values are derived from change lists.
- TUI layers should align selection/sort with the same source lists.
- Builders use `with_*` naming and `mut self -> Self` pattern.
- Error handling: thiserror for library code, anyhow for CLI.
- No `&String`, `&Vec<T>`, `Box<dyn Error>`, or production panics.
## Extension Points
- **Matching rules**: Configurable matching behavior via YAML configs and custom rule engine.
- **Enrichment**: OSV/KEV integration for vulnerability data and EOL detection via endoflife.date API (feature-gated).
- **Reports**: Add new generators by implementing ReportGenerator.
- **Compliance**: Add new standards by extending the quality scorer (currently: NTIA, FDA, CRA, NIST SSDF, EO 14028).
## Known Technical Debt
- Multi-SBOM fleet commands (diff-multi, timeline, matrix) bypass the pipeline (no enrichment, limited output formats). The `query` command supports enrichment.
- Enrichment is orchestrated by CLI, not the pipeline module.
- ~112 `unwrap()` calls across 27 files (most in non-production code: cache, parsers, config) and ~30 `expect()` calls (safe-by-construction).
- 12 lock-poisoning patterns (`lock().unwrap()` / `lock().expect()`).
- 45 integration tests across 7 test files in `tests/`.
- TUI dual system (legacy + ViewState trait) — only Quality migrated.