taudit_core/lib.rs
1//! # `taudit-core` — workspace-internal authority graph + rules engine
2//!
3//! ## Architecture
4//!
5//! `taudit-core` is the workspace-internal **engine**: graph mutation,
6//! BFS propagation, rule evaluation, baselines, suppressions, ignore-pattern
7//! handling, and the cross-sink helpers (`compute_fingerprint`,
8//! `compute_finding_group_id`, `rule_id_for`) that JSON / SARIF /
9//! CloudEvents sinks call directly.
10//!
11//! The **stable wire types** (everything that crosses the JSON / SARIF /
12//! CloudEvents boundary — `Finding`, `FindingCategory`, `Severity`,
13//! `Recommendation`, `FindingSource`, `FixEffort`, `FindingExtras`,
14//! `NodeKind`, `EdgeKind`, `TrustZone`, `AuthorityCompleteness`,
15//! `IdentityScope`, `GapKind`, `Node`, `Edge`, `PipelineSource`,
16//! `ParamSpec`, `AuthorityEdgeSummary`, `PropagationPath`, `NodeId`,
17//! `EdgeId`, every `META_*` metadata-key constant) live in the
18//! [`taudit-api`](https://crates.io/crates/taudit-api) crate.
19//!
20//! Each module here re-exports the wire types it used to own so
21//! existing in-tree imports (`use taudit_core::finding::Finding`,
22//! `use taudit_core::graph::NodeKind`, …) keep compiling unchanged.
23//!
24//! ## API stability
25//!
26//! `taudit-core` is a **workspace-internal library**, NOT a stable public
27//! API. External consumers (`tsign`, `axiom`, custom automation, SIEMs,
28//! third-party tooling) should depend on `taudit-api` directly (for the
29//! Rust contract) or consume the JSON / SARIF / CloudEvents output
30//! contracts (for cross-language integration). Both are versioned and
31//! treated as load-bearing:
32//!
33//! * `taudit-api` `0.x` — the Rust wire-type contract. While at `0.x`
34//! additive changes can ship in any minor; breaking changes require
35//! a `0.{N+1}` minor bump and a CHANGELOG migration note. At `1.0`
36//! this lifts to standard semver.
37//! * `contracts/schemas/taudit-report.schema.json` — JSON output
38//! * `schemas/finding.v1.json` — single finding object
39//! * `schemas/baseline.v1.json` — baseline file format
40//! * `contracts/schemas/taudit-cloudevent-finding-v1.schema.json` —
41//! CloudEvents extension attributes
42//! * SARIF 2.1.0 — `partialFingerprints` keys are stable
43//!
44//! Symbols marked `#[doc(hidden)]` here are required to be `pub` for
45//! inter-crate visibility within this workspace (sink crates call
46//! `compute_fingerprint`, `compute_finding_group_id`, `rule_id_for`,
47//! `downgrade_severity` directly), but their signatures may change between
48//! minor `taudit` versions without a SemVer bump on `taudit-core`. Treat
49//! them as `pub(crate-tree)`, not `pub`.
50//!
51//! See ADR 0001 (graph as product) and the v1.1.0 release notes for the
52//! full rationale behind this split.
53
54pub mod baselines;
55pub mod custom_rules;
56pub mod error;
57pub mod exploit_path;
58pub mod finding;
59pub mod graph;
60pub mod ignore;
61pub mod map;
62pub mod ports;
63pub mod propagation;
64pub mod rules;
65pub mod summary;
66pub mod suppressions;
67
68// ── Defense-in-depth caps for adversarial config files ────────────────
69//
70// taudit ingests YAML from PRs (pipeline files, custom-rule files,
71// suppressions, .tauditignore). Without caps, a hostile contributor can
72// allocate hundreds of MiB by submitting a single file and DoS the CI
73// runner before any rule logic runs. The caps below bound that surface.
74//
75// They are deliberately *constants*, not flags: every realistic CI YAML
76// is well under these limits, and a flag would just be another lever
77// for an attacker who has already convinced you to merge their PR. If
78// a legitimate use case for a larger file emerges we can revisit; for
79// now the council prefers a hard ceiling.
80
81/// Maximum size in bytes of any single pipeline / config / invariant YAML
82/// taudit will read.
83///
84/// Files above this size are rejected with a clear error before any
85/// allocation for `serde_yaml`. 2 MiB is well above the largest realistic
86/// CI YAML; the largest legitimate workflow in the existing
87/// `corpus/` is under 100 KiB.
88pub const MAX_INPUT_FILE_BYTES: u64 = 2 * 1024 * 1024;
89
90/// Read `path` to a `String`, but refuse files larger than
91/// [`MAX_INPUT_FILE_BYTES`].
92///
93/// Why this exists: a 50 MiB hostile YAML allocates ~150 MiB peak inside
94/// `serde_yaml` (triple-parse + a `serde_yaml::Value` for every node).
95/// Capping at the filesystem boundary keeps that allocation pre-empted —
96/// we never even hand the bytes to the YAML parser.
97///
98/// `metadata` follows symlinks; that is fine *here* because callers that
99/// need an explicit symlink fence call [`read_capped_with_symlink_fence`]
100/// instead, which canonicalises before calling this.
101///
102/// Returned [`io::Error`]s use `InvalidData` for the size-cap rejection so
103/// callers can distinguish IO failure from cap rejection if they want.
104pub fn read_capped(path: &std::path::Path) -> std::io::Result<String> {
105 let meta = std::fs::metadata(path)?;
106 if meta.len() > MAX_INPUT_FILE_BYTES {
107 return Err(std::io::Error::new(
108 std::io::ErrorKind::InvalidData,
109 format!(
110 "taudit refuses files larger than {} bytes ({} MiB). {} is {} bytes. \
111 If you have a legitimate use case for a larger file, please file an issue.",
112 MAX_INPUT_FILE_BYTES,
113 MAX_INPUT_FILE_BYTES / (1024 * 1024),
114 path.display(),
115 meta.len(),
116 ),
117 ));
118 }
119 std::fs::read_to_string(path)
120}
121
122/// Read `path` to a `String`, but only if it is either (a) not a symlink
123/// or (b) a symlink whose canonical target is a descendant of
124/// `cwd_canonical`. Also enforces [`MAX_INPUT_FILE_BYTES`].
125///
126/// Used for ambient config files that live at the repo root and that an
127/// adversarial PR could plant as a symlink — `.taudit-suppressions.yml`
128/// and `.tauditignore`. A symlink to `/etc/passwd` plus a YAML parse
129/// failure was previously a content-leak channel via stderr; this helper
130/// closes that.
131///
132/// `cwd_canonical` should be `std::env::current_dir()?.canonicalize()?`.
133/// Pass it in (rather than computing it here) so callers can canonicalise
134/// once per scan and so tests can fence against a temporary working
135/// directory.
136///
137/// On macOS, both `cwd_canonical` and the symlink target resolve through
138/// `/private/tmp` so the descendant check stays correct under the OS's
139/// hidden symlink-prefix.
140pub fn read_capped_with_symlink_fence(
141 path: &std::path::Path,
142 cwd_canonical: &std::path::Path,
143) -> std::io::Result<String> {
144 let meta = std::fs::symlink_metadata(path)?;
145 if meta.file_type().is_symlink() {
146 // canonicalize follows the chain; a broken symlink errors here,
147 // which is the right answer (we are not going to read it).
148 let target = std::fs::canonicalize(path)?;
149 if !target.starts_with(cwd_canonical) {
150 return Err(std::io::Error::new(
151 std::io::ErrorKind::PermissionDenied,
152 format!(
153 "refusing to read symlinked {} pointing to {} outside the working directory",
154 path.display(),
155 target.display(),
156 ),
157 ));
158 }
159 }
160 read_capped(path)
161}