1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
//! # `taudit-core` — workspace-internal authority graph + rules engine
//!
//! ## Architecture
//!
//! `taudit-core` is the workspace-internal **engine**: graph mutation,
//! BFS propagation, rule evaluation, baselines, suppressions, ignore-pattern
//! handling, and the cross-sink helpers (`compute_fingerprint`,
//! `compute_finding_group_id`, `rule_id_for`) that JSON / SARIF /
//! CloudEvents sinks call directly.
//!
//! The **stable wire types** (everything that crosses the JSON / SARIF /
//! CloudEvents boundary — `Finding`, `FindingCategory`, `Severity`,
//! `Recommendation`, `FindingSource`, `FixEffort`, `FindingExtras`,
//! `NodeKind`, `EdgeKind`, `TrustZone`, `AuthorityCompleteness`,
//! `IdentityScope`, `GapKind`, `Node`, `Edge`, `PipelineSource`,
//! `ParamSpec`, `AuthorityEdgeSummary`, `PropagationPath`, `NodeId`,
//! `EdgeId`, every `META_*` metadata-key constant) live in the
//! [`taudit-api`](https://crates.io/crates/taudit-api) crate.
//!
//! Each module here re-exports the wire types it used to own so
//! existing in-tree imports (`use taudit_core::finding::Finding`,
//! `use taudit_core::graph::NodeKind`, …) keep compiling unchanged.
//!
//! ## API stability
//!
//! `taudit-core` is a **workspace-internal library**, NOT a stable public
//! API. External consumers (`tsign`, `axiom`, custom automation, SIEMs,
//! third-party tooling) should depend on `taudit-api` directly (for the
//! Rust contract) or consume the JSON / SARIF / CloudEvents output
//! contracts (for cross-language integration). Both are versioned and
//! treated as load-bearing:
//!
//! * `taudit-api` `0.x` — the Rust wire-type contract. While at `0.x`
//! additive changes can ship in any minor; breaking changes require
//! a `0.{N+1}` minor bump and a CHANGELOG migration note. At `1.0`
//! this lifts to standard semver.
//! * `contracts/schemas/taudit-report.schema.json` — JSON output
//! * `schemas/finding.v1.json` — single finding object
//! * `schemas/baseline.v1.json` — baseline file format
//! * `contracts/schemas/taudit-cloudevent-finding-v1.schema.json` —
//! CloudEvents extension attributes
//! * SARIF 2.1.0 — `partialFingerprints` keys are stable
//!
//! Symbols marked `#[doc(hidden)]` here are required to be `pub` for
//! inter-crate visibility within this workspace (sink crates call
//! `compute_fingerprint`, `compute_finding_group_id`, `rule_id_for`,
//! `downgrade_severity` directly), but their signatures may change between
//! minor `taudit` versions without a SemVer bump on `taudit-core`. Treat
//! them as `pub(crate-tree)`, not `pub`.
//!
//! See ADR 0001 (graph as product) and the v1.1.0 release notes for the
//! full rationale behind this split.
// ── Defense-in-depth caps for adversarial config files ────────────────
//
// taudit ingests YAML from PRs (pipeline files, custom-rule files,
// suppressions, .tauditignore). Without caps, a hostile contributor can
// allocate hundreds of MiB by submitting a single file and DoS the CI
// runner before any rule logic runs. The caps below bound that surface.
//
// They are deliberately *constants*, not flags: every realistic CI YAML
// is well under these limits, and a flag would just be another lever
// for an attacker who has already convinced you to merge their PR. If
// a legitimate use case for a larger file emerges we can revisit; for
// now the council prefers a hard ceiling.
/// Maximum size in bytes of any single pipeline / config / invariant YAML
/// taudit will read.
///
/// Files above this size are rejected with a clear error before any
/// allocation for `serde_yaml`. 2 MiB is well above the largest realistic
/// CI YAML; the largest legitimate workflow in the existing
/// `corpus/` is under 100 KiB.
pub const MAX_INPUT_FILE_BYTES: u64 = 2 * 1024 * 1024;
/// Read `path` to a `String`, but refuse files larger than
/// [`MAX_INPUT_FILE_BYTES`].
///
/// Why this exists: a 50 MiB hostile YAML allocates ~150 MiB peak inside
/// `serde_yaml` (triple-parse + a `serde_yaml::Value` for every node).
/// Capping at the filesystem boundary keeps that allocation pre-empted —
/// we never even hand the bytes to the YAML parser.
///
/// `metadata` follows symlinks; that is fine *here* because callers that
/// need an explicit symlink fence call [`read_capped_with_symlink_fence`]
/// instead, which canonicalises before calling this.
///
/// Returned [`io::Error`]s use `InvalidData` for the size-cap rejection so
/// callers can distinguish IO failure from cap rejection if they want.
/// Read `path` to a `String`, but only if it is either (a) not a symlink
/// or (b) a symlink whose canonical target is a descendant of
/// `cwd_canonical`. Also enforces [`MAX_INPUT_FILE_BYTES`].
///
/// Used for ambient config files that live at the repo root and that an
/// adversarial PR could plant as a symlink — `.taudit-suppressions.yml`
/// and `.tauditignore`. A symlink to `/etc/passwd` plus a YAML parse
/// failure was previously a content-leak channel via stderr; this helper
/// closes that.
///
/// `cwd_canonical` should be `std::env::current_dir()?.canonicalize()?`.
/// Pass it in (rather than computing it here) so callers can canonicalise
/// once per scan and so tests can fence against a temporary working
/// directory.
///
/// On macOS, both `cwd_canonical` and the symlink target resolve through
/// `/private/tmp` so the descendant check stays correct under the OS's
/// hidden symlink-prefix.