sbom-tools 0.1.19

Semantic SBOM diff and analysis tool
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
# Standards Support Improvement Plan: CycloneDX 1.7 + SPDX 3.0

## Executive Summary

This plan covers adding CycloneDX 1.7 and SPDX 3.0 support to sbom-tools. These are
fundamentally different in scope: CycloneDX 1.7 is an **incremental, additive update**
from 1.6 (no breaking changes, ~2-3 weeks of work), while SPDX 3.0 is a **ground-up
redesign** requiring a new parser, model adaptations, and careful normalization (~6-8
weeks). The existing architecture's format-agnostic canonical model makes both feasible
without disrupting the diff/matching/quality/enrichment layers.

---

## Part 1: CycloneDX 1.7

### Difficulty: Low-Medium (Incremental)

CycloneDX 1.7 (published October 2025) is backward-compatible with 1.6. All changes are
additive: new fields, new top-level elements, and some structural relaxations. The existing
serde-based parser handles unknown fields gracefully via `Option<T>`, so 1.7 documents
already parse today -- they just lose the new fields.

### Phase 1.1: Parser Updates (cyclonedx.rs)

**Effort: ~3 days**

1. **Add "1.7" to `supported_versions()`**
   - File: `src/parsers/cyclonedx.rs`
   - Trivial: add `"1.7"` to the returned vec

2. **New serde structs for 1.7 fields:**

   ```
   // Top-level citations array
   struct CdxCitation {
       bom_ref: Option<String>,
       pointers: Option<Vec<String>>,      // JSON Pointer (RFC 6901)
       expressions: Option<Vec<String>>,   // JSONPath/XPath
       timestamp: String,                  // Required
       attributed_to: Option<String>,      // bom-ref to entity
       process: Option<String>,            // bom-ref to formulation
       note: Option<String>,
       signature: Option<CdxSignature>,
   }

   // Distribution constraints on metadata
   struct CdxDistributionConstraints {
       tlp: Option<TlpClassification>,     // CLEAR/GREEN/AMBER/AMBER_AND_STRICT/RED
   }

   // License expression details (1.7 relaxation)
   struct CdxExpressionDetail {
       license_identifier: String,
       bom_ref: Option<String>,
       text: Option<CdxAttachment>,
       url: Option<String>,
   }
   ```

3. **Extend existing structs:**
   - `CycloneDxBom`: add `citations: Option<Vec<CdxCitation>>`
   - `CdxMetadata`: add `distribution_constraints: Option<CdxDistributionConstraints>`
   - `CdxComponent`: add `is_external: Option<bool>`, `version_range: Option<String>`,
     `patent_assertions: Option<serde_json::Value>` (preserve but don't deeply parse)
   - `CdxExternalReference`: add `properties: Option<Vec<CdxProperty>>`

4. **License array parsing change:**
   - 1.7 moves the `oneOf` from array-level to item-level, allowing mixed license+expression entries
   - Current `deserialize_licenses` needs to handle both 1.6 (array-level oneOf) and 1.7 (item-level oneOf)
   - Since serde already tries each variant per-item, this may already work; needs testing

5. **New hash algorithms:**
   - Add `Streebog256` and `Streebog512` variants to `HashAlgorithm` enum
   - Already has `Other(String)` fallback, but named variants are better for quality scoring

### Phase 1.2: Model Extensions

**Effort: ~2 days**

1. **DocumentMetadata additions:**
   - Add `distribution_classification: Option<String>` (TLP value)
   - Add `citations_count: usize` (how many citations present -- useful for quality scoring)

2. **Component additions:**
   - Add `is_external: Option<bool>` to `Component` struct
   - Add `version_range: Option<String>` to `Component` struct
   - These are format-agnostic concepts (SPDX 3.0 has similar `supportLevel` on Artifact)

3. **FormatExtensions:**
   - Store full `citations` array in `FormatExtensions::cyclonedx` for lossless round-trip
   - Store patent assertions similarly (domain-specific, not worth canonical modeling yet)

4. **HashAlgorithm enum:**
   - Add `Streebog256`, `Streebog512` variants
   - Update `Display`, `from_str`, quality scoring hash strength classification

5. **ExternalRefType enum:**
   - Add `Citation`, `Patent`, `PatentAssertion`, `PatentFamily` variants

### Phase 1.3: Quality & Compliance Updates

**Effort: ~2 days**

1. **Quality scoring:**
   - Provenance category: award bonus points for documents with `citations` (data provenance)
   - Integrity category: recognize Streebog hash algorithms
   - `check_format_specific()`: update CycloneDX version warning threshold (< 1.5 instead of < 1.4)

2. **Compliance updates:**
   - CRA: `citations` provides Art.13(3) provenance evidence; check and report
   - NIST SSDF: citations support PS.1 provenance traceability

3. **CBOM awareness (light touch):**
   - Don't deeply model the crypto property changes (too specialized)
   - Store in `ComponentExtensions::raw` for preservation
   - Quality: detect `component_type == Cryptographic` and check for crypto properties presence

### Phase 1.4: Testing

**Effort: ~2 days**

1. **Fixture files:**
   - Create `tests/fixtures/cyclonedx/minimal-1.7.cdx.json` with citations, is_external, TLP
   - Create `tests/fixtures/cyclonedx/cbom-1.7.cdx.json` with crypto components
   - Create `tests/fixtures/cyclonedx/mixed-licenses-1.7.cdx.json` testing the license relaxation

2. **Unit tests:**
   - Parse 1.7 document, verify citations extracted
   - Parse 1.7 document, verify is_external flag on components
   - Parse 1.7 document, verify TLP distribution constraint
   - Verify 1.4/1.5/1.6 documents still parse identically (regression)
   - Cross-format diff: 1.7 CycloneDX vs SPDX 2.3

3. **Integration tests:**
   - Diff between 1.6 and 1.7 documents (same components, different format versions)
   - Quality scoring of 1.7 document with citations
   - Compliance checking with CRA standard against 1.7 document

### CycloneDX 1.7 -- Total Effort: ~9 days

---

## Part 2: SPDX 3.0

### Difficulty: High (Ground-Up Redesign)

SPDX 3.0 is not an incremental update from 2.3 -- it is a completely new data model.
The flat document structure is replaced by a graph of typed Elements. Relationships are
first-class objects. Vulnerabilities are native (Security profile). Licensing uses
relationships instead of inline properties. The serialization is JSON-LD.

The good news: our canonical model (`NormalizedSbom`) already abstracts away format
differences, so the diff/matching/quality/enrichment layers need minimal changes. The
challenge is entirely in the parser and normalization layer.

### Phase 2.1: Research & Design (Pre-Implementation)

**Effort: ~3 days**

1. **JSON-LD strategy decision:**
   - Option A: Use a JSON-LD library (`json-ld` crate) for full spec compliance
   - Option B: Treat SPDX 3.0 JSON-LD as "JSON with a `@context` field" and parse with serde
   - **Recommendation: Option B** -- SPDX 3.0 JSON-LD is relatively flat in practice;
     full JSON-LD processing (context resolution, graph expansion) is overkill for parsing.
     The `@context` URL just maps short names to RDF URIs. Real-world SPDX 3.0 files use
     the compact JSON-LD form which is structurally regular JSON.

2. **Profile scope decision:**
   - Must support: **Core** (mandatory) + **Software** (packages/files) + **SimpleLicensing**
   - Should support: **Security** (vulnerabilities/VEX -- aligns with existing enrichment model)
   - Defer: AI, Dataset, Build, Lite, Extension, ExpandedLicensing (can be added incrementally)

3. **Relationship mapping design:**
   - SPDX 3.0 has ~80 relationship types vs our 21 `DependencyType` variants
   - Strategy: map known types to existing enum variants, use `Other(String)` for the rest
   - Key new mappings needed: `affects`, `fixedIn`, `foundBy`, `hasAssessmentFor`,
     `hasConcludedLicense`, `hasDeclaredLicense`

### Phase 2.2: New SPDX 3.0 Parser

**Effort: ~8 days**

File: `src/parsers/spdx3.rs` (new file, separate from existing `spdx.rs`)

**Rationale for separate parser:** The data models are so different that trying to share
code between 2.x and 3.0 parsers would create a maintenance nightmare. Better to have
clean separation with shared normalization logic.

1. **Serde models for SPDX 3.0 JSON-LD:**

   ```
   // Top-level document
   struct Spdx3Document {
       context: serde_json::Value,           // @context (JSON-LD, ignored for parsing)
       spdx_id: String,                      // URI
       type_: String,                        // "SpdxDocument"
       name: Option<String>,
       creation_info: Spdx3CreationInfo,
       data_license: Option<String>,         // Always CC0-1.0
       namespace_map: Option<Vec<Spdx3NamespaceMap>>,
       imports: Option<Vec<Spdx3ExternalMap>>,
       root_element: Option<Vec<String>>,    // URIs of root elements
       element: Option<Vec<Spdx3Element>>,   // All elements in the document
       profile_conformance: Option<Vec<String>>,
   }

   // Polymorphic element (uses serde tag)
   #[serde(tag = "type")]
   enum Spdx3Element {
       Package(Spdx3Package),
       File(Spdx3File),
       Snippet(Spdx3Snippet),
       Relationship(Spdx3Relationship),
       Vulnerability(Spdx3Vulnerability),
       Annotation(Spdx3Annotation),
       Person(Spdx3Agent),
       Organization(Spdx3Agent),
       Tool(Spdx3Agent),
       SoftwareAgent(Spdx3Agent),
       Bom(Spdx3Bom),
       Bundle(Spdx3Bundle),
       // Fallback for unknown element types
       #[serde(other)]
       Unknown,
   }

   struct Spdx3Package {
       spdx_id: String,
       name: Option<String>,
       package_version: Option<String>,
       package_url: Option<String>,           // First-class PURL
       download_location: Option<String>,
       home_page: Option<String>,
       copyright_text: Option<String>,
       description: Option<String>,
       supplied_by: Option<Vec<String>>,       // URIs to Agent elements
       originated_by: Option<Vec<String>>,
       verified_using: Option<Vec<Spdx3IntegrityMethod>>,
       external_identifier: Option<Vec<Spdx3ExternalIdentifier>>,
       external_ref: Option<Vec<Spdx3ExternalRef>>,
       primary_purpose: Option<String>,
       additional_purpose: Option<Vec<String>>,
       content_identifier: Option<String>,
       attribution_text: Option<Vec<String>>,
       support_level: Option<String>,
       valid_until_time: Option<String>,
       built_time: Option<String>,
       release_time: Option<String>,
       creation_info: Option<Spdx3CreationInfo>,
   }

   struct Spdx3Relationship {
       spdx_id: String,
       from: String,                          // URI
       to: Vec<String>,                       // URIs (1..*)
       relationship_type: String,
       start_time: Option<String>,
       end_time: Option<String>,
       completeness: Option<String>,
       creation_info: Option<Spdx3CreationInfo>,
   }

   struct Spdx3Vulnerability {
       spdx_id: String,
       name: Option<String>,
       description: Option<String>,
       published_time: Option<String>,
       modified_time: Option<String>,
       withdrawn_time: Option<String>,
       external_identifier: Option<Vec<Spdx3ExternalIdentifier>>,
       external_ref: Option<Vec<Spdx3ExternalRef>>,
   }
   ```

2. **Element resolution pipeline:**
   - First pass: deserialize all elements into typed structs
   - Build element index: `HashMap<String, &Spdx3Element>` (spdx_id URI -> element)
   - Resolve Agent references: look up `supplied_by`/`originated_by` URIs in the index
   - Resolve Relationship targets: look up `from`/`to` URIs in the index

3. **License extraction via relationships:**
   - Find all Relationship elements where `relationship_type` is `hasDeclaredLicense` or `hasConcludedLicense`
   - The `to` target is a license element (LicenseExpression or CustomLicense)
   - Map to existing `LicenseInfo` model

4. **Vulnerability extraction (Security profile):**
   - Find Vulnerability elements
   - Find VulnAssessmentRelationship elements (CVSS, VEX, EPSS, etc.)
   - Map assessment relationships to `VulnerabilityRef` with scoring data
   - Map VEX relationships to `VexStatus`

### Phase 2.3: Format Detection Updates

**Effort: ~2 days**

1. **Detection for SPDX 3.0 vs 2.x:**
   - SPDX 3.0 JSON-LD has `@context` field and `"type": "SpdxDocument"`
   - SPDX 2.x has `spdxVersion: "SPDX-2.x"` at top level
   - Detection confidence:
     - `@context` containing "spdx.org/rdf/3" → CERTAIN for SPDX 3.0
     - `"type": "SpdxDocument"` → HIGH for SPDX 3.0
     - Presence of `spdxVersion` starting with "SPDX-2" → dispatch to existing 2.x parser
   - File: `src/parsers/detection.rs` -- add SPDX 3.0 detection branch

2. **Parser dispatch:**
   - `ParserKind` enum: add `Spdx3` variant (or detect version in Spdx parser and dispatch)
   - **Recommended:** Keep `ParserKind::Spdx` but have the SPDX detection logic route to
     `Spdx3Parser` vs `SpdxParser` based on detected version

3. **Streaming parser:**
   - `src/parsers/streaming.rs` needs SPDX 3.0 format detection in peek buffer
   - Look for `@context` in first 4KB

### Phase 2.4: Normalization (SPDX 3.0 -> NormalizedSbom)

**Effort: ~5 days**

This is the critical translation layer. SPDX 3.0's graph model must map to our flat
`NormalizedSbom` with `components` HashMap + `edges` Vec.

1. **Element-to-Component mapping:**
   - `Spdx3Package``Component` (primary mapping)
   - `Spdx3File``Component` with `component_type: File`
   - `Spdx3Snippet` → skip or store in extensions (too granular for SBOM diffing)
   - Agent elements → `Creator`/`Organization` (not components)
   - Vulnerability elements → `VulnerabilityRef` (attached to affected components)

2. **Identifier mapping:**
   - `package_url` (first-class in 3.0) → `identifiers.purl`
   - `external_identifier` with type `cpe23``identifiers.cpe`
   - `external_identifier` with type `swid``identifiers.swid`
   - `spdx_id` URI → `identifiers.format_id`
   - `CanonicalId` uses existing tiered fallback (PURL > CPE > SWID > synthetic)

3. **Relationship-to-Edge mapping:**
   - Relationship elements with dependency-related types → `DependencyEdge`
   - License relationships → populate `Component.licenses` on the `from` component
   - Vulnerability assessment relationships → populate `Component.vulnerabilities`
   - `rootElement``primary_component_id`

4. **CreationInfo handling:**
   - Per-element CreationInfo in 3.0 vs document-level in `DocumentMetadata`
   - Strategy: Use SpdxDocument's CreationInfo for `DocumentMetadata.created`/`creators`
   - Store per-element creation info in `ComponentExtensions.annotations` (preserves provenance)

5. **Profile-aware parsing:**
   - Check `profile_conformance` to know which element types to expect
   - If Security profile present, parse Vulnerability + VulnAssessment elements
   - If only Core + Software, skip vulnerability elements gracefully

6. **Handling `NoneElement` and `NoAssertionElement`:**
   - These are SPDX 3.0 sentinel values meaning "explicitly none" vs "unknown"
   - Map to appropriate `None` vs default values in the canonical model

### Phase 2.5: Model Extensions for SPDX 3.0

**Effort: ~3 days**

1. **DependencyType enum -- new variants:**
   ```
   // SPDX 3.0 relationship types that don't map to existing variants
   Affects,                    // Vulnerability → Package
   FixedIn,                    // Vulnerability → Package@version
   FoundBy,                    // Vulnerability → Agent
   HasAssessmentFor,           // VulnAssessment → Vulnerability
   HasConcludedLicense,        // Package → License
   HasDeclaredLicense,         // Package → License
   AmendedBy,                  // Element → Element (modification)
   ConfiguredBy,               // Package → configuration
   CoordinatedBy,              // Vulnerability → Agent
   ExpandedFromArchive,        // File → Package (extraction)
   FileAdded,                  // already exists
   FileDeleted,                // already exists
   HasPrerequisite,            // already mapped via SPDX 2.3
   InvokedBy,                  // Tool → Agent
   PackagedBy,                 // Package → Agent
   ReportedBy,                 // Vulnerability → Agent
   TestedOn,                   // Tool → Package
   TrainedOn,                  // AI model → Dataset
   UnderInvestigationFor,      // VEX → Vulnerability
   UsesTool,                   // Agent → Tool
   ```
   - Most of these won't affect diff/matching (they're vulnerability/agent relationships)
   - Only dependency-related types flow through the diff engine

2. **SbomFormat considerations:**
   - Keep `SbomFormat::Spdx` for both 2.x and 3.0 (same standard family)
   - Distinguish via `format_version` ("2.3" vs "3.0.1")
   - This preserves cross-version diffing without special-casing the format enum

3. **DocumentMetadata additions:**
   - Add `profile_conformance: Vec<String>` (which SPDX 3.0 profiles the document uses)
   - Add `root_elements: Vec<CanonicalId>` (SPDX 3.0 can have multiple root elements)

4. **FormatExtensions:**
   - Store the full `@context` in `FormatExtensions::spdx` for round-trip capability
   - Store unrecognized element types for lossless preservation

### Phase 2.6: Quality & Compliance for SPDX 3.0

**Effort: ~2 days**

1. **Quality scoring adjustments:**
   - Identifiers: SPDX 3.0 has richer `ExternalIdentifier` with typed identifiers;
     award higher scores for documents with typed CPE/PURL/SWHID identifiers
   - Completeness: check `Relationship.completeness` field
   - VulnDocs: if Security profile present, score native vulnerability data
   - Provenance: per-element CreationInfo provides better provenance; score accordingly

2. **Compliance updates:**
   - `check_format_specific()`: add SPDX 3.0 version detection
   - NTIA: SPDX 3.0 satisfies all NTIA minimum elements natively (supplier as typed Agent)
   - CRA: SPDX 3.0's Security profile provides Art.13(9) vulnerability statements
   - EO 14028: SPDX 3.0 format itself satisfies the "machine-readable" requirement

3. **Report generation:**
   - Format version display: show "SPDX 3.0.1" in reports
   - Profile conformance display in summary reports

### Phase 2.7: Testing

**Effort: ~4 days**

1. **Fixture files:**
   - `tests/fixtures/spdx3/minimal.spdx3.json` -- Core + Software only
   - `tests/fixtures/spdx3/with-vulnerabilities.spdx3.json` -- Core + Software + Security
   - `tests/fixtures/spdx3/with-licenses.spdx3.json` -- License relationships
   - `tests/fixtures/spdx3/multi-root.spdx3.json` -- Multiple root elements
   - Source: Generate from SPDX 3.0 examples at https://github.com/spdx/spdx-3-model
     or create manually following the spec

2. **Unit tests (parser):**
   - Parse minimal SPDX 3.0 document
   - Verify element type dispatch (Package, File, Relationship, Vulnerability)
   - Verify license extraction via relationship traversal
   - Verify vulnerability mapping from Security profile
   - Verify Agent resolution for supplier/originator
   - Verify per-element CreationInfo preservation
   - Verify `@context` detection and version routing

3. **Integration tests:**
   - Cross-format diff: SPDX 3.0 vs CycloneDX 1.7 (the key use case)
   - Cross-version diff: SPDX 2.3 vs SPDX 3.0 (same components, different formats)
   - Quality scoring of SPDX 3.0 document
   - Pipeline: parse → enrich → diff → report with SPDX 3.0 input

4. **Regression tests:**
   - All existing SPDX 2.2/2.3 tests must continue passing
   - All existing CycloneDX 1.4-1.6 tests must continue passing
   - Cross-format diff between SPDX 2.3 and CycloneDX 1.6 unchanged

### SPDX 3.0 -- Total Effort: ~27 days

---

## Part 3: Cross-Cutting Concerns

### 3.1: Cross-Format Diffing (SPDX 3.0 <-> CycloneDX 1.7)

**Effort: ~2 days (mostly testing)**

The canonical model already enables this. Both formats normalize to `NormalizedSbom`,
and the diff engine is format-agnostic. The work here is:

1. **Verify CanonicalId stability across formats:**
   - Same PURL in CycloneDX 1.7 and SPDX 3.0 must produce same CanonicalId
   - SPDX 3.0's `packageUrl` (first-class field) should match CycloneDX's `purl` field
   - Test with real-world PURLs (npm, pypi, maven, cargo)

2. **Verify relationship equivalence:**
   - CycloneDX `dependsOn` ↔ SPDX 3.0 `DEPENDS_ON` relationship type
   - Both should produce identical `DependencyEdge` records

3. **Handle metadata differences:**
   - CycloneDX 1.7 `citations` has no SPDX 3.0 equivalent (format-specific)
   - SPDX 3.0 `profileConformance` has no CycloneDX equivalent (format-specific)
   - These should appear in diff as metadata differences, not false positives

### 3.2: Backward Compatibility Guarantees

1. **Parser coexistence:**
   - Existing `SpdxParser` handles 2.2/2.3 (unchanged)
   - New `Spdx3Parser` handles 3.0+ (separate code path)
   - `CycloneDxParser` handles 1.4-1.7 (extended, not replaced)
   - Format detection routes to correct parser automatically

2. **Model backward compatibility:**
   - All new fields on `Component`, `DocumentMetadata` are `Option<T>` (default None)
   - New enum variants are behind `#[non_exhaustive]` (already in place)
   - Serialized output format unchanged for existing fields

3. **CLI backward compatibility:**
   - No new CLI flags needed (format detection is automatic)
   - Existing `--format` hints should work with new versions
   - Reports should display correct format version strings

### 3.3: Documentation

1. **Update README.md** format support table
2. **Update ARCHITECTURE.md** with SPDX 3.0 parser architecture notes
3. **Update CLI help text** for `--format` flag

---

## Implementation Order (Recommended)

### Sprint 1: CycloneDX 1.7 (Week 1-2)
- Phase 1.1: Parser updates
- Phase 1.2: Model extensions
- Phase 1.3: Quality/compliance updates
- Phase 1.4: Testing
- **Ship as PR #1** -- low risk, high immediate value

### Sprint 2: SPDX 3.0 Foundation (Week 3-5)
- Phase 2.1: Research & design decisions
- Phase 2.2: New parser (spdx3.rs)
- Phase 2.3: Format detection updates
- **Ship as PR #2** -- parser + detection only, no model changes yet

### Sprint 3: SPDX 3.0 Integration (Week 6-8)
- Phase 2.4: Normalization pipeline
- Phase 2.5: Model extensions
- Phase 2.6: Quality/compliance
- Phase 2.7: Testing
- Phase 3.1: Cross-format diffing verification
- **Ship as PR #3** -- full integration with tests

### Sprint 4: Polish (Week 9)
- Phase 3.2: Backward compatibility verification
- Phase 3.3: Documentation
- Cross-version regression testing
- **Ship as PR #4** -- docs + final verification

---

## Risk Assessment

| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| SPDX 3.0 JSON-LD has edge cases that break serde parsing | Medium | High | Test against official SPDX 3.0 examples; fall back to json-ld crate if needed |
| Real-world SPDX 3.0 files are rare (low adoption) | High | Low | Create comprehensive fixtures; monitor community adoption |
| SPDX 3.0 spec is still evolving (3.0.1 published) | Medium | Medium | Pin to 3.0.1; design parser to handle minor version bumps |
| CycloneDX 1.7 license structure change breaks existing parsing | Low | Medium | Custom deserializer already handles variant formats; add tests |
| New DependencyType variants break exhaustive matches downstream | Low | Low | Already using `#[non_exhaustive]` + `Other(String)` fallback |
| Performance regression with SPDX 3.0 element graph resolution | Low | Medium | Single-pass element indexing; benchmark against large documents |

---

## Dependencies & Crate Changes

### No new crate dependencies required for CycloneDX 1.7
- Existing `serde`, `serde_json`, `quick-xml` are sufficient

### Potential new dependencies for SPDX 3.0
- **None required** if using Option B (serde-based JSON-LD parsing)
- Consider `json-ld` crate only if real-world files use advanced JSON-LD features
  (context imports, graph expansion, blank nodes) -- unlikely for SPDX 3.0 in practice

### spdx crate (v0.13)
- Currently used only for SPDX license expression parsing, NOT for document parsing
- Does not need updating for SPDX 3.0 document support (license expressions unchanged)
- May need update if ExpandedLicensing profile support is added later

---

## What NOT to Do

1. **Don't merge SPDX 2.x and 3.0 parsers** -- the data models are too different;
   shared code would be a maintenance burden
2. **Don't add full JSON-LD processing** -- real-world SPDX 3.0 files are compact JSON-LD
   that parses as regular JSON; full graph expansion is unnecessary
3. **Don't deeply model CycloneDX CBOM changes** -- crypto property restructuring is
   highly specialized; store in extensions and parse on demand
4. **Don't deeply model SPDX 3.0 AI/Dataset/Build profiles** -- defer until there's
   user demand; the Core + Software + Security profiles cover 95% of use cases
5. **Don't change the canonical model fundamentally** -- the flat
   `HashMap<CanonicalId, Component>` works well; SPDX 3.0's graph model flattens
   naturally into components + edges during normalization
6. **Don't break the `spdx` crate dependency** -- it's used for license expression
   parsing only, which is unchanged in SPDX 3.0