1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
//! # seq_geom_parser
//!
//! A parser and extractor for sequencing read geometry descriptions.
//!
//! Geometry strings describe the layout of technical and biological sequences
//! within sequencing reads. For example, `1{b[16]u[12]x:}2{r:}` means:
//! - Read 1: 16bp cell barcode, 12bp UMI, discard rest
//! - Read 2: biological read (full length)
//!
//! ## Supported tags
//!
//! | Tag | Meaning | Example |
//! |-----|---------|---------|
//! | `b[N]` | Cell barcode | `b[16]` |
//! | `bN[L]` | Numbered barcode at level N | `b0[8]` |
//! | `s[N]` | Sample barcode (sugar for b0) | `s[8]` |
//! | `u[N]` | UMI | `u[12]` |
//! | `r[N]` / `r:` | Biological read (fixed/unbounded) | `r[50]`, `r:` |
//! | `f[SEQ]` | Fixed anchor sequence | `f[TTGCTAGGACCG]` |
//! | `x[N]` / `x:` | Discard (fixed/unbounded) | `x[18]`, `x:` |
//! | `x[N-M]` | Variable-length discard | `x[0-3]` |
//!
//! ## Distance functions
//!
//! Fixed anchors can be wrapped in distance functions for approximate matching:
//! - `hamming(f[SEQ], N)` — match within Hamming distance N
//!
//! ## Variable-length normalization
//!
//! Variable-length tags are supported when a downstream fixed anchor makes
//! their boundaries inferable, such as `b[9-10]u[12]f[SEQ]` or
//! `x[0-3]f[SEQ]s[10]`. Normalization helpers are exposed in [`normalize`] so
//! callers can pad extracted variable-length barcode/UMI sequences to their
//! declared maximum width when needed.
//!
//! ## Complexity Tiers
//!
//! The public API distinguishes three extraction tiers:
//! - [`GeometryComplexity::FixedOffsets`]: every extracted field has a static
//! offset. Example: `1{b[16]u[12]x:}2{r:}`.
//! - [`GeometryComplexity::InferableVariable`]: one variable-width region per
//! read, inferred from a fixed right boundary. Example:
//! `1{b[9-10]f[ACGT]u[12]}2{r:}`.
//! - [`GeometryComplexity::BoundaryResolved`]: the read must first be split by
//! resolved boundaries such as anchors and read ends. Example:
//! `1{r:f[ACAGT]b[9-11]}2{u[12]x:}`.
//!
//! ## Boundary Resolution
//!
//! For [`GeometryComplexity::BoundaryResolved`] geometries, extraction proceeds
//! in two phases:
//! 1. Resolve anchor positions in read order.
//! 2. Assign the spans between those resolved boundaries to fields.
//!
//! If multiple anchor placements satisfy the geometry, the solver chooses the
//! monotone placement chain with the minimum total distance score. Ties are
//! broken by choosing the lexicographically leftmost anchor positions.
//!
//! ## Public Model vs Compiled Executor
//!
//! The boundary-oriented types exposed from [`types`] describe the public
//! semantic model of a geometry: boundaries, anchors, and segments between
//! resolved boundaries.
//!
//! They are not the same as the extractor's internal compiled representation.
//! [`CompiledGeom`] compiles parsed geometries into private extraction plans in
//! [`extract`] that are optimized for the hot path. This split is intentional:
//! the public types document the model and complexity hierarchy, while the
//! executor keeps a separate IR that can evolve for performance without
//! changing the public API.
//!
//! ## Examples By Tier
//!
//! ```rust
//! use seq_geom_parser::{geometry_complexity, parse_geometry, GeometryComplexity};
//!
//! let simple = parse_geometry("1{b[16]u[12]x:}2{r:}").unwrap();
//! assert_eq!(geometry_complexity(&simple), GeometryComplexity::FixedOffsets);
//!
//! let inferable = parse_geometry("1{b[9-10]f[ACGT]u[12]}2{r:}").unwrap();
//! assert_eq!(
//! geometry_complexity(&inferable),
//! GeometryComplexity::InferableVariable
//! );
//!
//! let boundary = parse_geometry("1{r:f[ACAGT]b[9-11]}2{u[12]x:}").unwrap();
//! assert_eq!(
//! geometry_complexity(&boundary),
//! GeometryComplexity::BoundaryResolved
//! );
//! ```
// Re-export key types at crate root
pub use ;
pub use ;
pub use *;