marque-core
Format-agnostic text scanner and attribute parser — the front end of the Marque rule engine.
marque-core turns raw byte buffers into structured attributes for downstream
rules. It does no I/O, holds no format-specific knowledge, and never copies
input — every result references the original &[u8] through byte spans.
Role in Marque
bytes → [Scanner] → MarkingCandidate → [Parser] → ParsedMarking → marque-engine → diagnostics
Scanner uses memchr SIMD to locate candidate regions cheaply, with zero
heap allocation on the hot path, and emits MarkingCandidate values (a
Span plus a MarkingType). Parser runs an Aho-Corasick automaton —
supplied by the caller via a TokenSet impl — over each candidate to produce
a ParsedMarking (structured attributes + span) that the engine hands to
rules.
Usage
use ;
use ;
let source = b"(S) example text";
let tokens = new;
let parser = new;
for candidate in scan
# Ok::
Scanner::scan is an associated function (no instance needed). Parser
borrows its TokenSet for the duration of parsing. The pivot attribute type
is IsmAttributes (re-exported from marque-ism). Spans are byte offsets
into the original buffer; rule crates read them without allocating.
Features
| Feature | Default | Effect |
|---|---|---|
serde |
off | Serialize / Deserialize on public types via marque-ism/serde. |
WASM compatibility
WASM-safe. No file system, network, or thread-local state. The crate compiles
unchanged to wasm32-unknown-unknown and is consumed by marque-wasm. Format
extraction (PDF, DOCX, etc.) is the caller's responsibility — pass already-
extracted text in.
License
Marque License 1.0 (LicenseRef-MarqueLicense-1.0). See LICENSE.md.