# marque-core
Format-agnostic text scanner and attribute parser — the front end of the Marque rule engine.
`marque-core` turns raw byte buffers into structured attributes for downstream
rules. It does no I/O, holds no format-specific knowledge, and never copies
input — every result references the original `&[u8]` through byte spans.
## Role in Marque
```
bytes → [Scanner] → MarkingCandidate → [Parser] → ParsedMarking → marque-engine → diagnostics
```
`Scanner` uses `memchr` SIMD to locate candidate regions cheaply, with zero
heap allocation on the hot path, and emits `MarkingCandidate` values (a
`Span` plus a `MarkingType`). `Parser` runs an Aho-Corasick automaton —
supplied by the caller via a `TokenSet` impl — over each candidate to produce
a `ParsedMarking` (structured attributes + span) that the engine hands to
rules.
## Usage
```rust
use marque_core::{Parser, Scanner};
use marque_ism::{CapcoTokenSet, MarkingType};
let source = b"(S) example text";
let tokens = CapcoTokenSet::new();
let parser = Parser::new(&tokens);
for candidate in Scanner::scan(source) {
// Scanner also emits zero-length `PageBreak` candidates; skip them
// before parsing — they carry no parsable content.
if candidate.kind == MarkingType::PageBreak {
continue;
}
let parsed = parser.parse(&candidate, source)?;
// hand `parsed.attrs` + `parsed.source_span` to the engine
}
# Ok::<(), marque_core::CoreError>(())
```
`Scanner::scan` is an associated function (no instance needed). `Parser`
borrows its `TokenSet` for the duration of parsing. The pivot attribute type
is `IsmAttributes` (re-exported from `marque-ism`). Spans are byte offsets
into the original buffer; rule crates read them without allocating.
## Features
| `serde` | off | `Serialize` / `Deserialize` on public types via `marque-ism/serde`. |
## WASM compatibility
WASM-safe. No file system, network, or thread-local state. The crate compiles
unchanged to `wasm32-unknown-unknown` and is consumed by `marque-wasm`. Format
extraction (PDF, DOCX, etc.) is the caller's responsibility — pass already-
extracted text in.
## License
Marque License 1.0 (`LicenseRef-MarqueLicense-1.0`). See [LICENSE.md](./LICENSE.md).