marque-core 0.2.1

Format-agnostic text scanner and attribute parser — the marque engine's front end
Documentation
<!--
SPDX-FileCopyrightText: 2026 Knitli Inc.

SPDX-License-Identifier: MIT OR Apache-2.0
-->

# marque-core

Format-agnostic text scanner and attribute parser — the front end of the Marque rule engine.

`marque-core` turns raw byte buffers into structured attributes for downstream
rules. It does no I/O, holds no format-specific knowledge, and never copies
input — every result references the original `&[u8]` through byte spans.

## Role in Marque

```
bytes → [Scanner] → MarkingCandidate → [Parser] → ParsedMarking → marque-engine → diagnostics
```

`Scanner` uses `memchr` SIMD to locate candidate regions cheaply, with zero
heap allocation on the hot path, and emits `MarkingCandidate` values (a
`Span` plus a `MarkingType`). `Parser` runs an Aho-Corasick automaton —
supplied by the caller via a `TokenSet` impl — over each candidate to produce
a `ParsedMarking` (structured attributes + span) that the engine hands to
rules.

## Usage

```rust
use marque_core::{Parser, Scanner};
use marque_ism::{CapcoTokenSet, MarkingType};

let source = b"(S) example text";
let tokens = CapcoTokenSet::new();
let parser = Parser::new(&tokens);

for candidate in Scanner::scan(source) {
    // Scanner also emits zero-length `PageBreak` candidates; skip them
    // before parsing — they carry no parsable content.
    if candidate.kind == MarkingType::PageBreak {
        continue;
    }
    let parsed = parser.parse(&candidate, source)?;
    // hand `parsed.attrs` + `parsed.source_span` to the engine
}
# Ok::<(), marque_core::CoreError>(())
```

`Scanner::scan` is an associated function (no instance needed). `Parser`
borrows its `TokenSet` for the duration of parsing. The pivot attribute type
is `IsmAttributes` (re-exported from `marque-ism`). Spans are byte offsets
into the original buffer; rule crates read them without allocating.

## Features

| Feature | Default | Effect |
|---|---|---|
| `serde` | off | `Serialize` / `Deserialize` on public types via `marque-ism/serde`. |

## WASM compatibility

WASM-safe. No file system, network, or thread-local state. The crate compiles
unchanged to `wasm32-unknown-unknown` and is consumed by `marque-wasm`. Format
extraction (PDF, DOCX, etc.) is the caller's responsibility — pass already-
extracted text in.

## License

Marque License 1.0 (`LicenseRef-MarqueLicense-1.0`). See [LICENSE.md](./LICENSE.md).