iriq — IRI extraction, normalization, clustering
A Rust port of the iriq Ruby gem and Go module. Same behavior across all three runtimes — enforced by golden JSON fixtures and a CLI parity harness in CI.
[]
= "0.29"
For SQLite-backed corpora (the on-disk store with concurrent observers):
[]
= { = "0.29", = ["sqlite"] }
What it does
use ;
// Parse and normalize a single URL.
let iri = parse?;
assert_eq!;
assert_eq!; // default port dropped
assert_eq!;
// Pull IRIs out of free text.
let urls = new.extract_strings;
assert_eq!;
// Annotated trace (what the CLI shows under `-e`).
let tr = trace?;
assert_eq!;
// Streaming clustering with a persistent corpus.
let mut corpus = open?; // .db/.sqlite/.sqlite3 → SQLite
for url in &
corpus.save?;
# Ok::
See the crate docs for the full API and the main project README for the conceptual overview shared with the Ruby + Go siblings.
Features
| Feature | What it does |
|---|---|
| (default) | Memory + JSON corpus backends. Pure Rust, no system deps. |
sqlite |
Adds the SQLite corpus backend via bundled rusqlite. Concurrent writers, incremental UPSERTs. |
Parity guarantees
This crate is byte-identical to the Ruby gem + Go module on:
- All segment classification decisions (~25 typed shapes — UUID, ISO date, file, email, IPv4/6, color, coordinate, country, base64, JWT, MIME, phone, etc.).
Iriq::Normalizer.normalize/iriq.Normalizeoutputs, including hint suppression for semantic types and canonical date / currency rendering.Iriq::Trace.for/iriq.TraceJSON structure for-eoutput.- Corpus shape clustering, param-type inference,
--stats/--reinfer/--propose-recognizers/--cross-host-shapesoutput. - Cross-runtime SQLite corpus files (schema v4 is shared — a
.dbcreated by the Go CLI opens cleanly under the Rust CLI and vice versa).
Anywhere they diverge is a bug — file an issue with the diff.
License
MIT, same as the Ruby gem and Go module.