zer-pipeline 1.1.0

End-to-end entity resolution pipeline: ingestion, blocking, comparison, scoring, and clustering
Documentation
# zer-pipeline

End-to-end entity resolution pipeline orchestration for the zer library.

Ties together ingestion, blocking, comparison, scoring, and clustering into a single `Pipeline` type. Supports deduplication (single source), linkage (two or more sources), and combined link-and-dedupe modes, with optional Tokio-based async progress events.

- **Documentation**: [docs.zal-analytics.ch]https://docs.zal-analytics.ch
- **Website**: [www.zal-analytics.ch]https://www.zal-analytics.ch
- **Support & feedback**: [info@zal-analytics.ch]mailto:info@zal-analytics.ch

## What it provides

| Item | Description |
|------|-------------|
| `Pipeline` / `PipelineBuilder` | Fluent builder and executor for the full ER pipeline |
| `Ingester` / `IngestResult` | Loads records from CSV or any `IntoRecord` source |
| `PipelineConfig` / `LinkMode` | Controls link vs. dedupe mode, batch size, score threshold |
| `PipelineEvent` | Async progress events (blocking done, comparison done, etc.) |
| `ClusterView` / `ClusterIter` | Iterates over resolved entity clusters |
| `BatchReport` | Per-batch statistics (pairs generated, matched, time) |

## Feature flags

| Flag            | Effect |
|-----------------|--------|
| `collect-pairs` | Keeps all scored pairs in memory after judging for PR-AUC analysis; incurs allocation cost proportional to candidate count |

## Usage

```rust
use zer::pipeline::{Pipeline, PipelineBuilder, PipelineConfig, LinkMode};

let pipeline = PipelineBuilder::new(schema)
    .mode(LinkMode::Dedupe)
    .config(PipelineConfig::default())
    .build(blocker, comparator, scorer, clusterer);

let report = pipeline.run(ingester).await?;
```

## Breaking changes

### v1.1

**`LinkedPair`, `record_id_a/b` replaced by `record_key_a/b`**

`LinkedPair` no longer exposes raw numeric record IDs. The fields `record_id_a: RecordId` and `record_id_b: RecordId` are replaced by `record_key_a: String` and `record_key_b: String`, which hold the natural key values (e.g. BSN, KvK number) as supplied via `DatasetConfig` at ingestion time.

```rust
// v1.0
println!("{} ↔ {}", pair.record_id_a, pair.record_id_b);

// v1.1
println!("{} ↔ {}", pair.record_key_a, pair.record_key_b);
```

Evaluation code that built `HashSet<(u64, u64)>` from ground-truth integer IDs must be updated to `HashSet<(String, String)>` using natural key pairs.

## License

Apache-2.0 · [GitHub](https://github.com/ZAL-Analytics/zer)