gliner2 0.1.1

Rust implementation of GLiNER2 with compatibility for upstream weights and Python training output.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
# Gliner2 Rust


[![Build Status](https://github.com/paul-english/gliner2_rs/workflows/CI/badge.svg)](https://github.com/paul-english/gliner2_rs/actions)
[![Latest version](https://img.shields.io/crates/v/gliner2.svg)](https://crates.io/crates/gliner2)
[![Documentation](https://docs.rs/gliner2/badge.svg)](https://docs.rs/gliner2)
[![Dependency Status](https://deps.rs/repo/github/paul-english/gliner2_rs/status.svg)](https://deps.rs/repo/github/paul-english/gliner2_rs)
![License](https://img.shields.io/crates/l/gliner2.svg)

This project implements the [Gliner2](https://github.com/fastino-ai/GLiNER2) model in rust with compatibility to the original weights and output of the python training.

```bash
cargo add gliner2
# and/or for a cli utility
cargo install gliner2
# LibTorch encoder for the example binary: cargo install gliner2 --features tch
# then: gliner2 --backend tch   (or GLINER2_BACKEND=tch)
```

## Recorded speed (comparison harness)

The [harness/](harness/) scripts run the same **release** Rust binaries (`harness_compare`, `harness_compare_mt` on CPU) against the PyPI `gliner2` package. Timing fields are wall-clock milliseconds from a single process: `load_model_ms` is one-time load; `infer_ms` is per-fixture forward work (entity harness sums all cases for the total row).

**Reproduce (CPU vs CPU):** from the repo root, with Hugging Face access for the default model:

```bash
uv sync --locked --directory harness
# All three flows (entity + multitask + throughput), each with Rust Candle + Rust tch-rs + Python.
# Optional: --candle-only (skip tch-rs / LibTorch), --update-readme (refresh comparison tables in this file)
bash harness/run_compare_all.sh
# Or run steps separately:
bash harness/run_all.sh
bash harness/run_multitask.sh
```

**Rust (tch-rs) timings in the tables:** `bash harness/run_compare_all.sh` sets `GLINER2_BENCH_TCH=1` so every step runs **Candle, tch-rs, and Python**. For `run_all.sh` / `run_multitask.sh` / `run_throughput.sh` alone, set `GLINER2_BENCH_TCH=1` yourself when you want tch-rs. The harness rebuilds `harness_compare` / `harness_throughput` with `--features tch-backend,download-libtorch`, so `torch-sys` downloads a **CPU LibTorch** that matches the pinned `tch` crate (no system LibTorch required). Before running the release binaries, the scripts source [harness/prepend_libtorch_ld_path.sh](harness/prepend_libtorch_ld_path.sh) so the dynamic loader can find `libtorch_cpu.so` under `target/release/build/torch-sys-*/out/...`. Alternatively, install LibTorch yourself and set `LIBTORCH` / `LD_LIBRARY_PATH`; then build with `tch-backend` only (omit `download-libtorch`).

**Entity/multitask compare vs tch:** `compare.py` / `compare_mt.py` check **Candle Rust vs Python** for correctness. The tch JSON is used for **extra timing columns** only. On the current LibTorch encoder bridge, **NER fixture outputs from `--backend tch` can be empty or otherwise diverge from Candle** while wall-clock `infer_ms` is still meaningful. To run the full shell flow without failing on unrelated checks, use `GLINER2_COMPARE_WARN_ONLY=1` with `run_all.sh` / `run_multitask.sh` when needed.

The shell wrappers call Python with `CUDA_VISIBLE_DEVICES=` and `--device cpu` so PyTorch does not use a discrete NVIDIA GPU and weights stay on CPU, matching the Rust side.

For **apples-to-apples timing** with the Rust single-forward path, Python uses `**batch_size=1`**: `batch_extract_entities([text], …, batch_size=1)` on the entity harness and `batch_extract([text], schema, batch_size=1, …)` on the multitask harness (instead of relying on `extract` / `extract_entities` defaults).

**Reading ratios:** for infer times, `python/candle` is `(python infer_ms) / (rust Candle infer_ms)` per case or for the total line. Values **below 1** mean Python spent less time on that measure for these fixtures; **above 1** mean Python was slower. When tch-rs is included (default for `run_compare_all.sh`, or `GLINER2_BENCH_TCH=1` for `run_all.sh` / `run_multitask.sh` alone), `compare.py` / `compare_mt.py` also print `tch/candle` and `python/tch`. The **per-case entity table** below lists both `python/candle` and `python/tch` (the latter is `(python infer_ms) / (rust tch-rs infer_ms)`). In the **auto-generated** tables from `patch_readme.py`, **bold** marks the **lowest** time in milliseconds in that row (load, sum, per-case, or throughput lane) and the **highest** samples/s in throughput rows; ties are all bolded.

### CPU vs CPU (recorded)

<!-- gliner2-harness:cpu-recorded -->
Model: `fastino/gliner2-base-v1`. **Recorded:** 2026-04-05 (Linux x86_64, local run; numbers vary by machine and load). **tch-rs `infer_ms`:** LibTorch encoder path with `download-libtorch` + [prepend_libtorch_ld_path.sh](harness/prepend_libtorch_ld_path.sh); see caveat above on NER outputs vs Candle.
<!-- /gliner2-harness:cpu-recorded -->

**Entity harness** ([harness/fixtures.json](harness/fixtures.json)) — metadata and per-case infer times:


<!-- gliner2-harness:entity-summary -->
|                              | Rust (Candle) | Rust (tch-rs)   | Python           |
| ---------------------------- | ------------- | --------------- | ---------------- |
| `device_note`                | `cpu`         | `cpu_libtorch` | `cpu`            |
| `load_model_ms`              | 262.0         | 1072.3          | 3422.5          |
| Sum of `infer_ms` over cases | 378.1         | 151.7           | 249.0            |
| Ratios (total infer)         | —             | tch/cnd **0.40×** | py/cnd **0.66×**; py/tch **1.64×** |
<!-- /gliner2-harness:entity-summary -->


<!-- gliner2-harness:entity-footnote -->
† Expected device label for tch-rs harness JSON when LibTorch is used (`run_compare_all.sh` enables this by default; otherwise set `GLINER2_BENCH_TCH=1`).
<!-- /gliner2-harness:entity-footnote -->


<!-- gliner2-harness:entity-cases -->
| Case id             | Candle `infer_ms` | tch-rs `infer_ms` | python `infer_ms` | `python/candle` | `python/tch` |
| ------------------- | ----------------- | ----------------- | ----------------- | --------------- | ------------ |
| `microsoft_windows` | 86.8 | 35.9 | 50.2 | 0.58× | 1.40× |
| `steve_jobs` | 94.5 | 38.8 | 72.6 | 0.77× | 1.87× |
| `sundar_pichai` | 96.3 | 36.9 | 59.7 | 0.62× | 1.62× |
| `tim_cook_iphone` | 100.6 | 40.0 | 66.4 | 0.66× | 1.66× |
<!-- /gliner2-harness:entity-cases -->


**Multitask harness** ([harness/fixtures_multitask.json](harness/fixtures_multitask.json)) — single fixture `entities_plus_sentiment`:


<!-- gliner2-harness:multitask-summary -->
|                      | Rust (Candle) | Rust (tch-rs)   | Python           |
| -------------------- | ------------- | --------------- | ---------------- |
| `device_note`        | `cpu`         | `cpu_libtorch` | `cpu`            |
| `load_model_ms`      | 244.4         | 1091.5          | 3133.2           |
| Sum of `infer_ms`    | 100.2         | 42.4           | 82.5            |
| Ratios (total infer) | —             | tch/cnd **0.42×** | py/cnd **0.82×**; py/tch **1.95×** |
<!-- /gliner2-harness:multitask-summary -->


These are **short-fixture** timings. Update the tables when you change the model, fixtures, or harness code in a way that affects performance.

### Throughput (local only; not in CI)

**These benchmarks are not run in GitHub Actions** (see [.github/workflows/ci.yml](.github/workflows/ci.yml)). Run them on your machine when you need larger-sample timing.

The harness uses **64 samples** by default, built by cycling texts from [harness/fixtures.json](harness/fixtures.json). Every sample uses the same entity label list `["company", "person", "product", "location", "date"]` so Rust [batch_extract_entities](src/extract.rs) and PyPI `batch_extract_entities` can process the full set. **Sequential rows** use **64× micro-batches of size 1** on both sides (Rust’s `forward` loop vs Python `batch_extract_entities([t], …, batch_size=1)`). **Batched rows** are timed at **`batch_size` 8 and 64** (Rust `--rust-batch-size` and Python `batch_extract_entities` with the same batch sizes).

```bash
uv sync --locked --directory harness
bash harness/run_throughput.sh
```

Optional: `bash harness/run_throughput.sh [fixtures.json] [rust_seq_out.json] [rust_batch_8_out.json] [rust_batch_64_out.json] [samples] [python_out.json]`. The script runs [harness/compare_throughput.py](harness/compare_throughput.py) on the JSON outputs (sequential + batched batch sizes 8 and 64).

Rust JSON includes a `backend` field (`candle` or `tch`). For LibTorch encoder timing only, set `GLINER2_THROUGHPUT_BACKEND=tch` (builds with `tch-backend,download-libtorch`). For **both** Rust backends plus Python in one run, use `GLINER2_BENCH_TCH=1 bash harness/run_throughput.sh`. You can also pass `--backend candle|tch` directly to `harness_throughput`.

<!-- gliner2-harness:throughput-recorded -->
**Recorded:** 2026-04-05 (Linux x86_64, local run, CPU, `CUDA_VISIBLE_DEVICES=` + `--device cpu` on Python). `warmup_full_passes=8` over all samples before each timed pass. [harness/compare_throughput.py](harness/compare_throughput.py) prints Candle vs tch vs Python (ratios: `py/cnd`, `tch/cnd`, `py/tch`).
<!-- /gliner2-harness:throughput-recorded -->

Batched Rust runs use Rayon for parallel preprocessing and per-record decode. The encoder forward pass is a single batched tensor op; parallelism applies to the CPU-bound work around it.


<!-- gliner2-harness:throughput-table -->
| Lane                           | Candle `infer_ms` | Candle s/s | tch-rs `infer_ms` | tch-rs s/s | Python `infer_ms` | Python s/s | py/candle | py/tch  |
| ------------------------------ | ----------------- | ---------- | ----------------- | ---------- | ----------------- | ---------- | --------- | ------- |
| Sequential (`batch_size` 1)    | 5702              | 11.22       | 3037              | 21.07       | 3475              | 18.42       | **0.61×** | 1.14×   |
| Batched (`batch_size` 8)       | 3299              | 19.40       | 1395              | 45.87       | 1618              | 39.56       | **0.49×** | 1.16×   |
| Batched (`batch_size` 64)      | 2801              | 22.85       | 1263              | 50.66       | 1237              | 51.75       | **0.44×** | 0.98×   |
<!-- /gliner2-harness:throughput-table -->


<!-- gliner2-harness:throughput-loads -->
Load times: Candle ~243 ms; tch ~1089 ms; Python ~2190 ms.
<!-- /gliner2-harness:throughput-loads -->

**Notes:**
- **tch-rs** is consistently faster than Python (~3–13% at batch_size 8–64). Both use LibTorch; tch-rs avoids Python interpreter overhead.
- **Candle** is ~4–5× slower than Python on batched workloads (py/candle 0.22–0.26×). Candle's pure-Rust GEMM is the bottleneck. Rayon parallelism gives ~2× within Candle (with `RAYON_NUM_THREADS=1`, batched drops to 5.27 s/s).
- `py/candle` and `py/tch` are time ratios: `(Python infer_ms) / (Rust infer_ms)`. Values **below 1** mean Python was faster; **above 1** mean Rust was faster.

Re-run `bash harness/run_throughput.sh` for Candle-only Rust, or `GLINER2_BENCH_TCH=1 bash harness/run_throughput.sh` to refresh all three lanes (bundled LibTorch via `download-libtorch`).

### GPU vs GPU (not recorded yet)

Fair comparison needs **both** implementations on the same device class (for example CUDA on the PyPI side and a GPU inference path in the Rust harness). That pairing is not wired into the harness yet, so no GPU numbers are published here.


|                  | Rust | Python |
| ---------------- | ---- | ------ |
| Device           | —    | —      |
| `load_model_ms`  | —    | —      |
| Total `infer_ms` | —    | —      |
| `python/rust`    | —    | —      |


## Usage

Like the Python implementation, this crate supports a full extraction API. You load the model once, build a `SchemaTransformer` from the tokenizer, then call `CandleExtractor` (or `TchExtractor`) methods.

### Setup (load model + tokenizer)

```rust
use anyhow::Result;
use gliner2::config::{download_model, ExtractorConfig};
use gliner2::{CandleExtractor, SchemaTransformer};

fn load_extractor(model_id: &str) -> Result<(CandleExtractor, SchemaTransformer)> {
    let files = download_model(model_id)?;
    let transformer = SchemaTransformer::new(files.tokenizer.to_str().unwrap())?;
    let config: ExtractorConfig = serde_json::from_str(&fs::read_to_string(&files.config)?)?;
    let vocab = transformer.tokenizer.get_vocab_size(true);

    let extractor = CandleExtractor::load_cpu(&files, config, vocab)?;
    Ok((extractor, transformer))
}
```

### Entity extraction (`extract_entities`)

Same idea as Python `extract_entities`: pass label names; the returned `serde_json::Value` uses the formatted shape (`entities` → label → list of strings, when `include_spans` / `include_confidence` are false).

```rust
use gliner2::ExtractOptions;
use serde_json::json;

let (extractor, transformer) = load_extractor("fastino/gliner2-base-v1")?;
let text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino.";

let entity_types = vec![
    "company".to_string(),
    "person".to_string(),
    "product".to_string(),
    "location".to_string(),
];

let opts = ExtractOptions::default();
let out = extractor.extract_entities(&transformer, text, &entity_types, &opts)?;
// e.g. {"entities":{"company":["Apple"],"person":["Tim Cook"], ...}}

// Optional: character spans + confidence (richer JSON, closer to Python with flags on)
let opts_rich = ExtractOptions {
    include_confidence: true,
    include_spans: true,
    ..Default::default()
};
let _out = extractor.extract_entities(&transformer, text, &entity_types, &opts_rich)?;
```

### Text classification (`classify_text`)

One classification task per call. `labels` is a JSON array of class names, or an object mapping label → description (like Python).

```rust
use gliner2::ExtractOptions;
use serde_json::json;

let (extractor, transformer) = load_extractor("fastino/gliner2-base-v1")?;
let text = "The new phone is amazing and well worth the price.";

// Single-label: scalar string under the task name when format_results is true
let opts = ExtractOptions::default();
let out = extractor.classify_text(
    &transformer,
    text,
    "sentiment",
    json!(["positive", "negative", "neutral"]),
    &opts,
)?;
// e.g. {"sentiment":"positive"}

// Labels with optional descriptions (mirrors Python dict form)
let out2 = extractor.classify_text(
    &transformer,
    text,
    "topic",
    json!({
        "technology": "Tech products and software",
        "business": "Corporate or market news",
        "sports": "Athletics and games"
    }),
    &opts,
)?;
```

### Relation extraction (`extract_relations`)

Pass relation names as a JSON array of strings, or a JSON object (name → description / config), matching Python `relations(...)`.

```rust
use gliner2::ExtractOptions;
use serde_json::json;

let (extractor, transformer) = load_extractor("fastino/gliner2-base-v1")?;
let text = "Tim Cook works for Apple, based in Cupertino.";

let opts = ExtractOptions::default();

// List of relation types → formatted results under "relation_extraction"
let out = extractor.extract_relations(
    &transformer,
    text,
    json!(["works_for", "located_in"]),
    &opts,
)?;
// e.g. {"relation_extraction":{"works_for":[["Tim Cook","Apple"]],"located_in":[["Apple","Cupertino"]]}}

// Dict form (descriptions stored like Python; inference uses relation names)
let _out2 = extractor.extract_relations(
    &transformer,
    text,
    json!({
        "works_for": "Employment between person and organization",
        "founded": "Founder relationship"
    }),
    &opts,
)?;
```

### Structured JSON (`extract_json`)

Field specs use the same string syntax as Python `extract_json` (`name::dtype::[choices]::description`).

```rust
use gliner2::ExtractOptions;
use serde_json::json;

let (extractor, transformer) = load_extractor("fastino/gliner2-base-v1")?;
let text = "iPhone 15 Pro costs $999 and is in stock.";

let structures = json!({
    "product_info": [
        "name::str",
        "price::str",
        "features::list",
        "availability::str::[in_stock|pre_order|sold_out]"
    ]
});
let out = extractor.extract_json(
    &transformer,
    text,
    &structures,
    &ExtractOptions::default(),
)?;
```

### Multi-task builder (`create_schema` + `extract`)

Combines entities, classifications, relations, and structured fields in one encoder pass. Uses the same `(extractor, transformer)` and `text` as in the setup section.

```rust
use gliner2::{
    create_schema, ExtractOptions, CandleExtractor, SchemaTransformer, ValueDtype,
};
use serde_json::json;

let mut s = create_schema();
s.entities(json!({
    "person": "Names of people",
    "company": "Organization names",
    "product": "Products or offerings",
}));
s.classification_simple("sentiment", json!(["positive", "negative", "neutral"]));
s.classification_simple("category", json!(["technology", "business", "finance", "healthcare"]));
s.relations(json!(["works_for", "founded", "located_in"]));
{
    let _ = s.structure("product_info")
        .field_str("name")
        .field_str("price")
        .field_list("features")
        .field_choices(
            "availability",
            vec![
                "in_stock".into(),
                "pre_order".into(),
                "sold_out".into(),
            ],
            ValueDtype::Str,
        );
}
let (schema_val, meta) = s.build();
let opts = ExtractOptions::default();
let out = extractor.extract(&transformer, text, &schema_val, &meta, &opts)?;
```

### Batch inference

The crate mirrors Python’s batched entry points: records are preprocessed **in parallel** (Rayon), **padded into chunks** of at most `ExtractOptions::batch_size` (default **8**), the encoder runs **once per chunk**, span representations are computed with `**compute_span_rep_batched`** when needed, then each row is **decoded in parallel** (Rayon). Results are returned in **input order**. Set `RAYON_NUM_THREADS` to control the thread pool size.

Set `batch_size` on `ExtractOptions` for any batch method (it only affects chunking, not single-sample `extract_`* calls).

#### Shared schema (one schema for every text)

Use the `CandleExtractor` helpers; they build the same schema as the single-sample methods and call `batch_extract` internally.

```rust
use gliner2::ExtractOptions;
use serde_json::json;

let (extractor, transformer) = load_extractor("fastino/gliner2-base-v1")?;
let texts: Vec<String> = vec![
    "Apple CEO Tim Cook announced iPhone 15.".into(),
    "Google unveiled Gemini in Mountain View.".into(),
];

let entity_types: Vec<String> = ["company", "person", "product", "location"]
    .into_iter()
    .map(String::from)
    .collect();

let mut opts = ExtractOptions::default();
opts.batch_size = 16;

let results = extractor.batch_extract_entities(&transformer, &texts, &entity_types, &opts)?;
// Vec<serde_json::Value>, one formatted result per input line

let cls = extractor.batch_classify_text(
    &transformer,
    &texts,
    "sentiment",
    json!(["positive", "negative", "neutral"]),
    &opts,
)?;

let rels = extractor.batch_extract_relations(
    &transformer,
    &texts,
    json!(["works_for", "located_in"]),
    &opts,
)?;

let structures = json!({
    "product_info": ["name::str", "price::str"]
});
let json_results = extractor.batch_extract_json(&transformer, &texts, &structures, &opts)?;
```

#### Full schema + metadata (`batch_extract`)

For the same multitask flow as `[extract](#multi-task-builder-create_schema--extract)`, build `(schema_val, meta)` once and run `**batch_extract**` with `**BatchSchemaMode::Shared**`, or pass per-row schemas and metadata with `**BatchSchemaMode::PerSample**`.

```rust
use gliner2::{batch_extract, create_schema, BatchSchemaMode, ExtractOptions};
use gliner2::schema::infer_metadata_from_schema;
use serde_json::{json, Value};

let (extractor, transformer) = load_extractor("fastino/gliner2-base-v1")?;
let texts: Vec<String> = vec!["First document.".into(), "Second document.".into()];

// Option A — shared multitask schema from the builder
let mut s = create_schema();
s.entities(json!({ "company": "", "person": "" }));
s.classification_simple("sentiment", json!(["positive", "negative", "neutral"]));
let (schema_val, meta) = s.build();

let opts = ExtractOptions {
    batch_size: 8,
    ..Default::default()
};

let out_shared = batch_extract(
    &extractor,
    &transformer,
    &texts,
    BatchSchemaMode::Shared {
        schema: &schema_val,
        meta: &meta,
    },
    &opts,
)?;

// Option B — per-text JSON schemas (e.g. from config); metadata from infer_metadata_from_schema
let schema_a: Value = json!({ "entities": { "person": "" } });
let schema_b: Value = json!({ "entities": { "location": "" } });
let schemas = vec![schema_a.clone(), schema_b.clone()];
let metas = vec![
    infer_metadata_from_schema(&schema_a),
    infer_metadata_from_schema(&schema_b),
];

let out_per = batch_extract(
    &extractor,
    &transformer,
    &texts,
    BatchSchemaMode::PerSample {
        schemas: &schemas,
        metas: &metas,
    },
    &opts,
)?;
```

For a shared schema you can also call `**extractor.batch_extract(&transformer, &texts, &schema_val, &meta, &opts)**` instead of the free function.

Lower-level reuse: after `**transform_extract**` you can run `**extract_from_preprocessed**` on one sample if you already have encoder outputs and span tensors; see `[src/extract.rs](src/extract.rs)`.

## Development

### Pre-commit

Git hooks run the same Rust checks as CI (`cargo fmt`, `cargo clippy` on the workspace) plus [Ruff](https://docs.astral.sh/ruff/) on first-party Python (for example under `harness/`). Paths under `reference/` and `.tickets/` are excluded from hooks.

**Prerequisites:** stable Rust with `rustfmt` and `clippy` (for example `rustup component add rustfmt clippy`).

**Install** [pre-commit](https://pre-commit.com/) (either is fine):

```bash
uv tool install pre-commit
```

From the repository root, install the hooks once:

```bash
pre-commit install
```

Optionally validate the whole tree:

```bash
pre-commit run --all-files
```

If you must commit before fixing Clippy, you can skip that hook: `SKIP=cargo-clippy git commit` (use sparingly; CI still enforces warnings as errors).

## CLI specification

The command-line interface `gliner2` offers another way to run for a handful or input types.

Install the binary with `cargo install gliner2`. Inference flags mirror [ExtractOptions](src/extract.rs) (`threshold`, `format_results`, `include_confidence`, `include_spans`, `max_len`).

### Command overview

```mermaid
flowchart LR
  subgraph sub [Subcommands]
    entities[entities]
    classify[classify]
    relations[relations]
    jsonCmd[json]
    run[run]
  end
  gliner2[gliner2] --> entities
  gliner2 --> classify
  gliner2 --> relations
  gliner2 --> jsonCmd
  gliner2 --> run
```




| Subcommand          | Purpose                                      | Library analogue                                             |
| ------------------- | -------------------------------------------- | ------------------------------------------------------------ |
| `gliner2 entities`  | Named-entity extraction                      | `CandleExtractor::extract_entities`, `Schema::entities`            |
| `gliner2 classify`  | Text classification (single- or multi-label) | `CandleExtractor::classify_text`, `Schema::classification`         |
| `gliner2 relations` | Relation extraction                          | `CandleExtractor::extract_relations`, `Schema::relations`          |
| `gliner2 json`      | Structured JSON / field extraction           | `CandleExtractor::extract_json`, `Schema::extract_json_structures` |
| `gliner2 run`       | Multitask: full engine schema in one pass    | `CandleExtractor::extract`                                         |


Top-level: `gliner2 --help`, `gliner2 --version`, and `gliner2 <subcommand> --help`.

### Global options

These apply to every subcommand unless stated otherwise.


| Flag                                                       | Description                                                                                                                                                     |
| ---------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--model <HF_REPO_ID>`                                     | Hugging Face model id (default: `fastino/gliner2-base-v1`, same as `harness/` scripts).                                                                         |
| `--model-dir <DIR>`                                        | Offline layout: `config.json`, `encoder_config/config.json`, `tokenizer.json`, `model.safetensors` (matches `ModelFiles` from [download_model](src/config.rs)). |
| `--config`, `--encoder-config`, `--tokenizer`, `--weights` | Explicit paths instead of `--model` / `--model-dir`.                                                                                                            |
| `-q`, `-v` / `--log-level`                                 | Quiet / verbose logging (exact mapping is implementation-defined).                                                                                              |


Use either Hub resolution (`--model`) **or** a local layout (`--model-dir` or explicit file flags), not a conflicting mix; if both are given, the implementation should reject the invocation with a clear error.

**Device and dtype** are intentionally unspecified here until the library exposes them; do not document GPU flags until they exist.

### Shared inference flags


| Flag                            | Maps to                     | Default                   |
| ------------------------------- | --------------------------- | ------------------------- |
| `--threshold <float>`           | `ExtractOptions::threshold` | `0.5`                     |
| `--max-len <N>`                 | `ExtractOptions::max_len`   | unset                     |
| `--include-confidence`          | `include_confidence`        | off                       |
| `--include-spans`               | `include_spans`             | off                       |
| `--raw` / `--no-format-results` | `format_results = false`    | formatted output (`true`) |


### Batching

The **library** implements tensor batch inference (`CandleExtractor::batch_extract*`, `ExtractOptions::batch_size`); see **[Batch inference](#batch-inference)** above. The **CLI** is not implemented yet; the contract below assumes the binary will drive those batched APIs for any input that produces **more than one logical record** (for example multi-line JSONL or plain text with `--text-split line` and multiple non-empty lines).


| Flag               | Description                                                                                                                                        |
| ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--batch-size <N>` | Maximum records per model batch. Default: **8** (implementation may choose a lower value on constrained devices, but must document any deviation). |
| `--batch-size 1`   | Effectively sequential inference (debugging, peak memory limits, or until batched paths are stable).                                               |


**Single-record** inputs (one JSONL line, one JSON object, or `--text-split full` over an entire file) form a single batch of size 1.

**Ordering:** Output lines must follow **the same order as input records**, even when flushing internal batches.

### Input and output

**Input:** final positional argument `INPUT`, or `-` for stdin.


| Flag                  | Description                                                                                                                     |
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `--text-field <KEY>`  | Field containing document text in JSON / JSONL records (default: `text`).                                                       |
| `--id-field <KEY>`    | Field to pass through as record id when present (default: `id`).                                                                |
| `--text-split <MODE>` | Plain text: `full` (whole file) or `line` (one record per non-empty line). `sentence` / `char-chunk` reserved. Default: `full`. |



| Format         | Detection / notes                                                                                                                                                                                                                                              |
| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **JSONL**      | One JSON object per line. Text from `--text-field` (default: `text`). If the input object contains the id key named by `--id-field` (default: `id`), copy that field through to the output object.                                                             |
| **JSON**       | A single object using the same field convention. For many records, use JSONL or preprocess (for example with `jq`).                                                                                                                                            |
| **Plain text** | Controlled by `--text-split`: `full` (default for `.txt`) — entire file is one record; `line` — each non-empty line is one record (multiple lines ⇒ batching). `**sentence` and `char-chunk`** are reserved for a future release (segmentation semantics TBD). |


**Output:** JSONL to stdout by default. `--output <PATH>` / `-o <PATH>` (use `-` for stdout). Optional `--pretty`: pretty-printed JSON when the implementation can buffer a single record or full result (for example one JSON object input or explicit single-line mode).

**Format inference:** From `INPUT`’s path suffix when possible: `.jsonl` → JSONL, `.json` → single JSON object, `.txt` (or other) → plain text with `--text-split`. For stdin (`-`), default input format is **JSONL** (one object per line).

### Output record shape

Each output line is one JSON object, for example:

```json
{"id":"optional","text":"...","result":{ }}
```

`result` matches Python / Rust `**format_results**` output for the task mix (entities, `relation_extraction`, classification keys, structured parents, etc.), consistent with the harness direction in `harness/compare.py` and multitask fixtures. If the input record has no `id`, omit `id` from the output (or use `null`; implementations should pick one behavior and document it).

### Subcommands

#### `gliner2 entities`


| Flag                   | Description                                                                                                                                 |
| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| `--label <NAME>`       | Repeatable entity type name.                                                                                                                |
| `--labels-json <PATH>` | JSON array of names or object form accepted by `Schema::entities` (name → description string or `{ "description", "dtype", "threshold" }`). |


**Precedence:** If any `--label` is given **and** `--labels-json` is given, exit with a usage error (do not merge).

#### `gliner2 classify`


| Flag                      | Description                                                                     |
| ------------------------- | ------------------------------------------------------------------------------- |
| `--task <NAME>`           | Required classification task name (JSON key in formatted output).               |
| `--label <NAME>`          | Repeatable class label.                                                         |
| `--labels-json <PATH>`    | Array of labels or object label → description (Python-style).                   |
| `--multi-label`           | Multi-label classification (`Schema::classification` with `multi_label: true`). |
| `--cls-threshold <float>` | Per-task classifier threshold (default `0.5`).                                  |


Same rule: do not combine `--label` with `--labels-json`.

#### `gliner2 relations`


| Flag                      | Description                                                         |
| ------------------------- | ------------------------------------------------------------------- |
| `--relation <NAME>`       | Repeatable relation type name.                                      |
| `--relations-json <PATH>` | JSON array of names or object form accepted by `Schema::relations`. |


Do not pass both repeatable `--relation` and `--relations-json`.

#### `gliner2 json`


| Flag                           | Description                                                      |
| ------------------------------ | ---------------------------------------------------------------- |
| `--structures <PATH>`          | JSON file: object mapping structure name → array of field specs. |
| `--structures-json '<OBJECT>'` | Same object inline.                                              |


Field specs use the same grammar as **Structured JSON (`extract_json`)** above: strings like `name::dtype::[choices]::description` or JSON objects parsed by [parse_field_spec](src/schema.rs). Do not pass both `--structures` and `--structures-json`.

#### `gliner2 run`


| Flag                   | Description                                                                                                                                                                                                                                                                                                           |
| ---------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--schema-file <PATH>` | Required. Full **engine** multitask schema (same shape as Python `GLiNER2.extract(text, schema)`). See [harness/fixtures_multitask.json](harness/fixtures_multitask.json) for a minimal example: `entities`, `classifications`, `relations`, `json_structures`, optional `entity_descriptions` / `json_descriptions`. |


Each entry in `classifications` should include `"true_label": ["N/A"]` when mirroring Python; the harness script [harness/run_multitask_python.py](harness/run_multitask_python.py) sets this if missing.

### Environment

- `**HF_TOKEN`** — access to private or gated Hub models.
- Cache and offline behavior follow [Hugging Face Hub](https://huggingface.co/docs/huggingface_hub/index) environment variables (`HF_HOME`, etc.); see upstream docs for the full list.

### Exit codes

- **0** — success.
- **Non-zero** — usage errors, I/O failures, model load failures, or inference errors.

### Examples

```bash
# Entities: JSONL in → JSONL out (multi-record; default --batch-size 8 unless overridden)
gliner2 entities --label company --label person --batch-size 16 docs.jsonl --output out.jsonl

# Classify with labels from a file (JSONL input)
gliner2 classify --task sentiment --labels-json labels.json tweets.jsonl

# Relations
gliner2 relations --relation works_for --relation located_in article.txt

# Structured JSON (structures file matches extract_json object shape)
gliner2 json --structures product_fields.json --text-split full product_blurb.txt

# Multitask: JSONL file, custom text field
gliner2 run --schema-file schema.json --text-field body --batch-size 4 docs.jsonl
```

Minimal multitask schema file (trimmed from fixtures):

```json
{
  "json_structures": [],
  "entities": { "company": "", "product": "" },
  "relations": [],
  "classifications": [
    {
      "task": "sentiment",
      "labels": ["positive", "negative", "neutral"],
      "multi_label": false,
      "cls_threshold": 0.5,
      "true_label": ["N/A"]
    }
  ]
}
```

## Python Interface (Not implemented yet)

A Python package that wraps this Rust implementation (`gliner2_rs`) is planned *if* we can get rust performance to be better than Python; it is **not implemented yet** (this section is a placeholder).

```bash
# use your package manager of choice
uv add gliner2_rs
```

```python
from gliner2_rs import Gliner2

gliner2 = Gliner2.from_pretrained('fastino/gliner2-base-v1')

text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday."
result = extractor.extract_entities(text, ["company", "person", "product", "location"])

print(result)
# {'entities': {'company': ['Apple'], 'person': ['Tim Cook'], 'product': ['iPhone 15'], 'location': ['Cupertino']}}
```