Gliner2 Rust
This project implements the Gliner2 model in rust with compatibility to the original weights and output of the python training.
# and/or for a cli utility
# LibTorch encoder for the example binary: cargo install gliner2 --features tch
# then: gliner2 --backend tch (or GLINER2_BACKEND=tch)
Recorded speed (comparison harness)
The harness/ scripts run the same release Rust binaries (harness_compare, harness_compare_mt on CPU) against the PyPI gliner2 package. Timing fields are wall-clock milliseconds from a single process: load_model_ms is one-time load; infer_ms is per-fixture forward work (entity harness sums all cases for the total row).
Reproduce (CPU vs CPU): from the repo root, with Hugging Face access for the default model:
# All three flows (entity + multitask + throughput), each with Rust Candle + Rust tch-rs + Python.
# Optional: --candle-only (skip tch-rs / LibTorch), --update-readme (refresh comparison tables in this file)
# Or run steps separately:
Rust (tch-rs) timings in the tables: bash harness/run_compare_all.sh sets GLINER2_BENCH_TCH=1 so every step runs Candle, tch-rs, and Python. For run_all.sh / run_multitask.sh / run_throughput.sh alone, set GLINER2_BENCH_TCH=1 yourself when you want tch-rs. The harness rebuilds harness_compare / harness_throughput with --features tch-backend,download-libtorch, so torch-sys downloads a CPU LibTorch that matches the pinned tch crate (no system LibTorch required). Before running the release binaries, the scripts source harness/prepend_libtorch_ld_path.sh so the dynamic loader can find libtorch_cpu.so under target/release/build/torch-sys-*/out/.... Alternatively, install LibTorch yourself and set LIBTORCH / LD_LIBRARY_PATH; then build with tch-backend only (omit download-libtorch).
Entity/multitask compare vs tch: compare.py / compare_mt.py check Candle Rust vs Python for correctness. The tch JSON is used for extra timing columns only. On the current LibTorch encoder bridge, NER fixture outputs from --backend tch can be empty or otherwise diverge from Candle while wall-clock infer_ms is still meaningful. To run the full shell flow without failing on unrelated checks, use GLINER2_COMPARE_WARN_ONLY=1 with run_all.sh / run_multitask.sh when needed.
The shell wrappers call Python with CUDA_VISIBLE_DEVICES= and --device cpu so PyTorch does not use a discrete NVIDIA GPU and weights stay on CPU, matching the Rust side.
For apples-to-apples timing with the Rust single-forward path, Python uses **batch_size=1**: batch_extract_entities([text], …, batch_size=1) on the entity harness and batch_extract([text], schema, batch_size=1, …) on the multitask harness (instead of relying on extract / extract_entities defaults).
Reading ratios: for infer times, python/candle is (python infer_ms) / (rust Candle infer_ms) per case or for the total line. Values below 1 mean Python spent less time on that measure for these fixtures; above 1 mean Python was slower. When tch-rs is included (default for run_compare_all.sh, or GLINER2_BENCH_TCH=1 for run_all.sh / run_multitask.sh alone), compare.py / compare_mt.py also print tch/candle and python/tch. The per-case entity table below lists both python/candle and python/tch (the latter is (python infer_ms) / (rust tch-rs infer_ms)). In the auto-generated tables from patch_readme.py, bold marks the lowest time in milliseconds in that row (load, sum, per-case, or throughput lane) and the highest samples/s in throughput rows; ties are all bolded.
CPU vs CPU (recorded)
Model: fastino/gliner2-base-v1. Recorded: 2026-04-05 (Linux x86_64, local run; numbers vary by machine and load). tch-rs infer_ms: LibTorch encoder path with download-libtorch + prepend_libtorch_ld_path.sh; see caveat above on NER outputs vs Candle.
Entity harness (harness/fixtures.json) — metadata and per-case infer times:
| Rust (Candle) | Rust (tch-rs) | Python | |
|---|---|---|---|
device_note |
cpu |
cpu_libtorch |
cpu |
load_model_ms |
262.0 | 1072.3 | 3422.5 |
Sum of infer_ms over cases |
378.1 | 151.7 | 249.0 |
| Ratios (total infer) | — | tch/cnd 0.40× | py/cnd 0.66×; py/tch 1.64× |
† Expected device label for tch-rs harness JSON when LibTorch is used (run_compare_all.sh enables this by default; otherwise set GLINER2_BENCH_TCH=1).
| Case id | Candle infer_ms |
tch-rs infer_ms |
python infer_ms |
python/candle |
python/tch |
|---|---|---|---|---|---|
microsoft_windows |
86.8 | 35.9 | 50.2 | 0.58× | 1.40× |
steve_jobs |
94.5 | 38.8 | 72.6 | 0.77× | 1.87× |
sundar_pichai |
96.3 | 36.9 | 59.7 | 0.62× | 1.62× |
tim_cook_iphone |
100.6 | 40.0 | 66.4 | 0.66× | 1.66× |
Multitask harness (harness/fixtures_multitask.json) — single fixture entities_plus_sentiment:
| Rust (Candle) | Rust (tch-rs) | Python | |
|---|---|---|---|
device_note |
cpu |
cpu_libtorch |
cpu |
load_model_ms |
244.4 | 1091.5 | 3133.2 |
Sum of infer_ms |
100.2 | 42.4 | 82.5 |
| Ratios (total infer) | — | tch/cnd 0.42× | py/cnd 0.82×; py/tch 1.95× |
These are short-fixture timings. Update the tables when you change the model, fixtures, or harness code in a way that affects performance.
Throughput (local only; not in CI)
These benchmarks are not run in GitHub Actions (see .github/workflows/ci.yml). Run them on your machine when you need larger-sample timing.
The harness uses 64 samples by default, built by cycling texts from harness/fixtures.json. Every sample uses the same entity label list ["company", "person", "product", "location", "date"] so Rust batch_extract_entities and PyPI batch_extract_entities can process the full set. Sequential rows use 64× micro-batches of size 1 on both sides (Rust’s forward loop vs Python batch_extract_entities([t], …, batch_size=1)). Batched rows are timed at batch_size 8 and 64 (Rust --rust-batch-size and Python batch_extract_entities with the same batch sizes).
Optional: bash harness/run_throughput.sh [fixtures.json] [rust_seq_out.json] [rust_batch_8_out.json] [rust_batch_64_out.json] [samples] [python_out.json]. The script runs harness/compare_throughput.py on the JSON outputs (sequential + batched batch sizes 8 and 64).
Rust JSON includes a backend field (candle or tch). For LibTorch encoder timing only, set GLINER2_THROUGHPUT_BACKEND=tch (builds with tch-backend,download-libtorch). For both Rust backends plus Python in one run, use GLINER2_BENCH_TCH=1 bash harness/run_throughput.sh. You can also pass --backend candle|tch directly to harness_throughput.
Recorded: 2026-04-05 (Linux x86_64, local run, CPU, CUDA_VISIBLE_DEVICES= + --device cpu on Python). warmup_full_passes=8 over all samples before each timed pass. harness/compare_throughput.py prints Candle vs tch vs Python (ratios: py/cnd, tch/cnd, py/tch).
Batched Rust runs use Rayon for parallel preprocessing and per-record decode. The encoder forward pass is a single batched tensor op; parallelism applies to the CPU-bound work around it.
| Lane | Candle infer_ms |
Candle s/s | tch-rs infer_ms |
tch-rs s/s | Python infer_ms |
Python s/s | py/candle | py/tch |
|---|---|---|---|---|---|---|---|---|
Sequential (batch_size 1) |
5702 | 11.22 | 3037 | 21.07 | 3475 | 18.42 | 0.61× | 1.14× |
Batched (batch_size 8) |
3299 | 19.40 | 1395 | 45.87 | 1618 | 39.56 | 0.49× | 1.16× |
Batched (batch_size 64) |
2801 | 22.85 | 1263 | 50.66 | 1237 | 51.75 | 0.44× | 0.98× |
Load times: Candle ~243 ms; tch ~1089 ms; Python ~2190 ms.
Notes:
- tch-rs is consistently faster than Python (~3–13% at batch_size 8–64). Both use LibTorch; tch-rs avoids Python interpreter overhead.
- Candle is ~4–5× slower than Python on batched workloads (py/candle 0.22–0.26×). Candle's pure-Rust GEMM is the bottleneck. Rayon parallelism gives ~2× within Candle (with
RAYON_NUM_THREADS=1, batched drops to 5.27 s/s). py/candleandpy/tchare time ratios:(Python infer_ms) / (Rust infer_ms). Values below 1 mean Python was faster; above 1 mean Rust was faster.
Re-run bash harness/run_throughput.sh for Candle-only Rust, or GLINER2_BENCH_TCH=1 bash harness/run_throughput.sh to refresh all three lanes (bundled LibTorch via download-libtorch).
GPU vs GPU (not recorded yet)
Fair comparison needs both implementations on the same device class (for example CUDA on the PyPI side and a GPU inference path in the Rust harness). That pairing is not wired into the harness yet, so no GPU numbers are published here.
| Rust | Python | |
|---|---|---|
| Device | — | — |
load_model_ms |
— | — |
Total infer_ms |
— | — |
python/rust |
— | — |
Usage
Like the Python implementation, this crate supports a full extraction API. You load the model once, build a SchemaTransformer from the tokenizer, then call CandleExtractor (or TchExtractor) methods.
Setup (load model + tokenizer)
use Result;
use ;
use ;
Entity extraction (extract_entities)
Same idea as Python extract_entities: pass label names; the returned serde_json::Value uses the formatted shape (entities → label → list of strings, when include_spans / include_confidence are false).
use ExtractOptions;
use json;
let = load_extractor?;
let text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino.";
let entity_types = vec!;
let opts = default;
let out = extractor.extract_entities?;
// e.g. {"entities":{"company":["Apple"],"person":["Tim Cook"], ...}}
// Optional: character spans + confidence (richer JSON, closer to Python with flags on)
let opts_rich = ExtractOptions ;
let _out = extractor.extract_entities?;
Text classification (classify_text)
One classification task per call. labels is a JSON array of class names, or an object mapping label → description (like Python).
use ExtractOptions;
use json;
let = load_extractor?;
let text = "The new phone is amazing and well worth the price.";
// Single-label: scalar string under the task name when format_results is true
let opts = default;
let out = extractor.classify_text?;
// e.g. {"sentiment":"positive"}
// Labels with optional descriptions (mirrors Python dict form)
let out2 = extractor.classify_text?;
Relation extraction (extract_relations)
Pass relation names as a JSON array of strings, or a JSON object (name → description / config), matching Python relations(...).
use ExtractOptions;
use json;
let = load_extractor?;
let text = "Tim Cook works for Apple, based in Cupertino.";
let opts = default;
// List of relation types → formatted results under "relation_extraction"
let out = extractor.extract_relations?;
// e.g. {"relation_extraction":{"works_for":[["Tim Cook","Apple"]],"located_in":[["Apple","Cupertino"]]}}
// Dict form (descriptions stored like Python; inference uses relation names)
let _out2 = extractor.extract_relations?;
Structured JSON (extract_json)
Field specs use the same string syntax as Python extract_json (name::dtype::[choices]::description).
use ExtractOptions;
use json;
let = load_extractor?;
let text = "iPhone 15 Pro costs $999 and is in stock.";
let structures = json!;
let out = extractor.extract_json?;
Multi-task builder (create_schema + extract)
Combines entities, classifications, relations, and structured fields in one encoder pass. Uses the same (extractor, transformer) and text as in the setup section.
use ;
use json;
let mut s = create_schema;
s.entities;
s.classification_simple;
s.classification_simple;
s.relations;
let = s.build;
let opts = default;
let out = extractor.extract?;
Batch inference
The crate mirrors Python’s batched entry points: records are preprocessed in parallel (Rayon), padded into chunks of at most ExtractOptions::batch_size (default 8), the encoder runs once per chunk, span representations are computed with **compute_span_rep_batched** when needed, then each row is decoded in parallel (Rayon). Results are returned in input order. Set RAYON_NUM_THREADS to control the thread pool size.
Set batch_size on ExtractOptions for any batch method (it only affects chunking, not single-sample extract_* calls).
Shared schema (one schema for every text)
Use the CandleExtractor helpers; they build the same schema as the single-sample methods and call batch_extract internally.
use ExtractOptions;
use json;
let = load_extractor?;
let texts: = vec!;
let entity_types: =
.into_iter
.map
.collect;
let mut opts = default;
opts.batch_size = 16;
let results = extractor.batch_extract_entities?;
// Vec<serde_json::Value>, one formatted result per input line
let cls = extractor.batch_classify_text?;
let rels = extractor.batch_extract_relations?;
let structures = json!;
let json_results = extractor.batch_extract_json?;
Full schema + metadata (batch_extract)
For the same multitask flow as [extract](#multi-task-builder-create_schema--extract), build (schema_val, meta) once and run **batch_extract** with **BatchSchemaMode::Shared**, or pass per-row schemas and metadata with **BatchSchemaMode::PerSample**.
use ;
use infer_metadata_from_schema;
use ;
let = load_extractor?;
let texts: = vec!;
// Option A — shared multitask schema from the builder
let mut s = create_schema;
s.entities;
s.classification_simple;
let = s.build;
let opts = ExtractOptions ;
let out_shared = batch_extract?;
// Option B — per-text JSON schemas (e.g. from config); metadata from infer_metadata_from_schema
let schema_a: Value = json!;
let schema_b: Value = json!;
let schemas = vec!;
let metas = vec!;
let out_per = batch_extract?;
For a shared schema you can also call **extractor.batch_extract(&transformer, &texts, &schema_val, &meta, &opts)** instead of the free function.
Lower-level reuse: after **transform_extract** you can run **extract_from_preprocessed** on one sample if you already have encoder outputs and span tensors; see [src/extract.rs](src/extract.rs).
Development
Pre-commit
Git hooks run the same Rust checks as CI (cargo fmt, cargo clippy on the workspace) plus Ruff on first-party Python (for example under harness/). Paths under reference/ and .tickets/ are excluded from hooks.
Prerequisites: stable Rust with rustfmt and clippy (for example rustup component add rustfmt clippy).
Install pre-commit (either is fine):
From the repository root, install the hooks once:
Optionally validate the whole tree:
If you must commit before fixing Clippy, you can skip that hook: SKIP=cargo-clippy git commit (use sparingly; CI still enforces warnings as errors).
CLI specification
The command-line interface gliner2 offers another way to run for a handful or input types.
Install the binary with cargo install gliner2. Inference flags mirror ExtractOptions (threshold, format_results, include_confidence, include_spans, max_len).
Command overview
flowchart LR
subgraph sub [Subcommands]
entities[entities]
classify[classify]
relations[relations]
jsonCmd[json]
run[run]
end
gliner2[gliner2] --> entities
gliner2 --> classify
gliner2 --> relations
gliner2 --> jsonCmd
gliner2 --> run
| Subcommand | Purpose | Library analogue |
|---|---|---|
gliner2 entities |
Named-entity extraction | CandleExtractor::extract_entities, Schema::entities |
gliner2 classify |
Text classification (single- or multi-label) | CandleExtractor::classify_text, Schema::classification |
gliner2 relations |
Relation extraction | CandleExtractor::extract_relations, Schema::relations |
gliner2 json |
Structured JSON / field extraction | CandleExtractor::extract_json, Schema::extract_json_structures |
gliner2 run |
Multitask: full engine schema in one pass | CandleExtractor::extract |
Top-level: gliner2 --help, gliner2 --version, and gliner2 <subcommand> --help.
Global options
These apply to every subcommand unless stated otherwise.
| Flag | Description |
|---|---|
--model <HF_REPO_ID> |
Hugging Face model id (default: fastino/gliner2-base-v1, same as harness/ scripts). |
--model-dir <DIR> |
Offline layout: config.json, encoder_config/config.json, tokenizer.json, model.safetensors (matches ModelFiles from download_model). |
--config, --encoder-config, --tokenizer, --weights |
Explicit paths instead of --model / --model-dir. |
-q, -v / --log-level |
Quiet / verbose logging (exact mapping is implementation-defined). |
Use either Hub resolution (--model) or a local layout (--model-dir or explicit file flags), not a conflicting mix; if both are given, the implementation should reject the invocation with a clear error.
Device and dtype are intentionally unspecified here until the library exposes them; do not document GPU flags until they exist.
Shared inference flags
| Flag | Maps to | Default |
|---|---|---|
--threshold <float> |
ExtractOptions::threshold |
0.5 |
--max-len <N> |
ExtractOptions::max_len |
unset |
--include-confidence |
include_confidence |
off |
--include-spans |
include_spans |
off |
--raw / --no-format-results |
format_results = false |
formatted output (true) |
Batching
The library implements tensor batch inference (CandleExtractor::batch_extract*, ExtractOptions::batch_size); see Batch inference above. The CLI is not implemented yet; the contract below assumes the binary will drive those batched APIs for any input that produces more than one logical record (for example multi-line JSONL or plain text with --text-split line and multiple non-empty lines).
| Flag | Description |
|---|---|
--batch-size <N> |
Maximum records per model batch. Default: 8 (implementation may choose a lower value on constrained devices, but must document any deviation). |
--batch-size 1 |
Effectively sequential inference (debugging, peak memory limits, or until batched paths are stable). |
Single-record inputs (one JSONL line, one JSON object, or --text-split full over an entire file) form a single batch of size 1.
Ordering: Output lines must follow the same order as input records, even when flushing internal batches.
Input and output
Input: final positional argument INPUT, or - for stdin.
| Flag | Description |
|---|---|
--text-field <KEY> |
Field containing document text in JSON / JSONL records (default: text). |
--id-field <KEY> |
Field to pass through as record id when present (default: id). |
--text-split <MODE> |
Plain text: full (whole file) or line (one record per non-empty line). sentence / char-chunk reserved. Default: full. |
| Format | Detection / notes |
|---|---|
| JSONL | One JSON object per line. Text from --text-field (default: text). If the input object contains the id key named by --id-field (default: id), copy that field through to the output object. |
| JSON | A single object using the same field convention. For many records, use JSONL or preprocess (for example with jq). |
| Plain text | Controlled by --text-split: full (default for .txt) — entire file is one record; line — each non-empty line is one record (multiple lines ⇒ batching). **sentence and char-chunk** are reserved for a future release (segmentation semantics TBD). |
Output: JSONL to stdout by default. --output <PATH> / -o <PATH> (use - for stdout). Optional --pretty: pretty-printed JSON when the implementation can buffer a single record or full result (for example one JSON object input or explicit single-line mode).
Format inference: From INPUT’s path suffix when possible: .jsonl → JSONL, .json → single JSON object, .txt (or other) → plain text with --text-split. For stdin (-), default input format is JSONL (one object per line).
Output record shape
Each output line is one JSON object, for example:
result matches Python / Rust **format_results** output for the task mix (entities, relation_extraction, classification keys, structured parents, etc.), consistent with the harness direction in harness/compare.py and multitask fixtures. If the input record has no id, omit id from the output (or use null; implementations should pick one behavior and document it).
Subcommands
gliner2 entities
| Flag | Description |
|---|---|
--label <NAME> |
Repeatable entity type name. |
--labels-json <PATH> |
JSON array of names or object form accepted by Schema::entities (name → description string or { "description", "dtype", "threshold" }). |
Precedence: If any --label is given and --labels-json is given, exit with a usage error (do not merge).
gliner2 classify
| Flag | Description |
|---|---|
--task <NAME> |
Required classification task name (JSON key in formatted output). |
--label <NAME> |
Repeatable class label. |
--labels-json <PATH> |
Array of labels or object label → description (Python-style). |
--multi-label |
Multi-label classification (Schema::classification with multi_label: true). |
--cls-threshold <float> |
Per-task classifier threshold (default 0.5). |
Same rule: do not combine --label with --labels-json.
gliner2 relations
| Flag | Description |
|---|---|
--relation <NAME> |
Repeatable relation type name. |
--relations-json <PATH> |
JSON array of names or object form accepted by Schema::relations. |
Do not pass both repeatable --relation and --relations-json.
gliner2 json
| Flag | Description |
|---|---|
--structures <PATH> |
JSON file: object mapping structure name → array of field specs. |
--structures-json '<OBJECT>' |
Same object inline. |
Field specs use the same grammar as Structured JSON (extract_json) above: strings like name::dtype::[choices]::description or JSON objects parsed by parse_field_spec. Do not pass both --structures and --structures-json.
gliner2 run
| Flag | Description |
|---|---|
--schema-file <PATH> |
Required. Full engine multitask schema (same shape as Python GLiNER2.extract(text, schema)). See harness/fixtures_multitask.json for a minimal example: entities, classifications, relations, json_structures, optional entity_descriptions / json_descriptions. |
Each entry in classifications should include "true_label": ["N/A"] when mirroring Python; the harness script harness/run_multitask_python.py sets this if missing.
Environment
**HF_TOKEN** — access to private or gated Hub models.- Cache and offline behavior follow Hugging Face Hub environment variables (
HF_HOME, etc.); see upstream docs for the full list.
Exit codes
- 0 — success.
- Non-zero — usage errors, I/O failures, model load failures, or inference errors.
Examples
# Entities: JSONL in → JSONL out (multi-record; default --batch-size 8 unless overridden)
# Classify with labels from a file (JSONL input)
# Relations
# Structured JSON (structures file matches extract_json object shape)
# Multitask: JSONL file, custom text field
Minimal multitask schema file (trimmed from fixtures):
Python Interface (Not implemented yet)
A Python package that wraps this Rust implementation (gliner2_rs) is planned if we can get rust performance to be better than Python; it is not implemented yet (this section is a placeholder).
# use your package manager of choice
=
=
=
# {'entities': {'company': ['Apple'], 'person': ['Tim Cook'], 'product': ['iPhone 15'], 'location': ['Cupertino']}}