gliner2 0.1.0

Rust implementation of GLiNER2 with compatibility for upstream weights and Python training output.
Documentation

Gliner2 Rust

This project implements the Gliner2 model in rust with compatibility to the original weights and output of the python training.

cargo add gliner2
# and/or for a cli utility
cargo install gliner2

Development

Pre-commit

Git hooks run the same Rust checks as CI (cargo fmt, cargo clippy on the workspace) plus Ruff on first-party Python (for example under harness/). Paths under reference/ and .tickets/ are excluded from hooks.

Prerequisites: stable Rust with rustfmt and clippy (for example rustup component add rustfmt clippy).

Install pre-commit (either is fine):

uv tool install pre-commit

From the repository root, install the hooks once:

pre-commit install

Optionally validate the whole tree:

pre-commit run --all-files

If you must commit before fixing Clippy, you can skip that hook: SKIP=cargo-clippy git commit (use sparingly; CI still enforces warnings as errors).

Recorded speed (comparison harness)

The [harness/](harness/) scripts run the same release Rust binaries (harness_compare, harness_compare_mt on CPU) against the PyPI gliner2 package. Timing fields are wall-clock milliseconds from a single process: load_model_ms is one-time load; infer_ms is per-fixture forward work (entity harness sums all cases for the total row).

Reproduce (CPU vs CPU): from the repo root, with Hugging Face access for the default model:

uv sync --locked --directory harness
bash harness/run_all.sh
bash harness/run_multitask.sh

The shell wrappers call Python with CUDA_VISIBLE_DEVICES= and --device cpu so PyTorch does not use a discrete NVIDIA GPU and weights stay on CPU, matching the Rust side.

For apples-to-apples timing with the Rust single-forward path, Python uses **batch_size=1**: batch_extract_entities([text], …, batch_size=1) on the entity harness and batch_extract([text], schema, batch_size=1, …) on the multitask harness (instead of relying on extract / extract_entities defaults).

Reading python/rust: for infer times this is (python infer_ms) / (rust infer_ms) per case or for the total line. Values below 1 mean Python spent less time on that measure for these fixtures; above 1 mean Python was slower.

CPU vs CPU (recorded)

Model: fastino/gliner2-base-v1. Recorded: 2026-04-05 (Linux x86_64, local run; numbers vary by machine and load).

Entity harness ([harness/fixtures.json](harness/fixtures.json)) — metadata and per-case infer times:

Rust Python
device_note cpu cpu
load_model_ms 445.0 3874.4
Sum of infer_ms over cases 569.9 358.7
python/rust (total infer) 0.629×
Case id rust infer_ms python infer_ms python/rust
steve_jobs 140.0 118.9 0.849×
tim_cook_iphone 151.1 85.6 0.566×
sundar_pichai 144.3 78.8 0.546×
microsoft_windows 134.4 75.4 0.561×

Multitask harness ([harness/fixtures_multitask.json](harness/fixtures_multitask.json)) — single fixture entities_plus_sentiment:

Rust Python
device_note cpu cpu
load_model_ms 409.5 4002.4
Sum of infer_ms 157.5 113.9
python/rust (total infer) 0.724×

These are short-fixture timings. Update the tables when you change the model, fixtures, or harness code in a way that affects performance.

Throughput (local only; not in CI)

These benchmarks are not run in GitHub Actions (see [.github/workflows/ci.yml](.github/workflows/ci.yml)). Run them on your machine when you need larger-sample timing.

The harness uses 64 samples by default, built by cycling texts from [harness/fixtures.json](harness/fixtures.json). Every sample uses the same entity label list ["company", "person", "product", "location", "date"] so Rust [batch_extract_entities](src/extract.rs) and PyPI batch_extract_entities can process the full set with **batch_size=64**. Sequential rows use 64× micro-batches of size 1 on both sides (Rust’s forward loop vs Python batch_extract_entities([t], …, batch_size=1)). Batched rows use one logical batch of 64 on each side.

uv sync --locked --directory harness
bash harness/run_throughput.sh

Optional: bash harness/run_throughput.sh [fixtures.json] [rust_seq_out.json] [rust_batch_out.json] [samples]. The script runs [harness/compare_throughput.py](harness/compare_throughput.py) on the three JSON outputs.

Recorded: 2026-04-05 (Linux x86_64, CPU, CUDA_VISIBLE_DEVICES= + --device cpu on Python). warmup_full_passes=2 over all samples before each timed pass.

Lane total_infer_ms (64 samples) samples/s python/rust (infer)
Rust sequential (batch_size 1) 8813.2 7.26
Python sequential (batch_size 1) 4843.0 13.22 0.550×
Rust batched (batch_size 64) 6794.2 9.42
Python batched (batch_size 64) 1650.6 38.78 0.243×

Load times from that run: Rust sequential ~492 ms, Rust batched ~467 ms, Python ~2613 ms.

Re-run bash harness/run_throughput.sh to refresh; the script prints the same layout via [harness/compare_throughput.py](harness/compare_throughput.py).

GPU vs GPU (not recorded yet)

Fair comparison needs both implementations on the same device class (for example CUDA on the PyPI side and a GPU inference path in the Rust harness). That pairing is not wired into the harness yet, so no GPU numbers are published here.

Rust Python
Device
load_model_ms
Total infer_ms
python/rust

Usage

Like the Python implementation, this crate supports a full extraction API. You load the model once, build a SchemaTransformer from the tokenizer, then call Extractor methods.

Setup (load model + tokenizer)

use anyhow::Result;
use candle_core::Device;
use candle_nn::VarBuilder;
use candle_transformers::models::debertav2::Config as DebertaConfig;
use gliner2::config::{download_model, ExtractorConfig};
use gliner2::{Extractor, SchemaTransformer};

fn load_extractor(model_id: &str) -> Result<(Extractor, SchemaTransformer)> {
    let files = download_model(model_id)?;
    let device = Device::Cpu;
    let dtype = candle_core::DType::F32;

    let config: ExtractorConfig = serde_json::from_str(&std::fs::read_to_string(&files.config)?)?;
    let mut encoder_config: DebertaConfig =
        serde_json::from_str(&std::fs::read_to_string(&files.encoder_config)?)?;
    let transformer = SchemaTransformer::new(files.tokenizer.to_str().unwrap())?;
    encoder_config.vocab_size = transformer.tokenizer.get_vocab_size(true);

    let vb = unsafe { VarBuilder::from_mmaped_safetensors(&[files.weights], dtype, &device)? };
    let extractor = Extractor::load(config, encoder_config, vb)?;
    Ok((extractor, transformer))
}

Entity extraction (extract_entities)

Same idea as Python extract_entities: pass label names; the returned serde_json::Value uses the formatted shape (entities → label → list of strings, when include_spans / include_confidence are false).

use gliner2::ExtractOptions;
use serde_json::json;

let (extractor, transformer) = load_extractor("fastino/gliner2-base-v1")?;
let text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino.";

let entity_types = vec![
    "company".to_string(),
    "person".to_string(),
    "product".to_string(),
    "location".to_string(),
];

let opts = ExtractOptions::default();
let out = extractor.extract_entities(&transformer, text, &entity_types, &opts)?;
// e.g. {"entities":{"company":["Apple"],"person":["Tim Cook"], ...}}

// Optional: character spans + confidence (richer JSON, closer to Python with flags on)
let opts_rich = ExtractOptions {
    include_confidence: true,
    include_spans: true,
    ..Default::default()
};
let _out = extractor.extract_entities(&transformer, text, &entity_types, &opts_rich)?;

Text classification (classify_text)

One classification task per call. labels is a JSON array of class names, or an object mapping label → description (like Python).

use gliner2::ExtractOptions;
use serde_json::json;

let (extractor, transformer) = load_extractor("fastino/gliner2-base-v1")?;
let text = "The new phone is amazing and well worth the price.";

// Single-label: scalar string under the task name when format_results is true
let opts = ExtractOptions::default();
let out = extractor.classify_text(
    &transformer,
    text,
    "sentiment",
    json!(["positive", "negative", "neutral"]),
    &opts,
)?;
// e.g. {"sentiment":"positive"}

// Labels with optional descriptions (mirrors Python dict form)
let out2 = extractor.classify_text(
    &transformer,
    text,
    "topic",
    json!({
        "technology": "Tech products and software",
        "business": "Corporate or market news",
        "sports": "Athletics and games"
    }),
    &opts,
)?;

Relation extraction (extract_relations)

Pass relation names as a JSON array of strings, or a JSON object (name → description / config), matching Python relations(...).

use gliner2::ExtractOptions;
use serde_json::json;

let (extractor, transformer) = load_extractor("fastino/gliner2-base-v1")?;
let text = "Tim Cook works for Apple, based in Cupertino.";

let opts = ExtractOptions::default();

// List of relation types → formatted results under "relation_extraction"
let out = extractor.extract_relations(
    &transformer,
    text,
    json!(["works_for", "located_in"]),
    &opts,
)?;
// e.g. {"relation_extraction":{"works_for":[["Tim Cook","Apple"]],"located_in":[["Apple","Cupertino"]]}}

// Dict form (descriptions stored like Python; inference uses relation names)
let _out2 = extractor.extract_relations(
    &transformer,
    text,
    json!({
        "works_for": "Employment between person and organization",
        "founded": "Founder relationship"
    }),
    &opts,
)?;

Structured JSON (extract_json)

Field specs use the same string syntax as Python extract_json (name::dtype::[choices]::description).

use gliner2::ExtractOptions;
use serde_json::json;

let (extractor, transformer) = load_extractor("fastino/gliner2-base-v1")?;
let text = "iPhone 15 Pro costs $999 and is in stock.";

let structures = json!({
    "product_info": [
        "name::str",
        "price::str",
        "features::list",
        "availability::str::[in_stock|pre_order|sold_out]"
    ]
});
let out = extractor.extract_json(
    &transformer,
    text,
    &structures,
    &ExtractOptions::default(),
)?;

Multi-task builder (create_schema + extract)

Combines entities, classifications, relations, and structured fields in one encoder pass. Uses the same (extractor, transformer) and text as in the setup section.

use gliner2::{
    create_schema, ExtractOptions, Extractor, SchemaTransformer, ValueDtype,
};
use serde_json::json;

let mut s = create_schema();
s.entities(json!({
    "person": "Names of people",
    "company": "Organization names",
    "product": "Products or offerings",
}));
s.classification_simple("sentiment", json!(["positive", "negative", "neutral"]));
s.classification_simple("category", json!(["technology", "business", "finance", "healthcare"]));
s.relations(json!(["works_for", "founded", "located_in"]));
{
    let _ = s.structure("product_info")
        .field_str("name")
        .field_str("price")
        .field_list("features")
        .field_choices(
            "availability",
            vec![
                "in_stock".into(),
                "pre_order".into(),
                "sold_out".into(),
            ],
            ValueDtype::Str,
        );
}
let (schema_val, meta) = s.build();
let opts = ExtractOptions::default();
let out = extractor.extract(&transformer, text, &schema_val, &meta, &opts)?;

Batch inference

The crate mirrors Python’s batched entry points: records are preprocessed, padded into chunks of at most ExtractOptions::batch_size (default 8), the encoder runs once per chunk, span representations are computed with compute_span_rep_batched when needed, then each row is decoded. Results are returned in input order.

Set batch_size on ExtractOptions for any batch method (it only affects chunking, not single-sample extract_* calls).

Shared schema (one schema for every text)

Use the Extractor helpers; they build the same schema as the single-sample methods and call batch_extract internally.

use gliner2::ExtractOptions;
use serde_json::json;

let (extractor, transformer) = load_extractor("fastino/gliner2-base-v1")?;
let texts: Vec<String> = vec![
    "Apple CEO Tim Cook announced iPhone 15.".into(),
    "Google unveiled Gemini in Mountain View.".into(),
];

let entity_types: Vec<String> = ["company", "person", "product", "location"]
    .into_iter()
    .map(String::from)
    .collect();

let mut opts = ExtractOptions::default();
opts.batch_size = 16;

let results = extractor.batch_extract_entities(&transformer, &texts, &entity_types, &opts)?;
// Vec<serde_json::Value>, one formatted result per input line

let cls = extractor.batch_classify_text(
    &transformer,
    &texts,
    "sentiment",
    json!(["positive", "negative", "neutral"]),
    &opts,
)?;

let rels = extractor.batch_extract_relations(
    &transformer,
    &texts,
    json!(["works_for", "located_in"]),
    &opts,
)?;

let structures = json!({
    "product_info": ["name::str", "price::str"]
});
let json_results = extractor.batch_extract_json(&transformer, &texts, &structures, &opts)?;

Full schema + metadata (batch_extract)

For the same multitask flow as extract, build (schema_val, meta) once and run batch_extract with BatchSchemaMode::Shared, or pass per-row schemas and metadata with BatchSchemaMode::PerSample.

use gliner2::{batch_extract, create_schema, BatchSchemaMode, ExtractOptions};
use gliner2::schema::infer_metadata_from_schema;
use serde_json::{json, Value};

let (extractor, transformer) = load_extractor("fastino/gliner2-base-v1")?;
let texts: Vec<String> = vec!["First document.".into(), "Second document.".into()];

// Option A — shared multitask schema from the builder
let mut s = create_schema();
s.entities(json!({ "company": "", "person": "" }));
s.classification_simple("sentiment", json!(["positive", "negative", "neutral"]));
let (schema_val, meta) = s.build();

let opts = ExtractOptions {
    batch_size: 8,
    ..Default::default()
};

let out_shared = batch_extract(
    &extractor,
    &transformer,
    &texts,
    BatchSchemaMode::Shared {
        schema: &schema_val,
        meta: &meta,
    },
    &opts,
)?;

// Option B — per-text JSON schemas (e.g. from config); metadata from infer_metadata_from_schema
let schema_a: Value = json!({ "entities": { "person": "" } });
let schema_b: Value = json!({ "entities": { "location": "" } });
let schemas = vec![schema_a.clone(), schema_b.clone()];
let metas = vec![
    infer_metadata_from_schema(&schema_a),
    infer_metadata_from_schema(&schema_b),
];

let out_per = batch_extract(
    &extractor,
    &transformer,
    &texts,
    BatchSchemaMode::PerSample {
        schemas: &schemas,
        metas: &metas,
    },
    &opts,
)?;

For a shared schema you can also call extractor.batch_extract(&transformer, &texts, &schema_val, &meta, &opts) instead of the free function.

Lower-level reuse: after transform_extract you can run extract_from_preprocessed on one sample if you already have encoder outputs and span tensors; see src/extract.rs.

CLI specification

The command-line interface is not implemented yet. This section specifies the intended gliner2 binary (see default-run in Cargo.toml) so future work can match the library API and Python GLiNER2 behavior.

Install the binary with cargo install gliner2. Inference flags mirror [ExtractOptions](src/extract.rs) (threshold, format_results, include_confidence, include_spans, max_len).

Command overview

flowchart LR
  subgraph sub [Subcommands]
    entities[entities]
    classify[classify]
    relations[relations]
    jsonCmd[json]
    run[run]
  end
  gliner2[gliner2] --> entities
  gliner2 --> classify
  gliner2 --> relations
  gliner2 --> jsonCmd
  gliner2 --> run
Subcommand Purpose Library analogue
gliner2 entities Named-entity extraction Extractor::extract_entities, Schema::entities
gliner2 classify Text classification (single- or multi-label) Extractor::classify_text, Schema::classification
gliner2 relations Relation extraction Extractor::extract_relations, Schema::relations
gliner2 json Structured JSON / field extraction Extractor::extract_json, Schema::extract_json_structures
gliner2 run Multitask: full engine schema in one pass Extractor::extract

Top-level: gliner2 --help, gliner2 --version, and gliner2 <subcommand> --help.

Global options

These apply to every subcommand unless stated otherwise.

Flag Description
--model <HF_REPO_ID> Hugging Face model id (default: fastino/gliner2-base-v1, same as harness/ scripts).
--model-dir <DIR> Offline layout: config.json, encoder_config/config.json, tokenizer.json, model.safetensors (matches ModelFiles from [download_model](src/config.rs)).
--config, --encoder-config, --tokenizer, --weights Explicit paths instead of --model / --model-dir.
-q, -v / --log-level Quiet / verbose logging (exact mapping is implementation-defined).

Use either Hub resolution (--model) or a local layout (--model-dir or explicit file flags), not a conflicting mix; if both are given, the implementation should reject the invocation with a clear error.

Device and dtype are intentionally unspecified here until the library exposes them; do not document GPU flags until they exist.

Shared inference flags

Flag Maps to Default
--threshold <float> ExtractOptions::threshold 0.5
--max-len <N> ExtractOptions::max_len unset
--include-confidence include_confidence off
--include-spans include_spans off
--raw / --no-format-results format_results = false formatted output (true)

Batching

The library implements tensor batch inference (Extractor::batch_extract*, ExtractOptions::batch_size); see Batch inference above. The CLI is not implemented yet; the contract below assumes the binary will drive those batched APIs for any input that produces more than one logical record (for example multi-line JSONL or plain text with --text-split line and multiple non-empty lines).

Flag Description
--batch-size <N> Maximum records per model batch. Default: 8 (implementation may choose a lower value on constrained devices, but must document any deviation).
--batch-size 1 Effectively sequential inference (debugging, peak memory limits, or until batched paths are stable).

Single-record inputs (one JSONL line, one JSON object, or --text-split full over an entire file) form a single batch of size 1.

Ordering: Output lines must follow the same order as input records, even when flushing internal batches.

Input and output

Input: final positional argument INPUT, or - for stdin.

Flag Description
--text-field <KEY> Field containing document text in JSON / JSONL records (default: text).
--id-field <KEY> Field to pass through as record id when present (default: id).
--text-split <MODE> Plain text: full (whole file) or line (one record per non-empty line). sentence / char-chunk reserved. Default: full.
Format Detection / notes
JSONL One JSON object per line. Text from --text-field (default: text). If the input object contains the id key named by --id-field (default: id), copy that field through to the output object.
JSON A single object using the same field convention. For many records, use JSONL or preprocess (for example with jq).
Plain text Controlled by --text-split: full (default for .txt) — entire file is one record; line — each non-empty line is one record (multiple lines ⇒ batching). **sentence and char-chunk** are reserved for a future release (segmentation semantics TBD).

Output: JSONL to stdout by default. --output <PATH> / -o <PATH> (use - for stdout). Optional --pretty: pretty-printed JSON when the implementation can buffer a single record or full result (for example one JSON object input or explicit single-line mode).

Format inference: From INPUT’s path suffix when possible: .jsonl → JSONL, .json → single JSON object, .txt (or other) → plain text with --text-split. For stdin (-), default input format is JSONL (one object per line).

Output record shape

Each output line is one JSON object, for example:

{"id":"optional","text":"...","result":{ }}

result matches Python / Rust **format_results** output for the task mix (entities, relation_extraction, classification keys, structured parents, etc.), consistent with the harness direction in harness/compare.py and multitask fixtures. If the input record has no id, omit id from the output (or use null; implementations should pick one behavior and document it).

Subcommands

gliner2 entities

Flag Description
--label <NAME> Repeatable entity type name.
--labels-json <PATH> JSON array of names or object form accepted by Schema::entities (name → description string or { "description", "dtype", "threshold" }).

Precedence: If any --label is given and --labels-json is given, exit with a usage error (do not merge).

gliner2 classify

Flag Description
--task <NAME> Required classification task name (JSON key in formatted output).
--label <NAME> Repeatable class label.
--labels-json <PATH> Array of labels or object label → description (Python-style).
--multi-label Multi-label classification (Schema::classification with multi_label: true).
--cls-threshold <float> Per-task classifier threshold (default 0.5).

Same rule: do not combine --label with --labels-json.

gliner2 relations

Flag Description
--relation <NAME> Repeatable relation type name.
--relations-json <PATH> JSON array of names or object form accepted by Schema::relations.

Do not pass both repeatable --relation and --relations-json.

gliner2 json

Flag Description
--structures <PATH> JSON file: object mapping structure name → array of field specs.
--structures-json '<OBJECT>' Same object inline.

Field specs use the same grammar as Structured JSON (extract_json) above: strings like name::dtype::[choices]::description or JSON objects parsed by [parse_field_spec](src/schema.rs). Do not pass both --structures and --structures-json.

gliner2 run

Flag Description
--schema-file <PATH> Required. Full engine multitask schema (same shape as Python GLiNER2.extract(text, schema)). See [harness/fixtures_multitask.json](harness/fixtures_multitask.json) for a minimal example: entities, classifications, relations, json_structures, optional entity_descriptions / json_descriptions.

Each entry in classifications should include "true_label": ["N/A"] when mirroring Python; the harness script [harness/run_multitask_python.py](harness/run_multitask_python.py) sets this if missing.

Environment

  • **HF_TOKEN** — access to private or gated Hub models.
  • Cache and offline behavior follow Hugging Face Hub environment variables (HF_HOME, etc.); see upstream docs for the full list.

Exit codes

  • 0 — success.
  • Non-zero — usage errors, I/O failures, model load failures, or inference errors.

Examples

# Entities: JSONL in → JSONL out (multi-record; default --batch-size 8 unless overridden)
gliner2 entities --label company --label person --batch-size 16 docs.jsonl --output out.jsonl

# Classify with labels from a file (JSONL input)
gliner2 classify --task sentiment --labels-json labels.json tweets.jsonl

# Relations
gliner2 relations --relation works_for --relation located_in article.txt

# Structured JSON (structures file matches extract_json object shape)
gliner2 json --structures product_fields.json --text-split full product_blurb.txt

# Multitask: JSONL file, custom text field
gliner2 run --schema-file schema.json --text-field body --batch-size 4 docs.jsonl

Minimal multitask schema file (trimmed from fixtures):

{
  "json_structures": [],
  "entities": { "company": "", "product": "" },
  "relations": [],
  "classifications": [
    {
      "task": "sentiment",
      "labels": ["positive", "negative", "neutral"],
      "multi_label": false,
      "cls_threshold": 0.5,
      "true_label": ["N/A"]
    }
  ]
}

Python Interface (Not implemented yet)

A Python package that wraps this Rust implementation (gliner2_rs) is planned if we can get rust performance to be better than Python; it is not implemented yet (this section is a placeholder).

# use your package manager of choice
uv add gliner2_rs
from gliner2_rs import Gliner2

gliner2 = Gliner2.from_pretrained('fastino/gliner2-base-v1')

text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday."
result = extractor.extract_entities(text, ["company", "person", "product", "location"])

print(result)
# {'entities': {'company': ['Apple'], 'person': ['Tim Cook'], 'product': ['iPhone 15'], 'location': ['Cupertino']}}