Crate siglip2_naflex

Expand description

SigLIP2 NaFlex

Rust ONNX inference library for SigLIP2 NaFlex (image + text embeddings).

A sibling of textclap (CLAP audio inference) and a downstream of scenesdetect (keyframe extraction).

LoC

§Install

[dependencies]
siglip2-naflex = "0.1"

The package is siglip2-naflex on crates.io but the lib name is siglip2, so import sites are unchanged: use siglip2::*.

§Examples

Runnable examples live in examples/. Notable entry points:

embed_keyframes.rs — single-tower ImageEncoder over a directory of images.
index_and_search.rs — end-to-end retrieval using Siglip2 (bundled constructor, classify, top-K).
bench_ep.rs / bench_ep_text.rs — execution-provider latency microbenchmarks for vision and text.

§Parity-against-PyTorch testing

The parity-against-upstream-PyTorch tests live in tests/integration.rs and are gated on SIGLIP2_MODELS_DIR (the released ONNX graphs) — they remain #[ignore]-d in default cargo test runs because they need the release artifacts, but the golden fixtures (tests/fixtures/{images,embeddings}/*, text_prompts.json, text_embeddings.npy) are committed in-tree.

The CI workflow at .github/workflows/parity.yml is two-stage:

stage	what it proves	requires
`model-load-smoke`	Runtime can load the released ONNX + tokenizer, session shapes match contract	`FINDIT_INDEXER_TOKEN` repo secret
`parity-against-pytorch`	Cosine-floor parity (≥ 0.99917) against the in-tree PyTorch reference	same secret

The smoke gate is not a parity gate. The parity gate is, and it runs on every push and PR in any environment that has the secret. Forks without the secret skip the whole workflow (forks are not expected to hold the release-repo PAT).

Fixtures are reproducible end-to-end via scripts/generate_synthetic_keyframes.py + scripts/generate_parity_fixtures.py; see tests/fixtures/README.md.

§Model files

The runtime expects the assets from Findit-AI/indexer release models-siglip2-naflex-v1. See models/MODELS.md for the download recipe.

§Cargo features

Defaults: ["inference", "bundled", "decoders"].

Feature	Default	Effect
`inference`	✅	Pulls `ort` + `tokenizers`; activates `ImageEncoder`, `TextEncoder`, `Siglip2`. Native targets only.
`bundled`	✅	Embeds the 32.8 MB text-tower `tokenizer.json` via `include_bytes!` so `Siglip2::bundled` / `TextEncoder::bundled_with_options` work without a tokenizer file on disk. Implies `inference`.
`decoders`	✅	Activates `image` crate JPEG/PNG decoders. Without this, callers supply pre-decoded RGB pixels via `ImageView`.
`serde`		Pulls `serde` + `serde_json`. Activates `Serialize` / `Deserialize` on `Options`, `BatchOptions`, `ThreadOptions`, `LabeledScore` (Serialize only), `LabeledScoreOwned`, plus `Calibration::from_path` / `from_bytes` and the `Siglip2::from_files` constructors that load `calibration.json`. `Embedding` and `Calibration` deliberately do not* derive serde. The bundled path (`Siglip2::bundled` / `Calibration::bundled`) does not need this feature; calibration is baked in at build time from `models/siglip2/calibration.json`.
`cuda`		NVIDIA GPUs (Linux/Windows). Requires CUDA toolkit + cuDNN. Implies `inference`.
`tensorrt`		NVIDIA, optimized inference. Falls back to CUDA, then CPU. Implies `inference`.
`directml`		Windows GPUs (any vendor) via DirectX 12. Implies `inference`.
`rocm`		AMD GPUs (Linux). Requires ROCm SDK. Implies `inference`.
`coreml`		macOS / iOS via Core ML (Neural Engine + GPU + Metal). Implies `inference`.

The execution-provider features are off by default — none are required for CPU inference, and each requires its vendor SDK at build time. Building with --features cuda (etc.) will fail on stock CI runners that don’t have the SDK.

§Execution providers without a feature flag

If your deployment needs an EP that isn’t in the list above, build the session yourself with the relevant ort EP enabled and pass it via ImageEncoder::from_ort_session / TextEncoder::from_ort_session / Siglip2::from_parts. ANE-on-Mac is an example: it requires explicit opt-in via the coreml feature and Session::with_execution_providers on the caller side.

§Target / feature contract

The inference family is native-only. ort (ONNX Runtime FFI) and tokenizers (which transitively depends on onig_sys / esaxx-rs) don’t build on wasm32-*. Building wasm with default features fails deep in upstream C-toolchain code before this crate’s source is touched.

Wasm consumers must opt out:

cargo check --target wasm32-unknown-unknown --no-default-features

Without inference, the public surface is the preprocessing path (Preprocessor, ImageView, PreprocessedBatch), the value types (Embedding, Calibration, Options / BatchOptions / ThreadOptions, LabeledScoreOwned, Error), and the SIMD primitives — useful for browser / edge deployments that compute embeddings server-side and need only the value types and similarity / decoding / preprocessing on the client.

§License

MIT or Apache-2.0, at your option. The bundled tokenizer.json is derived from google/siglip2-base-patch16-naflex (Apache-2.0); see THIRD_PARTY_NOTICES.md.

Re-exports§

pub use calibration::Calibration;
pub use embedding::Embedding;
pub use embedding::LabeledScore;
pub use embedding::LabeledScoreOwned;
pub use error::Error;
pub use error::Result;
pub use image_enc::ImageEncoder;inference
pub use image_view::ImageView;
pub use options::BatchOptions;
pub use options::Options;
pub use options::ThreadOptions;
pub use preproc::PreprocessedBatch;
pub use preproc::Preprocessor;
pub use siglip2::Siglip2;inference
pub use text_enc::TextEncoder;inference

Modules§

calibration: Calibration — sigmoid scale/bias for SigLIP2’s calibrated probabilities.
embedding: Embedding, LabeledScore[Owned].
error: Error type for the full enum and its semantics.
image_encinference: Image encoder. ImageView lives in crate::image_view (always compiled, used by the preprocessor on both wasm and native); this module is gated on feature = "inference" and provides the ORT-backed ImageEncoder.
image_view: ImageView — borrowed RGB pixel buffer with validating constructor.
options: 6 for the full surface and rationale (defaults match the existing findit-siglip2-vision service’s settings).
preproc: Preprocessing pipeline. For the algorithm for the public Preprocessor API.
siglip2inference: Siglip2 wrapper.
text_encinference: Text encoder. 4 and §5.

Enums§

GraphOptimizationLevelinference: ONNX Runtime provides various graph optimizations to improve performance. Graph optimizations are essentially graph-level transformations, ranging from small graph simplifications and node eliminations to more complex node fusions and layout optimizations.

Constants§

BUNDLED_TOKENIZERbundled: Raw bytes of the bundled google/siglip2-base-patch16-naflex tokenizer.json, embedded via include_bytes!. Used internally by the bundled constructors on TextEncoder and Siglip2; exposed publicly so callers who need to assemble a Tokenizer off the bundled JSON (for example, ahead of from_ort_session) can do so without round-tripping through disk.