Skip to main content

Crate siglip2_naflex

Crate siglip2_naflex 

Source
Expand description

SigLIP2 NaFlex

Rust ONNX inference library for SigLIP2 NaFlex (image + text embeddings).

A sibling of textclap (CLAP audio inference) and a downstream of scenesdetect (keyframe extraction).

github LoC Build codecov

docs.rs crates.io crates.io license

§Install

[dependencies]
siglip2-naflex = "0.1"

The package is siglip2-naflex on crates.io but the lib name is siglip2, so import sites are unchanged: use siglip2::*.

§Examples

Runnable examples live in examples/. Notable entry points:

  • embed_keyframes.rs — single-tower ImageEncoder over a directory of images.
  • index_and_search.rs — end-to-end retrieval using Siglip2 (bundled constructor, classify, top-K).
  • bench_ep.rs / bench_ep_text.rs — execution-provider latency microbenchmarks for vision and text.

§Parity-against-PyTorch testing

The parity-against-upstream-PyTorch tests live in tests/integration.rs and are gated on SIGLIP2_MODELS_DIR (the released ONNX graphs) — they remain #[ignore]-d in default cargo test runs because they need the release artifacts, but the golden fixtures (tests/fixtures/{images,embeddings}/*, text_prompts.json, text_embeddings.npy) are committed in-tree.

The CI workflow at .github/workflows/parity.yml is two-stage:

stagewhat it provesrequires
model-load-smokeRuntime can load the released ONNX + tokenizer, session shapes match contractFINDIT_INDEXER_TOKEN repo secret
parity-against-pytorchCosine-floor parity (≥ 0.99917) against the in-tree PyTorch referencesame secret

The smoke gate is not a parity gate. The parity gate is, and it runs on every push and PR in any environment that has the secret. Forks without the secret skip the whole workflow (forks are not expected to hold the release-repo PAT).

Fixtures are reproducible end-to-end via scripts/generate_synthetic_keyframes.py + scripts/generate_parity_fixtures.py; see tests/fixtures/README.md.

§Model files

The runtime expects the assets from Findit-AI/indexer release models-siglip2-naflex-v1. See models/MODELS.md for the download recipe.

§Cargo features

Defaults: ["inference", "bundled", "decoders"].

FeatureDefaultEffect
inferencePulls ort + tokenizers; activates ImageEncoder, TextEncoder, Siglip2. Native targets only.
bundledEmbeds the 32.8 MB text-tower tokenizer.json via include_bytes! so Siglip2::bundled / TextEncoder::bundled_with_options work without a tokenizer file on disk. Implies inference.
decodersActivates image crate JPEG/PNG decoders. Without this, callers supply pre-decoded RGB pixels via ImageView.
serdePulls serde + serde_json. Activates Serialize / Deserialize on Options, BatchOptions, ThreadOptions, LabeledScore (Serialize only), LabeledScoreOwned, plus Calibration::from_path / from_bytes and the Siglip2::from_files* constructors that load calibration.json. Embedding and Calibration deliberately do not derive serde. The bundled path (Siglip2::bundled / Calibration::bundled) does not need this feature; calibration is baked in at build time from models/siglip2/calibration.json.
cudaNVIDIA GPUs (Linux/Windows). Requires CUDA toolkit + cuDNN. Implies inference.
tensorrtNVIDIA, optimized inference. Falls back to CUDA, then CPU. Implies inference.
directmlWindows GPUs (any vendor) via DirectX 12. Implies inference.
rocmAMD GPUs (Linux). Requires ROCm SDK. Implies inference.
coremlmacOS / iOS via Core ML (Neural Engine + GPU + Metal). Implies inference.

The execution-provider features are off by default — none are required for CPU inference, and each requires its vendor SDK at build time. Building with --features cuda (etc.) will fail on stock CI runners that don’t have the SDK.

§Execution providers without a feature flag

If your deployment needs an EP that isn’t in the list above, build the session yourself with the relevant ort EP enabled and pass it via ImageEncoder::from_ort_session / TextEncoder::from_ort_session / Siglip2::from_parts. ANE-on-Mac is an example: it requires explicit opt-in via the coreml feature and Session::with_execution_providers on the caller side.

§Target / feature contract

The inference family is native-only. ort (ONNX Runtime FFI) and tokenizers (which transitively depends on onig_sys / esaxx-rs) don’t build on wasm32-*. Building wasm with default features fails deep in upstream C-toolchain code before this crate’s source is touched.

Wasm consumers must opt out:

cargo check --target wasm32-unknown-unknown --no-default-features

Without inference, the public surface is the preprocessing path (Preprocessor, ImageView, PreprocessedBatch), the value types (Embedding, Calibration, Options / BatchOptions / ThreadOptions, LabeledScoreOwned, Error), and the SIMD primitives — useful for browser / edge deployments that compute embeddings server-side and need only the value types and similarity / decoding / preprocessing on the client.

§License

MIT or Apache-2.0, at your option. The bundled tokenizer.json is derived from google/siglip2-base-patch16-naflex (Apache-2.0); see THIRD_PARTY_NOTICES.md.

Re-exports§

pub use calibration::Calibration;
pub use embedding::Embedding;
pub use embedding::LabeledScore;
pub use embedding::LabeledScoreOwned;
pub use error::Error;
pub use error::Result;
pub use image_enc::ImageEncoder;inference
pub use image_view::ImageView;
pub use options::BatchOptions;
pub use options::Options;
pub use options::ThreadOptions;
pub use preproc::PreprocessedBatch;
pub use preproc::Preprocessor;
pub use siglip2::Siglip2;inference
pub use text_enc::TextEncoder;inference

Modules§

calibration
Calibration — sigmoid scale/bias for SigLIP2’s calibrated probabilities.
embedding
Embedding, LabeledScore[Owned].
error
Error type for the full enum and its semantics.
image_encinference
Image encoder. ImageView lives in crate::image_view (always compiled, used by the preprocessor on both wasm and native); this module is gated on feature = "inference" and provides the ORT-backed ImageEncoder.
image_view
ImageView — borrowed RGB pixel buffer with validating constructor.
options
6 for the full surface and rationale (defaults match the existing findit-siglip2-vision service’s settings).
preproc
Preprocessing pipeline. For the algorithm for the public Preprocessor API.
siglip2inference
Siglip2 wrapper.
text_encinference
Text encoder. 4 and §5.

Enums§

GraphOptimizationLevelinference
ONNX Runtime provides various graph optimizations to improve performance. Graph optimizations are essentially graph-level transformations, ranging from small graph simplifications and node eliminations to more complex node fusions and layout optimizations.

Constants§

BUNDLED_TOKENIZERbundled
Raw bytes of the bundled google/siglip2-base-patch16-naflex tokenizer.json, embedded via include_bytes!. Used internally by the bundled constructors on TextEncoder and Siglip2; exposed publicly so callers who need to assemble a Tokenizer off the bundled JSON (for example, ahead of from_ort_session) can do so without round-tripping through disk.