Expand description
Flash-Rerank – Blazing-fast neural reranking engine.
Provides cross-encoder and ColBERT inference via ONNX Runtime with TensorRT, CUDA, and CPU execution providers.
Re-exports§
pub use types::CacheMetadata;pub use types::Device;pub use types::ModelConfig;pub use types::ModelFile;pub use types::ModelManifest;pub use types::Precision;pub use types::RerankConfig;pub use types::RerankRequest;pub use types::RerankResult;pub use types::ScorerType;
Modules§
- batch
- Dynamic request batching for GPU inference.
- calibrate
- cascade
- engine
- Inference backends for cross-encoder and ColBERT models.
- fusion
- models
- Model download, caching, and lifecycle management.
- multi_
gpu - tokenize
- types
Enums§
- Error
- Top-level error type for flash_rerank.
Functions§
- rerank
- Convenience function: create an OrtScorer from a model directory and score documents.
Type Aliases§
- Result
- Result alias for flash_rerank operations.