Expand description
SAM 2 — Meta’s Segment Anything Model 2 (image + video segmentation).
Mirrors facebookresearch/sam2 so the published
sam2_hiera_{t,s,b+,l}.{pt,safetensors} checkpoints load with no
weight-key remapping.
§Components
- Phase 1 — Hiera image encoder + FpnNeck
(
image_encoder,fpn_neck,preprocess). - Phase 2 — prompt encoder + TwoWayTransformer + mask decoder
with object-pointer / object-score / high-res mask path
(
prompt_encoder,transformer,mask_decoder). - Phase 3 — memory encoder + memory attention for video
tracking (
memory_encoder,memory_attention). - Top-level wrapper —
Sam2orchestrator withpredict_image()andpredict_video_frame()APIs.
§Parity status
Synthetic-weights build tests in [tests] exercise every component
(encoder, prompt enc, decoder, memory enc/attn, end-to-end Sam2
object) for every Hiera variant. Numerical parity against the
pytorch reference is wired up in tests/sam2_parity.rs behind the
parity-pytorch feature flag — turning the bisect options there
against a real sam2_hiera_*.safetensors checkpoint is the
follow-up bisect work (analogous to how SAM v1 Phase 1 landed
parity in iterative passes after the initial graph was wired).
Re-exports§
pub use config::SAM2_IMG_SIZE;pub use config::SAM2_PATCH_GRID;pub use config::SAM2_PATCH_KERNEL;pub use config::SAM2_PATCH_PADDING;pub use config::SAM2_PATCH_STRIDE;pub use config::SAM2_PIXEL_MEAN;pub use config::SAM2_PIXEL_STD;pub use config::SAM2_PROMPT_EMBED_DIM;pub use config::SAM2_Q_POOL_COUNT;pub use config::SAM2_Q_STRIDE;pub use config::Sam2Config;pub use config::Sam2DecoderConfig;pub use config::Sam2FpnConfig;pub use config::Sam2HieraConfig;pub use config::Sam2MemoryConfig;pub use config::Sam2MemoryEncoderConfig;pub use flow::Sam2ImageEncoderBuilt;pub use flow::Sam2ImageEncoderFlow;pub use flow::build_sam2_image_encoder_built;pub use fpn_neck::FpnLevel;pub use fpn_neck::FpnNeckWeights;pub use fpn_neck::apply_fpn_neck;pub use fpn_neck::apply_fpn_neck_host;pub use fpn_neck_ir::Sam2FpnNeckIr;pub use fpn_neck_ir::compile_fpn_neck_ir;pub use image_encoder::build_sam2_image_encoder_graph;pub use image_encoder::build_sam2_image_encoder_hir;pub use mask_decoder::Sam2MaskDecoderOutput;pub use mask_decoder::Sam2MaskDecoderWeights;pub use mask_decoder::mask_decoder_forward;pub use memory_attention::Sam2MemoryAttentionWeights;pub use memory_attention::memory_attention_forward;pub use memory_attention_ir::MemoryAttentionCompiled;pub use memory_encoder::Sam2MemoryEncoderOutput;pub use memory_encoder::Sam2MemoryEncoderWeights;pub use memory_encoder::memory_encoder_forward;pub use preprocess::Sam2PreprocessWeights;pub use preprocess::assemble_patch_tokens;pub use preprocess::preprocess_image;pub use prompt_encoder::SAM2_MASK_IN_CHANS;pub use prompt_encoder::SAM2_PROMPT_GRID;pub use prompt_encoder::Sam2PromptEncoderOutput;pub use prompt_encoder::Sam2PromptEncoderWeights;pub use prompt_encoder::prompt_encoder_forward;pub use sam2::Sam2;pub use sam2::Sam2ImagePrediction;pub use sam2::Sam2VideoState;pub use transformer::Sam2TwoWayTransformerWeights;pub use transformer::two_way_transformer_forward;pub use transformer_ir::compile_two_way_transformer;
Modules§
- axial_
rope - SAM2 axial 2-D RoPE (host + cos/sin tables for [
Op::Rope] IR). - cli
- config
- SAM 2 model configuration. Mirrors Meta’s
segment-anything-2(a.k.a.facebookresearch/sam2) reference exactly, so the publishedsam2_hiera_{t,s,b+,l}.ptcheckpoints can load without remapping. - flow
- Tier-0 SAM2 Hiera image encoder flow.
- fpn_
neck - SAM 2 FPN neck (mirrors
sam2/modeling/backbones/image_encoder.py::FpnNeck). - fpn_
neck_ ir - SAM2 FPN neck IR: lateral 1×1 convs + top-down nearest ×2 fusion.
- image_
encoder - SAM 2 Hiera image encoder HIR builder.
- mask_
decoder - SAM 2 mask decoder — host-side.
- memory_
attention - SAM 2 memory attention — host-side.
- memory_
attention_ ir - SAM2 memory attention IR.
- memory_
encoder - SAM 2 memory encoder — host-side.
- memory_
mask_ ir - SAM2 memory-encoder IR subgraphs (
MaskDownSampler, prefix fuse,Fuser). - mlp_ir
- Compile SAM2 mask-decoder ReLU MLP heads to IR.
- preprocess
- SAM 2 host-side preprocessing.
- prompt_
encoder - SAM 2 prompt encoder — Fourier/point embeddings host-side; mask stack via IR.
- prompt_
mask_ ir - SAM2 prompt-encoder mask downscale (IR).
- sam2
- SAM 2 top-level orchestrator — ties together the IR-graph Hiera image encoder, the host-side FpnNeck, prompt encoder, mask decoder, memory encoder, and memory attention into the two reference APIs:
- transformer
- SAM 2 two-way transformer — host-side.
- transformer_
ir - Compile SAM2 two-way transformer to IR.
- upscale_
ir - SAM2 mask-decoder upscaling (ConvTranspose2d + LN2d + optional high-res 1×1 fuse).
Structs§
Constants§
- SAM_
PROFILE_ FILE - Colocated with safetensors weights (
sam.rlx.toml).
Functions§
- sam2_
profile_ default - sam2_
profile_ near_ weights - SAM2 checkpoint graphs — loads
sam.rlx.tomlnext to weights when present. - sam_
profile_ near_ weights - Load
sam.rlx.tomlnext to weights, orCompileProfile::sam_encoder.