SAM 2 — Meta's Segment Anything Model 2 (image + video segmentation).
Mirrors facebookresearch/sam2 so the published
sam2_hiera_{t,s,b+,l}.{pt,safetensors} checkpoints load with no
weight-key remapping.
Components
- Phase 1 — Hiera image encoder + FpnNeck
([
image_encoder], [fpn_neck], [preprocess]). - Phase 2 — prompt encoder + TwoWayTransformer + mask decoder
with object-pointer / object-score / high-res mask path
([
prompt_encoder], [transformer], [mask_decoder]). - Phase 3 — memory encoder + memory attention for video
tracking ([
memory_encoder], [memory_attention]). - Top-level wrapper — [
Sam2] orchestrator withpredict_image()andpredict_video_frame()APIs.
Parity status
Synthetic-weights build tests in [tests] exercise every component
(encoder, prompt enc, decoder, memory enc/attn, end-to-end Sam2
object) for every Hiera variant. Numerical parity against the
pytorch reference is wired up in tests/sam2_parity.rs behind the
parity-pytorch feature flag — turning the bisect options there
against a real sam2_hiera_*.safetensors checkpoint is the
follow-up bisect work (analogous to how SAM v1 Phase 1 landed
parity in iterative passes after the initial graph was wired).