rlx-sam2 0.2.5

SAM 2 (Hiera) for RLX
Documentation

SAM 2 — Meta's Segment Anything Model 2 (image + video segmentation).

Mirrors facebookresearch/sam2 so the published sam2_hiera_{t,s,b+,l}.{pt,safetensors} checkpoints load with no weight-key remapping.

Components

  • Phase 1 — Hiera image encoder + FpnNeck ([image_encoder], [fpn_neck], [preprocess]).
  • Phase 2 — prompt encoder + TwoWayTransformer + mask decoder with object-pointer / object-score / high-res mask path ([prompt_encoder], [transformer], [mask_decoder]).
  • Phase 3 — memory encoder + memory attention for video tracking ([memory_encoder], [memory_attention]).
  • Top-level wrapper — [Sam2] orchestrator with predict_image() and predict_video_frame() APIs.

Parity status

Synthetic-weights build tests in [tests] exercise every component (encoder, prompt enc, decoder, memory enc/attn, end-to-end Sam2 object) for every Hiera variant. Numerical parity against the pytorch reference is wired up in tests/sam2_parity.rs behind the parity-pytorch feature flag — turning the bisect options there against a real sam2_hiera_*.safetensors checkpoint is the follow-up bisect work (analogous to how SAM v1 Phase 1 landed parity in iterative passes after the initial graph was wired).