Crate scenesdetect

Expand description

scenesdetect

A Rust port of PySceneDetect — scene/shot cut detection built around a Sans-I/O streaming API, designed to slot in any other frame source.

LoC

§Overview

scenesdetect is a from-scratch Rust port of PySceneDetect. It is deliberately Sans-I/O: the crate never opens a file, decodes a packet, or spawns a thread. Callers hand frames in one by one, and each detector returns an Option<Timestamp> identifying the cut point — or nothing. Composing those point cuts into scene ranges is the caller’s responsibility, which keeps this crate independent of any particular decoding pipeline.

Timestamps are represented as raw integer pts + Timebase (matching FFmpeg’s AVRational) rather than floating-point seconds, so all arithmetic is exact and cross-stream comparisons are unambiguous.

§Detectors

Module	Algorithm	Good for
`histogram`	YUV-luma histogram correlation	Generic cuts, robust to camera shake
`phash`	DCT-based perceptual hash (pHash)	Similarity-tolerant dedup / cut detection
`threshold`	Mean-brightness state machine	Fade-to-black / fade-in transitions
`content`	HSV-space delta + optional Canny edge delta	Motion/composition changes — the default PySceneDetect algorithm
`adaptive`	Rolling-average wrapper over `content`	Suppresses false positives on sustained fast motion

§Features

Sans-I/O streaming API — hand in LumaFrame / RgbFrame / HsvFrame (zero-copy slices), get Option<Timestamp> back per frame. No allocation on the hot path once the detector is primed.
Hand-written SIMD backends — aarch64 NEON, x86 SSSE3 + AVX2 (runtime-dispatched via is_x86_feature_detected!), and wasm simd128. All with scalar fallbacks, toggleable per-detector via Options::with_simd(false).
Exact rational timestamps — Timebase mirrors FFmpeg’s AVRational; Timestamp compares semantically across timebases via i128 cross-multiply.
no_std + alloc — the crate builds without std; enable the default std feature for runtime x86 feature detection.
Optional serde — all Options types derive Serialize / Deserialize under the serde feature.

§Installation

[dependencies]
scenesdetect = "0.1"

§Crate features

Feature	Default	Purpose
`std`	✓	Runtime x86 SIMD dispatch, standard library types
`alloc`		`no_std` build using `alloc` only
`serde`		`Serialize` / `Deserialize` for all `Options` types

§Benchmarks

Numbers below are per-frame runtimes from the benchmark.yml CI workflow on GitHub-hosted runners, compiled with the default release profile (opt-level = 3, thin LTO). Each row is a single process_* call — that is, the full pipeline for one frame including the per-channel delta reduction. Lower is better; fps is 1 s / per-frame time. Full data lives in the Benchmarks workflow artifacts.

§Per-detector timings at 1080p

Best SIMD-on path, single-threaded:

Detector	macOS aarch64 NEON	Linux x86_64 AVX2	Windows x86_64 AVX2
`histogram`	0.93 ms (≈1 080 fps)	1.24 ms (≈810 fps)	1.26 ms (≈790 fps)
`phash`	1.65 ms (≈610 fps)	2.03 ms (≈490 fps)	2.22 ms (≈450 fps)
`threshold` — luma	0.12 ms (≈8 000 fps)	0.33 ms (≈3 080 fps)	0.34 ms (≈2 940 fps)
`threshold` — RGB	0.38 ms (≈2 650 fps)	0.98 ms (≈1 030 fps)	0.99 ms (≈1 020 fps)
`content` — luma-only	0.48 ms (≈2 080 fps)	0.34 ms (≈2 940 fps)	0.40 ms (≈2 510 fps)
`content` — BGR, no edges	3.38 ms (≈ 300 fps)	2.78 ms (≈360 fps)	2.84 ms (≈350 fps)
`content` — BGR with Canny edges	58.0 ms (≈17 fps)	71.0 ms (≈14 fps)	75.8 ms (≈13 fps)
`adaptive` — luma-only	0.49 ms (≈2 040 fps)	0.30 ms (≈3 300 fps)	0.40 ms (≈2 500 fps)
`adaptive` — BGR, no edges	3.18 ms (≈ 315 fps)	2.78 ms (≈360 fps)	3.06 ms (≈325 fps)

§SIMD vs scalar at 1080p (`content::process_bgr`, default weights, no edges)

The BGR path is the hot spot — packed-BGR → planar HSV conversion is where the hand-written SIMD backends earn their keep. Scalar numbers come from the same benches with Options::with_simd(false).

Tier	SIMD	Scalar	Uplift
`macos-aarch64-neon`	3.38 ms	4.61 ms	1.36×
`ubuntu-x86_64-default` (runtime AVX2)	2.78 ms	24.99 ms	9.0×
`ubuntu-x86_64-native` (`-C target-cpu=native`)	2.72 ms	9.00 ms	3.3×
`ubuntu-x86_64-ssse3-only` (AVX/AVX2/FMA disabled)	2.09 ms	21.34 ms	10.2×
`windows-x86_64-default`	2.84 ms	57.55 ms	20.3×

A few things fall out of this:

x86 SIMD is very much worth it. Intel/AMD runners without the hand-written std::arch dispatch — i.e. scalar — run the BGR pipeline 9–20× slower than the SSSE3/AVX2 backend. The biggest x86 win is the 3-plane deinterleave via PSHUFB, which the compiler doesn’t emit on its own.
NEON uplift is modest because aarch64’s auto-vectorizer handles the scalar fallback well; the hand-written NEON path still wins on the deinterleave (vld3q_u8) but the scalar baseline is already strong.
-C target-cpu=native closes most of the scalar gap on x86 (9 ms vs 25 ms default scalar) by unlocking AVX2 for LLVM’s auto-vectorizer, but it still loses to the hand-written dispatch by ~3×.
Canny edges are expensive. Turning on delta_edges dominates the frame time at ~60–75 ms/1080p. Only enable it when color deltas aren’t enough.
Adaptive overhead is ≈O(1) per frame. Varying window_width from 1 to 16 moves the 1080p luma-only timing by <5% — the rolling-sum fix made the per-frame cost flat.

§Reproducing locally

cargo bench --bench content
cargo bench --bench adaptive
# ...or all of them:
cargo bench

The benchmark.yml workflow runs five matrix rows on every push to main and every PR touching src/**, benches/**, or the workflow file: macos-aarch64-neon, ubuntu-x86_64-default, ubuntu-x86_64-native, ubuntu-x86_64-ssse3-only, windows-x86_64-default. The per-run artifact contains both a bencher-format summary and the Criterion HTML detail tree.

§Acknowledgements

scenesdetect is a Rust port of PySceneDetect by Brandon Castellano, released under the BSD 3-Clause license. The detector algorithms — histogram correlation, DCT-based pHash, brightness-threshold fades, HSV + Canny content deltas, and the rolling-average adaptive layer — are re-implementations of the algorithms described in PySceneDetect’s source and documentation. Default parameters mirror PySceneDetect’s where practical; any deliberate deviations are called out in the relevant module docs.

See THIRD-PARTY.md for the full upstream license text and additional third-party notices.

§License

scenesdetect is under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-APACHE, LICENSE-MIT for details.

Modules§

adaptivestd or alloc: Rolling-average / adaptive scene detector built on top of the content detector’s scores. Reduces false positives on fast camera motion. Adaptive (rolling-average) scene detector.
contentstd or alloc: Content-change scene detector using HSV-space per-frame deltas and optional Canny edge comparison. Content-change scene detection via HSV-space deltas and optional Canny edges.
frame: Frame types for scene detection. Frame-input types for the scene detectors.
histogramstd or alloc: Histogram-based scene detector using YUV luma correlation. Histogram-based scene detection via luma correlation.
phashstd or alloc: Perceptual hash-based scene detector using the DCT-based pHash algorithm. Perceptual hash (pHash) scene detection via DCT signatures.
thresholdstd or alloc: Intensity-threshold scene detector for fade-in / fade-out transitions. Intensity-threshold scene detection — fade-in / fade-out transitions.