Expand description
scenesdetect
A Rust port of PySceneDetect — scene/shot cut detection built around a Sans-I/O streaming API, designed to slot in any other frame source.
§Overview
scenesdetect is a from-scratch Rust port of PySceneDetect. It is deliberately Sans-I/O: the crate never opens a file, decodes a packet, or spawns a thread. Callers hand frames in one by one, and each detector returns an Option<Timestamp> identifying the cut point — or nothing. Composing those point cuts into scene ranges is the caller’s responsibility, which keeps this crate independent of any particular decoding pipeline.
Timestamps are represented as raw integer pts + Timebase (matching FFmpeg’s AVRational) rather than floating-point seconds, so all arithmetic is exact and cross-stream comparisons are unambiguous.
§Detectors
| Module | Algorithm | Good for |
|---|---|---|
histogram | YUV-luma histogram correlation | Generic cuts, robust to camera shake |
phash | DCT-based perceptual hash (pHash) | Similarity-tolerant dedup / cut detection |
threshold | Mean-brightness state machine | Fade-to-black / fade-in transitions |
content | HSV-space delta + optional Canny edge delta | Motion/composition changes — the default PySceneDetect algorithm |
adaptive | Rolling-average wrapper over content | Suppresses false positives on sustained fast motion |
§Features
- Sans-I/O streaming API — hand in
LumaFrame/RgbFrame/HsvFrame(zero-copy slices), getOption<Timestamp>back per frame. No allocation on the hot path once the detector is primed. - Hand-written SIMD backends — aarch64 NEON, x86 SSSE3 + AVX2 (runtime-dispatched via
is_x86_feature_detected!), and wasmsimd128. All with scalar fallbacks, toggleable per-detector viaOptions::with_simd(false). - Exact rational timestamps —
Timebasemirrors FFmpeg’sAVRational;Timestampcompares semantically across timebases via i128 cross-multiply. no_std+alloc— the crate builds withoutstd; enable the defaultstdfeature for runtime x86 feature detection.- Optional
serde— allOptionstypes deriveSerialize/Deserializeunder theserdefeature.
§Installation
[dependencies]
scenesdetect = "0.1"§Crate features
| Feature | Default | Purpose |
|---|---|---|
std | ✓ | Runtime x86 SIMD dispatch, standard library types |
alloc | no_std build using alloc only | |
serde | Serialize / Deserialize for all Options types |
§Benchmarks
Numbers below are per-frame runtimes from the benchmark.yml CI workflow on GitHub-hosted runners, compiled with the default release profile (opt-level = 3, thin LTO). Each row is a single process_* call — that is, the full pipeline for one frame including the per-channel delta reduction. Lower is better; fps is 1 s / per-frame time. Full data lives in the Benchmarks workflow artifacts.
§Per-detector timings at 1080p
Best SIMD-on path, single-threaded:
| Detector | macOS aarch64 NEON | Linux x86_64 AVX2 | Windows x86_64 AVX2 |
|---|---|---|---|
histogram | 0.93 ms (≈1 080 fps) | 1.24 ms (≈810 fps) | 1.26 ms (≈790 fps) |
phash | 1.65 ms (≈610 fps) | 2.03 ms (≈490 fps) | 2.22 ms (≈450 fps) |
threshold — luma | 0.12 ms (≈8 000 fps) | 0.33 ms (≈3 080 fps) | 0.34 ms (≈2 940 fps) |
threshold — RGB | 0.38 ms (≈2 650 fps) | 0.98 ms (≈1 030 fps) | 0.99 ms (≈1 020 fps) |
content — luma-only | 0.48 ms (≈2 080 fps) | 0.34 ms (≈2 940 fps) | 0.40 ms (≈2 510 fps) |
content — BGR, no edges | 3.38 ms (≈ 300 fps) | 2.78 ms (≈360 fps) | 2.84 ms (≈350 fps) |
content — BGR with Canny edges | 58.0 ms (≈17 fps) | 71.0 ms (≈14 fps) | 75.8 ms (≈13 fps) |
adaptive — luma-only | 0.49 ms (≈2 040 fps) | 0.30 ms (≈3 300 fps) | 0.40 ms (≈2 500 fps) |
adaptive — BGR, no edges | 3.18 ms (≈ 315 fps) | 2.78 ms (≈360 fps) | 3.06 ms (≈325 fps) |
§SIMD vs scalar at 1080p (content::process_bgr, default weights, no edges)
The BGR path is the hot spot — packed-BGR → planar HSV conversion is where the hand-written SIMD backends earn their keep. Scalar numbers come from the same benches with Options::with_simd(false).
| Tier | SIMD | Scalar | Uplift |
|---|---|---|---|
macos-aarch64-neon | 3.38 ms | 4.61 ms | 1.36× |
ubuntu-x86_64-default (runtime AVX2) | 2.78 ms | 24.99 ms | 9.0× |
ubuntu-x86_64-native (-C target-cpu=native) | 2.72 ms | 9.00 ms | 3.3× |
ubuntu-x86_64-ssse3-only (AVX/AVX2/FMA disabled) | 2.09 ms | 21.34 ms | 10.2× |
windows-x86_64-default | 2.84 ms | 57.55 ms | 20.3× |
A few things fall out of this:
- x86 SIMD is very much worth it. Intel/AMD runners without the hand-written
std::archdispatch — i.e. scalar — run the BGR pipeline 9–20× slower than the SSSE3/AVX2 backend. The biggest x86 win is the 3-plane deinterleave viaPSHUFB, which the compiler doesn’t emit on its own. - NEON uplift is modest because aarch64’s auto-vectorizer handles the scalar fallback well; the hand-written NEON path still wins on the deinterleave (
vld3q_u8) but the scalar baseline is already strong. -C target-cpu=nativecloses most of the scalar gap on x86 (9 ms vs 25 ms default scalar) by unlocking AVX2 for LLVM’s auto-vectorizer, but it still loses to the hand-written dispatch by ~3×.- Canny edges are expensive. Turning on
delta_edgesdominates the frame time at ~60–75 ms/1080p. Only enable it when color deltas aren’t enough. - Adaptive overhead is ≈O(1) per frame. Varying
window_widthfrom 1 to 16 moves the 1080p luma-only timing by <5% — the rolling-sum fix made the per-frame cost flat.
§Reproducing locally
cargo bench --bench content
cargo bench --bench adaptive
# ...or all of them:
cargo benchThe benchmark.yml workflow runs five matrix rows on every push to main and every PR touching src/**, benches/**, or the workflow file: macos-aarch64-neon, ubuntu-x86_64-default, ubuntu-x86_64-native, ubuntu-x86_64-ssse3-only, windows-x86_64-default. The per-run artifact contains both a bencher-format summary and the Criterion HTML detail tree.
§Acknowledgements
scenesdetect is a Rust port of PySceneDetect by Brandon Castellano, released under the BSD 3-Clause license. The detector algorithms — histogram correlation, DCT-based pHash, brightness-threshold fades, HSV + Canny content deltas, and the rolling-average adaptive layer — are re-implementations of the algorithms described in PySceneDetect’s source and documentation. Default parameters mirror PySceneDetect’s where practical; any deliberate deviations are called out in the relevant module docs.
See THIRD-PARTY.md for the full upstream license text and additional third-party notices.
§License
scenesdetect is under the terms of both the MIT license and the
Apache License (Version 2.0).
See LICENSE-APACHE, LICENSE-MIT for details.
Copyright (c) 2026 FinDIT studio authors.
Modules§
- adaptive
stdoralloc - Rolling-average / adaptive scene detector built on top of the content detector’s scores. Reduces false positives on fast camera motion. Adaptive (rolling-average) scene detector.
- content
stdoralloc - Content-change scene detector using HSV-space per-frame deltas and optional Canny edge comparison. Content-change scene detection via HSV-space deltas and optional Canny edges.
- frame
- Frame types for scene detection. Frame-input types for the scene detectors.
- histogram
stdoralloc - Histogram-based scene detector using YUV luma correlation. Histogram-based scene detection via luma correlation.
- phash
stdoralloc - Perceptual hash-based scene detector using the DCT-based pHash algorithm. Perceptual hash (pHash) scene detection via DCT signatures.
- threshold
stdoralloc - Intensity-threshold scene detector for fade-in / fade-out transitions. Intensity-threshold scene detection — fade-in / fade-out transitions.