# Sequential Frame Extraction (unbundle)
**Status:** Proposal
**Component:** `unbundle` crate — `VideoHandle`
**Date:** 2026-02-12
## Problem
`VideoHandle::frame_at_with_options(timestamp, opts)` performs a full
`avformat_seek_file` + decode-forward cycle on every call, regardless of the
caller's access pattern. When a consumer extracts frames at monotonically
increasing timestamps — the dominant real-world pattern — each call redundantly:
1. Flushes the codec's packet/frame buffers.
2. Seeks backward to the nearest keyframe (which may be the *same* keyframe
the previous call already seeked to).
3. Decodes forward through frames that were already decoded and discarded in
the previous call.
### Measured impact
Test file: 1080×1920 H.264 Main, 30 fps, keyframe interval ≈ 8.3 s (≈ 250
inter-keyframe frames).
| Sequential 5 s interval, re-seeking each time | 108 | 140 s | **1.30 s** |
| Same resolution file, 2 s keyframe interval | 150 | 33 s | 0.22 s |
With 8 s keyframes and 5 s sample spacing, consecutive timestamps often fall
between the *same* pair of keyframes. The decoder decodes ≈ 150–250 frames,
returns one, throws away state, then re-decodes most of the same frames for the
next request.
## Proposed API
### New method: `frames_at_with_options`
```rust
impl VideoHandle<'_> {
/// Extract frames at multiple timestamps, optimising for sequential access.
///
/// Timestamps are processed in sorted order internally. When consecutive
/// timestamps fall within the same keyframe interval, the decoder state is
/// kept warm — no redundant seek or re-decode occurs.
///
/// Frames are returned in the same order as the input `timestamps` slice.
/// If a frame cannot be decoded at a given timestamp, that entry is `None`.
pub fn frames_at_with_options(
&mut self,
timestamps: &[Duration],
opts: &ExtractOptions,
) -> Result<Vec<(Duration, Option<DynamicImage>)>>;
}
```
### Convenience wrapper
```rust
impl VideoHandle<'_> {
pub fn frames_at(
&mut self,
timestamps: &[Duration],
) -> Result<Vec<(Duration, Option<DynamicImage>)>> {
self.frames_at_with_options(timestamps, &ExtractOptions::default())
}
}
```
## Internal algorithm
```
fn frames_at_with_options(timestamps, opts):
// 1. Build a sorted work list with original indices for output reordering.
let work = timestamps
.iter().enumerate()
.map(|(i, ts)| (ts, i))
.sorted_by_key(|(ts, _)| ts);
// 2. Allocate output vec (same length as input, filled with None).
let mut out = vec![None; timestamps.len()];
// 3. State: track the current decoder position.
let mut decoder_pos: Option<Duration> = None;
for (target_ts, orig_idx) in work {
let needs_seek = match decoder_pos {
None => true,
Some(pos) => {
// If target is behind current position, must seek.
// If target is ahead but farther than SKIP_THRESHOLD from
// current position AND a keyframe exists between them,
// seeking is cheaper than decoding forward.
target_ts < pos || should_seek_forward(pos, target_ts)
}
};
if needs_seek {
seek_to_keyframe_before(target_ts);
flush_codec_buffers();
}
// 4. Decode forward until we reach or pass target_ts.
// Keep the *last* frame with pts <= target_ts.
let frame = decode_forward_to(target_ts, opts);
decoder_pos = Some(decoded_pts);
out[orig_idx] = frame.map(|f| apply_extract_options(f, opts));
}
Ok(timestamps.iter().cloned().zip(out).collect())
```
### Seek-vs-decode heuristic (`should_seek_forward`)
The skip threshold determines when it's cheaper to seek than to continue
decoding. A reasonable default:
```
SKIP_THRESHOLD = max(2 × keyframe_interval, 15 s)
```
If the gap between the current decoder position and the next target exceeds
this threshold, seek. Otherwise, keep decoding — the frames are already
partially in the codec's pipeline.
The keyframe interval can be estimated once during `MediaFile::open` by
sampling the first N keyframe packet timestamps from the demuxer (data already
available from `keyframes()`).
## Decoder state management
### What to keep warm
- **Codec context** (`AVCodecContext`): Contains reference frames, motion
vectors, and decoder state. Flushing this with `avcodec_flush_buffers` is
the expensive operation to avoid.
- **Demuxer read position** (`AVFormatContext`): The current read cursor in
the container. No need to re-seek if the next timestamp is ahead of it.
- **Scaler context** (`SwsContext`): Already reusable across frames (no
change needed).
### What to flush on seek
When a seek *is* required:
```c
avformat_seek_file(fmt_ctx, stream_idx, INT64_MIN, target_ts, target_ts, 0);
avcodec_flush_buffers(codec_ctx);
```
This is the existing behaviour — no change here, just done less often.
### Thread safety
`VideoHandle` already borrows `MediaFile` mutably (`&mut self`), so no
concurrent access is possible. No synchronisation changes needed.
## Performance model
Given:
- K = keyframe interval (seconds)
- I = sample interval (seconds)
- N = number of timestamps
- D = cost to decode one frame
**Current cost** (re-seek every call):
$$C_{\text{current}} = N \times \frac{K}{2} \times D$$
On average, each seek lands K/2 frames before the target.
**Proposed cost** (warm decoder for sequential access):
When I < K, consecutive timestamps share a keyframe. The decoder processes
each inter-keyframe frame exactly once:
$$C_{\text{proposed}} \approx \frac{\text{duration}}{K} \times K \times D = \text{duration} \times \text{fps} \times D$$
But since we only need frames up to the last requested timestamp (not the full
video), and we skip gaps > SKIP\_THRESHOLD:
$$C_{\text{proposed}} \approx N \times I \times \text{fps} \times D$$
**Speedup factor** for the I < K case:
$$\text{speedup} \approx \frac{K}{2I}$$
For the measured file (K = 8.3 s, I = 5 s): speedup ≈ 0.83×. Marginal
because I is close to K.
For denser sampling (I = 2 s, K = 8.3 s): speedup ≈ 2.1×.
The real win comes from eliminating *redundant* decode-forward when consecutive
timestamps share the same keyframe interval. With K = 8.3 s and I = 5 s,
≈ 60% of consecutive pairs share a keyframe, saving ≈ 4 s of decode per
pair → total saving ≈ 40–50% for 108 frames.
## Callback variant (streaming)
For callers that don't want to allocate all frames in memory at once:
```rust
impl VideoHandle<'_> {
/// Extract frames sequentially, invoking the callback for each.
/// Internally sorts timestamps and keeps the decoder warm.
pub fn extract_frames_sequential<F>(
&mut self,
timestamps: &[Duration],
opts: &ExtractOptions,
callback: F,
) -> Result<()>
where
F: FnMut(Duration, Option<&DynamicImage>);
}
```
This avoids holding 150 `DynamicImage` values in memory simultaneously.
## Migration path
- `frame_at_with_options` remains unchanged — no breaking change.
- `frames_at_with_options` is additive.
- Callers that extract frames in a loop (univibe, thumbnail strips, video
indexers) switch to the batch API for automatic optimisation.
## Test plan
1. **Correctness**: Extract frames at known timestamps using both
`frame_at_with_options` (loop) and `frames_at_with_options` (batch).
Assert identical pixel output (byte-exact DynamicImage comparison).
2. **Unsorted input**: Pass timestamps in reverse and random order. Verify
output ordering matches input ordering, not internal sorted order.
3. **Seek threshold**: Construct a timestamp list with a large gap (> 30 s)
in the middle. Verify the implementation seeks across the gap rather than
decoding through it (observable via wall-clock timing or packet read count).
4. **Edge cases**:
- Empty timestamp list → empty output.
- Single timestamp → equivalent to `frame_at_with_options`.
- Duplicate timestamps → each gets its own output entry, same frame.
- Timestamp past end of file → `None` for that entry.
- Timestamp at exactly 0 → first frame.
5. **Performance benchmark**: `bench_seek.rs` extended with a batch variant.
Measure wall time for 100 sequential timestamps at 5 s intervals on a
file with sparse keyframes (≥ 8 s). Target: ≥ 30% faster than the
current loop-of-`frame_at` approach.