unbundle 4.1.0

Unbundle media files - extract still frames, audio tracks, and subtitles from video files
Documentation
# Sequential Frame Extraction (unbundle)


**Status:** Proposal  
**Component:** `unbundle` crate — `VideoHandle`  
**Date:** 2026-02-12

## Problem


`VideoHandle::frame_at_with_options(timestamp, opts)` performs a full
`avformat_seek_file` + decode-forward cycle on every call, regardless of the
caller's access pattern. When a consumer extracts frames at monotonically
increasing timestamps — the dominant real-world pattern — each call redundantly:

1. Flushes the codec's packet/frame buffers.
2. Seeks backward to the nearest keyframe (which may be the *same* keyframe
   the previous call already seeked to).
3. Decodes forward through frames that were already decoded and discarded in
   the previous call.

### Measured impact


Test file: 1080×1920 H.264 Main, 30 fps, keyframe interval ≈ 8.3 s (≈ 250
inter-keyframe frames).

| Scenario | Frames | Wall time | Per-frame |
|----------|--------|-----------|-----------|
| Sequential 5 s interval, re-seeking each time | 108 | 140 s | **1.30 s** |
| Same resolution file, 2 s keyframe interval | 150 | 33 s | 0.22 s |

With 8 s keyframes and 5 s sample spacing, consecutive timestamps often fall
between the *same* pair of keyframes. The decoder decodes ≈ 150–250 frames,
returns one, throws away state, then re-decodes most of the same frames for the
next request.

## Proposed API


### New method: `frames_at_with_options`


```rust
impl VideoHandle<'_> {
    /// Extract frames at multiple timestamps, optimising for sequential access.
    ///
    /// Timestamps are processed in sorted order internally. When consecutive
    /// timestamps fall within the same keyframe interval, the decoder state is
    /// kept warm — no redundant seek or re-decode occurs.
    ///
    /// Frames are returned in the same order as the input `timestamps` slice.
    /// If a frame cannot be decoded at a given timestamp, that entry is `None`.
    pub fn frames_at_with_options(
        &mut self,
        timestamps: &[Duration],
        opts: &ExtractOptions,
    ) -> Result<Vec<(Duration, Option<DynamicImage>)>>;
}
```

### Convenience wrapper


```rust
impl VideoHandle<'_> {
    pub fn frames_at(
        &mut self,
        timestamps: &[Duration],
    ) -> Result<Vec<(Duration, Option<DynamicImage>)>> {
        self.frames_at_with_options(timestamps, &ExtractOptions::default())
    }
}
```

## Internal algorithm


```
fn frames_at_with_options(timestamps, opts):
    // 1. Build a sorted work list with original indices for output reordering.
    let work = timestamps
        .iter().enumerate()
        .map(|(i, ts)| (ts, i))
        .sorted_by_key(|(ts, _)| ts);

    // 2. Allocate output vec (same length as input, filled with None).
    let mut out = vec![None; timestamps.len()];

    // 3. State: track the current decoder position.
    let mut decoder_pos: Option<Duration> = None;

    for (target_ts, orig_idx) in work {
        let needs_seek = match decoder_pos {
            None => true,
            Some(pos) => {
                // If target is behind current position, must seek.
                // If target is ahead but farther than SKIP_THRESHOLD from
                // current position AND a keyframe exists between them,
                // seeking is cheaper than decoding forward.
                target_ts < pos || should_seek_forward(pos, target_ts)
            }
        };

        if needs_seek {
            seek_to_keyframe_before(target_ts);
            flush_codec_buffers();
        }

        // 4. Decode forward until we reach or pass target_ts.
        //    Keep the *last* frame with pts <= target_ts.
        let frame = decode_forward_to(target_ts, opts);
        decoder_pos = Some(decoded_pts);

        out[orig_idx] = frame.map(|f| apply_extract_options(f, opts));
    }

    Ok(timestamps.iter().cloned().zip(out).collect())
```

### Seek-vs-decode heuristic (`should_seek_forward`)


The skip threshold determines when it's cheaper to seek than to continue
decoding. A reasonable default:

```
SKIP_THRESHOLD = max(2 × keyframe_interval, 15 s)
```

If the gap between the current decoder position and the next target exceeds
this threshold, seek. Otherwise, keep decoding — the frames are already
partially in the codec's pipeline.

The keyframe interval can be estimated once during `MediaFile::open` by
sampling the first N keyframe packet timestamps from the demuxer (data already
available from `keyframes()`).

## Decoder state management


### What to keep warm


- **Codec context** (`AVCodecContext`): Contains reference frames, motion
  vectors, and decoder state. Flushing this with `avcodec_flush_buffers` is
  the expensive operation to avoid.
- **Demuxer read position** (`AVFormatContext`): The current read cursor in
  the container. No need to re-seek if the next timestamp is ahead of it.
- **Scaler context** (`SwsContext`): Already reusable across frames (no
  change needed).

### What to flush on seek


When a seek *is* required:

```c
avformat_seek_file(fmt_ctx, stream_idx, INT64_MIN, target_ts, target_ts, 0);
avcodec_flush_buffers(codec_ctx);
```

This is the existing behaviour — no change here, just done less often.

### Thread safety


`VideoHandle` already borrows `MediaFile` mutably (`&mut self`), so no
concurrent access is possible. No synchronisation changes needed.

## Performance model


Given:
- K = keyframe interval (seconds)
- I = sample interval (seconds)
- N = number of timestamps
- D = cost to decode one frame

**Current cost** (re-seek every call):

$$C_{\text{current}} = N \times \frac{K}{2} \times D$$

On average, each seek lands K/2 frames before the target.

**Proposed cost** (warm decoder for sequential access):

When I < K, consecutive timestamps share a keyframe. The decoder processes
each inter-keyframe frame exactly once:

$$C_{\text{proposed}} \approx \frac{\text{duration}}{K} \times K \times D = \text{duration} \times \text{fps} \times D$$

But since we only need frames up to the last requested timestamp (not the full
video), and we skip gaps > SKIP\_THRESHOLD:

$$C_{\text{proposed}} \approx N \times I \times \text{fps} \times D$$

**Speedup factor** for the I < K case:

$$\text{speedup} \approx \frac{K}{2I}$$

For the measured file (K = 8.3 s, I = 5 s): speedup ≈ 0.83×. Marginal
because I is close to K.

For denser sampling (I = 2 s, K = 8.3 s): speedup ≈ 2.1×.

The real win comes from eliminating *redundant* decode-forward when consecutive
timestamps share the same keyframe interval. With K = 8.3 s and I = 5 s,
≈ 60% of consecutive pairs share a keyframe, saving ≈ 4 s of decode per
pair → total saving ≈ 40–50% for 108 frames.

## Callback variant (streaming)


For callers that don't want to allocate all frames in memory at once:

```rust
impl VideoHandle<'_> {
    /// Extract frames sequentially, invoking the callback for each.
    /// Internally sorts timestamps and keeps the decoder warm.
    pub fn extract_frames_sequential<F>(
        &mut self,
        timestamps: &[Duration],
        opts: &ExtractOptions,
        callback: F,
    ) -> Result<()>
    where
        F: FnMut(Duration, Option<&DynamicImage>);
}
```

This avoids holding 150 `DynamicImage` values in memory simultaneously.

## Migration path


- `frame_at_with_options` remains unchanged — no breaking change.
- `frames_at_with_options` is additive.
- Callers that extract frames in a loop (univibe, thumbnail strips, video
  indexers) switch to the batch API for automatic optimisation.

## Test plan


1. **Correctness**: Extract frames at known timestamps using both
   `frame_at_with_options` (loop) and `frames_at_with_options` (batch).
   Assert identical pixel output (byte-exact DynamicImage comparison).

2. **Unsorted input**: Pass timestamps in reverse and random order. Verify
   output ordering matches input ordering, not internal sorted order.

3. **Seek threshold**: Construct a timestamp list with a large gap (> 30 s)
   in the middle. Verify the implementation seeks across the gap rather than
   decoding through it (observable via wall-clock timing or packet read count).

4. **Edge cases**:
   - Empty timestamp list → empty output.
   - Single timestamp → equivalent to `frame_at_with_options`.
   - Duplicate timestamps → each gets its own output entry, same frame.
   - Timestamp past end of file → `None` for that entry.
   - Timestamp at exactly 0 → first frame.

5. **Performance benchmark**: `bench_seek.rs` extended with a batch variant.
   Measure wall time for 100 sequential timestamps at 5 s intervals on a
   file with sparse keyframes (≥ 8 s). Target: ≥ 30% faster than the
   current loop-of-`frame_at` approach.