media-seek 0.2.5

🔍 Container format index parsing and timestamp-to-byte-offset seeking for streaming media
Documentation


💭 Why media-seek?

Downloading only a time clip from a streaming URL requires knowing which bytes to request. Every container format stores this information differently — MP4 uses a sidx box, WebM has EBML Cues, MP3 uses the Xing TOC, and so on.

media-seek abstracts all of that. You feed it the leading bytes of a stream and it returns a ContainerIndex that answers one question: given a time window [start, end], which bytes do I need to fetch?

There is no HTTP client bundled and no subprocess spawned. Callers implement the two-line RangeFetcher trait to supply bytes from whatever transport they use. Formats whose index lies outside the probe window (WebM Cues, OGG page bisection, AVI idx1, MPEG-TS PCR search) request additional ranges through that trait automatically.

📥 How to get it

Add the following to your Cargo.toml file:

[dependencies]
media-seek = "0.2.5"

Check the releases page for the latest version.

🔍 Observability & Tracing

This crate always includes the tracing crate. It emits debug events for each format detection, parse step, and extra fetch.

⚠️ Important: tracing macros are pure no-ops without a configured subscriber. If you don't add one, there is zero runtime overhead.

To capture logs, add a subscriber in your application:

[dependencies]
tracing-subscriber = "0.3"
use tracing::Level;
use tracing_subscriber::FmtSubscriber;

let subscriber = FmtSubscriber::builder()
    .with_max_level(Level::DEBUG)
    .finish();
tracing::subscriber::set_global_default(subscriber)
    .expect("setting default subscriber failed");

Refer to the tracing-subscriber documentation for more advanced configuration (JSON output, log levels, targets, etc.).


🎯 Supported formats

Extension Index type Precision Extra fetches
mp4, m4a fMP4 SIDX box Fragment boundary No
webm EBML Cues element Cluster boundary Maybe
mp3 Xing/VBRI TOC or CBR avg Frame / 1 % TOC entry No
ogg OGG page granule bisection Page boundary Yes (up to 64)
flac SEEKTABLE metadata block Seek point No
wav PCM formula BlockAlign-exact No
aiff PCM formula Sample-exact No
aac ADTS frame scan average ~21 ms frame No
flv AMF0 onMetaData keyframes Keyframe No
avi idx1 chunk at EOF Frame Yes (1 fetch)
ts PCR binary search ~11 ms TS packet Yes (up to 64)

MHTML (storyboard segments), None, and unrecognized magic bytes return Err(Error::UnsupportedFormat).

🚀 Quick start

1. Implement RangeFetcher

use media_seek::RangeFetcher;

struct HttpFetcher {
    client: reqwest::Client,
    url: String,
}

impl RangeFetcher for HttpFetcher {
    type Error = reqwest::Error;

    async fn fetch(&self, start: u64, end: u64) -> std::result::Result<Vec<u8>, Self::Error> {
        let range = format!("bytes={}-{}", start, end);
        self.client
            .get(&self.url)
            .header("Range", range)
            .send()
            .await?
            .bytes()
            .await
            .map(|b| b.to_vec())
    }
}

2. Parse the container index

use media_seek::{parse, RangeFetcher};

# struct HttpFetcher;
# impl RangeFetcher for HttpFetcher {
#     type Error = std::io::Error;
#     async fn fetch(&self, _: u64, _: u64) -> std::result::Result<Vec<u8>, Self::Error> { Ok(vec![]) }
# }
# async fn example() -> Result<(), Box<dyn std::error::Error>> {
let fetcher = HttpFetcher { /**/ };

// 512 KB is recommended — enough for most format headers and indices
let probe: Vec<u8> = fetcher.fetch(0, 512 * 1024).await?;

// Total stream size in bytes (from Content-Length or a HEAD request)
let total_size: Option<u64> = Some(1_234_567_890);

let index = parse(&probe, total_size, &fetcher).await?;
# Ok(())
# }

3. Translate timestamps to byte ranges

# use media_seek::{ContainerIndex, RangeFetcher};
# async fn example(index: ContainerIndex, fetcher: impl RangeFetcher<Error = std::io::Error>) -> Result<(), Box<dyn std::error::Error>> {
if let Some((content_start, content_end)) = index.find_byte_range(60.0, 120.0) {
    // Always prefetch the init segment so decoders have codec parameters
    let init = fetcher.fetch(0, index.init_end_byte).await?;
    let clip = fetcher.fetch(content_start, content_end).await?;

    // Write init + clip to a file, then trim with FFmpeg stream copy:
    // ffmpeg -i combined.mp4 -ss 60 -t 60 -c copy -avoid_negative_ts 1 -y out.mp4
    let _ = (init, clip);
}
# Ok(())
# }

📖 Documentation

The full API reference is available on docs.rs.

parse()

pub async fn parse<F: RangeFetcher>(
    probe: &[u8],
    total_size: Option<u64>,
    fetcher: &F,
) -> Result<ContainerIndex>;

Detects the container format from magic bytes in probe and dispatches to the appropriate parser. Returns Err(Error::UnsupportedFormat) for unrecognised formats.

ContainerIndex

pub struct ContainerIndex {
    /// Last byte (inclusive) of the codec initialisation data (moov, EBML header, …).
    /// A partial download must always include `bytes 0..=init_end_byte`.
    pub init_end_byte: u64,
}

impl ContainerIndex {
    /// Returns `Some((content_start, content_end))` covering `[start_secs, end_secs]`,
    /// expanded to the nearest decodable boundary, or `None` if the range is not covered.
    pub fn find_byte_range(&self, start_secs: f64, end_secs: f64) -> Option<(u64, u64)>;
}

RangeFetcher trait

pub trait RangeFetcher {
    type Error: std::error::Error + Send + Sync + 'static;

    /// Fetches bytes `[start, end]` (inclusive) from the remote stream.
    fn fetch(&self, start: u64, end: u64)
        -> impl Future<Output = std::result::Result<Vec<u8>, Self::Error>> + Send;
}

fetch is called only when extra data is required beyond the initial probe:

  • WebM — when the Cues element starts beyond the probe window.
  • OGG — up to 64 equidistant binary-search probes across the stream.
  • AVI — one fetch of the last 64 KB to locate the idx1 chunk.
  • MPEG-TS — up to 64 equidistant binary-search probes for PCR timestamps.

All other formats (MP4, MP3, FLAC, WAV, AIFF, AAC, FLV) parse entirely from the probe.

🚨 Error handling

pub enum Error {
    /// MHTML, plaintext, or unrecognized magic bytes.
    UnsupportedFormat,
    /// Container index could not be parsed (truncated data, invalid structure).
    ParseFailed { reason: String },
    /// An extra Range fetch required by the parser failed.
    FetchFailed(Box<dyn std::error::Error + Send + Sync>),
}

UnsupportedFormat is the expected case for storyboard MHTML segments and non-media content. Callers should handle it by falling back to a full download or reporting that seeking is unavailable.


🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. Make sure to follow the Contributing Guidelines.

📄 License

This project is licensed under the GPL-3.0 License.