mkv-element
A Rust library for reading and writing Matroska(MKV)/WebM elements.
This library provides a simple and efficient way to parse and serialize MKV elements both in memory and on disk, with support for both blocking and asynchronous I/O operations.
First and foremost, the library provides Header struct to read and write the element header (ID and size), and all MKV elements defined in the Matroska specifications as Rust structs with typed fields. All elements implement the Element trait, which provides methods for reading and writing the element body and EBML ID for identifying the element type. As a convenience, a prelude module is provided to bring all the types into scope.
To read an element, you can either use the element's read_from() method to read the entire element (header + body) from a type implementing std::io::Read, or read the header first using Header::read_from() followed by Element::read_element to read the body. The latter is useful when you don't know the element type in advance. To write an element, you can use the element's write_to() method to write the entire element (header + body) to a type implementing std::io::Write.
Asynchronous I/O is supported with the tokio feature enabled. The async_read_from(), async_read_element(), and async_write_to() methods are to work with types implementing tokio::io::AsyncRead and tokio::io::AsyncWrite respectively.
All non-master elements in this crate implements the Deref trait, allowing easy access to the inner value. For example, if you have an UnsignedInteger element, you can access its value directly using the * operator or by calling .deref().
Primer on Matroska/WebM (EBML) Structure
EBML(Extensible Binary Meta Language) is a binary format similar to XML, but more efficient and flexible. It is used as the underlying format for Matroska(MKV)/WebM files. Matroska(MKV)/WebM files start with an EBML header, followed by one or more segments containing the actual media data and metadata. Roughly, the structure looks like:
┌────────────────── MKV Structure ─────────────────┐
│ ┌────────────── EBML ──────────────┐ │
│ │ Header (Version, ReadVersion) │ │
│ └──────────────────────────────────┘ │
│ ┌────────────── Segment(s) ────────┐ │
│ │ ┌──────────── Info ──────────┐ │ │
│ │ │ Metadata (Duration, Title) │ │ │
│ │ └────────────────────────────┘ │ │
│ │ ┌──────────── Tracks ────────┐ │ │
│ │ │ Audio/Video Tracks │ │ │
│ │ └────────────────────────────┘ │ │
│ │ ┌──────────── SeekHead ──────┐ │ │
│ │ │ Index for Seeking │ │ │
│ │ └────────────────────────────┘ │ │
│ │ ┌──────────── Cluster(s) ────┐ │ │
│ │ │ Media Data (Frames) │ │ │
│ │ └────────────────────────────┘ │ │
│ │ ┌──────────── Others ────────┐ │ │
│ │ │ Cues, Chapters, Tags... │ │ │
│ │ └────────────────────────────┘ │ │
│ └──────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
MKV files are made of elements, each with an ID, size, and body. Elements can be of two types:
- Master elements: containers for other elements (like folders)
- Leaf elements: contain a single value of a specific type:
- Unsigned integers
- Signed integers
- Floating point numbers
- Strings (UTF-8/ASCII)
- Binary data
- Dates (timestamps in nanoseconds offset to 2001-01-01T00:00:00.000000000 UTC)
See the Matroska specifications for more details.
Blocking I/O
- Reading elements from types implementing
std::io::Read. - Writing elements to types implementing
std::io::Write.
use *; // prelude brings all the types into scope
use *; // use blocking_impl for blocking I/O
/// Create a simple EBML header element
let ebml = Ebml ;
// Write the EBML element to a type implementing std::io::Write
// 1. to a Vec<u8>
let mut buffer = Vecnew;
ebml.write_to.unwrap;
// 2. to a file
let mut file = sink; // replace with actual file, ie. std::fs::File::create("path/to/file.mkv").unwrap();
ebml.write_to.unwrap;
// Reading a element can be done from either using the element's `read_from()` method
// or reading out the header first followed by a `read_element()`.
// the latter is useful when you don't know the element type in advance.
// 1. using `read_from()`
let mut buf_cursor = new;
let ebml_read_1 = read_from.unwrap;
// or directly from a slice
let ebml_read_2 = read_from.unwrap;
// or from a file
// let mut file = std::fs::File::open("path/to/file.mkv").unwrap();
// let ebml_read_3 = Ebml::read_from(&mut file).unwrap();
assert_eq!;
assert_eq!;
// 2. using `read_element()`
let mut buf_cursor = new;
let header = read_from.unwrap;
assert_eq!;
let ebml_read_4 = read_element.unwrap;
assert_eq!;
Asynchronous I/O
With features tokio enabled, async I/O from tokio is supported. Use the async_read_from(), async_read_element(), and async_write_to() methods respectively.
# block_on
Quick Note
- if you need to work with actual MKV files, don't read a whole segment into memory at once, read only the parts you need instead. Real world MKV files can be very large.
- According to the Matroska specifications, segments and clusters can have an "unknown" size (all size bytes set to 1). In that case, the segment/cluster extends to the end of the file or until the next segment/cluster. This needs to handle by the user. Trying to read such elements with this library will result in an
ElementBodySizeUnknownerror. - This library does not attempt to recover from malformed/corrupted data. If such behavior is desired, extra logic can be added on top of this library.
- Output of this library MAY NOT be the same as input, but should be semantically equivalent and valid. For example, output order of elements may differ from input order, as the order is not strictly enforced by the Matroska specifications.
Acknowledgements
Some of the ideas and code snippets were inspired by the following sources, thanks to their authors:
- mp4-atom by kixelated
- Network protocols, sans I/O