mediaschema 0.2.0

Product-agnostic media-primitive schema (buffa-generated)
Documentation

mediaschema

Product-agnostic media-primitive schema. The architectural hub is the hand-written domain::* layer (rust-type-conventions, validated try_new-style constructors); every backend — protobuf wire (buffa), sqlx (3 dialects), MongoDB — is a thin lossless conversion to/from the domain.

Locked schema docs live under schema/*.md; they are the specification the implementation tracks.

Architectural model

                              ┌──────────────────────┐
                              │   domain layer       │
                              │   (validated types)  │
                              └──────┬───────────────┘
                                     │
            ┌────────────────────────┼────────────────────────┐
            │                        │                        │
            ▼                        ▼                        ▼
   ┌──────────────────┐   ┌──────────────────────┐   ┌──────────────────┐
   │  buffa wire      │   │  sqlx row mappers    │   │  mongodb bson    │
   │  (proto3 msgs)   │   │  pg / mysql / sqlite │   │  documents +     │
   │  domain ⇄ wire   │   │  domain ⇄ Row*       │   │  IndexModel      │
   └──────────────────┘   └──────────────────────┘   └──────────────────┘
  • Every backend's encode side (From<&Domain> for Backend) is infallible — the domain is the validated side.
  • Every backend's decode side (TryFrom<Backend> for Domain) routes through the same try_new + with_* builders application code uses, so the same invariants are enforced at every wire/storage edge.
  • All three SQL dialects share a single shape (Pg* / MySql* / Sqlite* row structs + borrowed *RowRef<'r> siblings); the column set is the same in all three.

Aggregate clusters

cluster facet per-track leaf rows
Media Media
MediaFile MediaFile — (1:N copies of a Media)
WatchedLocation WatchedLocation — (FS-event monitor)
Audio Audio AudioTrack AudioSegment (+ Word)
Video Video VideoTrack Scene, Keyframe
Subtitle Subtitle SubtitleTrack SubtitleCue<Id, D> (polymorphic over format: SrtData / VttData / AssData / LrcData + sibling aggregates)
Identity Person, Speaker (per-track diarized voice)

Feature flags

Three independent capability tiers plus medium-aggregate gates and a set of optional backend features. Capability tiers are additive (std is the default); medium-aggregate gates are independent on/off flags layered on top.

flag tier / role enables
none (--no-default-features) no-std + no-alloc stack-only types (Uuid7, FileChecksum, Rgba, ErrorCode, every unit-variant enum + bitflags! companion, Identified transport envelope). Wire layer not compiled.
alloc (no default) no-std + alloc cross-cutting heap-using domain types (Location, ErrorInfo, Provenance, LocalizedText, Media, MediaFile, Person, Speaker, WatchedLocation, UserTag, SceneAnnotation).
std (default) std adds jiff-using aggregates (Speaker, WatchedLocation, …) and Uuid::now_v7.
video (default) medium gate compiles the Video / VideoTrack / Scene / Keyframe aggregate tree + all its sqlx / mongodb backends. Pair with a heap tier (std or alloc).
audio (default) medium gate compiles the Audio / AudioTrack / AudioSegment / Word aggregate tree + all its sqlx / mongodb / buffa backends. Pair with a heap tier.
subtitle (default) medium gate compiles the Subtitle / SubtitleTrack / SubtitleCue aggregate tree + all its sqlx / mongodb / buffa backends. Pair with a heap tier.
buffa wire layer the buffa-generated media.v1 messages + the buffa ⇄ domain bridge under mediaschema::buffa::*. Pair with std or alloc.
json wire JSON serde derives on the wire types (via buffa). Implies std + buffa.
arbitrary property tests arbitrary::Arbitrary on the wire types. Implies std + buffa.
mongodb bson backend bson Document ⇄ domain + per-collection IndexModel constructors. Implies std + json.
sqlx-postgres sql backend postgres Pg*Row types + sqlx::FromRow derives. Implies std.
sqlx-mysql sql backend mysql MySql*Row types + sqlx::FromRow. Implies std.
sqlx-sqlite sql backend sqlite Sqlite*Row types + sqlx::FromRow. Implies std.

The three medium-aggregate gates (video / audio / subtitle) are all enabled in default so out-of-the-box behaviour is unchanged. Consumers that only need a subset of media — e.g. an analysis engine that emits FaceDetections but never touches audio or subtitle tracks — can opt out via default-features = false plus a hand-picked subset:

mediaschema = { version = "0.1", default-features = false, features = ["std", "video"] }

Cross-cutting aggregates (Media, MediaFile, Person, Speaker, WatchedLocation, UserTag, SceneAnnotation) plus the Identified<Id, D> transport envelope are always available when a heap tier is on, regardless of which medium features are selected.

Quick start

[dependencies]
mediaschema = { version = "0.1", features = ["std", "buffa"] }

Build a domain aggregate, encode it through one of the backends:

use mediaschema::domain::{Media, MediaKind, Uuid7};
use mediaschema::domain::primitives::FileChecksum;

let m = Media::try_new(
    Uuid7::new(),
    FileChecksum::from_bytes([1u8; 32]),
    /* container */ mediaframe::container::Container::Mp4,
    /* duration */ 12_345,
    MediaKind::Video,
)?;
// ... pass `m` through any backend bridge.

See examples/ for a complete end-to-end domain → backend encoding round-trip.

Regenerating wire code

The buffa-generated wire layer lives under src/generated/ and is produced from the .proto files in proto/. Regenerate with:

cargo run -p xtask -- gen

This is required after editing any proto/**/*.proto file. Do not hand-edit the generated files.

Versioning

mediaschema is currently pre-1.0. A single Cargo SemVer covers every surface — Rust API, proto wire, sqlx DDL, mongodb document shape — and the bump rule depends only on whether the change is breaking on any of them:

  • 0.x.y patch — purely additive across all surfaces (new fields, new proto numbers, new sqlx columns / migrations, new mongodb keys, new public Rust items, new #[non_exhaustive] variants).
  • 0.x.0 minor — any breaking change on any surface; bumps x.
  • 1.0.0 — every surface stabilises. From then on, removing or renaming any field on any surface requires 2.0, and proto reservation rules switch on (every removed proto field gets reserved N;, old sqlx migration files become immutable, mongodb keys are permanent with a one-major-version grace).

Pre-1.0 — the current state — the no-proto-reservations policy allows free renumbering of proto fields between 0.x bumps; this is the trade-off for rapid iteration before any consumer pins a stable version. See issue #59 for the full policy + the three open decisions (cutover timing, mongodb grace period, and schema-doc-rev formality).

Licence

MIT OR Apache-2.0.