oxideav-scene
A time-based composition model for oxideav: a Scene is a canvas
populated with Objects (images, videos, text, shapes, audio cues)
animated over a timeline. Scenes are the foundation for three distinct
workloads:
- Document layout — a PDF page is a single-frame scene with text, vector shapes, and image objects laid out in their native coordinate system. Edits (adding a watermark, moving an image, rewrapping a paragraph) happen on the scene, not on rasterised pixels, so text stays selectable and vectors stay crisp on re-export.
- Live streaming compositor — a long-running scene fed by external
operations (
AddObject,MoveObject,FadeOut). Intended to sit behind an RTMP server so a remote control plane can drive a per-viewer overlay: add a lower-third during a goal, slide a logo in, trigger a sound effect. - Non-linear video editor (NLE) timeline — Premiere/Resolve-style multi-track editing. Tracks are ordered groups of scene objects, transitions are keyframed cross-fades / wipes, effects are filter chains attached to a single object.
Zero C dependencies — pure Rust, same rules as the rest of oxideav.
Status
Scaffold. This crate ships the type model + public-API shape for
all three use cases and a placeholder SceneRenderer trait. No real
rendering, encoding, or file-format I/O yet — those land as follow-ups.
Scene,SceneObject,ObjectKind,Transform,Animation,Keyframe,Easing,AudioCuetypes are in place.SceneRenderer+SceneSamplertraits are defined but returnError::Unsupportedon every call.- No
oxideav-codecor container integration yet — that comes after the render pipeline is real.
Data model
Scene
A scene is addressed in its own time_base — same rational type oxideav
uses everywhere. SceneDuration::Indefinite signals a streaming scene:
no end, no rewinding, the composition is driven forward by wall-clock
time + operation messages.
Canvas
Keeping both raster and vector under one type lets the same
SceneObject/Animation/Transform primitives drive PDFs,
compositor streams, and NLE timelines without forking the API.
SceneObject
ObjectKind
ImageSource / VideoSource / LiveStreamHandle own the heavy
resources — e.g. a VideoSource holds a demuxer + decoder pair, so
copying a SceneObject is cheap but cloning the underlying pixels
requires Arc-shared frame storage (managed by oxideav-core).
Transform + Animation
Keyframe values are typed per property (Vec2, f32, colour, etc.)
via a KeyframeValue enum that interpolate(a, b, t, easing) acts on.
AudioCue
Audio cues mix into a single output bus per scene. The render pass
produces (VideoFrame, AudioBuffer) at each timestamp; the audio
buffer spans the interval [last_render_time, this_render_time) at
the scene's sample_rate.
Rendering pipeline
Scene + t → SceneSampler.sample_at(t) → RenderedFrame {
video: Option<VideoFrame>, // None for audio-only intervals
audio: AudioBuffer, // always valid, may be silence
operations: Vec<ExportOp>, // e.g. for PDF export: emit text run X
}
A SceneRenderer walks the SceneObject list in z-order, evaluating
transforms + animations at t, clipping against the canvas, and
compositing via the BlendMode. The renderer delegates per-object
content fetching to each ObjectKind's own sampler:
Imagesamplers hold a cached decodedVideoFrame.Videosamplers advance their demuxer/decoder to the requested PTS and return the most recent frame.Textsamplers shape glyphs via a pluggableTextShapertrait (default: a minimal monospace fallback; real layout engines land as separate crates).Shapesamplers rasterise on demand via a pure-Rust vector rasteriser (planned asoxideav-rasterise, another follow-up).
Use cases in detail
PDF pages
Each page becomes a Scene with Canvas::Vector { unit: Pt, width, height } and one SceneObject per glyph run, image, and vector path.
The scene's duration is Finite(1 frame). Edits (redact a region,
drop a watermark, rewrap a column) happen on the scene graph. When the
user re-exports:
- PDF out — the
SceneRendererwalks the tree and emits PDF operators (Tjfor text,Dofor images,f/Sfor vectors), preserving structure. Text remains selectable, hyperlinks survive, bookmarks stay intact. - PNG / JPEG out — the renderer rasterises at a requested DPI.
Streaming compositor (RTMP server)
A daemon holds one Scene per live channel with duration: Indefinite. A control-plane protocol (JSON over WebSocket, say)
surfaces:
The compositor renders the scene into a VP9/AV1/H.264 encoder fed to an RTMP muxer. Viewers receive a normal stream; the producer only sees the DSL.
NLE timeline (Premiere / Resolve style)
Tracks are SceneObject::Group children with a shared z-order band.
Transitions between clips are implemented as opacity / position
animations that overlap two Video objects. Effects are the
effects: Vec<Effect> vector on each object. Scrubbing + preview
works by driving the SceneSampler at arbitrary timestamps; export
renders the entire duration at the target framerate.
Crate layout (scaffold today)
src/
├── lib.rs — module exports + Scene / Canvas root types
├── object.rs — SceneObject + ObjectKind + Transform + BlendMode
├── animation.rs — Animation + Keyframe + Easing + interpolation
├── audio.rs — AudioCue + AudioSource
├── render.rs — SceneRenderer + SceneSampler traits
├── duration.rs — SceneDuration + Lifetime
├── id.rs — ObjectId (stable, editable)
└── ops.rs — Operation enum for the streaming compositor
Everything is pub and #[non_exhaustive] on public enums so new
variants can land without an SemVer break.
Non-goals (for now)
- Not a vector rasteriser. Shape rendering ships as a separate
crate (
oxideav-rasterise) pending. - Not a text shaper. The
TextShapertrait is pluggable; a real shaper lands inoxideav-text(pending). - Not an NLE UI. This crate is the data model + renderer core; the UI is downstream.
- Not a document parser. PDF / SVG ingest land in
oxideav-pdf/oxideav-svg(both pending) and produceScenes.
License
MIT — same as the rest of oxideav.