oxideav-scene
A time-based composition model for oxideav: a Scene is a canvas
populated with Objects (images, videos, text, shapes, audio cues)
animated over a timeline. Scenes are the foundation for three distinct
workloads:
- Document layout — a PDF page is a single-frame scene with text, vector shapes, and image objects laid out in their native coordinate system. Edits (adding a watermark, moving an image, rewrapping a paragraph) happen on the scene, not on rasterised pixels, so text stays selectable and vectors stay crisp on re-export.
- Live streaming compositor — a long-running scene fed by external
operations (
AddObject,MoveObject,FadeOut). Intended to sit behind an RTMP server so a remote control plane can drive a per-viewer overlay: add a lower-third during a goal, slide a logo in, trigger a sound effect. - Non-linear video editor (NLE) timeline — Premiere/Resolve-style multi-track editing. Tracks are ordered groups of scene objects, transitions are keyframed cross-fades / wipes, effects are filter chains attached to a single object.
Zero C dependencies — pure Rust, same rules as the rest of oxideav.
Status
Scaffold. This crate ships the type model + public-API shape for
all three use cases and a placeholder SceneRenderer trait. No real
rendering, encoding, or file-format I/O yet — those land as follow-ups.
Scene,SceneObject,ObjectKind,Transform,Animation,Keyframe,Easing,AudioCuetypes are in place.SceneRenderer+SceneSamplertraits are defined but returnError::Unsupportedon every call.- No
oxideav-codecor container integration yet — that comes after the render pipeline is real.
Data model
Scene
A scene is addressed in its own time_base — same rational type oxideav
uses everywhere. framerate is separate: time_base sets the tick
granularity of every scheduled event (keyframe, lifetime, audio cue
trigger); framerate sets the cadence at which the renderer samples the
scene and emits frames to a sink. A scene at time_base = 1/1000 (ms)
and framerate = 30/1 renders at t = 0, 33, 66, 100, … ms. Videos
included via ObjectKind::Video are retimed by the renderer so their
per-frame PTS aligns with this cadence.
SceneDuration::Indefinite signals a streaming scene: no end, no
rewinding, the composition is driven forward by wall-clock time +
operation messages.
Canvas
Keeping both raster and vector under one type lets the same
SceneObject/Animation/Transform primitives drive PDFs,
compositor streams, and NLE timelines without forking the API.
SceneObject
ObjectKind
ImageSource / VideoSource / LiveStreamHandle own the heavy
resources — e.g. a VideoSource holds a demuxer + decoder pair, so
copying a SceneObject is cheap but cloning the underlying pixels
requires Arc-shared frame storage (managed by oxideav-core).
Transform + Animation
Keyframe values are typed per property (Vec2, f32, colour, etc.)
via a KeyframeValue enum that interpolate(a, b, t, easing) acts on.
AudioCue
Audio cues mix into a single output bus per scene. The render pass
produces (VideoFrame, AudioBuffer) at each timestamp; the audio
buffer spans the interval [last_render_time, this_render_time) at
the scene's sample_rate.
Rendering pipeline
Scene + t → SceneSampler.sample_at(t) → RenderedFrame {
video: Option<VideoFrame>, // None for audio-only intervals
audio: AudioBuffer, // always valid, may be silence
operations: Vec<ExportOp>, // e.g. for PDF export: emit text run X
}
A SceneRenderer walks the SceneObject list in z-order, evaluating
transforms + animations at t, clipping against the canvas, and
compositing via the BlendMode. The renderer delegates per-object
content fetching to each ObjectKind's own sampler:
Source / Sink
A scene acts as a source of rendered frames. Wrap a Scene plus
a SceneRenderer in a RenderedSource and the resulting value
implements SceneSource: one pull() per frame at the scene's
framerate, timestamps auto-advanced by 1 / framerate. Finite
scenes signal end-of-stream by returning None; indefinite scenes
run until externally stopped.
Consumers implement SceneSink — init(&SourceFormat) once, push
per frame, finalise() at end. The helper drive(source, sink) runs
the pull loop:
use ;
let scene = Scene ;
let mut src = new; // real renderer goes here
let mut sink = default;
// drive(&mut src, &mut sink)?; // when the real renderer lands
Downstream crates provide the real sinks — an oxideav-scene-encode
sink that pipes frames into an encoder + muxer, an
oxideav-scene-rtmp sink that writes to an RTMP endpoint, a
WindowSink for live preview, etc. Any of these can slot in without
changing the scene or renderer.
Automatic pixel-format adaptation
Pixel formats get handled transparently in two places:
Inbound (source → scene): a Video / Image / Live object's
source frames can be in any pixel format the decoder produces —
YUV420P, YUV444P, BGRA, RGB24, NV12, whatever. The renderer converts
them to the canvas's pixel format before compositing via
adapt_frame_to_canvas. Writers of per-object samplers call this
once on each pulled frame; canvases that don't declare a raster
format (vector canvases for PDF export) short-circuit the conversion.
Outbound (scene → sink): when a sink expects a pixel format that
differs from the scene's canvas — e.g. a JPEG writer wants RGB24
while the scene composes in YUV420P — wrap the source in
AdaptedSource:
use ;
use PixelFormat;
let scene = default; // canvas: Yuv420P
let src = new;
let adapted = new;
// adapted.format().canvas now reports Rgba; pulled frames are
// transparently converted on the way out.
Both paths delegate to oxideav-pixfmt
— the same conversion matrix used across oxideav.
Imagesamplers hold a cached decodedVideoFrame.Videosamplers advance their demuxer/decoder to the requested PTS and return the most recent frame.Textsamplers shape glyphs via a pluggableTextShapertrait (default: a minimal monospace fallback; real layout engines land as separate crates).Shapesamplers rasterise on demand via a pure-Rust vector rasteriser (planned asoxideav-rasterise, another follow-up).
Use cases in detail
PDF pages
Each page becomes a Scene with Canvas::Vector { unit: Pt, width, height } and one SceneObject per glyph run, image, and vector path.
The scene's duration is Finite(1 frame). Edits (redact a region,
drop a watermark, rewrap a column) happen on the scene graph. When the
user re-exports:
- PDF out — the
SceneRendererwalks the tree and emits PDF operators (Tjfor text,Dofor images,f/Sfor vectors), preserving structure. Text remains selectable, hyperlinks survive, bookmarks stay intact. - PNG / JPEG out — the renderer rasterises at a requested DPI.
Streaming compositor (RTMP server)
A daemon holds one Scene per live channel with duration: Indefinite. A control-plane protocol (JSON over WebSocket, say)
surfaces:
The compositor renders the scene into a VP9/AV1/H.264 encoder fed to an RTMP muxer. Viewers receive a normal stream; the producer only sees the DSL.
NLE timeline (Premiere / Resolve style)
Tracks are SceneObject::Group children with a shared z-order band.
Transitions between clips are implemented as opacity / position
animations that overlap two Video objects. Effects are the
effects: Vec<Effect> vector on each object. Scrubbing + preview
works by driving the SceneSampler at arbitrary timestamps; export
renders the entire duration at the target framerate.
Crate layout (scaffold today)
src/
├── lib.rs — module exports + Scene / Canvas root types
├── object.rs — SceneObject + ObjectKind + Transform + BlendMode
├── animation.rs — Animation + Keyframe + Easing + interpolation
├── audio.rs — AudioCue + AudioSource
├── render.rs — SceneRenderer + SceneSampler traits + StubRenderer
├── source.rs — SceneSource + SceneSink + drive() + RenderedSource + NullSink / FnSink
├── adapt.rs — pixel-format adaptation (inbound + outbound, via oxideav-pixfmt)
├── duration.rs — SceneDuration + Lifetime
├── id.rs — ObjectId (stable, editable)
└── ops.rs — Operation enum for the streaming compositor
Everything is pub and #[non_exhaustive] on public enums so new
variants can land without an SemVer break.
Non-goals (for now)
- Not a vector rasteriser. Shape rendering ships as a separate
crate (
oxideav-rasterise) pending. - Not a text shaper. The
TextShapertrait is pluggable; a real shaper lands inoxideav-text(pending). - Not an NLE UI. This crate is the data model + renderer core; the UI is downstream.
- Not a document parser. PDF / SVG ingest land in
oxideav-pdf/oxideav-svg(both pending) and produceScenes.
License
MIT — same as the rest of oxideav.