Module captioning

Expand description

Three-level hierarchical caption generation pipeline.

SensorLM’s key insight is that paired (sensor, text) training data can be generated automatically from unlabelled wearable recordings, eliminating the need for human annotation at scale.

§Caption levels

Level	Module	Description	Token budget
1 – Statistical	`statistical`	Mean/max/min/std per channel	512
2 – Structural	`structural`	Trends & anomaly events	512
3 – Semantic	`semantic`	Activities, sleep, mood	256–1024

§Combination keys

The training pipeline selects one of eight caption variants for each batch:

low_level_caption             → level 1 only
middle_level_caption          → level 2 only
high_level_summary_caption    → level 3 only (short)
high_level_all_caption        → level 3 (full)
middle_low_level_caption      → levels 2 + 1
high_low_level_caption        → levels 3 + 1
high_middle_level_caption     → levels 3 + 2
high_middle_low_level_caption → levels 3 + 2 + 1

Modules§

semantic: Level-3 (semantic) caption generation.
statistical: Level-1 (statistical) caption generation.
structural: Level-2 (structural) caption generation.
templates: Text templates for all three captioning levels.

Structs§

CaptionContext: All contextual information needed to produce a full multi-level caption.

Functions§

generate_caption: Generate the caption text for the requested CaptionKey.

Module captioning

Module captioning Copy item path

§Caption levels

§Combination keys

Modules§

Structs§

Functions§

Module captioning