Skip to main content

Module captioning

Module captioning 

Source
Expand description

Three-level hierarchical caption generation pipeline.

SensorLM’s key insight is that paired (sensor, text) training data can be generated automatically from unlabelled wearable recordings, eliminating the need for human annotation at scale.

§Caption levels

LevelModuleDescriptionToken budget
1 – StatisticalstatisticalMean/max/min/std per channel512
2 – StructuralstructuralTrends & anomaly events512
3 – SemanticsemanticActivities, sleep, mood256–1024

§Combination keys

The training pipeline selects one of eight caption variants for each batch:

low_level_caption             → level 1 only
middle_level_caption          → level 2 only
high_level_summary_caption    → level 3 only (short)
high_level_all_caption        → level 3 (full)
middle_low_level_caption      → levels 2 + 1
high_low_level_caption        → levels 3 + 1
high_middle_level_caption     → levels 3 + 2
high_middle_low_level_caption → levels 3 + 2 + 1

Modules§

semantic
Level-3 (semantic) caption generation.
statistical
Level-1 (statistical) caption generation.
structural
Level-2 (structural) caption generation.
templates
Text templates for all three captioning levels.

Structs§

CaptionContext
All contextual information needed to produce a full multi-level caption.

Functions§

generate_caption
Generate the caption text for the requested CaptionKey.