1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
//! Chunking abstraction for long-form audio inference.
//!
//! `Splitter` is the pluggable boundary between a raw waveform and the
//! encoder-bounded chunks an ASR inference loop consumes. The trait + the
//! built-in [`FixedLengthSplitter`] live here (audio-level preprocessing,
//! same module family as the mel front-end); VAD-driven implementations
//! ship next to their VAD model — see
//! [`SileroVadSplitter`](crate::silero_vad::SileroVadSplitter).
//!
//! [`Transcriber`](crate::gigaam::Transcriber) is generic over the splitter;
//! `S::Error` flows through `TranscribeError<E>` the same way
//! `RnntDecodeError<JitError>` carries the step-backend error.
//!
//! Users with pre-segmented audio (pyannote, manual cuts) can skip the
//! splitter entirely via
//! [`Transcriber::transcribe_chunks`](crate::gigaam::Transcriber::transcribe_chunks).
pub use AudioChunk;
/// Encoder-derived bounds passed to [`Splitter::split`].
///
/// Carries the model-config primitives (not derived seconds counts) so
/// splitters can reason in whichever unit fits them. Helpers
/// [`max_samples`](Self::max_samples), [`align_to_samples`](Self::align_to_samples),
/// and [`encoder_capacity_secs`](Self::encoder_capacity_secs) cover the common
/// derivations.
///
/// The `2 * subsampling_factor` headroom that
/// [`encoder_capacity_secs`](Self::encoder_capacity_secs) subtracts mirrors the
/// JIT prepare loop's `subs_output_length` margin — a chunk filling
/// `max_samples()` is guaranteed to fit through the subsampling stack without
/// padding overflow.
/// Chunking strategy: turn a waveform into encoder-bounded `AudioChunk`s.
///
/// `split` is called once per
/// [`Transcriber::transcribe`](crate::gigaam::Transcriber::transcribe) call.
/// Implementations may keep mutable state across calls (the Silero VAD JIT
/// carries LSTM state internally), hence `&mut self`. Chunks must satisfy
/// `end_sample <= waveform.len()` and `end_sample - start_sample <=
/// bounds.max_samples()`; alignment is a soft preference (floor-division on
/// `hop_length` tolerates misaligned boundaries).
///
/// Splitters that align chunk ends to encoder strides
/// ([`align_to_samples`](EncoderBounds::align_to_samples)) can round the
/// trailing chunk past `waveform.len()`. The contract still requires
/// in-range chunks — use [`trim_chunks_to_waveform`] to clean up the tail
/// before returning.
/// Trim `chunks` so every entry stays within `0..waveform_len`: drop chunks
/// starting at or past `waveform_len`, then clamp the trailing chunk's
/// `end_sample` down to `waveform_len`. Intended for [`Splitter`]
/// implementations whose stride alignment can push the last chunk past the
/// waveform end (see the trait docs).
///
/// Assumes `chunks` is in increasing `start_sample` order — the
/// `Splitter::split` contract — so a single pass from the back is enough.
/// No-VAD splitter: walks the waveform in `bounds.max_samples()`-sized
/// strides, aligning non-final chunks to `bounds.align_to_samples()`.
///
/// Zero model load. Suitable when the caller already segmented the input,
/// for short utterances that fit a single chunk, or for tests. Boundary
/// context degrades transcription quality at chunk seams — for production
/// long-form ASR prefer
/// [`SileroVadSplitter`](crate::silero_vad::SileroVadSplitter).
///
/// `align_to_samples` dividing `max_samples` is not guaranteed; the final
/// aligned chunk is the floor of `len_remaining / align_to_samples` times
/// `align_to_samples` (so its mel length stays an integer multiple of
/// `subsampling_factor`). The very last chunk keeps its unaligned tail —
/// the JIT pads it.