1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
//! # wavekat-turn
//!
//! Unified turn detection with multiple backends.
//!
//! Provides a clean abstraction over turn-detection models that predict
//! whether a user has finished speaking. Two trait families cover the
//! two fundamental input modalities:
//!
//! - [`AudioTurnDetector`] — operates on raw audio frames (e.g. Pipecat Smart Turn)
//! - [`TextTurnDetector`] — operates on ASR transcript text (e.g. LiveKit EOU)
//!
//! For most use cases, wrap a detector in [`TurnController`] to get
//! automatic state tracking and soft-reset logic for VAD integration.
//! See [`controller`] for details.
//!
//! # Feature flags
//!
//! | Feature | Backend | Input |
//! |---------|---------|-------|
//! | `pipecat` | Pipecat Smart Turn v3 (ONNX) | Audio (16 kHz) |
//! | `livekit` | LiveKit Turn Detector (ONNX) | Text |
pub
pub use TurnController;
pub use TurnError;
pub use AudioFrame;
/// The predicted turn state.
/// Per-stage timing entry.
/// A turn detection prediction with confidence and timing metadata.
/// A single turn in the conversation, for context-aware text detectors.
/// Speaker role in a conversation turn.
/// Turn detector that operates on raw audio.
///
/// Implementations buffer audio internally and run prediction on demand.
///
/// **Most users should wrap this in [`TurnController`]** rather than calling
/// these methods directly. The controller tracks prediction state and provides
/// [`reset_if_finished`](TurnController::reset_if_finished) for correct
/// multi-utterance handling.
///
/// # Direct usage (advanced)
///
/// If you need full control over reset logic:
///
/// 1. **Every audio chunk** → [`push_audio`](AudioTurnDetector::push_audio)
/// 2. **VAD fires "speech stopped"** → [`predict`](AudioTurnDetector::predict)
/// 3. **New turn begins** → [`reset`](AudioTurnDetector::reset)
///
/// Note: calling `reset` unconditionally on every VAD speech-start will discard
/// audio context when the user pauses mid-sentence. See [`TurnController`] for
/// the recommended approach.
/// Turn detector that operates on ASR transcript text.
///
/// Implementations receive the current (possibly partial) transcript
/// and optionally prior conversation turns for context.