1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
/* SPDX-License-Identifier: GPL-2.0 */
/* sched_ext exit-kind values mirrored from kernel/sched/ext_internal.h
* enum scx_exit_kind. The error-class kinds (>= SCX_EXIT_ERROR) are the
* values the probe filters on; mirrored here so probe.bpf.c can use
* named constants instead of magic numbers and so userspace can match
* the same wire values when consuming the .bss latch and ringbuf
* events. Values must stay in sync with the kernel enum.
*/
/* Per-probe-hit captured data, stored in hash map keyed by (func_ip, task_ptr).
* Entry fields are written by fentry/kprobe at function entry.
* Exit fields are written in-place by fexit at function exit
* via bpf_map_lookup_elem on the same key. */
;
/* Field dereference spec: for a pointer param, read at base + offset.
* For chained pointer dereferences (e.g. ->cpus_ptr->bits[0]):
* ptr_offset != 0: first read a pointer at base + ptr_offset,
* then read size bytes at pointer + offset.
* ptr_offset == 0: single-level read at base + offset.
*/
;
/* Per-function metadata written by userspace before attachment. */
;
/* Event type for ring buffer. EVENT_TRIGGER is currently the only
* record type emitted on the `ktstr_events` ringbuf. EVENT_SCX_EVENT
* (value 3) was previously emitted by `tp_btf/sched_ext_event` but
* fired millions of times per second on a busy scheduler while the
* userspace consumer dropped every record on the floor — the
* handler and its enum value were removed. */
;
/* Timeline event types written into the dedicated `timeline_events`
* ringbuf by the sched_switch / sched_migrate_task / sched_wakeup
* tracepoint handlers. Drained only on test failure to give zero
* runtime cost on the success path (host side never wakes the
* consumer until the error latch fires). When the ringbuf fills
* before drain, `bpf_ringbuf_reserve` returns NULL and the BPF
* handler bumps the `KTSTR_PCPU_TIMELINE_DROPS` per-CPU slot in
* `ktstr_pcpu_counters` instead of submitting — dropping the
* newest event and surfacing the loss to userspace via the
* cross-CPU sum. */
/* Priority-inheritance boost / unboost event from
* `fentry/fexit` on `rt_mutex_setprio` (kernel/sched/core.c).
* Fires sparsely — only when a real-time mutex chain changes a
* task's effective priority. The probe pairs the entry-side
* snapshot (oldprio + prev_class kva) with the exit-side
* snapshot (newprio + next_class kva) via a per-task scratch
* map keyed by `p` (the task being boosted), then emits one
* timeline record carrying the prio pair.
*
* Field semantics by `type`:
* TL_EVT_PI_BOOST:
* prev_pid = `current->pid` (probe-context tid)
* next_pid = `p->pid` (the boosted task's pid)
* a = `oldprio` (s32 widened to u64; kernel's
* `int` priority on `task_struct.prio`)
* b = `newprio` (s32 widened to u64)
*
* Class transitions are tracked separately — the
* pi_class_changes counter (`KTSTR_PCPU_PI_CLASS_CHANGE_COUNT`
* slot in `ktstr_pcpu_counters`) increments whenever fexit
* observes `next_class != prev_class`, surfacing the class-flip
* count without bloating the per-event wire shape. A future
* expansion can emit a dedicated TL_EVT_CLASS_TRANSITION record
* once the host-side renderer needs per-event class tracking.
*/
/* Lock contention begin event from `lock:contention_begin`
* tracepoint (always available, no `CONFIG_LOCK_STAT` gate —
* kernel/locking/lockdep.c emits it unconditionally on any
* waiter-side contention path). Fires whenever a task waits on
* a contended mutex / rwsem / spinlock; gives the host-side
* timeline a per-lock contention sequence to correlate with
* scheduling stalls.
*
* Field semantics by `type`:
* TL_EVT_LOCK_CONTEND:
* prev_pid = `current->pid`
* next_pid = 0 (unused)
* a = lock pointer (kernel virtual address)
* b = lock flags (u32 from the tracepoint;
* LCB_* class bits — F_SPIN, F_READ,
* F_WRITE, F_RT — see
* include/trace/events/lock.h).
*/
/* Ring-buffer record written by the sched_* tracepoint handlers
* into the dedicated `timeline_events` ringbuf. Compact (40 bytes)
* so the fixed-size ring holds a useful window of events.
*
* Field semantics by `type`:
* TL_EVT_SWITCH:
* prev_pid = `prev->pid`
* next_pid = `next->pid`
* a = `prev_state` (raw `__state` bitfield)
* b = `preempt` (0/1)
* TL_EVT_MIGRATE:
* prev_pid = `p->pid`
* next_pid = 0 (unused)
* a = `dest_cpu`
* b = `task_cpu(p)` (orig_cpu, BTF-read)
* TL_EVT_WAKEUP:
* prev_pid = `p->pid`
* next_pid = 0 (unused)
* a = `task_cpu(p)` (target CPU at wakeup)
* b = 0 (unused)
*
* `cpu` is the host CPU the tracepoint fired on (`bpf_get_smp_processor_id()`).
*
* Note on type: the kernel tp_btf signature for sched_switch declares
* `prev_state` as `unsigned int`; we widen to `u64` here uniformly so
* every variant uses the same `a`/`b` slots regardless of source
* arity. `dest_cpu` / `task_cpu` are `int` in the kernel but always
* fit in u64. */
;
/* Ring buffer event sent from BPF to userspace on trigger.
*
* For EVENT_TRIGGER:
* args[0] = causal task pointer when the kind is unambiguously
* caused by the currently-running task
* (`SCX_EXIT_ERROR_BPF`), else `0`. Userspace drops
* events with `args[0] == 0` to suppress noise from
* non-causal exit contexts (e.g. kworker-driven
* `SCX_EXIT_ERROR`).
* args[1] = exit kind (scx_exit_kind enum value).
*
* `str_val`/`has_str`/`str_param_idx` are kept in the wire layout
* for ABI stability with `struct probe_entry` (the kprobe-side hash
* map uses an identically-named trio); EVENT_TRIGGER leaves all
* three zeroed. The dedicated EVENT_SCX_EVENT producer that
* populated them was removed (see the enum doc above).
*/
;
/* __KTSTR_INTF_H */