cpu_timer/
lib.rs

1//a Documentation
2//! This library provides architecture/implementation specific CPU
3//! counters for high precision timing, backed up by a std::time
4//! implementation where an architecture has no explicit CPU support
5//!
6//! The timers are really CPU tick counters, and so are not resilient
7//! to threads being descheduled or being moved between CPU cores; the
8//! library is designed for precise timing of short code sections
9//! where the constraints are understood. Furthermore, the timer
10//! values are thus not in seconds but in other arbitrary units -
11//! useful for comparing execution of different parts of code, but
12//! requiring another mechanism to determine the mapping from ticks to
13//! seconds
14//!
15//! # Precision
16//!
17//! For some architectures a real CPU ASM instruction is used to get
18//! the tick count. For x86_64 this returns (in an unvirtualized
19//! world) the real CPU tick counter, with a fine precision. For
20//! Aarch64 on MacOs this is no better than using std::time, and has a
21//! precision of about 40 ticks. However, the asm implementation has a
22//! lower overhead on Aarch64 on MacOs, so it is still worth using.
23//!
24//! The library does not attempt to take into account any overheads of
25//! using the timers; that is for the user. Normally the overheads
26//! will be small compared to the times being measured.
27//!
28//! # CPU support (for non-experimental Rustc target architectures)
29//!
30//! For the stable Rustc-supported architectures, CPU implementations
31//! are provided for:
32//!
33//! - [ ] x86    
34//! - [x] x86_64
35//! - [x] aarch64
36//! - [ ] wasm32
37//!
38//! Nonsupported architectures resort to the [std::time::Instant]
39//! 'now' method instead (which can be perfectly adequate)
40//!
41//! # Types
42//!
43//! The types in the library are all generic on *UseAsm* whether the CPU
44//! architecture specific version (if provided) of the timer should be
45//! used, or if std::time should be used instead. For architectures
46//! without a CPU implementation, the std::time version is used
47//! whatever the value of the generic.
48//!
49//! ## Timer
50//!
51//! The base type provided by this library is [Timer], which simply
52//! has a `start` method and an `elapsed` method, to delver the ticks
53//! (as a u64) since the last `state. It uses a generic *UseAsm* bool;
54//! if true then the CPU specific timer implementation is used,
55//! otherwise it uses std::time.
56//!
57//! There is an additional method `elapsed_and_update`, which restarts
58//! the timer as well as returning the elapsed time, in a single
59//! operation.
60//!
61//! ## DeltaTimer
62//!
63//! The [DeltaTimer] allows for *recording* the delta in CPU ticks
64//! between the entry to a region of code and the exit from it. It
65//! uses a generic *UseAsm* bool.
66//!
67//! ```
68//! # use cpu_timer::DeltaTimer;
69//! let mut t = DeltaTimer::<true>::default();
70//! t.start();
71//! // do something! - timed using CPU ticks
72//! t.stop();
73//! println!("That took {} cpu 'ticks'", t.value());
74//!
75//! let mut t = DeltaTimer::<false>::default();
76//! t.start();
77//! // do something! - timed using std::time
78//! t.stop();
79//! println!("That took {} nanoseconds", t.value());
80//! ```
81//!
82//! ## AccTimer
83//!
84//! Frequently one will want to repeatedly time a piece of code, to
85//! attain an average, or to just accumulate the time taken in some
86//! code whenever it is called to determine if it is a 'hotspot'. The
87//! [AccTimer] accumulates the time delta between start and stop.
88//!
89//! ```
90//! # use cpu_timer::AccTimer;
91//! let mut t = AccTimer::<true>::default();
92//! for i in 0..100 {
93//!     t.start();
94//!     // do something!
95//!     t.stop();
96//!     println!("Iteration {i} took {} ticks", t.last_delta());
97//! }
98//! println!("That took an average of {} ticks", t.acc_value()/100);
99//! ```
100//!
101//! ## AccArray
102//!
103//! An [AccArray] is used to accumulate timer values, storing not just
104//! the times but also (optionally) the number of occurrences.
105//!
106//! It is used as `AccVec<A, T, C, N>`; A is a bool; T the time accumulator type; C the counter type; N the number of accumulators.
107//!
108//!  * A is true if the CPU-specific timer should be used, false if
109//!    std::time should be used
110//!
111//!  * T is the type used for accumulating time deltas (u8, u16, u32,
112//!    u64, u128, usize, f32, f64, or () to not accumulate times)
113//!
114//!  * C is the type used for counting occurrences (u8, u16, u32,
115//!     u64, u128, usize, f32, f64, or () to not count occurrences)
116//!
117//!  * N can be any usize; the space for the occurrence accumulators
118//!    and counters is statically held within the type, so *N* effects
119//!    the size of the AccArray
120//!
121//! The array can be cleared - clearing the accumulators.
122//!
123//! A use is to first invoke `start` and then later `acc_n` with a
124//! specific index which identifies the code just executed; the time
125//! elapsed since the last start is accumulated and the occurrences
126//! counted.
127//!
128//! ## AccVec
129//!
130//! An [AccVec] is a less static version of [AccArray], using an array
131//! backed by a `Vec`. It has the same methods, and additional `push`
132//! related methods.
133//!
134//! ## Trace
135//!
136//! The [Trace] type supports tracing the execution path through some
137//! logic, getting deltas along the way
138//!
139//! ```
140//! # use cpu_timer::Trace;
141//! let mut t = Trace::<true, u32, 3>::default();
142//! t.start();
143//!   // do something!
144//! t.next();
145//!   // do something else!
146//! t.next();
147//!   // do something else!
148//! t.next();
149//! println!("The three steps took {:?} ticks", t.trace());
150//! ```
151//!
152//! The trace will have three entries, which are the delta times for
153//! the three operations.
154//!
155//! ## AccTrace
156//!
157//! The [AccTrace] accumulates a number of iterations of a Trace;
158//!
159//! ```
160//! # use cpu_timer::AccTrace;
161//! struct MyThing {
162//!     // things ...
163//!     /// For timing (perhaps only if #[cfg(debug_assertions)] )
164//!     acc: AccTrace::<true, u32,4>,
165//! }
166//!
167//! impl MyThing {
168//!     fn do_something_complex(&mut self) {
169//!         self.acc.start();
170//!         // .. do first complex thing
171//!         self.acc.next();
172//!         // .. do second complex thing
173//!         self.acc.next();
174//!         // .. do third complex thing
175//!         self.acc.next();
176//!         // .. do fourth complex thing
177//!         self.acc.next();
178//!         self.acc.acc();
179//!     }
180//! }
181//!
182//! let mut t = MyThing { // ..
183//!     acc: AccTrace::<true, u32, 4>::default()
184//! };
185//! for _ in 0..100 {
186//!     t.do_something_complex();
187//! }
188//! println!("After 100 iterations the accumulated times for the four steps is {:?} ticks", t.acc.acc_trace());
189//! t.acc.clear();
190//! // ready to be complex all again
191//! ```
192//!
193//! The trace will have four entries, which are the accumulated delta times for
194//! the four complex things.
195//!
196//! # OS-specific notes
197//!
198//! These outputs are generated from tests/cpu_timer.rs, test_timer_values
199//!
200//! The tables will have a rough granularity of the precision of the
201//! tick counter. Average time taken is calculated using the fastest
202//! 95% of 10,000 calls, as beyond that the outliers should be ignored.
203//!
204//! ## MacOs aarch64 (MacBook Pro M4 Max Os15.1 rustc 1.84
205//!
206//! The granularity of the clock appears to be 41 or 42 ticks, and the
207//! asm implementation seems to match the std time implementation for this precision.
208//!
209//! For asm, the average time taken for a call is 3 ticks in release, 9 ticks in debug
210//!
211//! For std::time, the average time taken for a call is 8 ticks in
212//! release, 17 ticks in debug. So clearly there is an overhead for
213//! using std::time
214//!
215//! | %age | arch release |   arch debug | std debug    | std release  |
216//! |------|--------------|--------------|--------------|--------------|
217//! | 10   |      0       |       0      |       41     |         0    |
218//! | 25   |      0       |       0      |       42     |         0    |
219//! | 50   |      0       |       0      |       42     |         0    |
220//! | 75   |      0       |      41      |       83     |        41    |
221//! | 90   |     42       |      41      |       83     |        41    |
222//! | 95   |     42       |      41      |       83     |        41    |
223//! | 99   |     42       |      42      |       84     |        42    |
224//! | 100  |  27084       |    2498      |     2166     |      1125    |
225//!
226// ### MacOs aarch64 std::time release
227//
228// Percentile distribution
229// 56, 0
230// 71, 41
231// 99, 42
232// 100, 1125
233//
234// average of up to 95 8
235//
236// ### MacOs aarch64 std::time debug
237//
238// Percentile distribution
239// 6, 41
240// 18, 42
241// 71, 83
242// 98, 84
243// 99, 125
244// 100, 2166
245//
246// average of up to 95 17
247//
248// ### MacOs aarch64 debug
249//
250// Percentile distribution
251// 52, 0
252// 68, 41
253// 99, 42
254// 100, 2958
255//
256// average of up to 95 9
257//
258// ### MacOs aarch64 release
259//
260// Percentile distribution
261// 77, 0
262// 85, 41
263// 99, 42
264// 100, 1500
265//
266// average of up to 95 3
267//
268//! ## MacOs x86_64
269//!
270//! MacBook Pro 2018 Os 15.0 rustc 1.84 2.2GHz i7
271//!
272//! The granularity of the clock appears to be 2 ticks, and the
273//! asm implementation is better than using the std::time implementation
274//!
275//! The average time taken for a call is 15 ticks in release, 78 (but
276//! sometimes 66!) ticks in debug
277//!
278//! | %age | arch release |   arch debug | std debug    | std release  |
279//! |------|--------------|--------------|--------------|--------------|
280//! | 10   |     12       |      62      |       72     |        38    |
281//! | 25   |     12       |      64      |       74     |        38    |
282//! | 50   |     12       |      64      |       79     |        39    |
283//! | 75   |     14       |      66      |       81     |        39    |
284//! | 90   |     14       |      68      |       83     |        39    |
285//! | 95   |     14       |      70      |       83     |        40    |
286//! | 99   |     16       |      82      |      132     |        41    |
287//! | 100  |  42918       |   65262      |    17101     |     24560    |
288//!
289//!
290// ### MacOs x86_64 release
291//
292// Percentile distribution
293// 5, 12
294// 73, 14
295// 99, 16
296// 100, 42918
297//
298// average of up to 95 15
299//
300// ### MacOs x86_64 debug
301//
302// Percentile distribution
303// 4, 62
304// 22, 64
305// 55, 66
306// 81, 68
307// 92, 70
308// 96, 72
309// 98, 74
310// 99, 82
311// 100, 65262
312//
313// average of up to 95 78
314//
315// ### MacOs std::time debug
316//
317// Percentile distribution
318// 1, 70
319// 4, 71
320// 9, 72
321// 15, 73
322// 22, 74
323// 28, 75
324// 34, 76
325// 40, 77
326// 45, 78
327// 50, 79
328// 56, 80
329// 66, 81
330// 79, 82
331// 90, 83
332// 96, 84
333// 98, 85
334// 99, 132
335// 100, 17101
336//
337// ### MacOs std::time release
338//
339// Percentile distribution
340// 3, 37
341// 44, 38
342// 92, 39
343// 96, 40
344// 99, 41
345// 100, 24560
346
347//a Imports
348mod delta;
349mod traits;
350
351mod acc_vec;
352mod arch;
353mod base;
354mod timers;
355mod trace;
356
357//a Export to the crate, but not outside
358pub(crate) use base::BaseTimer;
359pub(crate) use delta::Delta;
360pub(crate) use traits::private;
361
362//a Export to outside
363pub use acc_vec::{AccArray, AccVec};
364pub use arch::TDesc;
365pub use timers::{AccTimer, DeltaTimer, Timer};
366pub use trace::{AccTrace, Trace};
367pub use traits::{TArch, TraceCount, TraceValue};