Expand description
§latency_trace
This library supports latency measurement for functions and code blocks, both sync and async.
Given code instrumented with the Rust tracing library, this library uses histograms from the hdrhistogram library to capture span latency timings.
Latencies are collected in microseconds for all spans with level trace or higher.
§Design goals
This framework should:
- Be easy to use. It should only require a handful of lines of code to get latency metrics for code instrumented with the Rust
tracinglibrary. - Be self-contained, i.e., should not depend on the use of external tools like OpenTelemetry collectors, Jaeger, Grafana, etc.
- Support both sync and async code.
- Have low overhead, i.e., the latency associated with the collection of latency information should be low.
§Core concepts
This library collects latency information for spans. Span definitions are created in the code using macros and functions from the Rust tracing library which define span callsites, i.e., the places in the code where spans are defined. As the code is executed, a span definition in the code may be executed multiple times – each such execution is a span instance. Span instances arising from the same span definition are grouped into SpanGroups for latency information collection. Latencies are collected using Histograms from the hdrhistogram library.
The grouping of spans for latency collection is not exactly based on the span definitions in the code. Spans at runtime are structured as a set of span trees that correspond to the nesting of spans from code execution paths. The grouping of runtime spans for latency collection should respect the runtime parent-child relationships among spans.
Thus, SpanGroups form a forest of trees where some pairs of span groups have a parent-child relationship, corresponding to the parent-child relationships of the spans associated with the span groups. This means that if SpanGroup A is the parent of SpanGroup B then, for each span that was assigned to group B, its parent span was assigned to group A.
The coarsest-grained grouping of spans is characterized by a callsite path – a callsite and the (possibly empty) list of its ancestor callsites based on the different runtime execution paths (see span relationships). This is the default SpanGroup definition. Finer-grained groupings of spans can differentiate groups of spans with the same callsite path by taking into account values computed at runtime from the spans’ runtime Attributes.
While the granularity of latency information collection cannot be finer than a SpanGroup, the collected latency information can be subsequently aggregated further by grouping SpanGroups as needed (see Timings::aggregate.)
§Key design choices
This framework uses hdrhistogram::Histogram to collect latency information as it provides an efficient data structure for high-fidelity data collection across wide latency value ranges.
Two other design choices support the low overhead goal.
- The
tracinglibrary’sRegistryis used to store temporary timing data at runtime. As noted in the documentation, “This registry is implemented using a lock-free sharded slab, and is highly optimized for concurrent access.” - Runtime data collection takes place independently on each thread, overwhelmingly without the need for inter-thread coordination. The only inter-thread coordination involves two mutex lock request per thread (at the beginning of data collection for the thread and at the end) for the entire duration of the measurement, regardless of the number of spans executed. After the test execution has completed, information is post-processed for presentation purposes, with zero impact on the latency measurements. The
thread_local_collectlibrary is used to support this design approach.
§Usage modes
This framework is used to measure latencies for a sync or async function f that takes no arguments and contains code instrumented with the tracing framework. Any code to be measured can be wrapped by such a function.
The core LatencyTrace struct can be used directly as a Subscriber or indirectly as a Layer. The former usage is most straightforward, while the latter allows LatencyTrace to be combined with other tracing_subscriber::Layers to form a layered subscriber in more elaborate tracing scenarios. Using LatencyTrace as a tracing_subscriber::Layer also allows filtering to be added, regardless of whether it is combined with other layers.
The following modes of latency information reporting are supported:
- Direct – Executes the instrumented function
fand, afterfterminates, returns with the reporting of latencies. - Probed – Executes the instrumented function
fand immediately returns an object that can be used to report on partial latency information whilefis still running. The returned object also provides a method that waits untilfterminates and reports the final latency information.
§Async runtimes
This framework supports tokio out-of-the-box (see LatencyTrace::measure_latencies_tokio and LatencyTrace::measure_latencies_probed_tokio) but other async runtimes can be used as well by simply wrapping the async code with the chosen async runtime and using one of the sync methods (LatencyTrace::measure_latencies or LatencyTrace::measure_latencies_probed). The source code for the above-mentioned tokio variants shows exactly how to do it.
§Rust version requirements
This version of this library can be compiled with rustc 1.79.0 or higher. It may work with earlier rustc versions but that is not guaranteed.
§Default cargo feature
To include this library as a dependency without optional features in your Cargo.toml:
[dependencies]
latency_trace = "0"§Optional cargo features
The optional feature flag “tokio” enables the tokio-related methods of LatencyTrace.
[dependencies]
latency_trace = { version = "0", features = ["tokio"] }To run the doc_async_* examples from the latency_trace crate (see source repo), specify --features tokio or --all-features when invoking cargo run. For the example, to run doc_async_probed.rs, do as follows:
cargo run --features tokio --example doc_async_probedor
cargo run --all-features --example doc_async_probed§Usage examples
§Simple sync Subscriber example
use latency_trace::{summary_stats, LatencyTrace};
use std::{thread, time::Duration};
use tracing::{info, instrument, trace, trace_span};
#[instrument(level = "info")]
fn f() {
trace!("in f");
for _ in 0..10 {
trace_span!("loop_body").in_scope(|| {
info!("in loop body");
// Simulated work
thread::sleep(Duration::from_micros(1200));
g();
});
}
}
#[instrument(level = "info")]
fn g() {
trace!("in g");
// Simulated work
thread::sleep(Duration::from_micros(800));
}
fn main() {
let latencies = LatencyTrace::activated_default()
.unwrap()
.measure_latencies(f);
println!("\nLatency stats below are in microseconds");
for (span_group, stats) in latencies.map_values(summary_stats) {
println!(" * {:?}, {:?}", span_group, stats);
}
// A shorter way to print the summary stats, with uglier formatting.
println!("\nDebug print of `latencies.map_values(summary_stats)`:");
println!("{:?}", latencies.map_values(summary_stats));
}§Simple async Subscriber example
//! Requires feature flag "tokio".
use latency_trace::{summary_stats, LatencyTrace};
use std::time::Duration;
use tracing::{instrument, trace_span, Instrument};
#[instrument(level = "trace")]
async fn f() {
for _ in 0..1000 {
async {
// Simulated work
tokio::time::sleep(Duration::from_micros(1200)).await;
g().await;
}
.instrument(trace_span!("loop_body"))
.await
}
}
#[instrument(level = "trace")]
async fn g() {
// Simulated work
tokio::time::sleep(Duration::from_micros(800)).await;
}
fn main() {
let latencies = LatencyTrace::activated_default()
.unwrap()
.measure_latencies_tokio(f);
println!("\nLatency stats below are in microseconds");
for (span_group, stats) in latencies.map_values(summary_stats) {
println!(" * {:?}, {:?}", span_group, stats);
}
// A shorter way to print the summary stats, with uglier formatting.
println!("\nDebug print of `latencies.map_values(summary_stats)`:");
println!("{:?}", latencies.map_values(summary_stats));
}§Simple sync probed Subscriber example
use latency_trace::{summary_stats, LatencyTrace};
use std::{thread, time::Duration};
use tracing::{info, instrument, trace, trace_span};
#[instrument(level = "info")]
fn f() {
trace!("in f");
for _ in 0..10 {
trace_span!("loop_body").in_scope(|| {
info!("in loop body");
// Simulated work
thread::sleep(Duration::from_micros(1200));
g();
});
}
}
#[instrument(level = "info")]
fn g() {
trace!("in g");
// Simulated work
thread::sleep(Duration::from_micros(800));
}
fn main() {
let probed = LatencyTrace::activated_default()
.unwrap()
.measure_latencies_probed(f)
.unwrap();
// Let the function run for some time before probing latencies.
thread::sleep(Duration::from_micros(16000));
let latencies1 = probed.probe_latencies();
let latencies2 = probed.wait_and_report();
println!("\nlatencies1 in microseconds");
for (span_group, stats) in latencies1.map_values(summary_stats) {
println!(" * {:?}, {:?}", span_group, stats);
}
println!("\nlatencies2 in microseconds");
for (span_group, stats) in latencies2.map_values(summary_stats) {
println!(" * {:?}, {:?}", span_group, stats);
}
}§Simple async probed Subscriber example
//! Requires feature flag "tokio".
use latency_trace::{summary_stats, LatencyTrace};
use std::{thread, time::Duration};
use tracing::{instrument, trace_span, Instrument};
#[instrument(level = "trace")]
async fn f() {
for _ in 0..1000 {
async {
// Simulated work
tokio::time::sleep(Duration::from_micros(1200)).await;
g().await;
}
.instrument(trace_span!("loop_body"))
.await
}
}
#[instrument(level = "trace")]
async fn g() {
// Simulated work
tokio::time::sleep(Duration::from_micros(800)).await;
}
fn main() {
let probed = LatencyTrace::activated_default()
.unwrap()
.measure_latencies_probed_tokio(f)
.unwrap();
// Let the function run for some time before probing latencies.
thread::sleep(Duration::from_micros(48000));
let latencies1 = probed.probe_latencies();
let latencies2 = probed.wait_and_report();
println!("\nlatencies1 in microseconds");
for (span_group, stats) in latencies1.map_values(summary_stats) {
println!(" * {:?}, {:?}", span_group, stats);
}
println!("\nlatencies2 in microseconds");
for (span_group, stats) in latencies2.map_values(summary_stats) {
println!(" * {:?}, {:?}", span_group, stats);
}
}§Simple sync Layer example
//! Demonstrates how to use [`LatencyTrace`] as a [`Layer`](tracing_subscriber::layer::Layer)
//! that can be composed with other layers to make a layered [`Subscriber`](tracing::Subscriber).
//!
//! Compare with examples `doc_sync` and `doc_sync_fmt` in the GitHub repo.
use latency_trace::{summary_stats, LatencyTrace};
use std::{thread, time::Duration};
use tracing::{info, instrument, level_filters::LevelFilter, trace, trace_span};
use tracing_subscriber::{fmt::format::FmtSpan, prelude::*, Registry};
#[instrument(level = "info")]
fn f() {
trace!("in f");
for _ in 0..10 {
trace_span!("loop_body").in_scope(|| {
info!("in loop body");
// Simulated work
thread::sleep(Duration::from_micros(1200));
g();
});
}
}
#[instrument(level = "info")]
fn g() {
trace!("in g");
// Simulated work
thread::sleep(Duration::from_micros(800));
}
fn main() {
// LatencyTrace instance from which latency statistics will be extracted later.
let lt = LatencyTrace::default();
// Clone of the above that will be used as a `tracing_subscriber::layer::Layer` that can be
// composed with other tracing layers. Add a filter so that only spans with level `INFO` or
// higher priority (lower level) are aggregated.
let ltl = lt.clone().with_filter(LevelFilter::INFO);
// `tracing_subscriber::fmt::Layer` that can be composed with the above `LatencyTrace` layer.
// Spans with level `TRACE` or higher priority (lower level) are displayed.
let tfmt = tracing_subscriber::fmt::layer()
.with_span_events(FmtSpan::FULL)
.with_filter(LevelFilter::TRACE);
// Instantiate a layered subscriber and set it as the global default.
let layered = Registry::default().with(ltl).with(tfmt);
layered.init();
// Measure latencies.
let latencies = lt.measure_latencies(f);
println!("\nLatency stats below are in microseconds");
for (span_group, stats) in latencies.map_values(summary_stats) {
println!(" * {:?}, {:?}", span_group, stats);
}
// A shorter way to print the summary stats, with uglier formatting.
println!("\nDebug print of `latencies.map_values(summary_stats)`:");
println!("{:?}", latencies.map_values(summary_stats));
}§Other Layer examples
The above Subscriber examples can be easily modified based on the above Layer example to create analogous layered examples.
See also test_layer_sync.rs and test_layer_async.rs in the GitHub repo under the dev_support/tests directory.
§Benchmarks
Execution on my laptop of the benchmarks defined in the source repo indicates that there can be significant variability in latency tracing overhead measurements. There can be so much variability across runs that sometimes nonsensical negative overheads were observed (using both the Divan and Criterion benchmarking frameworks), i.e., the median total latency of an instrumented function was less than that of the uninstrumented version of the same function.
To more reliably compare latency differences between two functions, given the above-mentioned variability, I created a utility that separates positive and negative differences so that nonsensical results can be easily excluded. See bench_diff.rs.
Running the benchark bench_overhead_simple_real_sync with a target function (simple_real_sync) that does heavy in-memory computation, the overhead per span varied from less than 1 µs up to 3 µs, depending on the intensity of the computation. The high end of the aforementioned overhead range was with computations that took 800 µs to 1,200 µs per span, in a multi-threaded scenario with 6 threads and 100 iterations (instantiating 2 spans each) per thread. Thus, the overhead observed on these benchmarks was less than 3/800 (< 1%) on a per-span latency basis.
Where the target function had a thread::sleep() in lieu of real computation, higher overheads were observed, especially as the sleep time increased. It appears that thread context switches have a significant adverse impact on overheads, possibly due to the invalidation of processor caches.
Structs§
- Latency
Trace - Core type supporting latency mesurements.
- Latency
Trace Cfg - Configuration information for
LatencyTrace. It is instantiated with itsLatencyTraceCfg::defaultmethod and can be customized with its other methods. - Probed
Trace - Represents an ongoing collection of latency information with the ability to report on partial latencies before the instrumented function completes.
- Span
Group - Represents a set of tracing::Spans for which latency information should be collected as a group. It is the unit of latency information collection.
- Summary
Stats - Common summary statistics useful in latency testing/benchmarking.
- Wrapper
- Generic wrapper to facilitate the addition of new methods to the wrapped type.
Enums§
- Activation
Error - Error returned by
LatencyTraceactivation methods.
Functions§
- default_
span_ grouper - Default span grouper. Used to group spans by callsite and ancestors, ignoring any span attributes.
- group_
by_ all_ fields - Custom span grouper used to group spans by callsite, ancestors, and all span fields and their values.
- group_
by_ given_ fields - Custom span grouper used to group spans by callsite, ancestors, and a given list of span field names. Spans with the same callsites, ancestors, and values for the specified fields are grouped together.
- summary_
stats - Computes a
SummaryStatsfrom aTiming.