Expand description
Expose USDT probe points from Rust programs.
§Overview
This crate provides methods for compiling definitions of DTrace probes into Rust code, allowing rich, low-overhead instrumentation of userland Rust programs.
DTrace probes are instrumented points in software, usually corresponding to some important
event such as opening a file, writing to standard output, acquiring a lock, and much more.
Probes are grouped into providers, collections of related probes covering distinct classes
functionality. The syscall provider, for example, includes probes for the entry and exit of
certain important system calls, such as write(2)
.
USDT probes may be defined in the D language or inline in Rust code. These definitions are used to create macros, which, when called, fire the corresponding DTrace probe. The two methods for defining probes are very similar – one key difference, besides the syntax used to describe them, is that inline probes support any Rust type that is JSON serializable. We’ll cover each in turn.
§Defining probes in D
Users define a provider, with one or more probe functions in the D language. For example:
provider my_provider {
probe start_work(uint8_t);
probe start_work(char*, uint8_t);
};
Providers and probes may be named in any way, as long as they form valid Rust identifiers. The
names are intended to help understand the behavior of a program, so they should be semantically
meaningful. Probes accept zero or more arguments, data that is associated with the probe event
itself (timestamps, file descriptors, filesystem paths, etc.). The arguments may be specified
as any of the exact bit-width integer types (e.g., int16_t
), pointers to
such integers, or strings (char *
s). See Data types for a full list of
supported types.
Assuming the above is in a file called "test.d"
, the probes may be compiled into Rust code
with:
usdt::dtrace_provider!("test.d");
This procedural macro will generate a Rust macro for each probe defined in the provider. Note
that for versions of rust prior to 1.66 features may be required; see the
notes for a discussion. The invocation of dtrace_provider
(and any required
feature directives) should be at the crate root, i.e., src/lib.rs
or src/main.rs
.
One may then call the start
probe via:
let x: u8 = 0;
my_provider::start_work!(|| x);
We can see that the macros are defined in a module named by the provider, with one macro for each probe, with the same name. See below for how this naming may be configured.
Note that start_work!
is called with a closure which returns the arguments, rather than the
actual arguments themselves. See below for details. Additionally, as the
probes are exposed as macros, they should be included in the crate root, before any other
module or item which references them.
After declaring probes and converting them into Rust code, they must be registered with the
DTrace kernel module. Developers should call the function register_probes
as soon as
possible in the execution of their program to ensure that probes are available. At this point,
the probes should be visible from the dtrace(1)
command-line tool, and can be enabled or
acted upon like any other probe. See registration for a discussion of probe
registration, especially in the context of library crates.
§Inline Rust probes
Writing probes in the D language is convenient and familiar to those who’ve previously used DTrace. There are a few drawbacks though. Maintaining another file may be annoying or error prone, but more importantly, it provides limited support for Rust’s rich type system. In particular, only those types with a clear C analog are currently supported. (See the full list.)
More complex, user-defined types can be supported if one defines the probes in Rust directly.
In particular, this crate supports any type implementing serde::Serialize
, by
serializing the type to JSON and using DTrace’s native JSON support. Providers
can be defined inline by attaching the provider
attribute macro to a module.
#[derive(serde::Serialize)]
pub struct Arg {
pub x: u8,
pub buffer: Vec<i32>,
}
// A module named `test` describes the provider, and each (empty) function definition in the
// module's body generates a probe macro.
#[usdt::provider]
mod test {
use crate::Arg;
fn start_work(x: u8) {}
fn stop_work(arg: &Arg) {}
}
The arg
parameter to the stop
probe will be converted into JSON, and its fields may be
accessed in DTrace with the json
function. The signature is json(string, key)
, where key
is used to access the named key of a JSON-encoded string. For example:
$ dtrace -n 'stop_work { printf("%s", json(copyinstr(arg0), "ok.buffer[0]")); }'
would print the first element of the vector Arg::buffer
.
Important: Notice that the JSON key used in the above example to access the data inside DTrace is
"ok.buffer[0]"
. JSON values serialized to DTrace are alwaysResult
types, because the internal serialization method is fallible. So they are always encoded as objects like{"ok": _}
or{"err": "some error message"}
. In the error case, the message is created by formatting theserde_json::error::Error
that describes why serialization failed.
Note: It’s not possible to define probes in D that accept a serializable type, because the corresponding C type is just
char *
. There’s currently no way to disambiguate such a type from an actual string, when generating the Rust probe macros.
See the probe_test_attr example for a complete example implementing probes in Rust.
§Configurable names
When using the attribute macro or build.rs versions of the code-generator, the names of the
provider and/or probes may be configured. Specifically, the probe_format
argument to the
attribute macro or Builder
method sets a format string used to generate the names of the
probe macros. This can be any string, and will have the keys {provider}
and {probe}
interpolated to the actual names of the provider and probe. As an example, consider a provider
named foo
with a probe named bar
, and a format string of probe_{provider}_{probe}
– the
name of the generated probe macro will be probe_foo_bar
.
In addition, when using the attribute macro version, the name of the provider as seen by DTrace can be configured. This defaults to the name of the provider module. For example, consider a module like this:
#[usdt::provider(provider = "foo")]
mod probes {
fn bar() {}
}
The probe bar
will appear in DTrace as foo:::bar
, and will be accessible in Rust via the
macro probes::bar!
. Note that it’s not possible to rename the probe module when using the
attribute macro version.
Conversely, one can change the name of the generated provider module when using the builder
version, but not the name of the provider as it appears to DTrace. Given a file "test.d"
that
names a provider foo
and a probe bar
, consider this code:
usdt::Builder::new("test.d")
.module("probes")
.build()
.unwrap();
This probe bar
will appear in DTrace as foo:::bar
, but will now be accessible in Rust via
the macro probes::bar!
. Note that it’s not possible to rename the provider as it appears in
DTrace when using the builder version.
§Double-underscores
It’s a DTrace convention to name probes with dashes between words, rather than underscores. So
the probe should be my-probe
rather than my_probe
. The former is not a valid Rust
identifier, but can be achieved by using two underscores in the probe name. This crate
internally translates __
into -
in such cases. For example, the provider:
#[usdt::provider("my__provider")]
mod probes {
fn my__probe() {};
}
will result in a provider and probe name of my__provider
and my-probe
.
Important: This translation of double-underscores to dashes only occurs
in the probe name. Provider names are not modified in any way. This
matches the behavior of existing DTrace implementations, and guarantees that
providers are similarly named regardless of the target platform.
§Examples
See the probe_test_macro, probe_test_build, and probe_test_attr crates in the github repo for detailed working examples showing how the probes may be defined, included, and used.
§Probe arguments
Note that the probe macro is called with a closure which returns the actual arguments. There are two reasons for this. First, it makes clear that the probe may not be evaluated if it is not enabled; the arguments should not include function calls which are relied upon for their side-effects, for example. Secondly, it is more efficient. As the lambda is only called if the probe is actually enabled, this allows passing arguments to the probe that are potentially expensive to construct. However, this cost will only be incurred if the probe is actually enabled.
§Data types
Probes support any of the integer types which have a specific bit-width, e.g., uint16_t
, as
well as strings, which should be specified as char *
. As described above,
any types implementing Serialize
may be used, if the probes are defined in Rust directly.
Below is the full list of supported types.
(u?)int(8|16|32|64)_t
- Pointers to the above integer types
char *
T: Clone + serde::Serialize
(Only when defining probes in Rust)
Currently, up to six (6) arguments are supported, though this limitation may be lifted in the future.
Note: Serializable types must implement the
Clone
trait. It’s important to note that this may almost always be derived, and, more importantly, that the data in probes will never actually be cloned, even when probes are enabled. The trait boundClone
is required to implement type-checking on the probe arguments, and is just an unfortunate leakiness to the abstraction provided by this crate.
§Registration
USDT probes must be registered with the DTrace kernel module. This is done via a call to the
register_probes
function, which must be called before any of the probes become available to
DTrace. Ideally, this would be done automatically; however, while there are methods by which
that could be achieved, they all pose significant concerns around safety, clarity, and/or
explicitness.
At this point, it is incumbent upon the application developer to ensure that
register_probes
is called appropriately. This will register all probes in an application,
including those defined in a library dependency. To avoid foisting an explicit dependency on
the usdt
crate on downstream applications, library writers should re-export the
register_probes
function with:
pub use usdt::register_probes;
The library should clearly document that it defines and uses USDT probes, and that this function should be called by an application. Alternatively, library developers may call this function during some initialization routines required by their library. There is no harm in calling this method multiple times, even in concurrent situations.
§Unique IDs
A common pattern in DTrace scripts is to use a two or more probes to understand a section or
span of code. For example, the syscall:::{entry,return}
probes can be used to time the
duration of system calls. Doing this with USDT probes requires a unique identifier, so that
multiple probes can be correlated with one another. The UniqueId
type can be used for this
purpose. It may be passed as any argument to a probe function, and is guaranteed to be unique
between different invocations of the same probe. See the type’s documentation for details.
§Features
Note: This section is only relevant prior to Rust 1.59 (or Rust 1.66 on macOS).
The USDT crate relies on inline assembly to hook into DTrace. Prior to Rust 1.59, this is
unstable, and requires explicitly opting in with #![feature(asm))]
.
On macOS (only) an additional feature (asm_sym
) is required prior to Rust 1.66 (but after Rust
1.58.0-nightly). The macOS implementation relies on native linker support; it uses the sym
syntax of the asm!
macro which was split into its own feature in Rust 1.58.0-nightly
(2021-10-29).
Unfortunately, because of the way the features were added (see this pull
request), this version of Rust nightly is a Rubicon: the usdt
crate, and
crates using it, cannot be built with compilers both before and after this version.
Specifically, it’s not possible to write the set of features that would allow code to be
compiled with a nightly toolchain before and after this version. If we include the
feature(asm_sym)
directive with a toolchain of 1.57 or earlier, the compiler will generate an
error because that feature isn’t known for those versions. If we omit the directive, it will
compile with previous toolchains, but a newer one will generate an error because the feature is
required for opting into the functionality used in the usdt
crate’s implementation on macOS.
There’s no great solution here. If you’re developing an application, i.e., something that you’re
sure can be built with a specific toolchain such as with a rust-toolchain
file, you can write
the correct feature attribute for that toolchain version.
If you’re building a library, things are more complicated, because you don’t know what toolchain
a consuming application will choose to use. It’s not possible to use a build.rs
file or other
code-generation mechanism, because inner attributes must generally be written directly at the
top of the crate’s root source file. A mechanism that expands to the right tokens is not
sufficient. The only real approach is to specify which versions of the toolchain are supported
by your library in the documentation, as we’ve done here.
§Selecting the no-op implementation
It’s possible to use the usdt
crate in libraries without transitively requiring a nightly
compiler of one’s users (prior to Rust 1.66). Though asm
is a default feature of the usdt
crate, users can opt to build with --no-default-features
, which uses a no-op implementation of
the internals. This generates the same probe macros, but with empty bodies, meaning the code can
be compiled unchanged.
Library developers may choose to re-export this feature, with a name such as probes
, which
implies the asm
feature of the usdt
crate. This feature-gating allows users to select a
nightly compiler in exchange for probes, but still allows the code to be compiled with a stable
toolchain.
Note that prior to Rust 1.66, the appropriate features are required anywhere the generated
macros are called, rather than where they’re defined. (Because they’re macros-by-example, and
expand to an actual asm!
macro call.) So library writers should probably gate the feature
directive on their own re-exported feature, e.g., #![cfg_attr(feature = "probes", feature(asm))]
, and instruct developers consuming their libraries to do the same.
It’s important to keep in mind how Cargo unifies features, however. Specifically, if usdt
is
a dependency of two other dependencies in a package, it’s possible to end up in a confusing
situation. Cargo takes the union of all features in such a case. Thus if one crate is built
expecting to use the no-op implementation and another is built using the real, asm
-based
implementation, the latter will be chosen. This can be confusing or downright dangerous. First,
the former crate will fail at compile time, because the asm!
macro will actually be emitted,
but the #![feature(asm)]
flag will not be included. More troubling, the probes will actually
exist in the resulting object file, even if the user specifically opted to not use them.
To handle this, library writers may place all references to usdt
-related code behind a
conditional compilation directive. This will ensure that the crate is not even used, rather
than it being used with an unexpected implementation. This is most relevant for crates whose
minimum supported Rust version is earlier than 1.66.
Macros§
- Generate DTrace probe macros from a provider definition file.
Structs§
- A simple struct used to build DTrace probes into Rust code in a build.rs script.
- A unique identifier that can be used to correlate multiple USDT probes together.
Enums§
- Errors related to building DTrace probes into Rust code
Functions§
- Extract embedded USDT probe records from a file.
- Register an application’s probes with DTrace.
Attribute Macros§
- Generate a provider from functions defined in a Rust module.