metrique is a crate to emit unit-of-work metrics
Unlike many popular metric frameworks that are based on the concept of your application having a fixed-ish set of counters and gauges, which are periodically updated to a central place, metrique is based on the concept of structured metric records. Your application emits a series of metric records - that are essentially structured log entries - to an observability service such as Amazon CloudWatch, and the observability service allows you to view and alarm on complex aggregations of the metrics.
The log entries being structured means that you can easily use problem-specific aggregations to track down the cause of issues, rather than only observing the symptoms.
Getting Started (Applications)
Most metrics your application records will be "unit of work" metrics. In a classic HTTP server, these are typically tied to the request/response scope.
You declare a struct that represents the metrics you plan to capture over the course of the request and annotate it with #[metrics]. That makes it possible to write it to a Sink. Rather than writing to the sink directly, you typically use append_on_drop(sink) to obtain a guard that will automatically write to the sink when dropped.
The simplest way to emit the entry is by emitting it to a global entry sink, defined by using the [metrique_writer::sink::global_entry_sink] macro. That will create a global rendezvous point - you can attach a destination by using attach or attach_to_stream, and then write to it by using the sink method (you must attach a destination before calling sink, otherwise you will encounter a panic!).
The example below will write the metrics to an [tracing_appender::rolling::RollingFileAppender]
in EMF format.
use PathBuf;
use metrics;
use ;
use Millisecond;
use ServiceMetrics;
use GlobalEntrySink;
use ;
use Emf;
use ;
// define operation as an enum (you can also define operation as a &'static str)
// define our metrics struct
async
async
That code will create a single metric line (your timestamp and OperationTime may vary).
Getting Started (Libraries)
Library operations should normally return a struct implementing CloseEntry that contains the metrics for their operation. Generally, the best way of getting that is by just using the #[metrics] macro:
use Instrumented;
use Timer;
use Millisecond;
use metrics;
use io;
async
Note that we do not use rename_all - the application should be able to choose the naming style.
Read docs/usage_in_libraries.md for more details
Common Patterns
For more complex examples, see the examples folder.
Timing Events
metrique provides several timing primitives to simplify measuring time. They are all mockable via
[metrique_timesource]:
-
Timer/Stopwatch: Reports aDurationusing theInstanttime-source. It can either be aTimer(in which case it starts as soon as it is created), or aStopwatch(in which case you must start it manually). In all cases, if you don't stop it manually, it will drop when the record containing it is closed. -
Timestamp: records a timestamp using theSystemTimetime-source. When used with#[metrics(timestamp)], it will be written as the canonical timestamp field for whatever format is in use. Otherwise, it will report its value as a string property containing the duration since the Unix Epoch.You can control the formatting of a
Timestamp(that is not used as a#[metrics(timestamp)]- the formatting of the canonical timestamp is controlled solely by the formatter) by setting#[metrics(format = ...)]to one ofEpochSeconds,EpochMillis(the default), orEpochMicros. -
TimestampOnClose: records the timestamp when the record is closed.
Usage example:
use ;
use Millisecond;
use EpochSeconds;
use metrics;
use Duration;
Returning Metrics from Subcomponents
#[metrics] are composable. There are two main patterns for subcomponents
recording their own metrics. You can define sub-metrics by having a
#[metrics(subfield)]. Then, you can either return a metric struct along with
the data - metrique provides Instrument to standardize this - or pass a
(mutable) reference to the metrics struct. See the library metrics example.
This is the recommended approach. It has minimal performance overhead and makes your metrics very predictable.
Metrics with complex lifetimes
Sometimes, managing metrics with a simple ownership and mutable reference pattern does not work well. The
metrique crate provides some tools to help more complex situations
Controlling the point of metric emission
Sometimes, your code does not have a single exit point at which you want to report your metrics = maybe your operation spawns some post-processing tasks, and you want your metric entry to include information from all of them.
You don't want to wrap your parent metric in an Arc, as that will prevent you from having mutable access
to metric fields, but you still want to delay metric emission.
To allow for that, the [AppendAndCloseOnDrop] guard (which is what the <MetricName>Guard aliases point to)
has flush_guard and force_flush_guard functions. The flush guards are type-erased (they have
types FlushGuard and ForceFlushGuard, which don't mention the type of the metric entry).
The metric will then be emitted when either:
- The owner handle of the metric and all the
FlushGuards have been dropped - The owner handle of the metric and any of the
ForceFlushGuards have been dropped.
This makes force_flush_guard useful to emit a metric via a timeout even if some
of the downstream tasks have not completed, which is useful since you normally
want metrics even (maybe especially) when things are stuck (the downstream tasks
presumably have access to the metric struct via an Arc
or Slot, which if they eventually finish,
will let them safely write a value to the now-dead metric).
See the examples below to see how the flush guards are used.
Using Slots to send values
In some cases, you might want a sub-task (potentially a Tokio task, but maybe just a sub-component of your code) to be able to add some metric fields to your metric entry, but without forcing an ownership relationship.
In that case, you can use Slot, which creates a oneshot channel, over which the value of the metric can be sent.
Note that Slot by itself does not delay the parent metric entry's emission in any way. If your metric entry
is emitted (for example, when your request is finished) before the slot is filled, the metric entry will just
skip the metrics behind the Slot. One option is to make your request wait for the slot
to be filled - either by waiting for your subtask to complete or by using Slot::wait_for_data.
Another option is to use techniques for controlling the point of metric emission - to make that easy, Slot::open has a OnParentDrop::Wait mode, that holds on to a FlushGuard until the slot is closed.
use GlobalEntrySink;
use metrics;
use ;
// sub-fields can also be declared with `#[metrics]`
async
async
async
Using Atomics
You might want to "fan out" work to multiple scopes that are in the background or otherwise operating in parallel. You can accomplish this by using atomic field types to store the metrics, and fanout-friendly wrapper APIs on your metrics entry.
Anything that implements CloseValue can be used as a field. metrique provides a number of basic primitives such as Counter, a thin wrapper around AtomicU64. Most std::sync::atomic types also implement CloseValueRef directly. If you need to build your own primitives, simply ensure they implement CloseValueRef. By using primitives that can be mutated through shared references, you make it possible to use Handle or your own Arc to share the metrics entry around multiple owners or tasks.
For further usage of atomics for concurrent metric updates, see the fanout example.
use GlobalEntrySink;
use metrics;
use ;
use Arc;
Controlling metric output
Setting units for metrics
You can provide units for your metrics. These will be included in the output format. You can find all available units in metrique::unit::*. Note that these are an open set and the custom units may be defined.
use metrics;
use Megabyte;
Renaming metric fields
the complex interaction between naming, prefixing, and inflection is deterministic, but sometimes might not do what you expect. It is critical that you add tests that validate that the keys being produced match your expectations
You can customize how metric field names appear in the output using several approaches:
Rename all fields with a consistent case style
Use the rename_all attribute on the struct to apply a consistent naming convention to all fields:
use metrics;
// All fields will use kebab-case in the output
Supported case styles include: "PascalCase", "camelCase", "snake_case".
Important: rename_all is transitive—it will apply to all child structures that are #[metrics(flatten)]'d into the entry. You SHOULD only set rename_all on your root struct. If a struct explicitly sets a name scheme with rename_all, it will not be overridden by a parent.
Add a prefix to all fields
Use the prefix attribute on structs to add a consistent prefix to all fields:
use metrics;
// All fields will be prefixed with "api_"
Add a prefix to all metrics in a subfield
Use the prefix attribute on flatten to add a consistent prefix to fields of the
included struct:
use metrics;
use HashMap;
// using `subfield_owned` to allow closing over the `HashMap`
Prefixes will be inflected to the case metrics are emitted in, so if you let rename_all
vary, the inner metric name will be:
- in
rename_all = "Preserve",Downstreamsuccess/OtherDownstreamsuccess - in
rename_all = "PascalCase",DownstreamSuccess/OtherDownstreamSuccess - in
rename_all = "kebab-case",downstream-success/other-downstream-success - in
rename_all = "snake_case",downstream_success/other_downstream_success
Rename individual fields
Use the name attribute on individual fields to override their names:
use metrics;
Combining renaming strategies
You can combine these approaches, with field-level renames taking precedence over struct-level rules:
use metrics;
Types in metrics
Example of a metrics struct:
use ;
use ;
use ;
use metrics;
use ToString;
use IpAddr;
use ;
use Duration;
Ordinary fields in metrics need to implement CloseValue<Output: [metrique_writer::Value]>.
If you use a formatter (#[metrics(format)]), your field needs to implement CloseValue,
and its output needs to be supported by the formatter instead of
implementing [metrique_writer::Value].
Nested fields (#[metrics(flatten)]) need to implement [CloseEntry].
Customization
If the standard primitives in metrique don't serve your needs, there's a good
chance you might be able to implement them yourself.
Custom CloseValue and CloseValueRef
If you want to change the behavior when metrics are closed, you can
implement CloseValue or CloseValueRef yourself (CloseValueRef
does not take ownership and will also also work behind smart pointers,
for example for Arc<YourValue>).
For instance, here is an example for adding a custom timer type that calculates the time from when it was created, to when it finished, on close (it doesn't do anything that timers::Timer doesn't do, but is useful as an example).
use ;
use ;
;
// this does not take ownership, and therefore should implement `CloseValue` for both &T and T
Custom ValueFormatters
You can implement custom formatters by creating a custom value formatter using the ValueFormatter trait that formats the value into a ValueWriter, then referring to it using #[metrics(format)].
An example use would look like the following:
use metrics;
use SystemTime;
use ;
/// Format a SystemTime as UTC time
;
// observe that `format_value` is a static method, so `AsUtcDate`
// is never initialized.
Testing
Testing emitted metrics
metrique provides test_entry which allows introspecting the entries that are emitted (without needing to read EMF directly). You can use this functionality in combination with the TestEntrySink to test that you are emitting the metrics that you expect:
Note: enable the
test-utilfeature ofmetriqueto enable test utility features.
#
use metrics;
use ;
#
There are two ways to control the queue:
- Pass the queue explicitly when constructing your metric object, e.g. by passing it into
init(as done above) - Use the test-queue functionality provided out-of-the-box by global entry queues:
use GlobalEntrySink;
use ServiceMetrics;
use ;
let TestEntrySink = test_entry_sink;
let _guard = set_test_sink;
See examples/testing.rs and examples/testing-global-queues.rs for more detailed examples.
Debugging common issues
No entries in the log
If you see empty files e.g. "service_log.{date}.log", this is could be because your entries are invalid and being dropped by metrique-writer. This will occur if your entry is invalid (e.g. if you have two fields with the same name). Enable tracing logs to see the errors.
#
Security Concerns
Sensitive information in metrics
Metrics and logs are often exported to places where they can be read by a large number of people. Therefore, it is important to keep sensitive information, including secret keys and private information, out of them.
The metrique library intentionally does not have mechanisms that put unexpected data within metric entries (for example, bridges from Debug implementations that can put unexpected struct fields in metrics).
However, the metrique library controls neither the information placed in metric entries nor where the metrics end up. Therefore, it is your responsibility of an application writer to avoid using the metrique library to emit sensitive information to where it shouldn't be present.
Metrics being dropped
The metrique library is intended to be used for operational metrics, and therefore it is intentionally designed to drop metrics under high-load conditions rather than having the application grind to a halt.
There are 2 main places where this can happen:
BackgroundQueuewill drop the earliest metric in the queue under load.- It is possible to explicitly enable sampling (by using
sample_by_fixed_fractionorsample_by_congress_at_fixed_entries_per_second). If sampling is being used, metrics will be dropped at random.
If your application's security relies on metric entries not being dropped (for example, if you use metric entries to track user log-in operations, and your application relies on log-in operations not being dropped), it is your responsibility to engineer your application to avoid the metrics being dropped.
In that case, you should not be using BackgroundQueue or sampling. It is probably fine to use the Format implementations in that case, but it is recommended to test and audit your use-case to make sure nothing is being missed.
Use of exporters
The metrique library does not currently contain any code that exports the metrics outside of the current process. To make a working system, you normally need to integrate the metrique library with some exporter such as the Amazon CloudWatch Agent.
It is your responsibility to ensure that any agents you are using are kept up to date and configured in a secure manner.