[−][src]Module hdrhistogram::serialization::interval_log

Interval log parsing and writing.

Interval logs, as handled by the Java implementation's HistogramLogWriter, HistogramLogReader, and HistogramLogProcessor, are a way to record a sequence of histograms over time. Suppose you were running a load test for an hour: you might want to record a histogram per second or minute so that you could correlate measurements with behavior you might see in logs, etc.

An interval log contains some initial metadata, then a sequence of histograms, each with some additional metadata (timestamps, etc). See IntervalLogHistogram.

The intervals in the log should be ordered by start timestamp. It's possible to write (and parse) logs with intervals in any order, but the expectation is that they will be sorted.

To parse a log, see IntervalLogIterator. To write a log, see IntervalLogWriterBuilder.

Timestamps

Each interval has a timestamp in seconds associated with it. However, it's not necessarily as simple as just interpreting the number as seconds since the epoch. There are two optional pieces of header metadata: "StartTime" and "BaseTime". Neither, one, or both of these may be present. It is possible to have multiple StartTime or BaseTime entries in the log, or even interleaved with interval histograms, but that is confusing, so this API prevents you from doing so.

When BaseTime is present, per-interval timestamps are the number of seconds since BaseTime. When it is absent, the per-interval timestamps represent fractional seconds since the epoch (aka Unix time). BaseTime is useful because it allows you to have somewhat human-readable timestamps for each interval -- it's easier to see 245.3 and see that that's 4 minutes after the start of whatever it is you're doing than it is to mentally parse a Unix timestamp. Naturally, you can always calculate the deltas after the fact if you're willing to write a little tooling, but in some cases log files are consumed by humans too.

While BaseTime is used to redefine per-interval timestamps for human readability, StartTime provides information about when the process that is generating the log reached some "start" condition. It's frequently the case that intervals will start some seconds after the process started, whether due to initial warmup before a benchmark or because it just takes a while to start up. If this is relevant to your workload, use StartTime to record the actual process start time (or other relevant "start" condition, like when a benchmark begins on an already long-running process). You could then use this when processing a log to more accurately plot interval data over time. Put differently, StartTime doesn't change the interpretation of interval timestamps directly like BaseTime; instead, it provides a hint to the consumer of the log that the "start" (whatever that means to you -- process start, etc) was at a different time than that associated with the first interval.

Example scenario

To explain their usage, suppose we're running a multi-hour benchmark on a process that starts up at a Unix time of 1500000000. We'll be recording separate log files per hour just to demonstrate the interaction between BaseTime, StartTime, and interval log timestamps.

The process starts up, warms up its caches, JIT compiles, etc and is ready to start its benchmark 40 seconds later, so we create the first interval log file and record a StartTime of 1500000040. If the actual process start (rather than benchmark start) is more useful to you, using a StartTime of 1500000000 would be reasonable, but we'll stick with 1500000040.

We'll use a BaseTime of 1500000040 because that's when the benchmark began, but 1500000000 would also be a reasonable choice here -- it would just make the per-interval deltas 40 seconds larger, which might be a more useful way of recording them, depending on the situation.

The benchmark produces an interval histogram for each 60 seconds of workload, so the first one is ready at 1500000100, and is recorded with a delta timestamp of 60. This goes on for another hour, with the last one being 3540 seconds after the start of the benchmark with a corresponding delta of 3540.

At the start of the second hour, the time is 1500003640 and the first log file is ended (which is straightforward: just stop writing to it and close the file) and the second log file is opened. It still uses a StartTime of 1500000040 because we want to represent that this log pertains to something that started an hour ago, but we'll use a BaseTime of 1500003640 so that our delta timestamps start over at 0. Again, you could just as well decide to use the same BaseTime of 1500000040 if you prefer, but it's arguably easier for a human to parse "h hours into the benchmark, s seconds past the hour" than it is to mentally divide by 3600, so we'll go with the new BaseTime.

Suppose now you wanted to process these logs and show information (e.g. the 0.999th quantile of each interval) as the y-axis on a plot where time is the x-axis. You would want to have StartTime be the zero of the x-axis. For each interval, calculate the Unix time by adding its timestamp to BaseTime, where BaseTime is assumed to be zero if it's not specified. The point on the x-axis for that interval would be the result of subtracting StartTime from that Unix time. As an example, the 17th minute in the 2nd hour would have an interval timestamp of approximately 1020, which when added to 1500003640 is 1500004660. The difference between StartTime and then is 4620. You might choose to display that in units of minutes, which would be 4620 / 60 = 77.

Java interop

Since you may be interoperating with the Java reference implementation, here's a summary of the logic used by the HistogramLogReader class for StartTime and BaseTime. It's slightly different than what was described above, presumably for legacy compatibility reasons. This class stores the StartTime as a field which is exposed via a getter, and also integrates filtering intervals based on windows for either "absolute" (Unix time) timestamps or "offset" timestamps (delta vs the StartTime), so if you're using that filtering functionality, you need to understand how it ends up setting its internal version of StartTime.

Neither StartTime nor BaseTime are present: interval timestamps are interpreted as seconds since the epoch. The first interval's timestamp is stored to the StartTime field.
StartTime is present: StartTime is a number of seconds since epoch, and interval timestamps may be interpreted as deltas to be added to StartTime or as "absolute" Unix time depending on a heuristic. In other words, the heuristic chooses between setting the effective BaseTime to 0 or to StartTime. Specifically, the heuristic interprets interval timestamps as deltas if they are more than a year's worth of seconds smaller than StartTime and as absolute timestamps otherwise.
BaseTime is present: BaseTime is a number of seconds since epoch, and interval timestamps are interpreted as deltas. The first interval's (delta) timestamp is stored to the StartTime field. This is likely a bug, since StartTime should be an absolute timestamp, and appears to cause erroneous behavior when filtering by offset timestamps.
BaseTime and StartTime are present: The BaseTime is used like it is when it's the only one present: it's a number of seconds since epoch that serves as the starting point for the per-interval deltas to get a wall-clock time for each interval. No heuristics are applied to guess whether or not the intervals are absolute or deltas.

The Java implementation also supports re-setting the StartTime and BaseTime if those entries exist more than once in the log. Suppose that you had an hour's worth of per-minute intervals, and then you set another StartTime to the current time and added a second hour's worth of intervals. Filtering for "all intervals between 5 and 10 minutes past the start" would result in two disjoint sequences of intervals, which is at the very least arguably unintuitive. However, you may well be working with log files that are structured that way, so this implementation's IntervalLogIterator will expose multiple StartTime, etc, entries as they appear in the log file.

Examples

Parse a single interval from a log.

use std::time;
use hdrhistogram::serialization::interval_log;

// two newline-separated log lines: a comment, then an interval
let log = b"#I'm a comment\nTag=t,0.127,1.007,2.769,base64EncodedHisto\n";

let mut iter = interval_log::IntervalLogIterator::new(&log[..]);

// the comment is consumed and ignored by the parser, so the first event is an Interval
match iter.next().unwrap() {
    Ok(interval_log::LogEntry::Interval(h)) => {
        assert_eq!(time::Duration::new(0, 127_000_000), h.start_timestamp());
    }
    _ => panic!()
}

// there are no more lines in the log; iteration complete
assert_eq!(None, iter.next());

Skip logs that started before 3 seconds.

use hdrhistogram::serialization::interval_log;

let log = "\
    #I'm a comment\n\
    Tag=a,0.123,1.007,2.769,base64EncodedHisto\n\
    1.456,1.007,2.769,base64EncodedHisto\n\
    3.789,1.007,2.769,base64EncodedHisto\n\
    Tag=b,4.123,1.007,2.769,base64EncodedHisto\n\
    5.456,1.007,2.769,base64EncodedHisto\n\
    #Another comment\n"
.as_bytes();

let iter = interval_log::IntervalLogIterator::new(&log);

let count = iter
    // only look at intervals (which are the only non-comment lines in this log)
    .filter_map(|e| match e {
        Ok(interval_log::LogEntry::Interval(ilh)) => Some(ilh),
         _ => None
    })
    // do any filtering you want
    .filter(|ilh| ilh.start_timestamp().as_secs() >= 3)
    .count();

assert_eq!(3, count);

Write a log.

use std::{str, time};
use hdrhistogram;
use hdrhistogram::serialization;
use hdrhistogram::serialization::interval_log;

let mut buf = Vec::new();
let mut serializer = serialization::V2Serializer::new();

let mut h = hdrhistogram::Histogram::<u64>::new_with_bounds(
    1, u64::max_value(), 3).unwrap();
h.record(12345).unwrap();

// limit scope of mutable borrow of `buf`
{
    let now = time::SystemTime::now();
    let mut log_writer = interval_log::IntervalLogWriterBuilder::new()
        .add_comment("Comments are great")
        .with_start_time(now)
        .begin_log_with(&mut buf, &mut serializer)
        .unwrap();

    log_writer.write_comment(
        "You can have comments anywhere in the log").unwrap();

    log_writer
        .write_histogram(
            &h,
            now.elapsed().unwrap(),
            time::Duration::new(12, 345_678_901),
            interval_log::Tag::new("im-a-tag")
        )
        .unwrap();
}

// `buf` is now full of stuff; we check for the first line
assert_eq!("#Comments are great\n", &str::from_utf8(&buf).unwrap()[0..20]);

Structs

IntervalLogHistogram	An individual interval histogram parsed from an interval log.
IntervalLogIterator	Parse interval logs.
IntervalLogWriter	Writes interval histograms in an interval log.
IntervalLogWriterBuilder	Prepare an `IntervalLogWriter`.
Tag	A tag for an interval histogram.

Enums

IntervalLogWriterError	Errors that can occur while writing a log.
LogEntry	Represents one non-comment line in an interval log.
LogIteratorError	Errors that occur when parsing an interval log.