whisper 0.1.11

Whisper time series file support
docs.rs failed to build whisper-0.1.11
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build: whisper-0.1.3

Whisper

A turbo-charged whisper database implementation.

Open tasks

  • WhisperCache
  • Validate whisper-dump.py behavior
  • Aggregations on write
  • SchemaRegistry or similar
  • Validate retention policies in schema
  • Validate WhisperFile when opening
  • tmpfile support in test cases
  • test suite comparing behavior with python implementation
  • WhisperFile advisory locking
  • WhisperFile#write to io::Result (file deleted while app is running removes from cache, etc)
  • Put most #[derive(Debug)] behind feature flag for test mode only?
  • What should we do when we get a 'nan' value in a datagram? Right now it goes to 0.0.
  • Bug: when writing more points than in archive points writes are placed differently than whisper-py (it removes young points and does not write them). see whisper.py __archive_update_many

What is Whisper?

Whisper is fixed-size file format for storing one run of time series measurements. A measurement comes from a single instance of a thing, said as CPU0 on Computer A or Bytes Transmitted on eth0 for Computer B. To measure a system you will end up with multiple whisper files.

The fixed-size of the file means it has a fixed retention, it can only store so many measurements. Each measurement has a timestamp which corresponds to a predetermined location in the file. And when you get to the end of the file the location just wraps around and overwrites data.

Whisper has clear benefits: easy capacity planning, no dynamic allocations. Whisper-files are the simplest way of storing time series data. This simplicity certainly has its tradeoffs but provides the best raw performance and throughput.

Note: if you want a more modern, clustered, appending time series database it is highly recommended you explore InfluxDB. It's still under heavy development but reflects the future we want.

The Python Implemenation

The original whisper system is written in Python, check out the project on github. There's a severe lack of tests, the code has multiple unused variables, and the most interesting parts are large, undocumented methods.

This Rust Implementation

This is actually version 2 of Xavier's reimplementation. The aim is create a small, fast library which can become the kernel of a full graphite implementation. This maintains full backwards compatibility with your existing whisper files.

How is it faster?

  • Staticly compiled to machine code with few dynamic allocations, no garbage collection, and no runtime
  • Uses POSIX facilities for mmap, commonly read pages of files are kept hot in RAM
  • Files are kept open for longer with the read-through WhisperCache

How is the code better?

  • Functionality is broken out to smaller, unit tested rust modules
  • Whisper concepts are encoded in types to avoid programmer error and keep things explicit

How do I use it?

git clone https://github.com/tureus/whisper-mmap and cargo test to verify things work ok.

Simply opening a whisper file:

let path = Path::new("/tmp/blah.wsp").to_path_buf();
let default_specs = vec!["1s:60s".to_string(), "1m:1y".to_string()];
let schema = Schema::new_from_retention_specs(default_specs);

let file = WhisperFile::new(&path, schema).unwrap();
// do things with the file