Crate persistence

source ·
Expand description

persistence – mutable resizable arrays built on top of mmap

This Rust library provides MmapedVec; a resizable, mutable array type implemented on top of mmap(), providing a Vec-like data structure with persistence to disk built into it.

MmapedVec is aimed at developers who wish to write software utilizing data-oriented design techniques in run-time environments where all of the following hold true:

  1. You have determined that a Vec-like data structure is appropriate for some or all of your data, and
  2. You require that the data in question be persisted to disk, and
  3. You require that the data in question be synced to disk at certain times or intervals, after said data has been mutated (added to, deleted from, or altered), such that abnormal termination of your program (e.g. program crash, loss of power, etc.) incurs minimal loss of data, and
  4. You are confident that all processes which rely on the data on disk honor the advisory locks that we apply to them, so that the integrity of the data is ensured, and
  5. You desire, or at least are fine with, having the on-disk representation of your data be the same as that which it has in memory, and understand that this means that the files are tied to the CPU architecture of the host that they were saved to disk on. If you need to migrate your data to another computer with a different CPU architecture in the future, you convert it then, rather than serializing and deserializing your data between some other format and the in-memory representation all of the time.

Advisory locks

This library makes use of BSD flock() advisory locks on Unix platforms (Linux, macOS, FreeBSD, etc).

Provided that your software runs in an environment where any process that attempts to open the files you are persisting your data to honor the advisory locks, everything will be fine and dandy :)

Motivation

Data persistence is achievable by many different means. No one solution fits all (and this library is no exception from that).

Some of the ways in which data persistence can be achieved include:

  • Relying on a relational database such as PostgreSQL.
  • Making use of the Serde framework for serializing and deserializing Rust data structures, and handle writing to and reading from disk yourself.

But, in software architecture situations where you choose to apply the data-oriented design paradigm to your problem, you may find that you end up with some big arrays of data where you’ve ordered the elements of each array in such a way as to be optimized for CPU caches in terms of spatial locality of reference.

When that is the case – when you have those kinds of arrays, and when you want to persist the data in those arrays in the manner we talked about at the beginning of this document, mmap()’ing those arrays to files on disk begins to look pretty alluring, doesn’t it? And there you have it, that was the motivation for writing this library.

What this library is, and what it is not

This library helps you out when you have arrays of data that are being mutated at run-time, and you need to sync the data to disk for persistence at certain points or intervals in time. It does so by making use of mmap() (through the memmap crate) with a little bit of locking and data validation sprinkled on top.

What this library is not is, something that “gives you” data-oriented design. Indeed, there can be no such thing;

A big misunderstanding for many new to the data-oriented design paradigm, a concept brought over from abstraction based development, is that we can design a static library or set of templates to provide generic solutions to everything presented in this book as a data-oriented solution. Much like with domain driven design, data-oriented design is product and work-flow specific. You learn how to do data-oriented design, not how to add it to your project. The fundamental truth is that data, though it can be generic by type, is not generic in how it is used.

Caveats or, some things to keep in mind

TODO: Write about how to use the library correctly.

READY? LET’S GO!

Add the persistence crate to the [dependencies] section of your Cargo.toml manifest and start using this library in your projects.

Star me on GitHub

Don’t forget to star persistence on GitHub if you find this library interesting or useful.

Structs