Crate persistence
source ·Expand description
persistence – mutable resizable arrays built on top of mmap
This Rust library provides MmapedVec
; a resizable, mutable array type
implemented on top of mmap()
,
providing a Vec
-like data structure
with persistence to disk built into it.
MmapedVec
is aimed at developers who wish to write software utilizing
data-oriented design
techniques in run-time environments where all of the following hold true:
- You have determined that a
Vec
-like data structure is appropriate for some or all of your data, and - You require that the data in question be persisted to disk, and
- You require that the data in question be synced to disk at certain times or intervals, after said data has been mutated (added to, deleted from, or altered), such that abnormal termination of your program (e.g. program crash, loss of power, etc.) incurs minimal loss of data, and
- You are confident that all processes which rely on the data on disk honor the advisory locks that we apply to them, so that the integrity of the data is ensured, and
- You desire, or at least are fine with, having the on-disk representation of your data be the same as that which it has in memory, and understand that this means that the files are tied to the CPU architecture of the host that they were saved to disk on. If you need to migrate your data to another computer with a different CPU architecture in the future, you convert it then, rather than serializing and deserializing your data between some other format and the in-memory representation all of the time.
Advisory locks
This library makes use of BSD flock()
advisory locks on Unix platforms (Linux, macOS,
FreeBSD, etc).
Provided that your software runs in an environment where any process that attempts to open the files you are persisting your data to honor the advisory locks, everything will be fine and dandy :)
Motivation
Data persistence is achievable by many different means. No one solution fits all (and this library is no exception from that).
Some of the ways in which data persistence can be achieved include:
- Relying on a relational database such as PostgreSQL.
- Making use of the Serde framework for serializing and deserializing Rust data structures, and handle writing to and reading from disk yourself.
But, in software architecture situations where you choose to apply the data-oriented design paradigm to your problem, you may find that you end up with some big arrays of data where you’ve ordered the elements of each array in such a way as to be optimized for CPU caches in terms of spatial locality of reference.
When that is the case – when you have those kinds of arrays, and when you want to persist
the data in those arrays in the manner we talked about
at the beginning of this document,
mmap()
’ing those arrays to files on disk begins to look pretty alluring,
doesn’t it? And there you have it, that was the motivation for writing this library.
What this library is, and what it is not
This library helps you out when you have arrays of data that are being mutated at run-time,
and you need to sync the data to disk for persistence at certain points or intervals in time.
It does so by making use of mmap()
(through the memmap
crate)
with a little bit of locking and data validation sprinkled on top.
What this library is not is, something that “gives you” data-oriented design. Indeed, there can be no such thing;
A big misunderstanding for many new to the data-oriented design paradigm, a concept brought over from abstraction based development, is that we can design a static library or set of templates to provide generic solutions to everything presented in this book as a data-oriented solution. Much like with domain driven design, data-oriented design is product and work-flow specific. You learn how to do data-oriented design, not how to add it to your project. The fundamental truth is that data, though it can be generic by type, is not generic in how it is used.
Caveats or, some things to keep in mind
TODO: Write about how to use the library correctly.
READY? LET’S GO!
Add the persistence crate to the [dependencies]
section of your Cargo.toml
manifest
and start using this library in your projects.
Star me on GitHub
Don’t forget to star persistence on GitHub if you find this library interesting or useful.