- You have determined that a
Vec-like data structure is appropriate for some or all of your data, and
- You require that the data in question be persisted to disk, and
- You require that the data in question be synced to disk at certain times or intervals, after said data has been mutated (added to, deleted from, or altered), such that abnormal termination of your program (e.g. program crash, loss of power, etc.) incurs minimal loss of data, and
- You are confident that all processes which rely on the data on disk honor the advisory locks that we apply to them, so that the integrity of the data is ensured, and
- You desire, or at least are fine with, having the on-disk representation of your data be the same as that which it has in memory, and understand that this means that the files are tied to the CPU architecture of the host that they were saved to disk on. If you need to migrate your data to another computer with a different CPU architecture in the future, you convert it then, rather than serializing and deserializing your data between some other format and the in-memory representation all of the time.
This library makes use of BSD
flock() advisory locks on Unix platforms (Linux, macOS,
Provided that your software runs in an environment where any process that attempts to open the files you are persisting your data to honor the advisory locks, everything will be fine and dandy :)
Data persistence is achievable by many different means. No one solution fits all (and this library is no exception from that).
Some of the ways in which data persistence can be achieved include:
- Relying on a relational database such as PostgreSQL.
- Making use of the Serde framework for serializing and deserializing Rust data structures, and handle writing to and reading from disk yourself.
But, in software architecture situations where you choose to apply the data-oriented design paradigm to your problem, you may find that you end up with some big arrays of data where you’ve ordered the elements of each array in such a way as to be optimized for CPU caches in terms of spatial locality of reference.
When that is the case – when you have those kinds of arrays, and when you want to persist
the data in those arrays in the manner we talked about
at the beginning of this document,
mmap()’ing those arrays to files on disk begins to look pretty alluring,
doesn’t it? And there you have it, that was the motivation for writing this library.
This library helps you out when you have arrays of data that are being mutated at run-time,
and you need to sync the data to disk for persistence at certain points or intervals in time.
It does so by making use of
mmap() (through the
with a little bit of locking and data validation sprinkled on top.
What this library is not is, something that “gives you” data-oriented design. Indeed, there can be no such thing;
A big misunderstanding for many new to the data-oriented design paradigm, a concept brought over from abstraction based development, is that we can design a static library or set of templates to provide generic solutions to everything presented in this book as a data-oriented solution. Much like with domain driven design, data-oriented design is product and work-flow specific. You learn how to do data-oriented design, not how to add it to your project. The fundamental truth is that data, though it can be generic by type, is not generic in how it is used.
TODO: Write about how to use the library correctly.
Don’t forget to star persistence on GitHub if you find this library interesting or useful.