Walcraft
Walcraft is a Write Ahead Log (WAL) solution for concurrent environments. The library provides high performance by using an in-memory buffer and append-only logs. The logs are stored in multiple files, and older files are deleted to save space.
Features
- Awesome crate name
- Simple to use and customize
- Configurable storage limit
- Configurable page size
- CRC 32 checksum for data integrity
- fsync support
- High write throughput
- Built for concurrent and parallel environments
- Prevents write amplification for high frequency writes
- Automatically syncs logs with the disk (default: every 100ms)
- Bring your own serialization format
Initialization
Builder Pattern (Recommended)
The builder pattern allows for complete customization of the WAL instance.
use ;
Direct Initialization
This method only allows you to set location and storage size (in MBs) only. The buffer size is set to 4 KB by default and fsync is disabled.
use Wal;
Usage
Writing logs
use ;
use Wal;
// Log to write
Reading logs
use ;
use Wal;
// Log to read
Limiting the size of logs
Wal::new method accepts two arguments. The first argument is the directory where logs will be stored.
The second (optional) argument is for the preferred storage that logs shall occupy in MBs.
Once the storage occupied by log files exceeds the provided limit, the older logs are deleted in chunks to free up some space.
use Wal;
Breaking Changes in version 0.3
We have introduced significant changes to the WAL library in version 0.3 that are not backward compatible with the previous versions. Because of these breaking changes, any logs created by older versions of the WAL library will not be readable by the new version. The key differences are:
Paged File Layout
The WAL file is now divided into fixed-size pages, and data is managed on a per-page basis rather than as a continuous byte file. This new layout provides better consistency checks, corruption detection & recovery and internal organization but makes files generated by older versions unreadable under the new scheme.
Binary Data Only
All data is now handled strictly as binary blobs (&[u8]). Both the append and read APIs expect and return raw bytes. This change helps streamline performance and reduce dependency overhead. It reduces the tight coupling with the serde library and offers more flexibility in how data is serialized and deserialized.
Mandatory CRC32 Checksums
Every page now includes a CRC32 checksum that is computed on write and validated on read. Pages that fail validation are automatically skipped during iteration. This is always enabled and cannot be turned off.
Convenient Struct Interface
A convenience method append_struct<T: Serialize>(item: T) is provided to serialize your structs automatically.
On reading, the library returns a LogEntry object, which can be converted to a struct using
to_struct::<T: Deserialize>(). Alternatively, you can use the data() method to get the underlying binary data.
Useful tips
- Storage size: The storage size can be adjusted to limit the amount of space the logs can occupy. Once the limit is reached or exceeded, the older logs are deleted to free up space.
- Page size: The default page size is 4 KB. The maximum size of a single log is
PAGE_SIZE - 8 bytes, You can set page size to any value you want. However, it must be in multiple of 4 KB. It is recommended to keep it as small as possible and between 4 KB and 1 MB. The page size is the size of the buffer that is used to write the logs. The larger the page size,the more data can be written at once. But it will result in higher write amplification and memory usage. - Fsync: By default, fsync is disabled. You can enable it by using the builder pattern. Enabling fsync will ensure that the data is written to the disk before returning from the write operation. This will ensure that the data is not lost in case of a power failure. However, this method reduces the number of disk IO per second significantly.
- Recovery: The library provides a way to recover the logs at startup. You can read the logs using the
.iter()method. This method returns an iterator that you can use to read the logs. Calling this method after writing starts, results in a panic. - Flush: The library automatically flushes the logs to the disk once the page is filled and periodically as
specified by
sync_interval. However, it's advised to run the.flush()method before terminating the program to ensure that no logs are lost.
Handling Log Versioning
It's important to note that the WAL library does not support versioning of logs. If you need to handle different
versions
of logs, you will need to implement your own versioning mechanism. One way to do this is to use an enum to represent the
different versions of the logs. You can then use the serde library to serialize and deserialize the logs. The
following
example demonstrates how to handle log versioning using an enum:
use Wal;
use ;
Quirks
The WAL can only be in read mode or write mode, not both at the same time.
- Idle: When created, the WAL is in an idle mode.
- Read: Calling
.iter()method switches the WAL to read mode. In this mode, you cannot write data; any write attempts will be ignored. Once the reading finishes, the WAL automatically reverts to idle mode. - Write: When you start writing to the WAL, it switches to write mode and cannot switch back to idle or read mode.
This design prevents conflicts between reading and writing. Ideally, you should read the data at startup, as part of the recovery process, before beginning to write.
use ;
use Wal;
// Log to write
Known issues
- None at the moment.