# Walcraft
Walcraft is a Write Ahead Log (WAL) solution for concurrent environments. The library provides high performance by using
an in-memory buffer and append-only logs. The logs are stored in multiple files, and older files are deleted to save
space.
# Features
- Awesome crate name
- Simple to use and customize
- Configurable storage limit
- Configurable page size
- CRC 32 checksum for data integrity
- fsync support
- High write throughput
- Built for concurrent and parallel environments
- Prevents write amplification for high frequency writes
- Automatically syncs logs with the disk (default: every 100ms)
- Bring your own serialization format
# Initialization
### Builder Pattern (Recommended)
The builder pattern allows for complete customization of the WAL instance.
```rust
use walcraft::{Size, WalBuilder, Wal};
fn main() {
// create a wal with 4 KB page size, 10 GB storage and autosync every 50ms
let wal: Wal = WalBuilder::new()
.location("/tmp/logs/wal")
.page_size(Size::Kb(4))
.storage_size(Size::Gb(10))
.sync_interval(50)
.build()
.unwrap();
// create a wal with 16 KB page size, enable fsync, use 250 MB of storage and disable autosync
let wal2: Wal = WalBuilder::new()
.location("/tmp/logs/wal")
.storage_size(Size::Mb(250))
.page_size(Size::Mb(16))
.sync_interval(0)
.enable_fsync()
.build()
.unwrap();
}
```
### Direct Initialization
This method only allows you to set location and storage size (in MBs) only.
The buffer size is set to 4 KB by default and fsync is disabled.
```rust
use walcraft::Wal;
fn main() {
// Create a wal instance with 200 MB of storage
let wal = Wal::new("/tmp/logs/wal", Some(200));
}
```
# Usage
### Writing logs
```rust
use serde::{Deserialize, Serialize};
use walcraft::Wal;
// Log to write
#[derive(Serialize, Deserialize, Clone)]
struct Log {
id: usize,
value: f64
}
fn main() {
let log = Log { id: 1, value: 5.6234 };
// initiate wal and add a log
let wal = Wal::new("./tmp/", None).unwrap();
// write a struct
wal.append_struct(log).unwrap();
// write raw bytes
wal.append(b"raw binary data").unwrap();
// write a log in another thread
let wal2 = wal.clone();
std::thread::spawn(move || {
let log = Log { id: 2, value: 0.45 };
wal2.append_struct(log).unwrap();
});
// keep writing logs in current thread
let log = Log { id: 3, value: 123.59 };
wal.append_struct(log).unwrap();
// Flush the logs to the disk manually
// This happens automatically as well after some time. However, it's advised to
// run this method before terminating the program to ensure that no logs are lost.
wal.flush().unwrap();
}
```
### Reading logs
```rust
use serde::{Deserialize, Serialize};
use walcraft::Wal;
// Log to read
#[derive(Serialize, Deserialize, Debug)]
struct Log {
id: usize,
value: f64
}
fn main() {
let wal = Wal::new("./tmp/", None).unwrap();
let iterator = wal.iter().unwrap();
for entry in iterator {
let raw_log = entry.data(); // read raw bytes
let log: Log = entry.to_struct().unwrap(); // convert raw bytes to struct
println!("Log: {:?}", log);
}
}
```
### Limiting the size of logs
`Wal::new` method accepts two arguments. The first argument is the directory where logs will be stored.
The second (optional) argument is for the preferred storage that logs shall occupy in MBs.
Once the storage occupied by log files exceeds the provided limit, the older logs are deleted in chunks
to free up some space.
```rust
use walcraft::Wal;
fn main() {
// Unlimited log storage
let wal = Wal::new("/tmp/logz", None);
// 500 MB of logs storage
let wal = Wal::new("/tmp/logz", Some(500));
// 20 GB of logs storage
let wal = Wal::new("/tmp/logz", Some(20_000));
}
```
# Breaking Changes in version 0.3
We have introduced significant changes to the WAL library in version 0.3 that are not backward compatible with
the previous versions. Because of these breaking changes, any logs created by older versions of the WAL library will not
be readable by the new version. The key differences are:
**Paged File Layout**
The WAL file is now divided into fixed-size pages, and data is managed on a per-page basis rather than as a continuous
byte file. This new layout provides better consistency checks, corruption detection & recovery and internal organization
but makes files generated by older versions unreadable under the new scheme.
**Binary Data Only**
All data is now handled strictly as binary blobs (&[u8]). Both the append and read APIs expect and return raw bytes.
This change helps streamline performance and reduce dependency overhead. It reduces the tight coupling with the serde
library and offers more flexibility in how data is serialized and deserialized.
**Mandatory CRC32 Checksums**
Every page now includes a CRC32 checksum that is computed on write and validated on read. Pages that fail
validation are automatically skipped during iteration. This is always enabled and cannot be turned off.
**Convenient Struct Interface**
A convenience method `append_struct<T: Serialize>(item: T)` is provided to serialize your structs automatically.
On reading, the library returns a `LogEntry` object, which can be converted to a struct using
`to_struct::<T: Deserialize>()`. Alternatively, you can use the `data()` method to get the underlying binary data.
# Useful tips
- **Storage size**: The storage size can be adjusted to limit the amount of space the logs can occupy. Once the limit is
reached or exceeded, the older logs are deleted to free up space.
- **Page size**: The default page size is 4 KB. The maximum size of a single log is `PAGE_SIZE - 8 bytes`,
You can set page size to any value you want. However, it must be in multiple of 4 KB. It is recommended to keep it as
small as possible and between 4 KB and 1 MB. The page size is the size of the buffer that is used to write the logs.
The larger the page size,the more data can be written at once.
But it will result in higher write amplification and memory usage.
- **Fsync**: By default, fsync is disabled. You can enable it by using the builder pattern. Enabling fsync will ensure
that the data is written to the disk before returning from the write operation. This will ensure that the data is
not lost in case of a power failure. However, this method reduces the number of disk IO per second significantly.
- **Recovery**: The library provides a way to recover the logs at startup. You can read the logs using the `.iter()`
method. This method returns an iterator that you can use to read the logs. Calling this method after writing starts,
results in a panic.
- **Flush**: The library automatically flushes the logs to the disk once the page is filled and periodically as
specified by `sync_interval`. However, it's advised to run the `.flush()` method before terminating the program to
ensure that no logs are lost.
### Handling Log Versioning
It's important to note that the WAL library does not support versioning of logs. If you need to handle different
versions
of logs, you will need to implement your own versioning mechanism. One way to do this is to use an enum to represent the
different versions of the logs. You can then use the `serde` library to serialize and deserialize the logs. The
following
example demonstrates how to handle log versioning using an enum:
```rust
use walcraft::Wal;
use serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize, Debug, PartialEq)]
enum Log {
V1 { id: usize, name: String },
V2 { id: usize, name: String, age: u8 },
}
fn main() {
// write logs
let wal = Wal::new("/tmp/walcraft", Some(100)).unwrap();
wal.append_struct(Log::V1 {
id: 1,
name: "Alice".to_string(),
})
.unwrap();
wal.append_struct(Log::V2 {
id: 2,
name: "John".to_string(),
age: 30,
})
.unwrap();
wal.flush().unwrap();
drop(wal);
// read logs
let wal = Wal::new("/tmp/walcraft", Some(100)).unwrap();
let iterator = wal.iter().unwrap();
let logs: Vec<Log> = iterator
.into_iter()
.map(|entry| entry.to_struct::<Log>().unwrap())
.collect();
assert_eq!(logs.len(), 2);
assert_eq!(logs[0], Log::V1 { id: 1, name: "Alice".to_string() });
assert_eq!(logs[1], Log::V2 { id: 2, name: "John".to_string(), age: 30 });
}
```
### Quirks
The WAL can only be in read mode or write mode, not both at the same time.
- **Idle**: When created, the WAL is in an idle mode.
- **Read**: Calling `.iter()` method switches the WAL to read mode. In this mode, you cannot write data;
any write attempts will be ignored. Once the reading finishes, the WAL automatically reverts to idle mode.
- **Write**: When you start writing to the WAL, it switches to write mode and cannot switch back to idle or read mode.
This design prevents conflicts between reading and writing. Ideally, you should read the data at startup, as part of the
recovery process, before beginning to write.
```rust
use serde::{Deserialize, Serialize};
use walcraft::Wal;
// Log to write
#[derive(Serialize, Deserialize, Clone, Debug)]
struct Log {
id: usize,
value: f64
}
fn main() {
// create an instance of WAL
let wal = Wal::new("/tmp/logz", Some(2000)).unwrap();
// recovery: Option A (read all data at once)
// This method reads all the data at once and shall only be used
// if all the logs, depending on storage size, can fit in the memory
let all_logs = wal.iter().unwrap().collect::<Vec<_>>();
// recovery: Option B
// This method reads data in chunks of page size (default: 4 KB).
// It is memory efficient and ideal when you have a large number of logs
for entry in wal.iter().unwrap() {
let log: Log = entry.to_struct().unwrap();
dbg!(log);
}
// start writing
wal.append_struct(Log { id: 1, value: 3.14 }).unwrap();
wal.append(b"raw binary data").unwrap();
}
```
# Known issues
- None at the moment.