1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
#![warn(missing_docs)]
#![deny(clippy::unwrap_used)]

/*! # Rust memory mapped vector

This crate contains implementation / helper to create data struct that are memory mapped.

Sometime, you have to deal with vector / data that cannot fit in memory.
Moving them to disk and memory map them is a good way to deal with this problem.

## How to use it ?

That is so simple !

```rust
use mmap_vec::MmapVec;

#[derive(Debug, PartialEq, Clone, Copy)]
struct Row {
    id: usize,
    age: u8,
}

let row1 = Row { id: 42, age: 18 };
let row2 = Row { id: 894, age: 99 };

// Create a memory mapped vec 😎
let mut v = MmapVec::new();

// Push can trigger new mmap segment creation, so it can fail.
v.push(row1).unwrap();
v.push(row2).unwrap();

// Check the content
assert_eq!(v[0], row1);
assert_eq!(&v[..], &[row1, row2]);

// Pop content
assert_eq!(v.pop(), Some(row2));
assert_eq!(v.pop(), Some(row1));
```

Check the unit tests for more example.

## How it works ?

The main idea here is to provide a basic `struct Segment`.

This struct provides constant size memory mapped array of type `T`.
Wrapping `Segment` into a new struct `MmapVec` that handle segment growth / shrink does the trick.

> Where does the segment are store on disk ?

For now data are stored in `/tmp` under a dedicated folder.

UUID V4 are generated in order to avoid collision when creating segment.

```bash
❯ ls /tmp/mmap-vec-rs -1
/tmp/mmap-vec-rs/00d977bf-b556-475e-8de5-d35e7baaa39d.seg
/tmp/mmap-vec-rs/6cb81228-9cf3-4918-a3ef-863907b32830.seg
/tmp/mmap-vec-rs/8a86eeaa-1fa8-4535-9e23-6c59e0c9c376.seg
/tmp/mmap-vec-rs/de62bde3-6524-4c4b-b514-24f6a44d6323.seg
```

> Does segment creation is configurable ?

Yes ! Check out `test_custom_segment_creator::test_custom_segment_builder` for example.

Since segment creation are manage through a trait. You are free to configure it the way you want.

> Does this work on Windows ?

__Nope__. I am not targeting this OS and would like to keep this crate as simple as possible.

I also would like to reduce dependencies as much as possible.

```bash
❯ cargo tree
mmap-vec v0.1.0
├── libc v0.2.147
└── uuid v1.4.1
    └── getrandom v0.2.10
        ├── cfg-if v1.0.0
        └── libc v0.2.147
[dev-dependencies]
└── glob v0.3.1
```

> Is this crate production ready ?

Check TODO and DONE bellow for this 😁.

## TODO & DONE

- [ ] __production ready__ base code
- [x] Unit tests
- [x] Doc
- [x] Configurable segment path creation
- [ ] Serde support
- [x] CI
- [ ] Crate deployment

## Ideas ?

- Implement custom `std::alloc::Allocator` to use with `std::vec::Vec`
 */

use std::{
    io, mem,
    ops::{Deref, DerefMut},
};

pub use segment::Segment;
pub use segment_builder::{DefaultSegmentBuilder, SegmentBuilder};
pub use vec_builder::MmapVecBuilder;

use crate::utils::page_size;

mod segment;
mod segment_builder;
mod utils;
mod vec_builder;

/// A disk memory mapped vector.
#[derive(Debug)]
pub struct MmapVec<T, B: SegmentBuilder = DefaultSegmentBuilder> {
    pub(crate) segment: Segment<T>,
    pub(crate) builder: B,
}

impl<T, B: SegmentBuilder> MmapVec<T, B> {
    /// Create a zero size mmap vec.
    pub fn new() -> Self {
        Self {
            segment: Segment::null(),
            builder: B::default(),
        }
    }

    /// Create a mmap vec with a given capacity.
    ///
    /// This function can fail if FS / IO failed.
    pub fn with_capacity(capacity: usize) -> io::Result<Self> {
        MmapVecBuilder::new().capacity(capacity).try_build()
    }

    /// Currently used vec size.
    #[inline]
    pub fn capacity(&self) -> usize {
        self.segment.capacity()
    }

    /// Shortens the vec, keeping the first `new_len` elements and dropping
    /// the rest.
    pub fn truncate(&mut self, new_len: usize) {
        self.segment.truncate(new_len);
    }

    /// Clears the vec, removing all values.
    pub fn clear(&mut self) {
        self.segment.clear();
    }

    /// Remove last value of the vec.
    ///
    /// Value will be return if data structure is not empty.
    #[inline]
    pub fn pop(&mut self) -> Option<T> {
        self.segment.pop()
    }

    /// Append a value to the vec.
    ///
    /// If vec is too small, new segment will be created.
    /// Data will then be moved to new segment.
    ///
    /// This is why this function can fail, because it depends on FS / IO calls.
    pub fn push(&mut self, value: T) -> io::Result<()> {
        // Check if we need to growth inner segment.
        if self.segment.len() == self.segment.capacity() {
            let min_capacity = page_size() / mem::size_of::<T>();
            let new_capacity = std::cmp::max(self.segment.capacity() * 2, min_capacity);
            let new_segment = self.builder.create_new_segment::<T>(new_capacity)?;
            debug_assert!(new_segment.capacity() > self.segment.capacity());

            // Copy previous data to new segment.
            let old_segment = mem::replace(&mut self.segment, new_segment);
            self.segment.fill_from(old_segment);
        }

        // Add new value to vec.
        if self.push_within_capacity(value).is_err() {
            panic!("Fail to push to newly created segment")
        }

        Ok(())
    }

    /// Try to push a new value to the data structure.
    ///
    /// If vec is too small, value will be return as an `Err`.
    #[inline]
    pub fn push_within_capacity(&mut self, value: T) -> Result<(), T> {
        self.segment.push_within_capacity(value)
    }
}

impl<T> Default for MmapVec<T> {
    fn default() -> Self {
        Self::new()
    }
}

impl<T> Deref for MmapVec<T> {
    type Target = [T];

    fn deref(&self) -> &Self::Target {
        self.segment.deref()
    }
}

impl<T> DerefMut for MmapVec<T> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        self.segment.deref_mut()
    }
}