wasefire_store/lib.rs
1// Copyright 2019 Google LLC
2//
3// Licensed under the Apache License, Version 2.0 (the "License");
4// you may not use this file except in compliance with the License.
5// You may obtain a copy of the License at
6//
7// http://www.apache.org/licenses/LICENSE-2.0
8//
9// Unless required by applicable law or agreed to in writing, software
10// distributed under the License is distributed on an "AS IS" BASIS,
11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12// See the License for the specific language governing permissions and
13// limitations under the License.
14
15// The documentation is easier to read from a browser:
16// - Run: cargo doc --document-private-items --features=std
17// - Open: target/doc/store/index.html
18
19//! Store abstraction for flash storage
20//!
21//! # Specification
22//!
23//! The [store](Store) provides a partial function from keys to values on top of a
24//! [storage](Storage) interface. The store total [capacity](Store::capacity) depends on the size of
25//! the storage. Store [updates](StoreUpdate) may be bundled in [transactions](Store::transaction).
26//! Mutable operations are atomic, including when interrupted.
27//!
28//! The store is flash-efficient in the sense that it uses the storage [lifetime](Store::lifetime)
29//! efficiently. For each page, all words are written at least once between erase cycles and all
30//! erase cycles are used. However, not all written words are user content: Lifetime is also
31//! consumed with metadata and compaction.
32//!
33//! The store is extendable with other entries than key-values. It is essentially a framework
34//! providing access to the storage lifetime. The partial function is simply the most common usage
35//! and can be used to encode other usages.
36//!
37//! ## Definitions
38//!
39//! An _entry_ is a pair of a key and a value. A _key_ is a number between 0 and
40//! [4095](format::MAX_KEY_INDEX). A _value_ is a byte slice with a length between 0 and
41//! [1023](format::Format::max_value_len) bytes (for large enough pages).
42//!
43//! The store provides the following _updates_:
44//! - Given a key and a value, [`StoreUpdate::Insert`] updates the store such that the value is
45//! associated with the key. The values for other keys are left unchanged.
46//! - Given a key, [`StoreUpdate::Remove`] updates the store such that no value is associated with
47//! the key. The values for other keys are left unchanged. Additionally, if there was a value
48//! associated with the key, the value is wiped from the storage (all its bits are set to 0).
49//!
50//! The store provides the following _read-only operations_:
51//! - [`Store::iter`] iterates through the store returning all entries exactly once. The iteration
52//! order is not specified but stable between mutable operations.
53//! - [`Store::capacity`] returns how many words can be stored before the store is full.
54//! - [`Store::lifetime`] returns how many words can be written before the storage lifetime is
55//! consumed.
56//!
57//! The store provides the following _mutable operations_:
58//! - Given a set of independent updates, [`Store::transaction`] applies the sequence of updates.
59//! - Given a threshold, [`Store::clear`] removes all entries with a key greater or equal to the
60//! threshold.
61//! - Given a length in words, [`Store::prepare`] makes one step of compaction unless that many
62//! words can be written without compaction. This operation has no effect on the store but may
63//! still mutate its storage. In particular, the store has the same capacity but a possibly
64//! reduced lifetime.
65//!
66//! A mutable operation is _atomic_ if, when power is lost during the operation, the store is either
67//! updated (as if the operation succeeded) or left unchanged (as if the operation did not occur).
68//! If the store is left unchanged, lifetime may still be consumed.
69//!
70//! The store relies on the following _storage interface_:
71//! - It is possible to [read](Storage::read_slice) a byte slice. The slice won't span multiple
72//! pages.
73//! - It is possible to [write](Storage::write_slice) a word slice. The slice won't span multiple
74//! pages.
75//! - It is possible to [erase](Storage::erase_page) a page.
76//! - The pages are sequentially indexed from 0. If the actual underlying storage is segmented, then
77//! the storage layer should translate those indices to actual page addresses.
78//!
79//! The store has a _total capacity_ of C = (N - 1) * (P - 4) - M - 1 words, where:
80//! - P is the number of words per page
81//! - [N](format::Format::num_pages) is the number of pages
82//! - [M](format::Format::max_prefix_len) is the maximum length in words of a value (256 for large
83//! enough pages)
84//!
85//! The capacity used by each mutable operation is given below (a transient word only uses capacity
86//! during the operation):
87//!
88//! | Operation/Update | Used capacity | Freed capacity | Transient capacity |
89//! | ----------------------- | ---------------- | ----------------- | ------------------ |
90//! | [`StoreUpdate::Insert`] | 1 + value length | overwritten entry | 0 |
91//! | [`StoreUpdate::Remove`] | 0 | deleted entry | see below\* |
92//! | [`Store::transaction`] | 0 + updates | 0 + updates | 1 |
93//! | [`Store::clear`] | 0 | deleted entries | 0 |
94//! | [`Store::prepare`] | 0 | 0 | 0 |
95//!
96//! \*0 if the update is alone in the transaction, otherwise 1.
97//!
98//! The _total lifetime_ of the store is below L = ((E + 1) * N - 1) * (P - 2) and above L - M
99//! words, where E is the maximum number of erase cycles. The lifetime is used when capacity is
100//! used, including transiently, as well as when compaction occurs. Compaction frequency and
101//! lifetime consumption are positively correlated to the store load factor (the ratio of used
102//! capacity to total capacity).
103//!
104//! It is possible to approximate the cost of transient words in terms of capacity: L transient
105//! words are equivalent to C - x words of capacity where x is the average capacity (including
106//! transient) of operations.
107//!
108//! ## Preconditions
109//!
110//! The following assumptions need to hold, or the store may behave in unexpected ways:
111//! - A word can be written [twice](Storage::max_word_writes) between erase cycles.
112//! - A page can be erased [E](Storage::max_page_erases) times after the first boot of the store.
113//! - When power is lost while writing a slice or erasing a page, the next read returns a slice
114//! where a subset (possibly none or all) of the bits that should have been modified have been
115//! modified.
116//! - Reading a slice is deterministic. When power is lost while writing a slice or erasing a slice
117//! (erasing a page containing that slice), reading that slice repeatedly returns the same result
118//! (until it is overwritten or its page is erased).
119//! - To decide whether a page has been erased, it is enough to test if all its bits are equal to 1.
120//! - When power is lost while writing a slice or erasing a page, that operation does not count
121//! towards the limits. However, completing that write or erase operation would count towards the
122//! limits, as if the number of writes per word and number of erase cycles could be fractional.
123//! - The storage is only modified by the store. Note that completely erasing the storage is
124//! supported, essentially losing all content and lifetime tracking. It is preferred to use
125//! [`Store::clear`] with a threshold of 0 to keep the lifetime tracking.
126//!
127//! The store properties may still hold outside some of those assumptions, but with an increasing
128//! chance of failure.
129//!
130//! # Implementation
131//!
132//! We define the following constants:
133//! - [E](format::Format::max_page_erases) <= [65535](format::MAX_ERASE_CYCLE) the number of times a
134//! page can be erased.
135//! - 3 <= [N](format::Format::num_pages) < 64 the number of pages in the storage.
136//! - 8 <= P <= 1024 the number of words in a page.
137//! - [Q](format::Format::virt_page_size) = P - 2 the number of words in a virtual page.
138//! - [M](format::Format::max_prefix_len) = min(Q - 1, 256) the maximum length in words of a value.
139//! - [W](format::Format::window_size) = (N - 1) * Q - M the window size.
140//! - [V](format::Format::virt_size) = (N - 1) * (Q - 1) - M the virtual capacity.
141//! - [C](format::Format::total_capacity) = V - N the user capacity.
142//!
143//! We build a virtual storage from the physical storage using the first 2 words of each page:
144//! - The first word contains the number of times the page has been erased.
145//! - The second word contains the starting word to which this page is being moved during
146//! compaction.
147//!
148//! The virtual storage has a length of (E + 1) * N * Q words and represents the lifetime of the
149//! store. (We reserve the last Q + M words to support adding emergency lifetime.) This virtual
150//! storage has a linear address space.
151//!
152//! We define a set of overlapping windows of N * Q words at each Q-aligned boundary. We call i the
153//! window spanning from i * Q to (i + N) * Q. Only those windows actually exist in the underlying
154//! storage. We use compaction to shift the current window from i to i + 1, preserving the content
155//! of the store.
156//!
157//! For a given state of the virtual storage, we define h\_i as the position of the first entry of
158//! the window i. We call it the head of the window i. Because entries are at most M + 1 words, they
159//! can overlap on the next page only by M words. So we have i * Q <= h_i <= i * Q + M . Since there
160//! are no entries before the first page, we have h\_0 = 0.
161//!
162//! We define t\_i as one past the last entry of the window i. If there are no entries in that
163//! window, we have t\_i = h\_i. We call t\_i the tail of the window i. We define the compaction
164//! invariant as t\_i - h\_i <= V and the window invariant as t\_i - h\_i <= W. The compaction
165//! invariant may temporarily be broken during a sequence of (at most N - 1) compactions.
166//!
167//! We define |x| as the capacity used before position x. We have |x| <= x. We define the capacity
168//! invariant as |t\_i| - |h\_i| <= C.
169//!
170//! Using this virtual storage, entries are appended to the tail as long as there is both virtual
171//! capacity to preserve the compaction invariant and capacity to preserve the capacity invariant.
172//! When virtual capacity runs out, the first page of the window is compacted and the window is
173//! shifted.
174//!
175//! Entries are identified by a prefix of bits. The prefix has to contain at least one bit set to
176//! zero to differentiate from the tail. Entries can be one of:
177//! - [Padding](format::ID_PADDING): A word whose first bit is set to zero. The rest is arbitrary.
178//! This entry is used to mark words partially written after an interrupted operation as padding
179//! such that they are ignored by future operations.
180//! - [Header](format::ID_HEADER): A word whose second bit is set to zero. It contains the following
181//! fields:
182//! - A [bit](format::HEADER_DELETED) indicating whether the entry is deleted.
183//! - A [bit](format::HEADER_FLIPPED) indicating whether the value is word-aligned and has all
184//! bits set to 1 in its last word. The last word of an entry is used to detect that an entry
185//! has been fully written. As such it must contain at least one bit equal to zero.
186//! - The [key](format::HEADER_KEY) of the entry.
187//! - The [length](format::HEADER_LENGTH) in bytes of the value. The value follows the header.
188//! The entry is word-aligned if the value is not.
189//! - The [checksum](format::HEADER_CHECKSUM) of the first and last word of the entry.
190//! - [Erase](format::ID_ERASE): A word used during compaction. It contains the
191//! [page](format::ERASE_PAGE) to be erased and a [checksum](format::WORD_CHECKSUM).
192//! - [Clear](format::ID_CLEAR): A word used during the clear operation. It contains the
193//! [threshold](format::CLEAR_MIN_KEY) and a [checksum](format::WORD_CHECKSUM).
194//! - [Marker](format::ID_MARKER): A word used during a transaction. It contains the [number of
195//! updates](format::MARKER_COUNT) following the marker and a [checksum](format::WORD_CHECKSUM).
196//! - [Remove](format::ID_REMOVE): A word used inside a transaction. It contains the
197//! [key](format::REMOVE_KEY) of the entry to be removed and a [checksum](format::WORD_CHECKSUM).
198//!
199//! Checksums are the number of bits equal to 0.
200//!
201//! # Proofs
202//!
203//! ## Compaction
204//!
205//! Let I be a window at which all invariants hold. We will show that the next N - 1 compactions
206//! will preserve the window invariant (the capacity invariant is trivially preserved) after each
207//! compaction. We will also show that after N - 1 compactions, the compaction invariant is
208//! restored.
209//!
210//! We consider all notations on the virtual storage after the full compaction. We will use the |x|
211//! notation although we update the state of the virtual storage. This is fine because compaction
212//! doesn't change the status of an existing word.
213//!
214//! We first show that after each compaction, the window invariant is preserved.
215//!
216//! ```text
217//! for all (1 <= i <= N - 1) t_{I + i} - h_{I + i} <= W
218//! ```
219//!
220//! We assume i between 1 and N - 1.
221//!
222//! One step of compaction advances the tail by how many words were used in the first page of the
223//! window with the last entry possibly overlapping on the next page.
224//!
225//! ```text
226//! for all j t_{j + 1} = t_j + |h_{j + 1}| - |h_j| + 1
227//! ```
228//!
229//! By induction, we have:
230//!
231//! ```text
232//! t_{I + i} = t_I + |h_{I + i}| - |h_I| + i
233//! ```
234//!
235//! We have the following properties:
236//!
237//! ```text
238//! t_I <= h_I + V
239//! |h_{I + i}| - |h_I| <= h_{I + i} - h_I
240//! ```
241//!
242//! Replacing into our previous equality, we can conclude:
243//!
244//! ```text
245//! t_{I + i} = t_I + |h_{I + i}| - |h_I| + i
246//! <= h_I + V + h_{I + 1} - h_I + i
247//! iff
248//! t_{I + i} - h_{I + 1} <= V + i
249//! <= V + N - 1
250//! = W
251//! ```
252//!
253//! An important corollary is that the tail stays within the window:
254//!
255//! ```text
256//! t_{I + i} <= (I + i + N - 1) * Q
257//! ```
258//!
259//! We have the following property:
260//!
261//! ```text
262//! h_{I + i} <= (I + i) * Q + M
263//! ```
264//!
265//! From which we conclude with the definition of W:
266//!
267//! ```text
268//! t_{I + i} <= h_{I + i} + W
269//! <= (I + i) * Q + M + (N - 1) * Q - M
270//! = (I + i + N - 1) * Q
271//! ```
272//!
273//! We finally show that after N - 1 compactions, the compaction invariant is restored. In
274//! particular, the remaining capacity is available without compaction.
275//!
276//! ```text
277//! V - (t_{I + N - 1} - h_{I + N - 1}) >= C - (|t_{I + N - 1}| - |h_{I + N - 1}|) + 1
278//! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~
279//! immediate capacity remaining capacity |
280//! reserved for clear
281//! ```
282//!
283//! We can replace the definition of C and simplify:
284//!
285//! ```text
286//! V - (t_{I + N - 1} - h_{I + N - 1}) >= V - N - (|t_{I + N - 1}| - |h_{I + N - 1}|) + 1
287//! iff t_{I + N - 1} - h_{I + N - 1} <= |t_{I + N - 1}| - |h_{I + N - 1}| + N - 1
288//! ```
289//!
290//! We have the following properties:
291//!
292//! ```text
293//! t_{I + N - 1} = t_I + |h_{I + N - 1}| - |h_I| + N - 1
294//! |t_{I + N - 1}| - |h_{I + N - 1}| = |t_I| - |h_I|
295//! |h_{I + N - 1}| - |t_I| <= h_{I + N - 1} - t_I
296//! ```
297//!
298//! From which we conclude:
299//!
300//! ```text
301//! t_{I + N - 1} - h_{I + N - 1} <= |t_{I + N - 1}| - |h_{I + N - 1}| + N - 1
302//! iff t_I + |h_{I + N - 1}| - |h_I| + N - 1 - h_{I + N - 1} <= |t_I| - |h_I| + N - 1
303//! iff t_I + |h_{I + N - 1}| - h_{I + N - 1} <= |t_I|
304//! iff |h_{I + N - 1}| - |t_I| <= h_{I + N - 1} - t_I
305//! ```
306//!
307//! ## Checksum
308//!
309//! The main property we want is that all partially written/erased words are either the initial
310//! word, the final word, or invalid.
311//!
312//! We say that a bit sequence `TARGET` is reachable from a bit sequence `SOURCE` if both have the
313//! same length and `SOURCE & TARGET == TARGET` where `&` is the bitwise AND operation on bit
314//! sequences of that length. In other words, when `SOURCE` has a bit equal to 0 then `TARGET` also
315//! has that bit equal to 0.
316//!
317//! The only written entries start with `101` or `110` and are written from an erased word. Marking
318//! an entry as padding or deleted is a single bit operation, so the property trivially holds. For
319//! those cases, the proof relies on the fact that there is exactly one bit equal to 0 in the 3
320//! first bits. Either the 3 first bits are still `111` in which case we expect the remaining bits
321//! to be equal to 1. Otherwise we can use the checksum of the given type of entry because those 2
322//! types of entries are not reachable from each other. Here is a visualization of the partitioning
323//! based on the first 3 bits:
324//!
325//! | First 3 bits | Description | How to check |
326//! | ------------:| ------------------ | ---------------------------- |
327//! | `111` | Erased word | All bits set to `1` |
328//! | `101` | User entry | Contains a checksum |
329//! | `110` | Internal entry | Contains a checksum |
330//! | `100` | Deleted user entry | No check, atomically written |
331//! | `0??` | Padding entry | No check, atomically written |
332//!
333//! To show that valid entries of a given type are not reachable from each other, we show 3 lemmas:
334//!
335//! 1. A bit sequence is not reachable from another if its number of bits equal to 0 is smaller.
336//! 2. A bit sequence is not reachable from another if they have the same number of bits equals to 0
337//! and are different.
338//! 3. A bit sequence is not reachable from another if it is bigger when they are interpreted as
339//! numbers in binary representation.
340//!
341//! From those lemmas we consider the 2 cases. If both entries have the same number of bits equal to
342//! 0, they are either equal or not reachable from each other because of the second lemma. If they
343//! don't have the same number of bits equal to 0, then the one with less bits equal to 0 is not
344//! reachable from the other because of the first lemma and the one with more bits equal to 0 is not
345//! reachable from the other because of the third lemma and the definition of the checksum.
346//!
347//! # Fuzzing
348//!
349//! For any sequence of operations and interruptions starting from an erased storage, the store is
350//! checked against its model and some internal invariant at each step.
351//!
352//! For any sequence of operations and interruptions starting from an arbitrary storage, the store
353//! is checked not to crash.
354
355#![cfg_attr(not(feature = "std"), no_std)]
356
357#[macro_use]
358extern crate alloc;
359
360#[cfg(feature = "std")]
361mod buffer;
362#[cfg(feature = "std")]
363mod driver;
364#[cfg(feature = "std")]
365mod file;
366pub mod format;
367pub mod fragment;
368#[cfg(feature = "std")]
369mod model;
370mod storage;
371mod store;
372#[cfg(test)]
373mod test;
374
375#[cfg(feature = "std")]
376pub use self::buffer::{BufferCorruptFunction, BufferOptions, BufferStorage, INTERRUPTION};
377#[cfg(feature = "std")]
378pub use self::driver::{
379 StoreDriver, StoreDriverOff, StoreDriverOn, StoreInterruption, StoreInvariant,
380};
381#[cfg(feature = "std")]
382pub use self::file::{FileOptions, FileStorage};
383#[cfg(feature = "std")]
384pub use self::model::{StoreModel, StoreOperation};
385pub use self::storage::{Storage, StorageIndex};
386pub use self::store::{
387 INVALID_ARGUMENT, INVALID_STORAGE, NO_CAPACITY, NO_LIFETIME, Store, StoreHandle, StoreIter,
388 StoreRatio, StoreUpdate,
389};
390
391/// Internal representation of natural numbers.
392///
393/// In Rust natural numbers are represented as `usize`. However, internally we represent them as
394/// `u32`. This is done to preserve semantics across different targets. This is useful when tests
395/// run with `usize = u64` while the actual target has `usize = u32`.
396///
397/// To avoid too many conversions between `usize` and `Nat` which are necessary when interfacing
398/// with Rust, `usize` is used instead of `Nat` in code meant only for tests.
399///
400/// Currently, the store only supports targets with `usize = u32`.
401// Make sure production builds have `usize = 32`.
402#[cfg(any(target_pointer_width = "32", feature = "std"))]
403pub type Nat = u32;
404
405/// Returns the internal representation of a Rust natural number.
406///
407/// # Panics
408///
409/// Panics if the conversion overflows.
410fn usize_to_nat(x: usize) -> Nat {
411 Nat::try_from(x).unwrap()
412}