1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545
//! # AppendList: An append-only list that preserves references to its elements //! //! [![Build Status](https://travis-ci.com/danieldulaney/appendlist.svg?branch=master)](https://travis-ci.com/danieldulaney/appendlist) //! //! This list lets you add new things to the end, even if you're holding a reference //! to something already inside it. It also avoids reallocations. //! //! When should you use it? //! //! - You need to insert into a list while its other elements are borrowed //! - You have so much data that `Vec` reallocations are significant //! //! When shouldn't you use it? //! //! - You are storing indices into a list, rather than actual references (just use //! `Vec<T>`) //! - `Vec` reallocations don't matter very much (normally the case!) //! //! What are some features? //! //! - A `push(&self, item: T)` method (you'd expect `push(&mut self, item: T)`) //! - Non-amortized constant-time insertions and indexes (normally insertions are //! *amortized* constant-time, like `Vec`, or indexing is linear-time, like `LinkedList`) //! - You can hold onto references in the list //! - Only 2 lines of unsafe code //! //! ## What does this let me do? //! //! This example fails to compile because `second_element` has a reference into //! `list` when `list.push()` is called. //! //! ```rust compile_fail //! let list: Vec<u32> = (1..=10).collect(); //! //! let second_element = &list[1]; //! //! list.push(11); // Push needs &mut self, but list is already borrowed //! //! assert_eq!(*second_element, 2); // Fails to compile //! ``` //! //! But if you just swap in `AppendList` for `Vec`, everything works! //! //! ```rust //! use appendlist::AppendList; //! //! let list: AppendList<u32> = (1..=10).collect(); //! //! let second_element = &list[1]; //! //! list.push(11); // Push only needs &self, so this works fine //! //! assert_eq!(*second_element, 2); // All OK! //! ``` //! //! ## What's all this about reallocations? //! //! In general, `Vec`s are pretty cool, and you should use them by default. But they //! have a weakness: when you create //! one, it gets created with a finite amount of space. When it runs out of space, //! it needs to reallocate: grab a new hunk of memory (usually twice as big as //! the current one) and copy everything over, then release the old memory. //! //! Reallocations take O(n) time to do: you need to copy all n elements in the list. //! But you don't have to do them very often: only O(log n) reallocations are needed //! to do n insertions (for the current `Vec` implementation). In fact, the //! reallocations are rare enough that if you spread //! them out across all the insertions, they just add a constant extra time. //! This is why the Rust docs say that `Vec` has "O(1) *amortized* //! push" -- it generally takes constant time to push and it occasionally takes linear //! time, but if you spread out those expensive pushes over all the cheap pushes, //! it's still constant. //! //! Reallocations have another issue: any reference to an element inside the `Vec` //! is invalidated. If you have a reference to one of the elements when it gets //! copied over, your reference has no way of knowing its new location and will //! still point to the old location, which is now invalid memory. Using that reference //! would be a use-after-free bug, so Rust forces you to have no references into a //! `Vec` before you push another element on, just in case that push would reallocate. //! //! The `AppendList` solves both issues by keeping a `Vec` of chunks of data. When //! you push a new element on, it goes to the end of the current chunk. If the chunk //! is full, rather than reallocate it, `AppendList` creates a whole new chunk that //! starts off empty. Each //! chunk is double the size of the last chunk, so only O(log n) allocations are //! needed, and each one takes constant time (for most allocators) rather than linear //! time. By eliminating reallocations, an `AppendList` gives you a couple of benefits //! compared to `Vec`: //! //! - You can keep your references while pushing //! - You don't have to pay the occasional linear reallocate-and-copy cost //! //! However, it also comes with some drawbacks: //! //! - Indexing takes longer (you have to index into a chunk, then into your item) //! - CPU cache behavior might be somewhat worse near a chunk boundary (because the //! next chunk isn't generally contiguous) //! //! ## So should I use this crate? //! //! Probably not. //! //! In general, you should just use a `Vec` and keep track of indices rather than //! references. But if keeping references is very important, then this is your solution. use std::cell::{Cell, UnsafeCell}; use std::fmt::{self, Debug}; use std::iter::FromIterator; use std::ops::Index; // Must be a power of 2 const FIRST_CHUNK_SIZE: usize = 16; /// A list that can be appended to while elements are borrowed /// /// This looks like a fairly bare-bones list API, except that it has a `push` /// method that works on non-`mut` lists. It is safe to hold references to /// values inside this list and push a new value onto the end. /// /// Additionally, the list has O(1) index and O(1) push (not amortized!). /// /// For example, this would be illegal with a `Vec`: /// /// ``` /// use appendlist::AppendList; /// /// let list = AppendList::new(); /// /// list.push(1); /// let first_item = &list[0]; /// list.push(2); /// let second_item = &list[1]; /// /// assert_eq!(*first_item, list[0]); /// assert_eq!(*second_item, list[1]); /// ``` /// /// # Implementation details /// /// This section is not necessary to use the API, it just describes the underlying /// allocation and indexing strategies. /// /// The list is a `Vec` of *chunks*. Each chunk is itself a `Vec<T>`. The list /// will fill up a chunk, then allocate a new chunk with its full capacity. /// Because the capacity of a given chunk never changes, the underlying `Vec<T>` /// never reallocates, so references to that chunk are never invalidated. Each /// chunk is twice the size of the previous chunk, so there will never be more /// than O(log(n)) chunks. /// /// Constant-time indexing is achieved because the chunk ID of a particular index /// can be quickly calculated: if the first chunk has size c, index i will be /// located in chunk floor(log2(i + c) - log2(c)). If c is a power of 2, this /// is equivalent to floor(log2(i + c)) - floor(log2(c)), and a very fast floor /// log2 algorithm can be derived from `usize::leading_zeros()`. pub struct AppendList<T> { chunks: UnsafeCell<Vec<Vec<T>>>, len: Cell<usize>, } impl<T> AppendList<T> { /// Wrapper to get the list of chunks immutably fn chunks(&self) -> &[Vec<T>] { unsafe { &*self.chunks.get() } } /// In test builds, check all of the unsafe invariants /// /// In release builds, no-op fn check_invariants(&self) { #[cfg(test)] { if self.len.get() > 0 { // Correct number of chunks assert_eq!(index_chunk(self.len.get() - 1), self.chunks().len() - 1); // Every chunk holds enough items for chunk_id in 0..self.chunks().len() { assert!(chunk_size(chunk_id) <= self.chunks()[chunk_id].capacity()); } // Intermediate chunks are full for chunk_id in 0..self.chunks().len() - 1 { assert_eq!(chunk_size(chunk_id), self.chunks()[chunk_id].len()); } // Last chunk is correct length assert_eq!( self.chunks().last().unwrap().len(), self.len.get() - chunk_start(self.chunks().len() - 1) ); } else { // No chunks assert_eq!(0, self.chunks().len()); } } } /// Create a new `AppendList` pub fn new() -> Self { Self { chunks: UnsafeCell::new(Vec::new()), len: Cell::new(0), } } /// Append an item to the end /// /// Note that this does not require `mut`. pub fn push(&self, item: T) { self.check_invariants(); // Unsafe code alert! // // Preserve the following invariants: // - Only the last chunk may be modified // - A chunk cannot ever be reallocated // - len must reflect the length // // Invariants are checked in the check_invariants method let mut_chunks = unsafe { &mut *self.chunks.get() }; let new_index = self.len.get(); let chunk_id = index_chunk(new_index); if chunk_id < mut_chunks.len() { // We should always be inserting into the last chunk debug_assert_eq!(chunk_id, mut_chunks.len() - 1); // Insert into the appropriate chunk let chunk = &mut mut_chunks[chunk_id]; // The chunk must not be reallocated! Save the pre-insertion capacity // so we can check it later (debug builds only) #[cfg(test)] let prev_capacity = chunk.capacity(); // Do the insertion chunk.push(item); // Check that the capacity didn't change (debug builds only) #[cfg(test)] assert_eq!(prev_capacity, chunk.capacity()); } else { // Need to allocate a new chunk // New chunk should be the immediate next chunk debug_assert_eq!(chunk_id, mut_chunks.len()); // New chunk must be big enough let mut new_chunk = Vec::with_capacity(chunk_size(chunk_id)); debug_assert!(new_chunk.capacity() >= chunk_size(chunk_id)); new_chunk.push(item); mut_chunks.push(new_chunk); } // Increment the length self.len.set(self.len.get() + 1); self.check_invariants(); } /// Get the length of the list pub fn len(&self) -> usize { self.check_invariants(); self.len.get() } /// Get an item from the list, if it is in bounds /// /// Returns `None` if the `index` is out-of-bounds. Note that you can also /// index with `[]`, which will panic on out-of-bounds. pub fn get(&self, index: usize) -> Option<&T> { self.check_invariants(); if index >= self.len() { return None; } let chunk_id = index_chunk(index); let chunk_start = chunk_start(chunk_id); return Some(&self.chunks()[chunk_id][index - chunk_start]); } /// Get an iterator over the list pub fn iter(&self) -> Iter<T> { self.check_invariants(); Iter { list: &self, index: 0, } } } const fn chunk_size(chunk_id: usize) -> usize { // First chunk is FIRST_CHUNK_SIZE, subsequent chunks double each time FIRST_CHUNK_SIZE << chunk_id } const fn chunk_start(chunk_id: usize) -> usize { // This looks like magic, but I promise it works // Essentially, each chunk is the size of the sum of all chunks before // it. Except that the first chunk is different: it "should" be preceded // by a whole list of chunks that sum to its size, but it's not. Therefore, // there's a "missing" set of chunks the size of the first chunk, so // later chunks need to be updated. chunk_size(chunk_id) - FIRST_CHUNK_SIZE } const fn index_chunk(index: usize) -> usize { // This *is* magic floor_log2(index + FIRST_CHUNK_SIZE) - floor_log2(FIRST_CHUNK_SIZE) } #[inline] const fn floor_log2(x: usize) -> usize { const BITS_PER_BYTE: usize = 8; BITS_PER_BYTE * std::mem::size_of::<usize>() - (x.leading_zeros() as usize) - 1 } impl<T> Default for AppendList<T> { fn default() -> Self { Self::new() } } impl<T> Index<usize> for AppendList<T> { type Output = T; fn index(&self, index: usize) -> &Self::Output { self.get(index) .expect("AppendList indexed beyond its length") } } impl<T> FromIterator<T> for AppendList<T> { fn from_iter<I: IntoIterator<Item = T>>(iter: I) -> Self { let list = Self::new(); for item in iter { list.push(item); } list } } impl<'l, T> IntoIterator for &'l AppendList<T> { type Item = &'l T; type IntoIter = Iter<'l, T>; fn into_iter(self) -> Self::IntoIter { self.iter() } } impl<T: PartialEq> PartialEq for AppendList<T> { fn eq(&self, other: &AppendList<T>) -> bool { let mut s = self.iter(); let mut o = other.iter(); loop { match (s.next(), o.next()) { (Some(a), Some(b)) if a == b => {}, (None, None) => return true, _ => return false, } } } } impl<T: Debug> Debug for AppendList<T> { fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result { fmt.debug_list().entries(self.iter()).finish() } } pub struct Iter<'l, T> { list: &'l AppendList<T>, index: usize, } impl<'l, T> Iterator for Iter<'l, T> { type Item = &'l T; fn next(&mut self) -> Option<Self::Item> { let item = self.list.get(self.index); self.index += 1; item } fn size_hint(&self) -> (usize, Option<usize>) { let remaining = self.list.len() - self.index; (remaining, Some(remaining)) } } #[cfg(test)] mod test { use super::*; #[test] fn from_iterator() { let l: AppendList<i32> = (0..100).collect(); for i in 0..100 { assert_eq!(l[i], i as i32); } } #[test] fn iterator() { let l: AppendList<i32> = (0..100).collect(); let mut i1 = l.iter(); let mut i2 = l.into_iter(); for item in 0..100 { assert_eq!(i1.next(), Some(&item)); assert_eq!(i2.next(), Some(&item)); } assert_eq!(i1.next(), None); assert_eq!(i2.next(), None); } #[test] fn equality() { let a = AppendList::new(); let b = AppendList::new(); assert_eq!(a, b); a.push("foo"); assert_ne!(a, b); b.push("foo"); assert_eq!(a, b); a.push("bar"); a.push("baz"); assert_ne!(a, b); } #[test] fn iterator_size_hint() { let l: AppendList<i32> = AppendList::new(); let mut i = l.iter(); assert_eq!(i.size_hint(), (0, Some(0))); l.push(1); assert_eq!(i.size_hint(), (1, Some(1))); l.push(2); assert_eq!(i.size_hint(), (2, Some(2))); i.next(); assert_eq!(i.size_hint(), (1, Some(1))); l.push(3); assert_eq!(i.size_hint(), (2, Some(2))); i.next(); assert_eq!(i.size_hint(), (1, Some(1))); i.next(); assert_eq!(i.size_hint(), (0, Some(0))); } #[test] fn chunk_sizes_make_sense() { assert_eq!(chunk_size(0), FIRST_CHUNK_SIZE); let mut index = 0; for chunk in 0..20 { // Each chunk starts just after the previous one ends assert_eq!(chunk_start(chunk), index); index += chunk_size(chunk); } } #[test] fn index_chunk_matches_up() { for index in 0..1_000_000 { let chunk_id = index_chunk(index); // Each index happens after its chunk start and before its chunk end assert!(index >= chunk_start(chunk_id)); assert!(index < chunk_start(chunk_id) + chunk_size(chunk_id)); } } #[test] fn empty_list() { let n: AppendList<usize> = AppendList::new(); assert_eq!(n.len(), 0); assert_eq!(n.get(0), None); let d: AppendList<usize> = AppendList::default(); assert_eq!(d.len(), 0); assert_eq!(d.get(0), None); } #[test] fn thousand_item_list() { test_big_list(1_000); } #[test] #[ignore] fn million_item_list() { test_big_list(1_000_000); } fn test_big_list(size: usize) { let l = AppendList::new(); let mut refs = Vec::new(); for i in 0..size { assert_eq!(l.len(), i); l.push(i); refs.push(l[i]); assert_eq!(l.len(), i + 1); } for i in 0..size { assert_eq!(Some(&refs[i]), l.get(i)); } } }