SSTB
An experimental an educational attempt to write a Rust thread-safe sstables library.
See the documentation for more details and background.
TODO ([x] means done)
- Prettify and publish benchmark results in the readme. For now one can "cargo bench" and look at the reports.
- cache=none does not work. It uses unbounded cache as default which is incorrect.
- [-] open-source
- write README with badges
- Travis tests etc
- backtraces in errors
- range queries
- bloom filters on disk
- they slowed things down by 25% though! but it works
- writing "flush_every"'s default should depend on the default compression.
- read cache size configurable both for page cache and for uncompressed cache
- read cache size should be in bytes, not blocks
- cache cannot be explicitly disabled in some places
- add length to encoded bits
- indexes as separate files in this case don't need to maintain the index in memory while writing
- remove as much as possible unsafe and unwrap
- Mmap can be put into an Arc, to remove unsafe static buffer casts. This should not matter at runtime.
- the index can store the number of items and uncompressed length (in case the file is compressed)
- the uncompressed length can be used when allocating memory for uncompressed chunks
- the number of items in the chunk can be used for HashMap capacity IF we get back the "Block" structure which helps not scan the whole table every time.
- there's a space tradeoff here, so maybe it's all not worth it
- consider getting back the "Block" trait and its implementations
- this will help not scan through the chunk on each get()
- however, there are costs
- need to allocate HashMaps for lookups
- if length is not known, might even reallocate
- messes up the concurrency as the hashmap becomes the contention point
- an RWLock might help for the majority of the cases
- need to allocate HashMaps for lookups
- even if all this is implemented, it's totally not guaranteed that it's going to be faster in the end.
- u16 keys and u32 values, not u64, for saving space
- mmap with no compression is already multi-threaded, but the API does not reflect that
- zlib bounded and unbounded performs the same in benchmarks
- analyze all casts from u64 to usize
- clippy actually has a lint for it in pedantic
- multi-threading
- compression is all over the place
- files and imports are all over the place, reorganize
- fail if keys or values are too long (> u32_max)
- byte keys
- also "memchr" is very slow, better to use offsets