[−][src]Crate var_byte_str
A variable byte encoded of gap to represent a string.
This crate is used mainly for large non English text that need to be represent by two or more
bytes per character in UTF-8 encoding. It encode string by iterating on each character
then turn it to u32
then calculate distance of this u32
to previous character.
The distance here is called "gap"
. Each gap is compressed by using variable byte encoding scheme.
It assume that text is usually come in as cluster where many contiguous characters came have code point close to each other. In such case, the character is likely to take only single byte with one extra bit for sign flag. See README.md for reason behind it.
In order to obtain back a character, it need to iterate from the very first character. This is similar to typical UTF derivative encoding as each char may have different number of bytes.
In order to serialize the encoded string, feature flag serialize
must be enable.
For example, in cargo.toml
:
var_byte_str = {version="*", features=["serialize"] default=false}
Structs
Chars | An iterator that return a |
Gaps | An iterator that return gap of each character as |
GapsBytes | An iterator that return gap as copy of variable byte encoded along with sign boolean.
Each iteration return a tuple of |
VarByteString | The core struct that represent variable byte encoded of gap of string. |