[][src]Crate var_byte_str

A variable byte encoded of gap to represent a string.

This crate is used mainly for large non English text that need to be represent by two or more bytes per character in UTF-8 encoding. It encode string by iterating on each character then turn it to u32 then calculate distance of this u32 to previous character. The distance here is called "gap". Each gap is compressed by using variable byte encoding scheme.

It assume that text is usually come in as cluster where many contiguous characters came have code point close to each other. In such case, the character is likely to take only single byte with one extra bit for sign flag. See README.md for reason behind it.

In order to obtain back a character, it need to iterate from the very first character. This is similar to typical UTF derivative encoding as each char may have different number of bytes.

In order to serialize the encoded string, feature flag serialize must be enable.

For example, in cargo.toml:

var_byte_str = {version="*", features=["serialize"] default=false}

Structs

Chars

An iterator that return a char using unsafe cast.

Gaps

An iterator that return gap of each character as i64 offset. The first return value can be cast to char. The subsequence value need to be added to first value in order to obtain an i64 that can be cast to char.

GapsBytes

An iterator that return gap as copy of variable byte encoded along with sign boolean. Each iteration return a tuple of bool and SmallVec<[u8; 5]>. The bool part is true when the bytes it represent should be convert into negative value. The SmallVec<[u8; 5]> part contain absolute value of different in least significant byte first order.

VarByteString

The core struct that represent variable byte encoded of gap of string.