[][src]Crate smartstring

Smart String

SmartString is a wrapper around String which offers automatic inlining of small strings, as well as, optionally, improved cache locality for comparisons between larger strings. It comes in two flavours: Compact, which takes up exactly as much space as a String and is generally a little faster, and Prefixed, which is usually slower but can be much faster at string comparisons if your strings tend to have a certain shape. Compact is the default.

What Is It For?

The intended use for SmartString is as a key type for a B-tree (such as std::collections::BTreeMap) or any kind of array operation where cache locality is critical.

In general, it's a nice data type for reducing your heap allocations and increasing the locality of string data. If you use SmartString as a drop-in replacement for String, you're almost certain to see a slight performance boost, as well as slightly reduced memory usage.

How To Use It?

SmartString has the exact same API as String, all the clever bits happen automatically behind the scenes, so you could just:

use smartstring::alias::String;
use std::fmt::Write;

let mut string = String::new();
string.push_str("This is just a string!");
string.clear();
write!(string, "Hello Joe!");
assert_eq!("Hello Joe!", string);

Give Me The Details

The Compact variant is the same size as String and relies on pointer alignment to be able to store a discriminant bit in its inline form that will never be present in its String form, thus giving us 24 bytes (on 64-bit architectures) minus one bit to encore our inline string. It uses 23 bytes to store the string data and the remaining 7 bits to encode the string's length. When the available space is exceeded, it swaps itself out with a String containing its previous contents. Likewise, if the string's length should drop below its inline capacity again, it deallocates the string and moves its contents inline.

Given that we use the knowledge that a certain bit in the memory layout of String will always be unset as a discriminant, you would be able to call std::mem::transmute::<String>() on a boxed smart string and start using it as a normal String immediately - there's no pointer tagging or similar trickery going on here. (But please don't do that, there's an efficient Into<String> implementation that does the exact same thing with no need to go unsafe in your own code.)

The Prefixed variant stores strings as one String preceded by up to the first FRAGMENT_SIZE bytes of its content plus one byte to store the size of the fragment plus the discriminant bit. FRAGMENT_SIZE is calculated to be the size of the padding between the size byte and the String, which would generally be 7 bytes on 64-bit systems. This lets us quickly check for ordering or equality in a cache local context if it can be determined by looking at the first couple of bytes of the string, which is very often the case.

Given that the Prefixed variant stores Strings with this extra data, its inline variant has room for 31 bytes (once again, on 64-bit architectures).

It is aggressive about inlining strings, meaning that if you modify a heap allocated string such that it becomes short enough for inlining, it will be inlined immediately and the allocated String will be dropped. This may cause multiple unintended allocations if you repeatedly adjust your string's length across the inline capacity threshold, so if your string's construction can get complicated and you're relying on performance during construction, it might be better to construct it as a String and convert it once construction is done.

Performance

It doesn't aim to be more performant than String in the general case, except that it doesn't trigger heap allocations for anything shorter than its inline capacity and so can be reasonably expected to exceed String's performance perceptibly on shorter strings, as well as being more memory efficient in these cases. There will always be a slight overhead on all operations on boxed strings, compared to String.

You can assume that the Prefixed variant will be more efficient than String when comparing strings in an array, given keys that tend to differ inside the first couple of bytes. If your strings tend to have identical prefixes, any benefits you'd otherwise gain from the cache locality of the stored prefix would be lost, and the slight added complexity of SmartString's prefix search would be a problem. However, it's safe to assume that if you're comparing inlined SmartStrings, they'll always beat String.

Please do also keep in mind that Prefixed strings are slightly larger than plain Strings, which may have an impact on cache locality if you don't plan for it, and certainly on how many strings will fit into a cache line at the same time.

Modules

alias

Convenient type aliases.

Structs

Compact

A compact string representation equal to String in size.

Drain

A draining iterator for a SmartString.

Prefixed

A string representation that always keeps an inline prefix.

SmartString

A smart string.

Constants

FRAGMENT_SIZE

The capacity of the prefix fragment stored by Prefixed SmartStrings.

Traits

SmartStringMode

Marker trait for SmartString representations.