[][src]Crate smartstring

Smart String

SmartString is a wrapper around String which offers automatic inlining of small strings. It comes in two flavours: LazyCompact, which takes up exactly as much space as a String and is generally a little faster, and Compact, which is the same as LazyCompact except it will aggressively re-inline any expanded Strings which become short enough to do so. LazyCompact is the default.

What Is It For?

The intended use for SmartString is as a key type for a B-tree (such as std::collections::BTreeMap) or any kind of array operation where cache locality is critical.

In general, it's a nice data type for reducing your heap allocations and increasing the locality of string data. If you use SmartString as a drop-in replacement for String, you're almost certain to see a slight performance boost, as well as slightly reduced memory usage.

How To Use It?

SmartString has the exact same API as String, all the clever bits happen automatically behind the scenes, so you could just:

use smartstring::alias::String;
use std::fmt::Write;

let mut string = String::new();
string.push_str("This is just a string!");
string.clear();
write!(string, "Hello Joe!");
assert_eq!("Hello Joe!", string);

Give Me The Details

SmartString is the same size as String and relies on pointer alignment to be able to store a discriminant bit in its inline form that will never be present in its String form, thus giving us 24 bytes (on 64-bit architectures) minus one bit to encode our inline string. It uses 23 bytes to store the string data and the remaining 7 bits to encode the string's length. When the available space is exceeded, it swaps itself out with a String containing its previous contents. Likewise, if the string's length should drop below its inline capacity again, it deallocates the string and moves its contents inline.

Given that we use the knowledge that a certain bit in the memory layout of String will always be unset as a discriminant, you would be able to call std::mem::transmute::<String>() on a boxed smart string and start using it as a normal String immediately - there's no pointer tagging or similar trickery going on here. (But please don't do that, there's an efficient Into<String> implementation that does the exact same thing with no need to go unsafe in your own code.)

It is aggressive about inlining strings, meaning that if you modify a heap allocated string such that it becomes short enough for inlining, it will be inlined immediately and the allocated String will be dropped. This may cause multiple unintended allocations if you repeatedly adjust your string's length across the inline capacity threshold, so if your string's construction can get complicated and you're relying on performance during construction, it might be better to construct it as a String and convert it once construction is done.

LazyCompact looks the same as Compact, except it never re-inlines a string that's already been heap allocated, instead keeping the allocation around in case it needs it. This makes for less cache local strings, but is the best choice if you're more worried about time spent on unnecessary allocations than cache locality.

Performance

It doesn't aim to be more performant than String in the general case, except that it doesn't trigger heap allocations for anything shorter than its inline capacity and so can be reasonably expected to exceed String's performance perceptibly on shorter strings, as well as being more memory efficient in these cases. There will always be a slight overhead on all operations on boxed strings, compared to String.

Caveat

The way smartstring gets by without a discriminant is dependent on the memory layout of the std::string::String struct, which isn't something the Rust compiler and standard library make any guarantees about. smartstring makes an assumption about how it's been laid out, which has held basically since rustc came into existence, but is nonetheless not a safe assumption to make, and if the layout ever changes, smartstring will stop working properly (at least on little-endian architectures, the assumptions made on big-endian archs will hold regardless of the actual memory layout). Its test suite does comprehensive validation of these assumptions, and as long as the CI build is passing for any given rustc version, you can be sure it will do its job properly on all tested architectures. You can also check out the smartstring source tree yourself and run cargo test to validate it for your particular configuration.

As an extra precaution, some runtime checks are made as well, so that if the memory layout assumption no longer holds, smartstring will not work correctly, but there should be no security implications and it should crash early.

Feature Flags

smartstring comes with optional support for the following crates through Cargo feature flags. You can enable them in your Cargo.toml file like this:

[dependencies]
smartstring = { version = "*", features = ["proptest", "serde"] }
FeatureDescription
arbitraryArbitrary implementation for SmartString.
proptestA strategy for generating SmartStrings from a regular expression.
serdeSerialize and Deserialize implementations for SmartString.

Modules

alias

Convenient type aliases.

proptest

proptest strategies (requires the proptest feature flag).

Structs

Compact

A compact string representation equal to String in size with guaranteed inlining.

Drain

A draining iterator for a SmartString.

LazyCompact

A representation similar to Compact but which doesn't re-inline strings.

SmartString

A smart string.

Traits

SmartStringMode

Marker trait for SmartString representations.