[−][src]Crate smartstring
Smart String
SmartString
is a wrapper around String
which offers
automatic inlining of small strings, as well as, optionally, improved cache locality
for comparisons between larger strings. It comes in two flavours:
Compact
, which takes up exactly as much space as a String
and is generally a little faster, and Prefixed
, which is usually
slower but can be much faster at string comparisons if your strings tend to
have a certain shape. Compact
is the default.
What Is It For?
The intended use for SmartString
is as a key type for a
B-tree (such as std::collections::BTreeMap
) or any kind of
array operation where cache locality is critical.
In general, it's a nice data type for reducing your heap allocations and
increasing the locality of string data. If you use SmartString
as a drop-in replacement for String
, you're almost certain to see
a slight performance boost, as well as slightly reduced memory usage.
How To Use It?
SmartString
has the exact same API as String
,
all the clever bits happen automatically behind the scenes, so you could just:
use smartstring::alias::String; use std::fmt::Write; let mut string = String::new(); string.push_str("This is just a string!"); string.clear(); write!(string, "Hello Joe!"); assert_eq!("Hello Joe!", string);
Give Me The Details
The Compact
variant is the same size as String
and
relies on pointer alignment to be able to store a discriminant bit in its
inline form that will never be present in its String
form, thus
giving us 24 bytes (on 64-bit architectures) minus one bit to encore our
inline string. It uses 23 bytes to store the string data and the remaining
7 bits to encode the string's length. When the available space is exceeded,
it swaps itself out with a String
containing its previous
contents. Likewise, if the string's length should drop below its inline
capacity again, it deallocates the string and moves its contents inline.
Given that we use the knowledge that a certain bit in the memory layout
of String
will always be unset as a discriminant, you would be
able to call std::mem::transmute::<String>()
on a boxed
smart string and start using it as a normal String
immediately -
there's no pointer tagging or similar trickery going on here.
(But please don't do that, there's an efficient Into<String>
implementation that does the exact same thing with no need to go unsafe
in your own code.)
The Prefixed
variant stores strings as one String
preceded by up to the first
FRAGMENT_SIZE
bytes of its content plus one byte
to store the size of the fragment plus the discriminant bit. FRAGMENT_SIZE
is
calculated to be the size of the padding between the size byte and the
String
, which would generally be 7 bytes on 64-bit systems.
This lets us quickly check for ordering or equality in a cache local
context if it can be determined by looking at the first couple of
bytes of the string, which is very often the case.
Given that the Prefixed
variant stores String
s
with this extra data, its inline variant has room for 31 bytes (once again, on
64-bit architectures).
It is aggressive about inlining strings, meaning that if you modify a heap allocated
string such that it becomes short enough for inlining, it will be inlined immediately
and the allocated String
will be dropped. This may cause multiple
unintended allocations if you repeatedly adjust your string's length across the
inline capacity threshold, so if your string's construction can get
complicated and you're relying on performance during construction, it might be better
to construct it as a String
and convert it once construction is done.
Performance
It doesn't aim to be more performant than String
in the general case,
except that it doesn't trigger heap allocations for anything shorter than
its inline capacity and so can be reasonably expected to exceed
String
's performance perceptibly on shorter strings, as well as being more
memory efficient in these cases. There will always be a slight overhead on all
operations on boxed strings, compared to String
.
You can assume that the Prefixed
variant will be more efficient
than String
when comparing strings in an array, given keys that tend
to differ inside the first couple of bytes. If your strings tend to have identical
prefixes, any benefits you'd otherwise gain from the cache locality of the stored
prefix would be lost, and the slight added complexity of SmartString
's
prefix search would be a problem. However, it's safe to assume that if you're comparing inlined
SmartString
s, they'll always beat String
.
Please do also keep in mind that Prefixed
strings are slightly larger
than plain String
s, which may have an impact on cache locality if you don't
plan for it, and certainly on how many strings will fit into a cache line at the same time.
Modules
alias | Convenient type aliases. |
Structs
Compact | A compact string representation equal to |
Drain | A draining iterator for a |
Prefixed | A string representation that always keeps an inline prefix. |
SmartString | A smart string. |
Constants
FRAGMENT_SIZE | The capacity of the prefix fragment stored by |
Traits
SmartStringMode | Marker trait for |