arcstr 0.1.0

A better reference-counted string type
Documentation

arcstr: A better reference-counted string type.

Build Status codecov Docs Latest Version

Or, "types" hopefully — plural. The intent is for it to have a couple of those.

It just has one at the moment: ArcStr, which is the important one anyway, and has over Arc<str>.

  • Only a single pointer. Great for cases where you want to keep the data structure lightweight or need to do some FFI stuff with it or who knows.

  • It's possible to create a const ArcStr from a literal string constant.

    These are zero cost, take no heap allocation, and don't even need to perform atomic reads/writes when being cloned or dropped (or at any other time). They even get stored in the read-only memory of your executable, which can be beneficial for performance and memory usage.

    That said, I won't lie to you: the API for this is... a bit of a janky macro: unsafe { literal_arcstr!(b"stuff") };. Thing is, I can't verify UTF-8 validity and stay const, and various details mean I need a bytestring literal like b"..." which unfortunately means it could be non-utf8.

  • That said, ArcStr::new() is a const function, which isn't true of e.g. Arc<str>, which actually has to heap allocate for each default-initialized string. This shouldn't be surprising given the macro I mentioned. Naturally, this means that ArcStr::default() is free too. That said, this doesn't make us that special, as most types in libstd get it right, it's just Arc that can't.

  • ArcStr is totally immutable. No more need to lose sleep over code that thinks it has a right to mutate your Arc just because it holds the only reference. This is deliberate and IMO a feature... but I can see why some might want to frame it as a negative.

  • More implementations of various traits like PartialEq<Other> and friends than Arc<str> has AFAIK. That is, sometimes Arc<str>'s ergonomics feel a bit off, but I'm hoping that doesnt happen here.

  • We don't support Weak references, which means the overhead of atomic operations is lower. This is also a "Well, it's a feature to me" situation...

It also has all the stuff you'd expect like optional serde support, no_std, etc.

Planned funtionality

So right, yeah, I did mention that "really the intent is for the crate to have a couple of those". What did I mean by that? Well, there are a few things you can build on ArcStr in not much code that are pretty nice:

Substr Type

Essentially an ergonomic (ArcStr, Range<usize>), which can be used to avoid allocation when creating a lot of ranges over the same string. A use case for this is parsers and lexers (Note that in practice I might use Range<u32> and not Range<usize>).

Key type

Essentially this will be an 8-byte wrapper around ArcStr that allows storing small 7b-or-fewer strings inline, without allocation. It will be 8 bytes on 32-bit and 64-bit platforms, since 3b-or-fewer is not compelling.

Actually, I need to do some invesigation that 7b isn't too small too. The idea is for use as map keys or other small frequently repeated identifiers.