Crate char_list

source ·
Expand description

A persistent string type with the same API as a linked-list of characters.

§Contents

§FiniteCharList

The FiniteCharList type is probably what you want to use. It behaves just like a linked list of characters but with better (TODO: verify) performance.

§CharList<Tail>

The CharList<Tail> type allows customization of the data structure by allowing a custom tail. I’m using this to emulate Prolog’s open-ended lists where the tail of the list may be an uninstantiated Prolog variable.

A FiniteCharList is just a type alias for CharList<()>.

Generic CharLists have much fewer abilities provided by this crate because the nature of the tail is unknown. For instance, CharList<Tail> does not implement AsRef<str> or even PartialEq!

The intention is that you will wrap a CharList<YourTail> in a newtype and implement those traits if it makes sense for your problem space. For example, I (plan to) use a CharList<ast::Expression> which will implement PartialEq because I want syntactic equality.

§DISCLAIMER: unsafe

This crate is a work in progress. Specifically Not all uses of unsafe have been validated! Please don’t use this for anything serious yet.

Safety audits are welcome and appreciated! I’m still quite new to writing unsafe code.

Also, this crate depends on front-vec which is also badly in need of auditing.

§Internal Representation

This crate relies on front_vec::FrontString to store arrays of bytes which allow for efficient prepending (pushing characters onto the front of the array).

Each FrontString is owned by a PqRcCell (internal to this crate, soon to be it’s own crate) which is sorta like a cross between a RefCell and a RcBox (the thing an Rc pointer points to).

A CharList is just a pointer + length (though it’s not a slice) which refers to a specific slice of a PqRcCell-managed FrontString.

The PqRcCell ensures that out of all CharLists which refer to it’s FrontString, only those with the longest length are allowed to mutate the inner FrontString value. And they can only* mutate it in ways that other CharLists won’t notice.

* (Ok, at this point we make developers pinky-swear 🤞 they’ll follow those rules, and they gotta wrap their use in an unsafe block. Maybe there’s a better way to do this in the future though 🤔)

That was a lotta words, let’s look at an example with memory diagrams.

§Memory Diagram

§Note: Simplified Diagrams

Every time the word len or refs appears in these diagrams, in the code it corresponds to either priority or priorities. The internal representation of a CharList is actually just a PqRc (Priority Queue Reference Counted) pointer which abstracts over its priority. For CharLists, priority is the same as string length.

§Example

Consider this code:

use assert2::assert; // For nicer assertions.
let icon = FiniteCharList::from("icon");
let nomicon = icon.cons_str("nom");
let rustonomicon = nomicon.cons_str("rusto");
let rustonomicon2 = rustonomicon.clone();

assert!(icon == "icon");
assert!(nomicon == "nomicon");
assert!(rustonomicon == "rustonomicon");
assert!(rustonomicon == rustonomicon2);

These values would be represented in memory like this:

did it work?

(The arrows with dashed lines represent the implied beginnings of CharList slices, i.e. icon implicitly “refers to” the last four bytes in the buffer.)

Notice that all cons_str and clone operations are immutable, i.e. they create new CharLists instead of mutating the original.

§One More cons
let the_book = rustonomicon.cons_str("the ");
assert!(the_book == "the rustonomicon");

Here’s what memory looks like now:

did it work?

Because rustonomicon has the longest length in the refs table, it’s perfectly safe to call the underlying FrontString’s push_char_front method (mutably!).

Notice that by pushing a character onto the front, rustonomicon doesn’t affect any other CharList’s view of the underlying FrontString.

§Dropping

Ok now what happens if we drop the longest three CharLists?

drop(rustonomicon);
drop(the_book);
drop(rustonomicon2);

did it work?

Notice that the internal FrontString’s len went down to 7. This means we can reuse the memory that used to be held by the longer strings!

§Tricky cons

Here’s a problem though. What if we want to cons the character 'b' onto the front of icon?

let janelle_monae = icon.cons('b');
assert!(janelle_monae == "bicon");

Well, if we overwrite the 'm' that currently sits in front of “icon”, we’d be modifying nomicon’s view of the string. That’s no good, so we gotta clone icon’s view of the FrontString and mutate the copy.

did it work?

Modules§

Structs§

  • An efficient string type with the same API as a linked-list of characters.

Traits§

Type Aliases§

  • The full CharList type takes a type parameter which represents the tail of the list. For a FiniteCharList, the tail is NoTail – an alias to ().