[][src]Module measureme::stringtable

A string table implementation with a tree-like encoding.

Each entry in the table represents a string and is encoded as a list of components where each component can either be

  1. a string value that contains actual UTF-8 string content,
  2. a string ID that contains a reference to another entry, or
  3. a terminator tag which marks the end of a component list.

The string content of an entry is defined as the concatenation of the content of its components. The content of a string value is its actual UTF-8 bytes. The content of a string ID is the contents of the entry it references.

The byte-level encoding of component lists uses the structure of UTF-8 in order to save space:

  • A valid UTF-8 codepoint never starts with the bits 10 as this bit prefix is reserved for bytes in the middle of a UTF-8 codepoint byte sequence. We make use of this fact by letting all string ID components start with this 10 prefix. Thus when we parse the contents of a value we know to stop if the start byte of the next codepoint has this prefix.

  • A valid UTF-8 string cannot contain the 0xFF byte and since string IDs start with 10 as described above, they also cannot start with a 0xFF byte. Thus we can safely use 0xFF as our component list terminator.

The sample composite string ["abc", ID(42), "def", TERMINATOR] would thus be encoded as:

This example is not tested
    ['a', 'b' , 'c', 128, 0, 0, 42, 'd', 'e', 'f', 255]
                     ^^^^^^^^^^^^^                 ^^^
             string ID 42 with 0b10 prefix        terminator (0xFF)

As you can see string IDs are encoded in big endian format so that highest order bits show up in the first byte we encounter.


Each string in the table is referred to via a StringId. StringIds may be generated in two ways:

  1. Calling StringTable::alloc() which returns the StringId for the allocated string.
  2. Calling StringTable::alloc_with_reserved_id() and StringId::reserved().

String IDs allow you to deduplicate strings by allocating a string once and then referring to it by id over and over. This is a useful trick for strings which are recorded many times and it can significantly reduce the size of profile trace files.

StringIds are partitioned according to type:

[0 .. MAX_PRE_RESERVED_STRING_ID, METADATA_STRING_ID, .. ]

From 0 to MAX_PRE_RESERVED_STRING_ID are the allowed values for reserved strings. After MAX_PRE_RESERVED_STRING_ID, there is one string id (METADATA_STRING_ID) which is used internally by measureme to record additional metadata about the profiling session. After METADATA_STRING_ID are all other StringId values.

Structs

StringId

A StringId is used to identify a string in the StringTable.

StringTableBuilder

Write-only version of the string table

Enums

StringComponent

A single component of a string. Used for building composite table entries.

Constants

MAX_STRING_ID
METADATA_STRING_ID

The id of the profile metadata string entry.

STRING_ID_MASK
TERMINATOR

Traits

SerializableString

Anything that implements SerializableString can be written to a StringTable.