trivet 3.1.0 - Docs.rs

# Encoding Strings

In addition to [parsing strings](string.md), **Trivet** can also produce encoded strings. This is similar to the Rust debug output for a string, but is configurable.

String encoding is handled by the `trivet::strings::StringEncoder` struct. The following will write out a string using a few of the encoding standards.

```rust,ignore
{{#include ../../examples/book_encoding_strings.rs}}
```

This produces the following output.

```text
 Trivet: \n\nStringy † and 𝄠\r\n\x00
   JSON: \n\nStringy † and \ud834\udd20\r\n\u0000
   Rust: \n\nStringy † and 𝄠\r\n\0
 Python: \n\nStringy † and \U0001d120\r\n\x00
```

The example `stringy.rs` found in the `examples` folder of the distribution provides a wider playground for examining both how string are parsed and how they are encoded.

## Configuration

The following are the configuration options (except escapes, discussed later on, below) available in `StringEncoder` and the settings for each string standard.

| Option                                                                                        | Common Setting                 |
| --------------------------------------------------------------------------------------------- | ------------------------------ |
| `escape_char`<br>The character to use to introduce an escape                                  | `\`                            |
| `use_ascii_escapes`<br>Use two-digit encoding for escapes in the ASCII range                  | `true`                         |
| `ascii_escape`<br>The character to introduce a two-digit escape                               | `x`                            |
| `low_unicode_escape`<br>The character to introduce a four-digit hexadecimal escape            | `u`                            |
| `high_unicode_escape`<br>The character to introduce an eight-digit hexadecimal escape         | `U`                            |
| `brace_unicode_escape`<br>The character to introduce a bracketed hexadecimal escape           | `u`                            |
| `use_names`<br>If true, use names for encoding characters                                     | `false`                        |
| `encoding_standard`<br>The encoding standard to use, that determines _what_ gets encoded      | `EncodingStandard::Control`    |
| `encoding_method`<br>The encoding method to use, that determines _how_ characters get encoded | `EncodingMethod::BracketedU18` |

The specific setting for these options depend on the string standard chosen. For example, the Python and **Trivet** standards permit using names of the form `\N{dagger}`, while the other standards do not. Python uses eight-digit encoding for U+1D120 '𝄠': `\U0001D120`, while Rust uses curly braces: `\u{1d120}`.

Special escape encodings for characters (such as `\n` for newline) can be set with the `escape` option in `StringEncoder`. This must be set to a `std::collections::BTreeMap<char, &'static str>` instance, and maps a character to the encoding for that character. For example, the following will tell the encoder to encode U+2020 '†' as `\d`.

```rust,ignore
encoder.escapes.insert('\u{2020}', "d");
```

The other major pieces are the `trivet::strings::EncodingStandard` that determines _what_ gets encoded, and the `trivet::strings::EncodingMethod` that determines _how_ characters are encoded. See the documentation of these enums for more details.