r-toml 0.0.27

Regular subset of TOML
Documentation

Regular subset of TOML

A fast, streaming TOML parser for the regular subset of TOML v1.0.0. Available in F# and Rust.

  • Streaming — calls your callback during parsing, no intermediate tree
  • Zero-copy — values are byte spans into the original input
  • Stackless — no recursion, no stack-allocated collections
  • Automata-based — DFA-driven lexer, optimal and predictable (F# benchmarks, Rust benchmarks)
  • Inlined — lambdas inline at the call site, no vtables
  • Single file, no dependencies — drop it in and go
  • Raw UTF-8 — runs on bytes directly, no char conversion

F# benchmarks

Rust benchmarks

What is this?

TOML's nested types ([table], [[array]], dotted keys) happen to be expressible as a regular grammar. r-toml exploits this to parse TOML with a flat DFA instead of a recursive descent parser. The trade-off: some rarely-used TOML features aren't supported (see below). For typical config files and data storage, it's fully compatible with TOML and much faster than general-purpose parsers.

basic usage (F#)

let toml : byte[] = "
[server]
port = 8080
hostname = 'abc'
"B
let dictionary = RToml.toDictionary(toml)
dictionary["server.port"].kind          // INT
dictionary["server.port"].ToInt(toml)   // 8080

// or any of the other formats
let array = RToml.toArray(toml)
let array2 = RToml.toStructArray(toml)
let valuelist =
    use vlist = RToml.toValueList(toml)
    for v in vlist do () //.. do something
// or iterate over the key-value pairs
RToml.stream (
    toml,
    (fun key value ->
        if value.kind = Token.TRUE then
            let keystr = key.ToString toml // struct to string
            printfn $"{keystr} at pos:{key.key_begin} is set to true"
    )
)

basic usage (Rust)

fn main() {
    let toml = b"
[server]
port = 8080
hostname = 'abc'
";
    let map = r_toml::to_map(toml).unwrap();
    dbg!(&map["server.port"].kind); // INT
    dbg!(&map["server.port"].to_int(toml)); // Ok(8080)

    // or iterate over key-value pairs
    let mut key_buf = Vec::new();
    r_toml::stream(toml, |k, v| {
        println!("{} = {:?}", k.to_str(&mut key_buf, toml), v.kind);
        key_buf.clear();
    });
    // iterator over (String,Value) for convenience
    r_toml::to_iter(toml).for_each(|(k_string, v)| {
        println!("{} = {:?}", k_string, v.kind);
    });
}

Supported types

  • keys and basic primitives: true/false, 10, 0.005, 'string'
  • multiline strings: '''content''', """content"""
  • datetime: 1979-05-27T07:32:00Z
  • tables: [entry], [entry.inner]
  • arrays of tables: [[products]]
  • typed arrays: [1, 2, 3], ['a', 'b'], [true, false], [1.0, 2.0]
  • comments: # comment (standalone or inline after a value)

Unsupported

Inline tables:

person = { address = { postcode = 123, street = "abc" } }

Use the equivalent flat forms instead:

[person.address]
postcode = 123
street = "abc"

Mixed/nested arrays:

data = [[[0,1,3],"abc"],{ x = 1, y = 2}]

Quoted keys:

"person"."address"."name" = "value"

Parsing quoted keys requires collecting or transforming the key contents, which breaks the zero-copy/stackless property.

String types

Escape sequences are detected and flagged but not resolved — if your strings contain escapes, you handle the unescaping. String kinds after parsing:

  • VALID_STR — no escape sequences, use as-is (multiline leading newline already trimmed)
  • ESC_STR — contains escape sequences, needs post-processing
  • EMPTY_STR — empty string

Array types

Only homogeneous arrays are supported. The parser returns the validated region as one of:

  • ARR1_INT, ARR1_FLOAT, ARR1_BOOL, ARR1_STR

Token visualization

Future work (if there's real interest)

  • proper benchmark data from real Cargo.toml / pyproject.toml files
  • iterator for array values
  • toml compliance tests
  • codegen for other languages
  • string-to-tagged-union deserialization
  • SIMD intrinsics for string scanning