Crate serde_luaq

Crate serde_luaq 

Source
Expand description

Note: this library is still a work in progress, and there are no API stability guarantees.

serde_luaq is a library for deserialising (and eventually, serialising) simple, JSON-like data structures from Lua 5.4 source code, without requiring Lua itself.

The goal is to be able to read state from software (mostly games) which is serialised using Lua %q formatting (and similar techniques) without requiring arbitrary code execution.

This library consists of four parts:

§Examples

§peg deserialiser

Deserialise a bare Lua value with the peg parser to LuaValue:

use serde_luaq::{LuaValue, lua_value};
assert_eq!(LuaValue::Boolean(true), lua_value(b"true", /* max table depth */ 16).unwrap());

There are similar deserialisers for a return statement and scripts with one or more variable assignments.

Maximum table depth limits are described in their own section.

§serde deserialiser

from_slice() deserialises a a bare Lua value into a type that implements Deserialize:

use serde::Deserialize;
use serde_luaq::{LuaFormat, from_slice};

#[derive(Deserialize, PartialEq, Debug)]
struct ComplexType {
    foo: String,
}

#[derive(Deserialize, PartialEq, Debug)]
struct Test {
    a: bool,
    b: Vec<u32>,
    c: ComplexType,
}

let expected = Test {
    a: true,
    b: vec![1, 2, 3],
    c: ComplexType { foo: "bar".to_string() },
};

assert_eq!(
    expected,
    from_slice(
        b"{a=true, [ [[b]] ]={[3] = 3, 0x1, 2}, ['c'] = { foo = \"bar\" }}",
        LuaFormat::Value,
        /* maximum table depth */ 16,
    ).unwrap(),
);

It can also deserialise from a return statement or script with one or more variable assignments.

Maximum table depth limits are described in their own section.

§Data types

serde_luaq supports a JSON-like subset of Lua 5.4’s data types:

Lua typeLuaValue variantRust type(s) for Serde
nilLuaValue::NilOption::None
booleanLuaValue::Booleanbool
stringLuaValue::String[u8], Vec<u8>, String (see note)
numberLuaValue::NumberLuaNumber (see note)
float subtypeLuaNumber::Floatf64
integer subtypeLuaNumber::Integeri64
tableLuaValue::TableBTreeMap, HashMap, Vec<T>, struct (see note)

The peg deserialisers will always produce a LuaValue.

However, LuaValue doesn’t implement Deserialize, so can’t be used as a Serde field.

Generally speaking, serde_luaq tries to do whatever a default build of Lua 5.4 does, except for:

  • anything which requires evaluating or executing Lua code
  • locale-dependant behaviour
  • platform-dependant behaviour

Unicode identifiers (LUA_UCID) and other locale-specific identifiers are not supported, even if they would be valid in Rust.

§Numbers

serde_luaq follows Lua 5.4’s number handling semantics, but doesn’t implement locale-specific behaviour (eg: using , as a decimal point in addition to .).

The following types can be used with Serde’s data model:

LiteralLuaNumber variantf64i64
Decimal integer,
inside i64 range
Integermay lose precision
Decimal integer,
outside i64 range
Float
will lose precision
will lose precision
Hexadecimal integerIntegermay lose precision
Decimal floatFloat
Hexadecimal float1Float
(0/0) (NaN)Float
  • A LuaNumber field will follow Lua 5.4 semantics, which could be a i64 or f64.

  • An f64 field will accept decimal integer literals from −(253 − 1) to (253 − 1) without loss of precision.

  • Decimal integer literals outside of the i64 range are converted to f64, and will lose precision. These cannot be used with i64 fields.

  • Hexadecimal integer literals are always coerced to i64, and can always be used with i64 fields. Values outside of the i64 range will only under/overflow as i64, regardless of the field type.

    This means the literal 0xffffffffffffffff is always treated as if it were written -1, even for f64, i8, and u64 fields. This would be an error for unsigned types.

  • Hexadecimal float literals with more than 16 hex digits will not parse, due to a limitation of the parsing library serde_luaq uses.

    While Lua accepts these values, string.format('%q') would never produce them.

  • Narrower integer fields like i8 and i16 reject all integer literals that are outside of their range.

    The i64 coersion process means that the hexadecimal literal 0xff is treated as if it were written 255, and using it with an i8 field would be an error.

  • Unsigned integer fields like u8 and u16 reject all negative decimal integer literals.

  • Narrower float fields like f32 are first handled as a f64, then converted to f32. This will result in a loss of precision, and values outside of their acceptable range will be set to positive or negative infinity.

  • Wider integer fields like i128 and u64 apply the same limits as i64, even with hexadecimal integer literals.

§Strings

Lua strings are “8-bit clean”, and can contain any 8-bit value (ie: [u8]).

For Serde, this is preserved if using a Vec<u8> field with #[serde(with = "serde_bytes")] or a serde_bytes::ByteBuf field. If you don’t use serde_bytes, Serde will expect a sequence of u8 (and won’t read the string).

Lua’s \u{...} escapes follow RFC 2279 (1998) rather than RFC 3629 (2003). RFC 2279 differs by allowing surrogate code points and code points greater than \u{10FFFF}. serde_luaq will convert these escapes into bytes following RFC 2279, which might not be valid in RFC 3629.

Serde String fields can be used the string literal evaluates to valid RFC 3629 UTF-8. This is not guaranteed even if the input data is &str, as Lua string escapes may evaluate to binary values or invalid sequences (eg: "\xC1\u{7FFFFFFF}").

Unlike Lua, new-line characters/sequences in strings are kept as-is, and not converted to their platform-specific representation.

§Tables

Lua tables are used for both lists and maps.

The peg deserialisers will always produce a Vec of LuaTableEntry in the order the entries were defined. It does not attempt to reconcile implicit keys mixed with explicit keys, nor duplicate keys.

Unlike Lua, a LuaTableEntry may use any key or value type, including nil and NaN.

As a convienience, identifier-keyed entries ({ a = 1 }) are treated as keyed with str, because with Lua’s default build settings, these are always valid RFC 3629 UTF-8.

The rules are slightly different when using Serde, which is described below.

§Duplicate table keys in Serde

Using duplicate table keys is undefined behaviour in Lua.

However, when using serde_luaq with Serde, later entries always overwrite earlier entries, regardless of how they are defined, ie:

{ ['a'] = 1, a = 2 } == { ['a'] = 2 }
{ 1, [1] = 2 } == { 2 }
{ [1] = 1, 2 } == { 2 }
§Tables as lists in Serde (Vec)

Lua tables are 1-indexed, rather than 0-indexed. serde_luaq will handle these differences on the input side for explicitly-keyed values, and make the resulting Vec 0-indexed:

let a: Vec<i64> = from_slice(
    b"{1, 2, 3}",
    LuaFormat::Value,
    /* max table depth */ 16,
)?;

assert_eq!(a[0], 1);
assert_eq!(a[1], 2);
assert_eq!(a[2], 3);

Table entries may be defined with implicit or explicit keys, or a combination, and may be defined in any order.

Any missing entries will be treated as nil, which can be used with Option:

let b: Vec<Option<i64>> = from_slice(
    b"{1, [4] = 4, 2}",
    LuaFormat::Value,
    /* max table depth */ 16,
)?;

assert_eq!(b, vec![Some(1), Some(2), None, Some(4)]);

let c: Vec<Option<i64>> = from_slice(
    b"{1, [1000] = 1000}",
    LuaFormat::Value,
    /* max table depth */ 16,
)?;

assert_eq!(c.len(), 1000);
assert_eq!(c[0], Some(1));
assert_eq!(c[999], Some(1000));

If you’re working with a sparse table, it’s probably better to handle it as a map (see below). This works everywhere but as a flattened field’s map value type.

§Tables as maps in Serde (BTreeMap/HashMap)

If the key of the map is an integer type, table entries may contain implicit keys. Like Lua, implicit keys start counting at 1, without regard for explicit keys.

let a: BTreeMap<i64, i64> = from_slice(
    b"{1, [4] = 4, 2}",
    LuaFormat::Value,
    /* max table depth */ 16,
)?;

assert_eq!(1, *a.get(&1).unwrap());
assert_eq!(2, *a.get(&2).unwrap());
assert_eq!(4, *a.get(&4).unwrap());

Otherwise, all entries must be explicitly keyed.

For maps, serde_luaq treats “entry present and set to nil” and “entry not present” as distinct states. This means unless a key or value uses an Option type, it must not contain nil:

let input = b"{a = 1, b = nil}";

// Error: b cannot be a unit (None) type
assert!(from_slice::<BTreeMap<String, i64>>(input, LuaFormat::Value, 16).is_err());

// Success: b is set to None, other entries are set to Some
let a: BTreeMap<String, Option<i64>> = from_slice(input, LuaFormat::Value, 16)?;
assert_eq!(Some(1), *a.get("a").unwrap());
assert!(a.get("b").unwrap().is_none()); // present, set to nil
assert!(a.get("c").is_none()); // not present
§Tables as structs

When deserialising a table as a struct, all keys must be written as valid RFC 3629 strings or Lua identifiers.

Unicode identifiers (LUA_UCID) and other locale-specific identifiers are not supported, even if they would be valid Rust identifiers. If used in a table key, these must be written as a string instead:

{ english = "en", ["français"] = "fr" }

Serde does not support numeric keys in structs.

§Flattening

#[serde(flatten)] can be used with a map field:

#[derive(Deserialize, Debug, PartialEq)]
struct Flatten {
    version: i32,
    #[serde(flatten)]
    entries: BTreeMap<String, i64>,
}

let lua = br#"{
    version = 1,
    example = 2,
    hello = 4,
}"#;

assert_eq!(
    Flatten { version: 1, entries: BTreeMap::from([
        ("example".to_string(), 2),
        ("hello".to_string(), 4),
    ])},
    from_slice(lua, LuaFormat::Value, 16)?
);

If a flattened field’s value is a table of only implicitly-keyed and/or numerically-keyed entries, it can only go into a Vec field (eg: BTreeMap<String, Vec<i64>>), and not a nested map (eg: BTreeMap<String, BTreeMap<i64, i64>>).

This is because Serde tries to handle these as an “any” type, and this library forces anything that looks like an array or sparse array to be treated as an array.

§Enums

When deserialising, enums may be represented multiple ways:

enum E {
    /// `"Unit"` or `{["Unit"] = {}}`
    Unit,

    /// `{["NewType"] = 1}`
    NewType(i64),

    /// `{["Tuple"] = {1,2}}` or `{["Tuple"] = {[1]=1,[2]=2}}`
    Tuple(i64, i64),

    /// `{["Struct"] = {["a"] = 1}`
    Struct { a: i64 },
}

Like with tables, if a variant’s name is a valid Lua identifier, tables may be keyed with an identifier instead of a string (eg: {NewType = 1}).

§Security

While using Lua as a serialisation format is convenient to work with in Lua, its load() and require() functions allow arbitrary code execution, so aren’t safe to use with untrusted inputs. These risks are similar to using JavaScript’s eval() function to load JSON data (instead of JSON.parse()).

For example, this Lua function loads an expression in the string data, similar to what would be produced by the serialize() function described in Programming in Lua:

-- WARNING: this function is insecure and unsafe.
function deserialize(data)
    data = "return (" .. data .. ")"
    local f = load(data, nil, "t")
    if f == nil then
        return error("could not load data")
    end
    local status, r = pcall(f)
    if not status then
        return error("could not call data")
    end
    return r
end

a = deserialize("{hello='world'}")
-- Prints "world"
print(a.hello)

If your program is ever sent untrusted Lua inputs, a malicious actor could insert some code which could do anything to your program or the system it is running on. For example, this input would cause Lua to read and return the contents of /etc/passwd:

(function() f=io.open('/etc/passwd');return f:read('a');end)()

serde_luaq addresses this risk by implementing a JSON-like subset of Lua’s syntax, such that inserting code is a syntax error:

use serde_luaq::{LuaValue, lua_value};

// This would cause Lua to return the contents of a local file:
let input = b"(function() f=io.open('/etc/passwd');return f:read('a');end)()";
// But it's a syntax error here.
assert!(lua_value(input, 16).is_err());

// This would cause Lua to use a lot of RAM:
let input = b"(function() x={};for a=1,100000000 do x[a]=a end;return x;end)()";
// But it's a syntax error here.
assert!(lua_value(input, 16).is_err());

Ideally, serde_luaq shouldn’t use significantly more memory than Lua to read the same data structures, on a LuaValue level (not Serde). If it doesn’t, that’s a bug. :)

§Maximum table depth

The max_depth argument controls how deeply nested a table can be before being rejected by serde_luaq.

Set this to the maximum depth of tables that you expect in your input data.

For example:

-- Table of depth 1
a1 = {1, 2, 3}

-- An empty table is still of depth 1
b1 = {}

-- Table of depth 2
a2 = {
    {1, 2, 3},
    {4, 5, 6},
}

This is roughly equivalent to Lua’s LUAI_MAXCCALLS build option, which counts many other nested lexical elements which serde_luaq doesn’t support (like code blocks and parentheses).

Warning: setting max_depth too high allows a heavily-nested table to cause your program to overflow its stack and crash.

What is “too high” depends on your platform and where you call serde_luaq in your program.

Setting max_depth to 0 disables support for tables, even empty tables.

§Memory usage

Unless otherwise noted, all memory usage estimates assume a 64-bit target CPU.

serde_luaq requires that the entire input fit in memory, and be less than usize::MAX bytes (4 GiB on 32-bit systems, 16 EiB on 64-bit systems). It is the caller’s responsibility to enforce a reasonable input size limit for the system’s available RAM.

When deserialising a Lua data structure, the minimum sizes of the LuaValue and LuaTableEntry enums are 32 and 16 bytes respectively. Values are checked at compile-time on aarch64, wasm32 and x86_64 targets to prevent regressions.

Heap-allocated variants of these enums (those with Cow or Vec fields) use more memory.

§Large data structures

At present, the highest-known memory usage per byte of input Lua is a table of deeply-nested tables, which consumes up to 96 bytes of RAM for 2 bytes of input Lua (48×). This means a 64 MiB input could use up to 3 GiB of RAM.

Setting a maximum table depth of 2 could limit this to 56 bytes of RAM for 3 bytes of input Lua (18.67×), or 1.167 GiB of RAM for a 64 MiB input.

Lua uses similar amounts of memory for such data structures.

When deserialising into your own data structures with Serde, be mindful that some Rust data structures can use significant amounts of memory if you’re not careful. Check out the Rust performance book for tips.

§Large strings

serde_luaq uses Cow to avoid owning strings whenever possible, borrowing from the input buffer (for short strings that don’t contain escape sequences, and long strings) or 'static (for empty strings and those containing a single, non-UTF-8 escape sequence) instead.

Otherwise, it must be reassembled by copying it into an owned buffer.

If the string consists entirely of escape sequences, the parser may temporarily use up to 24 bytes of memory per 2 bytes of input Lua (12×).

The final, reassembled string will use up to 1 byte of memory for each byte of input Lua, plus Vec’s usual overheads (but doesn’t allocate excess capacity).

Unlike Lua, serde_luaq does not de-duplicate strings in a string table.

§Lua version compatibility

serde_luaq targets syntax compatibility with Lua 5.4.

As it does not execute Lua code, there are only a small number of compatibility issues with older and other versions of Lua.

§Lua 5.3

Lua 5.3 over/underflows decimal integers that don’t fit in a i64, rather than coercing to f64.

Hexadecimal integers over/underflow in both Lua 5.3 and 5.4.

§Lua 5.2

Lua 5.2 and earlier, and Luau always use f64 for numbers, and do not have an integer subtype.

§Lua 5.1 and earlier

  • serde_luaq only allows basic Latin letters in identifiers.

    Lua 5.1 and earlier allows locale-dependent letters.

  • serde_luaq does not allow goto as an identifier name.

    This is not a reserved keyword in Lua 5.1 and earlier.

  • serde_luaq allows empty statements in script mode.

    This is not allowed in Lua 5.1.

§Luau

Like Lua 5.2, Luau uses f64 for numbers.

It also adds type annotations, binary integer literals, separators for all integer literals and string interpolation.

None of these features are supported by serde_luaq.

§Ravi

Ravi adds type annotations and some other language features, which aren’t supported by serde_luaq.


  1. Not supported on WASM targets before v0.2.1. 

Structs§

JsonConversionOptions
Lua to JSON conversion options.

Enums§

Error
JsonConversionError
Errors when converting Lua to JSON.
LuaConversionError
Errors when converting JSON to Lua.
LuaFormat
The format of the input Lua buffer.
LuaNumber
Lua 5.4 number types.
LuaTableEntry
Lua table entry.
LuaValue
Basic Lua 5.4 data types that are equivalent to those available in JSON, similar to serde_json::Value.

Functions§

from_json_value
Converts a JSON value to a Lua value.
from_slice
Parses a byte slice containing a Lua expression in format.
from_str
Parses a str containing a Lua expression in format.
lua_value
Parse a bare Lua value expression as a LuaValue.
return_statement
Parse a Lua return stamement into a LuaValue.
script
Parse a Lua script containing variable assignments into a Vec of (&str, LuaValue).
to_json_value
Converts a LuaValue into a serde_json::Value.

Type Aliases§

Result