Expand description
Note: this library is still a work in progress, and there are no API stability guarantees.
serde_luaq is a library for deserialising (and eventually, serialising) simple, JSON-like data
structures from Lua 5.4 source code, without requiring Lua itself.
The goal is to be able to read state from software (mostly games) which is serialised using
Lua %q formatting (and similar techniques)
without requiring arbitrary code execution.
This library consists of four parts:
-
A
LuaValueenum, which describes Lua’s basic data types (nil, boolean, string, number, table). -
A
peg-based parser for parsing a&[u8](containing Lua) into aLuaValuefrom a bare Lua value expression, a singlereturnstatement or script with variable assignments. -
A Serde-based
Deserializeimplementation for converting aLuaValueinto your own data types. -
Optional lossy converter to and from
serde_json’sValuetype.
§Examples
§peg deserialiser
Deserialise a bare Lua value with the peg parser to LuaValue:
use serde_luaq::{LuaValue, lua_value};
assert_eq!(LuaValue::Boolean(true), lua_value(b"true", /* max table depth */ 16).unwrap());There are similar deserialisers for a return statement and
scripts with one or more variable assignments.
Maximum table depth limits are described in their own section.
§serde deserialiser
from_slice() deserialises a a bare Lua value into a type that
implements Deserialize:
use serde::Deserialize;
use serde_luaq::{LuaFormat, from_slice};
#[derive(Deserialize, PartialEq, Debug)]
struct ComplexType {
foo: String,
}
#[derive(Deserialize, PartialEq, Debug)]
struct Test {
a: bool,
b: Vec<u32>,
c: ComplexType,
}
let expected = Test {
a: true,
b: vec![1, 2, 3],
c: ComplexType { foo: "bar".to_string() },
};
assert_eq!(
expected,
from_slice(
b"{a=true, [ [[b]] ]={[3] = 3, 0x1, 2}, ['c'] = { foo = \"bar\" }}",
LuaFormat::Value,
/* maximum table depth */ 16,
).unwrap(),
);It can also deserialise from a return statement or
script with one or more variable assignments.
Maximum table depth limits are described in their own section.
§Data types
serde_luaq supports a JSON-like subset of Lua 5.4’s data types:
| Lua type | LuaValue variant | Rust type(s) for Serde |
|---|---|---|
nil | LuaValue::Nil | Option::None |
boolean | LuaValue::Boolean | bool |
string | LuaValue::String | [u8], Vec<u8>, String (see note) |
number | LuaValue::Number | LuaNumber (see note) |
…float subtype | LuaNumber::Float | f64 |
…integer subtype | LuaNumber::Integer | i64 |
table | LuaValue::Table | BTreeMap, HashMap, Vec<T>, struct (see note) |
The peg deserialisers will always produce a LuaValue.
However, LuaValue doesn’t implement Deserialize, so can’t be used as a
Serde field.
Generally speaking, serde_luaq tries to do whatever a default build of Lua 5.4 does,
except for:
- anything which requires evaluating or executing Lua code
- locale-dependant behaviour
- platform-dependant behaviour
Unicode identifiers (LUA_UCID) and other locale-specific identifiers are not supported, even
if they would be valid in Rust.
§Numbers
serde_luaq follows Lua 5.4’s number handling semantics, but doesn’t implement
locale-specific behaviour (eg: using , as a decimal point in addition to .).
The following types can be used with Serde’s data model:
| Literal | LuaNumber variant | f64 | i64 |
|---|---|---|---|
| Decimal integer, inside i64 range | Integer | may lose precision | ✅ |
| Decimal integer, outside i64 range | Floatwill lose precision | will lose precision | ❌ |
| Hexadecimal integer | Integer | may lose precision | ✅ |
| Decimal float | Float | ✅ | ❌ |
| Hexadecimal float1 | Float | ✅ | ❌ |
(0/0) (NaN) | Float | ✅ | ❌ |
-
A
LuaNumberfield will follow Lua 5.4 semantics, which could be ai64orf64. -
An
f64field will accept decimal integer literals from −(253 − 1) to (253 − 1) without loss of precision. -
Decimal integer literals outside of the
i64range are converted tof64, and will lose precision. These cannot be used withi64fields. -
Hexadecimal integer literals are always coerced to
i64, and can always be used withi64fields. Values outside of thei64range will only under/overflow asi64, regardless of the field type.This means the literal
0xffffffffffffffffis always treated as if it were written-1, even forf64,i8, andu64fields. This would be an error for unsigned types. -
Hexadecimal float literals with more than 16 hex digits will not parse, due to a limitation of the parsing library
serde_luaquses.While Lua accepts these values,
string.format('%q')would never produce them. -
Narrower integer fields like
i8andi16reject all integer literals that are outside of their range.The
i64coersion process means that the hexadecimal literal0xffis treated as if it were written255, and using it with ani8field would be an error. -
Unsigned integer fields like
u8andu16reject all negative decimal integer literals. -
Narrower float fields like
f32are first handled as af64, then converted tof32. This will result in a loss of precision, and values outside of their acceptable range will be set to positive or negative infinity. -
Wider integer fields like
i128andu64apply the same limits asi64, even with hexadecimal integer literals.
§Strings
Lua strings are “8-bit clean”, and can contain any 8-bit value (ie: [u8]).
For Serde, this is preserved if using a Vec<u8> field
with #[serde(with = "serde_bytes")] or a serde_bytes::ByteBuf field. If you
don’t use serde_bytes, Serde will expect a sequence of u8 (and won’t
read the string).
Lua’s \u{...} escapes follow RFC 2279 (1998) rather than RFC 3629 (2003). RFC 2279
differs by allowing surrogate code points and code points greater than
\u{10FFFF}. serde_luaq will convert these escapes into bytes following RFC 2279, which might
not be valid in RFC 3629.
Serde String fields can be used the string literal evaluates to valid RFC 3629 UTF-8. This
is not guaranteed even if the input data is &str, as Lua string escapes may
evaluate to binary values or invalid sequences (eg: "\xC1\u{7FFFFFFF}").
Unlike Lua, new-line characters/sequences in strings are kept as-is, and not converted to their platform-specific representation.
§Tables
Lua tables are used for both lists and maps.
The peg deserialisers will always produce a Vec of
LuaTableEntry in the order the entries were defined. It does not attempt to reconcile
implicit keys mixed with explicit keys, nor duplicate keys.
Unlike Lua, a LuaTableEntry may use any key or value type, including
nil and NaN.
As a convienience, identifier-keyed entries ({ a = 1 }) are
treated as keyed with str, because with Lua’s default build settings, these are always
valid RFC 3629 UTF-8.
The rules are slightly different when using Serde, which is described below.
§Duplicate table keys in Serde
Using duplicate table keys is undefined behaviour in Lua.
However, when using serde_luaq with Serde, later entries always overwrite earlier entries,
regardless of how they are defined, ie:
{ ['a'] = 1, a = 2 } == { ['a'] = 2 }
{ 1, [1] = 2 } == { 2 }
{ [1] = 1, 2 } == { 2 }§Tables as lists in Serde (Vec)
Lua tables are 1-indexed, rather than 0-indexed. serde_luaq will handle these differences on
the input side for explicitly-keyed values, and make the resulting Vec 0-indexed:
let a: Vec<i64> = from_slice(
b"{1, 2, 3}",
LuaFormat::Value,
/* max table depth */ 16,
)?;
assert_eq!(a[0], 1);
assert_eq!(a[1], 2);
assert_eq!(a[2], 3);Table entries may be defined with implicit or explicit keys, or a combination, and may be defined in any order.
Any missing entries will be treated as nil, which can be used with Option:
let b: Vec<Option<i64>> = from_slice(
b"{1, [4] = 4, 2}",
LuaFormat::Value,
/* max table depth */ 16,
)?;
assert_eq!(b, vec![Some(1), Some(2), None, Some(4)]);
let c: Vec<Option<i64>> = from_slice(
b"{1, [1000] = 1000}",
LuaFormat::Value,
/* max table depth */ 16,
)?;
assert_eq!(c.len(), 1000);
assert_eq!(c[0], Some(1));
assert_eq!(c[999], Some(1000));If you’re working with a sparse table, it’s probably better to handle it as a map (see below). This works everywhere but as a flattened field’s map value type.
§Tables as maps in Serde (BTreeMap/HashMap)
If the key of the map is an integer type, table entries may contain implicit keys. Like Lua, implicit keys start counting at 1, without regard for explicit keys.
let a: BTreeMap<i64, i64> = from_slice(
b"{1, [4] = 4, 2}",
LuaFormat::Value,
/* max table depth */ 16,
)?;
assert_eq!(1, *a.get(&1).unwrap());
assert_eq!(2, *a.get(&2).unwrap());
assert_eq!(4, *a.get(&4).unwrap());Otherwise, all entries must be explicitly keyed.
For maps, serde_luaq treats “entry present and set to nil” and “entry not present” as
distinct states. This means unless a key or value uses an Option type, it must not contain
nil:
let input = b"{a = 1, b = nil}";
// Error: b cannot be a unit (None) type
assert!(from_slice::<BTreeMap<String, i64>>(input, LuaFormat::Value, 16).is_err());
// Success: b is set to None, other entries are set to Some
let a: BTreeMap<String, Option<i64>> = from_slice(input, LuaFormat::Value, 16)?;
assert_eq!(Some(1), *a.get("a").unwrap());
assert!(a.get("b").unwrap().is_none()); // present, set to nil
assert!(a.get("c").is_none()); // not present§Tables as structs
When deserialising a table as a struct, all keys must be written as valid
RFC 3629 strings or Lua identifiers.
Unicode identifiers (LUA_UCID) and other locale-specific identifiers are not supported, even
if they would be valid Rust identifiers. If used in a table key, these must be written as a
string instead:
{ english = "en", ["français"] = "fr" }Serde does not support numeric keys in structs.
§Flattening
#[serde(flatten)] can be used with a map field:
#[derive(Deserialize, Debug, PartialEq)]
struct Flatten {
version: i32,
#[serde(flatten)]
entries: BTreeMap<String, i64>,
}
let lua = br#"{
version = 1,
example = 2,
hello = 4,
}"#;
assert_eq!(
Flatten { version: 1, entries: BTreeMap::from([
("example".to_string(), 2),
("hello".to_string(), 4),
])},
from_slice(lua, LuaFormat::Value, 16)?
);If a flattened field’s value is a table of only implicitly-keyed and/or numerically-keyed
entries, it can only go into a Vec field (eg: BTreeMap<String, Vec<i64>>), and not a
nested map (eg: BTreeMap<String, BTreeMap<i64, i64>>).
This is because Serde tries to handle these as an “any” type, and this library forces anything that looks like an array or sparse array to be treated as an array.
§Enums
When deserialising, enums may be represented multiple ways:
enum E {
/// `"Unit"` or `{["Unit"] = {}}`
Unit,
/// `{["NewType"] = 1}`
NewType(i64),
/// `{["Tuple"] = {1,2}}` or `{["Tuple"] = {[1]=1,[2]=2}}`
Tuple(i64, i64),
/// `{["Struct"] = {["a"] = 1}`
Struct { a: i64 },
}Like with tables, if a variant’s name is a valid Lua identifier, tables
may be keyed with an identifier instead of a string (eg: {NewType = 1}).
§Security
While using Lua as a serialisation format is convenient to work with in Lua,
its load() and require() functions allow arbitrary code execution, so
aren’t safe to use with untrusted inputs. These risks are similar to using
JavaScript’s eval() function to load JSON data (instead of
JSON.parse()).
For example, this Lua function loads an expression in the string data, similar to what would
be produced by the serialize() function described in Programming in Lua:
-- WARNING: this function is insecure and unsafe.
function deserialize(data)
data = "return (" .. data .. ")"
local f = load(data, nil, "t")
if f == nil then
return error("could not load data")
end
local status, r = pcall(f)
if not status then
return error("could not call data")
end
return r
end
a = deserialize("{hello='world'}")
-- Prints "world"
print(a.hello)If your program is ever sent untrusted Lua inputs, a malicious actor could insert some code
which could do anything to your program or the system it is running on. For example, this input
would cause Lua to read and return the contents of /etc/passwd:
(function() f=io.open('/etc/passwd');return f:read('a');end)()serde_luaq addresses this risk by implementing a JSON-like subset of Lua’s syntax, such that
inserting code is a syntax error:
use serde_luaq::{LuaValue, lua_value};
// This would cause Lua to return the contents of a local file:
let input = b"(function() f=io.open('/etc/passwd');return f:read('a');end)()";
// But it's a syntax error here.
assert!(lua_value(input, 16).is_err());
// This would cause Lua to use a lot of RAM:
let input = b"(function() x={};for a=1,100000000 do x[a]=a end;return x;end)()";
// But it's a syntax error here.
assert!(lua_value(input, 16).is_err());Ideally, serde_luaq shouldn’t use significantly more memory than Lua to
read the same data structures, on a LuaValue level (not Serde). If it doesn’t, that’s a
bug. :)
§Maximum table depth
The max_depth argument controls how deeply nested a table can be before being rejected by
serde_luaq.
Set this to the maximum depth of tables that you expect in your input data.
For example:
-- Table of depth 1
a1 = {1, 2, 3}
-- An empty table is still of depth 1
b1 = {}
-- Table of depth 2
a2 = {
{1, 2, 3},
{4, 5, 6},
}This is roughly equivalent to Lua’s LUAI_MAXCCALLS build option, which counts many other
nested lexical elements which serde_luaq doesn’t support (like code blocks and parentheses).
Warning: setting max_depth too high allows a heavily-nested table to cause your program
to overflow its stack and crash.
What is “too high” depends on your platform and where you call serde_luaq in your program.
Setting max_depth to 0 disables support for tables, even empty tables.
§Memory usage
Unless otherwise noted, all memory usage estimates assume a 64-bit target CPU.
serde_luaq requires that the entire input fit in memory, and be less than usize::MAX
bytes (4 GiB on 32-bit systems, 16 EiB on 64-bit systems). It is the caller’s responsibility
to enforce a reasonable input size limit for the system’s available RAM.
When deserialising a Lua data structure, the minimum sizes of the LuaValue and
LuaTableEntry enums are 32 and 16 bytes respectively. Values are checked at compile-time
on aarch64, wasm32 and x86_64 targets to prevent regressions.
Heap-allocated variants of these enums (those with Cow or Vec
fields) use more memory.
§Large data structures
At present, the highest-known memory usage per byte of input Lua is a table of deeply-nested tables, which consumes up to 96 bytes of RAM for 2 bytes of input Lua (48×). This means a 64 MiB input could use up to 3 GiB of RAM.
Setting a maximum table depth of 2 could limit this to 56 bytes of RAM for 3 bytes of input Lua (18.67×), or 1.167 GiB of RAM for a 64 MiB input.
Lua uses similar amounts of memory for such data structures.
When deserialising into your own data structures with Serde, be mindful that some Rust data structures can use significant amounts of memory if you’re not careful. Check out the Rust performance book for tips.
§Large strings
serde_luaq uses Cow to avoid owning strings whenever possible, borrowing
from the input buffer (for short strings that don’t contain escape sequences, and long strings)
or 'static (for empty strings and those containing a single, non-UTF-8 escape sequence) instead.
Otherwise, it must be reassembled by copying it into an owned buffer.
If the string consists entirely of escape sequences, the parser may temporarily use up to 24 bytes of memory per 2 bytes of input Lua (12×).
The final, reassembled string will use up to 1 byte of memory for each byte of input Lua, plus
Vec’s usual overheads (but doesn’t allocate excess capacity).
Unlike Lua, serde_luaq does not de-duplicate strings in a string table.
§Lua version compatibility
serde_luaq targets syntax compatibility with Lua 5.4.
As it does not execute Lua code, there are only a small number of compatibility issues with older and other versions of Lua.
§Lua 5.3
Lua 5.3 over/underflows decimal integers that don’t fit in a i64, rather than
coercing to f64.
Hexadecimal integers over/underflow in both Lua 5.3 and 5.4.
§Lua 5.2
Lua 5.2 and earlier, and Luau always use f64 for numbers, and do not have an integer
subtype.
§Lua 5.1 and earlier
-
serde_luaqonly allows basic Latin letters in identifiers.Lua 5.1 and earlier allows locale-dependent letters.
-
serde_luaqdoes not allowgotoas an identifier name.This is not a reserved keyword in Lua 5.1 and earlier.
-
serde_luaqallows empty statements in script mode.
§Luau
Like Lua 5.2, Luau uses f64 for numbers.
It also adds type annotations, binary integer literals, separators for all integer literals and string interpolation.
None of these features are supported by serde_luaq.
§Ravi
Ravi adds type annotations and some other language features, which aren’t supported by
serde_luaq.
Not supported on WASM targets before v0.2.1. ↩
Structs§
- Json
Conversion Options - Lua to JSON conversion options.
Enums§
- Error
- Json
Conversion Error - Errors when converting Lua to JSON.
- LuaConversion
Error - Errors when converting JSON to Lua.
- LuaFormat
- The format of the input Lua buffer.
- LuaNumber
- Lua 5.4 number types.
- LuaTable
Entry - Lua table entry.
- LuaValue
- Basic Lua 5.4 data types that are equivalent to those available in JSON, similar to
serde_json::Value.
Functions§
- from_
json_ value - Converts a JSON value to a Lua value.
- from_
slice - Parses a byte slice containing a Lua expression in
format. - from_
str - Parses a
strcontaining a Lua expression informat. - lua_
value - Parse a bare Lua value expression as a
LuaValue. - return_
statement - Parse a Lua
returnstamement into aLuaValue. - script
- Parse a Lua script containing variable assignments into a
Vecof(&str, LuaValue). - to_
json_ value - Converts a
LuaValueinto aserde_json::Value.