lexpr: S-expressions for Rust
[]
= "0.1.3"
You may be looking for:
- API Documentation
- API Documentation for master branch
- Serde support for S-expressions
- TODO
- Goals and a survey of other S-expression crates
- Release Notes
S-expressions are the
human-readable, textual representation of code and data in the Lisp
family of languages. lexpr
aims to provide the tools to:
-
Embed S-expression data into Rust programs using the
sexp
macro:use sexp; let address = sexp!;
-
Construct and destructure S-expression data using a full-featured API:
use Value; let names = list; println!;
-
Parse and serialize S-expression data from and to its textual representation.
To get a better idea of the direction lexpr
is headed, you may want
to take at the TODO or the "why"
document.
Supported Lisp dialects
Currently, lexpr
focuses on Scheme, mostly based on R6RS and R7RS
syntax, with some extensions, and Emacs Lisp. The following features,
common across dialects, are not yet implemented:
- Comments. These are not high-priority, as the primary use-case for
lexpr
is for data exchange between Lisp and Rust programs. - Syntactic shorthands for
quote
,quasiquote
,unquote
andunquote-splicing
. Again, these are not usually important when using S-expressions as a data exchange format. - Support for number syntax is currently quite limited. Integers and floating point values written in decimal notation should work though.
Further dialect-specific omissions, both ones that are planned to be fixed in the future, and deliberate ones, are listed below.
Scheme
- For strings, continuation line syntax (using a trailing slash) is not yet implemented.
- Directives, such as
#!fold-case
and#!no-fold-case
are not implemented. It's not clear if these will be implemented at all.
Emacs Lisp
Strings in Emacs Lisp are somewhat difficult to deal with, for the following reasons:
-
They can be either "unibyte" strings, which correspond to byte vectors in Scheme, and "multibyte" strings, which can handle unicode. Whether a string is considered unibyte or multibyte depends on its contents; see Section 2.3.8.2, "Non-ASCII Characters in Strings" in the Emacs Lisp manual for details.
-
Whether a string is considered unibyte or multibyte not only depends on its contents, but also the source it is read from.
-
A multibyte string can include characters outside of the unicode codepoint range. This happens for instance when the string includes a hexadecimal or octal escape interpreted as a single byte, potentially violating the encoding rules of the multibyte source.
-
Emacs Lisp string syntax supports a multitude of escaping modes, some of which originate from representing keyboard event sequences in strings. Using these "keyboard-oriented" escapes inside strings is explicitly discouraged in the Emacs Lisp manual.
The way lexpr
deals with this complexity is the following:
-
The input source is always considered to be "multibyte" using the UTF-8 encoding; other encodings are not supported.
-
Mixing non-ASCII UTF-8 characters, either directly part of the input or represented using escape sequences, and hexadecimal or octal escape sequences resulting in a single byte outside of the ASCII range will result in a parse error. For instance, the following string cannot be parsed by
lexpr
:"\xFC\N{U+203D}"
Emacs, however, would parse this as a string containing the "character" sequence
#x3ffffc
,#x203d
. Note that the first "character" is not a valid unicode codepoint. -
Strings containing only ASCII characters and at least one single-byte hexadecimal or octal escape will be parsed as byte vectors instead of strings. This mirrors the Emacs Lisp rules for when a string will be considered to be "unibyte".
When producing S-expression text, byte vectors will always be represented as a sequence of octal-escaped bytes.
-
The escaping styles supported by
lexpr
are:- Hexadecimal (
\xN...
) and octal (\N...
) - Unicode (
\uNNNN
,\U00NNNNNN
) - Named unicode (
\N{U+X...}
). Note that the syntax that refers to codepoints using their full name (e.g.\N{LATIN SMALL LETTER A WITH GRAVE}
) is deliberately not supported.
- Hexadecimal (
It is expected that these restrictions will not be an impediment when using S-expressions as a data exchange format between Emacs Lisp and Rust programs. In short, S-expressions produced by Rust should be always be parsable by Emacs, and the other direction should work as long as there are no strings with non-unicode "characters" are involved.
Licensing
The code and documentation in the lexpr
crate is free
software, dual-licensed
under the MIT or Apache-2.0
license, at your choosing.
The lexpr
repository contains code and documentation adapted from
the following projects:
serde_json
, also dual-licensed under MIT/Apache-2.0 licenses.sexpr
, Copyright 2017 Zephyr Pellerin, dual-licensed under the same licenses.