Expand description
This crate provides facilities for parsing, printing and manipulating S-expression data. S-expressions are the format used to represent code and data in the Lisp language family.
((name . "John Doe")
(age . 43)
(address
(street "10 Downing Street")
(city "London"))
(phones "+44 1234567" "+44 2345678"))
lexpr
also supports more complex types; including keywords and
configurable tokens for true
, false
and nil
, by default
using Scheme syntax:
(define-class rectangle ()
(width
#:init-value #nil ;; Nil value
#:settable #t ;; true
#:guard (> width 10)
)
(height
#:init-value 10
#:writable #f ;; false
))
Note that keywords, and the corresponding #:
notation, is not
part of standard Scheme, but is supported by lexpr
’s default
parser settings.
There are three common ways that you might find yourself needing to work with S-expression data in Rust:
-
As text data. An unprocessed string of S-expression data that you receive from a Lisp program, read from a file, or prepare to send to a Lisp program.
-
As an dynamically typed representation. Maybe you want to check that some S-expression data is valid before passing it on, but without knowing the structure of what it contains. Or you want to handle arbirarily structured data, like Lisp code.
-
As a statically typed Rust data structure. When you expect all or most of your data to conform to a particular structure and want to get real work done without the dynamically typed nature of S-expressions tripping you up.
Only the first two items of this list are handled by lexpr
; for conversion
from and to statically typed Rust data structures see the serde-lexpr
crate.
Operating on dynamically typed S-expression data
Any valid S-expression can be manipulated using the Value
data
structure.
Constructing S-expression values
use lexpr::{Value, parse::Error};
// Some s-expressions a &str.
let data = r#"((name . "John Doe")
(age . 43)
(phones "+44 1234567" "+44 2345678"))"#;
// Parse the string of data into lexpr::Value.
let v = lexpr::from_str(data)?;
// Access parts of the data by indexing with square brackets.
println!("Please call {} at the number {}", v["name"], v["phones"][1]);
Ok(())
What are S-expressions?
S-expressions, as mentioned above, are the notation used by various dialects of Lisp to represent data (and code). As a data format, it is roughly comparable to JSON (JavaScript Object Notation), but syntactically more lightweight. Also, JSON is designed for consumption and generation by machines, which is reflected by the fact that it does not specify a syntax for comments. S-expressions on the other hand, are intended to be written and read by humans as well as machines. In this respect, they are more like YAML, but have a simpler and less syntactically rigid structure. For example, indentation does not convey any information to the parser, but is used only to allow for easier digestion by humans.
Different Lisp dialects have notational differences for some data types, and
some may lack specific data types completely. This section tries to give an
overview over the different types of values representable by the Value
data type and how it relates to different Lisp dialects. All examples are
given in the syntax used in Guile
Scheme implementation.
The parser and serializer implementation in lexpr
can be
tailored to parse and generate S-expression data in various
“dialects” in use by different Lisp variants; the aim is to cover
large parts of R6RS and R7RS Scheme with some Guile and Racket
extensions, as well as Emacs Lisp.
In the following, the S-expression values that are modeled by
lexpr
are introduced, In general, S-expression values can be
split into the two categories primitive types and compound types.
Primitive types
Primitive, or non-compound types are those that can not recursively contain arbitrary other values. Numbers, strings and booleans fall into this category.
Symbols and keywords
Lisp has a data type not commonly found in other languages, namely “symbols”. A symbol is conceptually similar to identifiers in other languages, but allow for a much richer set of characters than typically allowed for identifiers in other languages. Also, identifiers in other languages can usually not be used in data; Lisps expose them as a primitive data type, a result of the homoiconicity of the Lisp language family.
this-is-a-symbol ; A single symbol, dashes are allowed
another.symbol ; Periods are allowed as well
foo$bar!<_>? ; As are quite a few other characters
Another data type, present in some Lisp dialects, such as Emacs
Lisp, Common Lisp, and several Scheme implementations, are
keywords. These are also supported by lexpr
. Keywords are very
similiar to symbols, but are typically prefixed by :
or #:
and
are used for different purposes in the language.
#:foo ; A keyword named "foo", written in Guile/Racket notation
:bar ; A keyword named "bar", written in Emacs Lisp or Common Lisp notation
Booleans
While Scheme has a primitive boolean data type, more traditional Lisps such
as Emacs Lisp and Common Lisp do not; they instead use the symbols t
and
nil
to represent boolean values. Using parser options, lexpr
allows to
parse these symbols as booleans, which may be desirable in some
circumstances, as booleans are simpler to handle than symbols.
#t ; The literal representing true
#f ; The literal representing false
The empty list and “nil”
In traditional Lisps, the end of list is represented as by a
special atom written as nil
. In Scheme, the empty list is an
atom written as ()
, and there nil
is just a regular
symbol. Both nil
and the empty list are present and
distinguishable in lexpr
.
Numbers
Numbers are represented by the Number
abstract data type. It can handle
signed and unsigned integers, each up to 64 bit size, as well as floating
point numbers. The Scheme syntax for hexadecimal, octal, and binary literals
is supported.
1 -4 3.14 ; A postive, negative, and a floating point number
#xDEADBEEF ; An integer written using decimal notation
#o0677 ; Octal
#b10110 ; Binary
Scheme has an elaborate numerical type hierarchy (called “numeric tower”),
which supports fractionals, numbers of arbitrary size, and complex
numbers. These more advanced number types are not yet supported by lexpr
.
Characters
Characters are unicode codepoints, represented by Rust’s char
data type
embedded in the Value::Char
variant.
Strings
"Hello World!"
Lists
Lists are a sequence of values, of either atoms or lists. In fact,
Lisp does not have a “real” list data type, but instead lists are
represented by chains of so-called “cons cells”, which are used to
form a singly-linked list, terminated by the empty list (or nil
in tradional Lisps). It is also possible for the terminator to not
be the empty list, but instead be af an arbitrary other data type.
In this case, the list is refered to as an “improper” or “dotted”
list. Here are some examples:
("Hello" "World") ; A regular list
;; A list having with another, single-element, list as
;; its second item
("Hello" ("World"))
(1 . 2) ; A cons cell, represented as an improper list by `lexpr`
(1 2 . 3) ; A dotted (improper) list
Lists are not only used to represent sequences of values, but also
associative arrays, also known as maps. A map is represented as a list
containing cons cells, where the first field of each cons cell, called
car
, for obscure historical reasons, is the key, and the second field
(cdr
) of the cons cell is the associated value.
;; An association list with the symbols `a` and `b` as keys
((a . 42) (b . 43))
Vectors
In contrast to lists, which are represented as singly-linked chains of “cons
cells”, vectors allow O(1) indexing, and thus are quite similar to Rusts
Vec
datatype.
#(1 2 "three") ; A vector in Scheme notation
Byte vectors
Byte vectors are similar to regular vectors, but are uniform: each element only holds a single byte, i.e. an exact integer in the range of 0 to 255, inclusive.
#u8(41 42 43) ; A byte vector
Modules
- List “cons cell” data type and accompanying iterator types.
- S-expression values including source location.
- Dynamically typed number type.
- S-expression parser and options.
- Converting S-expression values into text.
- The Value enum, a dynamically typed way of representing any valid S-expression value.
Macros
- Construct a
Value
using syntax similar to regular S-expressions.
Structs
- A Lisp “cons cell”.
- Combines an S-expression value with location information.
- Represents an S-expression number, whether integer or floating point.
- Parser for the S-expression text representation.
- A printer for S-expression values.
Enums
- Represents an S-expression value.
Traits
- A type that can be used to index into a
lexpr::Value
.
Functions
- Parse a value from an IO stream of S-expressions, using the default parser options.
- Parse a value from an IO stream containing a single S-expression.
- Parse a value from bytes representing a single S-expressions, using the default parser options.
- Parse a value from bytes representing a single S-expression.
- Parse a value from a string slice representing a single S-expressions, using the default parser options.
- Parse a value from a string slice representing a single S-expression.
- Serialize the given value an S-expression string, using the default printer options.
- Serialize the given value an S-expression string.
- Serialize the given value as byte vector containing S-expression text, using the default printer options.
- Serialize the given value as byte vector containing S-expression text.
- Serialize the given value value as S-expression text into the IO stream, using the default printer options.
- Serialize the given value value as S-expression text into the IO stream.