1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
// Copyright 2018 Steven Bosnick
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE-2.0 or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms

//! `luther_derive` provides a procedural macro to derive the `luther::Lexer` trait.
//!
//! Deriving the `luther::Lexer` trait is expected to be the primary (possibly only)
//! way of implementing this trait. The trait can be derived on an `enum` of token
//! types where the variants of the `enum` are annotated with a regular expression.
//! Not all variants of the `enum` need to be annotated with a regular expression, but
//! variants that do not have such an annotation will not be returned by the lexer
//! that `luther_derive` generates.
//!
//! Generating the lexer adds a visible type name for the deterministic finite automaton
//! that the lexer uses internally. Once hygenic macros are available it will be possible
//! to hide this name, but with the current implementation of procedural macros the name
//! will be visible. By default the name is formed by adding a suffix of `Dfa` to the
//! name of the `enum` on which `luther::Lexer` is derived. This default can be overridden
//! with the `dfa` option of the `luther` attribute.
//!
//! # Example
//! ```rust
//! extern crate luther;
//!
//! #[macro_use]
//! extern crate luther_derive;
//!
//! #[derive(Lexer)]
//! enum Token {
//!     #[luther(regex = "ab")]
//!     Ab,
//!
//!     #[luther(regex = "acc*")]
//!     Acc(String),
//! }
//!
//! # fn main() {}
//! ```
//!
//! # Capturing the recognized characters.
//! If a variant of the `enum` on which the lexer is being geneated includes a single
//! type (like the `Acc` variant in the above example) and that type implements
//! `str::FromStr` (like `String` does for the `Acc` example) then the generated
//! lexer will capture the recognized characters when it has matched that variant's
//! regular expression. It will capture the characters as a value of the type using
//! the type's `str::FromStr` implementation.
//!
//! It is an error to have more than one type included in an `enum` variant. `luther_derive`
//! will recognize this error. It is also an error to have a signle type that does not
//! implement `str::FromStr`, but `luther_derive` cannot recognize this error. This case
//! will likely manifest itself with a confusing error message from the compiler.
//!
//! For now the single type included in an enum must also implement `default::Default`,
//! although this restriction may be lifted in the future.
//!
//! The code to capture the characters will be someting similar to
//! `characters.parse().unwarp_or_default()` where `characters` is a `&str` of the
//! recognized characters.
//!
//! # The `luther` attribute
//! `luther_derive` recognized the `luther` attribute both on the `enum` for which
//! `luther::Lexer` is being derived and on the variants of that `enum`. `luther`
//! supports various options which are invoked like `#[luther(option = "value")].
//!
//! The options supported by the `luther` attribute are the following with an indication
//! of where the option is valid (the enum or the variants):
//!
//! * `dfa`: the name to use for the generated deterministic finite automaton [enum]
//! * `regex`: the regular expression to recognize for particular variant [variant]
//! * `priority_group`: the priority group to which a variant belongs [variant]
//!
//! # Priority groups
//! It is possible for the regular expressions for more than one `enum` variant to match
//! the same input. For example, the following regular expressions all match the input
//! "auto":
//!
//! 1. "auto"
//! 2. "[a-z]+"
//! 3. "[a-z]+[0-9]*"
//!
//! The lexer generated by `luther_derive` will favour simple strings as the `regex` option
//! on the `luther` attribute over more complicated regular expressions. In the examples listed
//! above this means that item 1 will be prefered over either item 2 or 3. This rule allows the
//! lexer to prefer keywords over identifiers, for example.
//!
//! If the preference for simple strings is not enough to resolve the ambiguity, though, then
//! you will have to use the `priority_group` option of the `luther` attribute to indicate which
//! of the two (or more) is a higher priority (a smaller number indicates a higher priority).
//! Within a priority group, though, `luther_derive` will continue to favour simple strings over
//! other more complicated regular expressions.
//!
//! The default value for `priority_group` if it is not specified is 1.
//!
//! # Errors
//! `luther_derive` will raise an error at compile time in the following circumstances (among
//! others):
//!
//! * the `#[derive(Lexer)]` invocation is on a `struct` rather than an `enum`
//! * none of the variants of the `enum` have a `luther` attribute with the `regex` specified
//! * one of the `regex`'s specified for a variant would match the empty string
//! * a variant has included types that are not a tuple of arity 1
//! * the value provided for the `regex` option can't be parsed as a regular expression
//! * the value provided for the `priority_group` option can't be parsed as an integer

extern crate proc_macro;
extern crate redfa;
extern crate syn;

#[macro_use]
extern crate quote;

#[macro_use]
extern crate itertools;

mod enum_info;
mod generate;
mod dfa;

use proc_macro::TokenStream;
use syn::DeriveInput;

type Dfa<'info, 'ast: 'info> = redfa::Dfa<char, Option<&'info enum_info::VariantInfo<'ast>>>;

/// Procedural macro to derive the `luther::Lexer` trait.
///
/// The macro will also recognize and act on the `luther` attribute. See the
/// crate level documentation for more information about the `luther`
/// attribute.
#[proc_macro_derive(Lexer, attributes(luther))]
pub fn luther_derive(input: TokenStream) -> TokenStream {
    let ast: DeriveInput = syn::parse(input).expect("failed to parse the input token stream");

    let info: enum_info::EnumInfo = (&ast).into();

    let (dfa, error_state) = dfa::build_dfa(&info);

    let expanded = generate::generate_lexer_impl(&info, &dfa, error_state);

    expanded.into()
}