[−][src]Derive Macro nom_derive::Nom
#[derive(Nom)] { // Attributes available to this derive: #[nom] }
The Nom
derive automatically generates a parse
function for the structure
using nom parsers. It will try to infer parsers for primitive of known
types, but also allows you to specify parsers using custom attributes.
Deriving parsers supports struct
and enum
types.
Many examples are provided, and more can be found in the project tests.
Attributes
Derived parsers can be controlled using the nom
attribute, with a sub-attribute.
For example, #[nom(Value)]
.
To specify multiple attributes, use a comma-separated list: #[nom(Debug, Count="4")]
.
The available attributes are:
Attribute | Supports | Description |
---|---|---|
BigEndian | all | Set the endianness to big endian |
Cond | fields | Used on an Option<T> to read a value of type T only if the condition is met |
Count | fields | Set the expected number of items to parse |
Debug | all | Print error message and input if parser fails (at runtime) |
DebugDerive | top-level | Print the generated code to stderr during build |
Default | fields | Do not parse, set a field to the default value for the type |
If | fields | Similar to Cond |
Ignore | fields | An alias for default |
LittleEndian | all | Set the endianness to little endian |
Map | fields | Parse field, then apply a function |
Parse | fields | Use a custom parser function for reading from a file |
Selector | all | Used to specify the value matching an enum variant |
Value | fields | Store result of evaluated expression in field |
Verify | fields | After parsing, check that condition is true and return an error if false. |
See below for examples.
Deriving parsers for Struct
Import the Nom
derive attribute:
use nom_derive::Nom;
and add it to structs or enums.
For simple structures, the parsers are automatically generated:
#[derive(Nom)] struct S { a: u32, b: u16, c: u16 }
This also work for tuple structs:
#[derive(Nom)] struct S(u32);
Byteorder
By default, integers are parsed are big endian.
The LittleEndian
attribute can be applied to a struct to change all integer parsers:
#[derive(Nom)] #[nom(LittleEndian)] struct LittleEndianStruct { a: u32, b: u16, c: u16 } let input = b"\x00\x00\x00\x01\x12\x34\x56\x78"; let res = LittleEndianStruct::parse(input); assert_eq!(res, Ok((&input[8..], LittleEndianStruct{a:0x0100_0000,b:0x3412,c:0x7856})) );
The BigEndian
and LittleEndian
attributes can be specified for struct fields.
If both per-struct and per-field attributes are present, the more specific wins.
For example, the all fields of the following struct will be parsed as big-endian,
except b
:
#[derive(Nom)] #[nom(BigEndian)] struct MixedEndianStruct { a: u32, #[nom(LittleEndian)] b: u16, c: u16 }
Deriving and Inferring Parsers
nom-derive
is also able to infer parsers for some usual types: integers, Option
, Vec
, etc.
If the parser cannot be inferred, a default function will be called. It is also possible to
override this using the Parse
attribute.
Following sections give more details.
Option types
If a field is an Option<T>
, the generated parser is opt(complete(T::parse))
For ex:
#[derive(Nom)] struct S { a: Option<u32> } let input = b"\x00\x00\x00\x01"; let res = S::parse(input); assert_eq!(res, Ok((&input[4..],S{a:Some(1)})));
Vec types
If a field is an Vec<T>
, the generated parser is many0(complete(T::parse))
For ex:
#[derive(Nom)] struct S { a: Vec<u16> } let input = b"\x00\x00\x00\x01"; let res = S::parse(input); assert_eq!(res, Ok((&input[4..],S{a:vec![0,1]})));
Count
The Count(n)
attribute can be used to specify the number of items to parse.
Notes:
- the subparser is inferred as usual (item type must be
Vec< ... >
) - the number of items (
n
) can be any expression, and will be cast tousize
For ex:
#[derive(Nom)] struct S { a: u16, #[nom(Count="a")] b: Vec<u16> }
Default parsing function
If a field with type T
is not a primitive or known type, the generated parser is
T::parse(input)
.
This function can be automatically derived, or specified as a method for the struct. In that case, the function must be a static method with the same API as a nom combinator, returning the wrapped struct when parsing succeeds.
For example (using Nom
derive):
#[derive(Nom)] struct S2 { c: u16 } #[derive(Nom)] struct S { a: u16, b: S2 }
Example (defining parse
method):
// no Nom derive struct S2 { c: u16 } impl S2 { fn parse(i:&[u8]) -> IResult<&[u8],S2> { map!( i, le_u16, // little-endian |c| S2{c} // return a struct S2 ) } } #[derive(Nom)] struct S { a: u16, b: S2 }
Custom parsers
Sometimes, the default parsers generated automatically are not those you want.
The Parse
custom attribute allows for specifying the parser that
will be inserted in the nom parser.
The parser is called with input as argument, so the signature of the parser must be equivalent to:
fn parser(i: &[u8]) -> IResult<T> { // ... }
For example, to specify the parser of a field:
#[derive(Nom)] struct S{ #[nom(Parse="le_u16")] a: u16 }
The Parse
argument can be a complex expression:
#[derive(Nom)] struct S{ pub a: u8, #[nom(Parse="cond(a > 0,be_u16)")] pub b: Option<u16>, }
Note that you are responsible from providing correct code.
Default
If a field is marked as Ignore
(or Default
), it will not be parsed.
Its value will be the default value for the field type.
This is convenient if the structured has more fields than the serialized value.
#[derive(Nom)] struct S{ pub a: u8, #[nom(Ignore)] pub b: Option<u16>, }
Map
The Map
attribute can be used to apply a function to the result
of the parser.
It is often used combined with the Parse
attribute.
#[derive(Nom)] struct S{ pub a: u8, #[nom(Parse="be_u8", Map = "|x: u8| x.to_string()")] pub b: String, }
Conditional Values
The Cond
custom attribute allows for specifying a condition.
The generated parser will use the cond!
combinator, which calls the
child parser only if the condition is met.
The type with this attribute must be an Option
type.
#[derive(Nom)] struct S{ pub a: u8, #[nom(Cond="a == 1")] pub b: Option<u16>, }
Value
The Value
attribute does not parse data. It is used to store the result
of the evaluated expression in the variable.
Previous fields can be used in the expression.
#[derive(Nom)] struct S{ pub a: u8, #[nom(Value = "a.to_string()")] pub b: String, }
Verifications
The Verify
custom attribute allows for specifying a verifying function.
The generated parser will use the verify!
combinator, which calls the
child parser only if is verifies a condition (and otherwise raises an error).
The argument used in verify function is passed as a reference.
#[derive(Nom)] struct S{ #[nom(Verify="*a == 1")] pub a: u8, }
Deriving parsers for Enum
The Nom
attribute can also used to generate parser for Enum
types.
The generated parser will used a value (called selector) to determine
which attribute variant is parsed.
Named and unnamed enums are supported.
In addition of derive(Nom)
, a Selector
attribute must be used:
- on the structure, to specify the type of selector to match
- on each variant, to specify the value associated with this variant.
#[derive(Nom)] #[nom(Selector="u8")] pub enum U1{ #[nom(Selector="0")] Field1(u32), #[nom(Selector="1")] Field2(Option<u32>), }
The generated function will look like:
impl U1{ pub fn parse(i:&[u8), selector: u8) -> IResult<&[u8],U1> { match selector { ... } } }
It can be called either directly (U1::parse(n)
) or using nom
(call!(U1::parse,n)
).
The selector can be a primitive type (u8
), or any other type implementing the PartialEq
trait.
#[derive(Debug,PartialEq,Eq,Clone,Copy,Nom)] pub struct MessageType(pub u8); #[derive(Nom)] #[nom(Selector="MessageType")] pub enum U1{ #[nom(Selector="MessageType(0)")] Field1(u32), #[nom(Selector="MessageType(1)")] Field2(Option<u32>), } // Example of call from a struct: #[derive(Nom)] pub struct S1{ pub msg_type: MessageType, #[nom(Parse="{ |i| U1::parse(i, msg_type) }")] pub msg_value: U1 }
Default case
By default, if no value of the selector matches the input value, a nom error
ErrorKind::Switch
is raised. This can be changed by using _
as selector
value for one the variants.
#[derive(Nom)] #[nom(Selector="u8")] pub enum U2{ #[nom(Selector="0")] Field1(u32), #[nom(Selector="_")] Field2(u32), }
If the _
selector is not the last variant, the generated code will use it
as the last match to avoid unreachable code.
Special case: specifying parsers for fields
Sometimes, an unnamed field requires a custom parser. In that case, the
field (not the variant) must be annotated with attribute Parse
.
Named fields:
#[derive(Nom)] #[nom(Selector="MessageType")] pub enum U3<'a>{ #[nom(Selector="MessageType(0)")] Field1{a:u32}, #[nom(Selector="MessageType(1)")] Field2{ #[nom(Parse="take(4 as usize)")] a: &'a[u8] }, }
Unnamed fields:
#[derive(Nom)] #[nom(Selector="MessageType")] pub enum U3<'a>{ #[nom(Selector="MessageType(0)")] Field1(u32), #[nom(Selector="MessageType(1)")] Field2( #[nom(Parse="take(4 as usize)")] &'a[u8] ), }
Special case: fieldless enums
If the entire enum is fieldless (a list of constant integer values), a parser can be derived if
- the
Enum
has arepr(ty)
attribute, withty
an integer type - the
Enum
implements theEq
trait
In that case, the Selector
attribute must not be specified.
#[repr(u8)] #[derive(Eq,Nom)] pub enum U3{ A, B = 2, C }
The generated parser will parse an element of type ty
(as Big Endian), try
to match to enum values, and return an instance of Enum
if it succeeds
(wrapped in an IResult
).
For ex, U3::parse(b"\x02")
will return Ok((&b""[..],U3::B))
.
Limitations
Except if the entire enum is fieldless (a list of constant integer values), unit fields are not supported.
Debug
Errors in generated parsers may be hard to understand and debug.
The Debug
attribute insert calls to nom's dbg_dmp
function, which will print
an error message and the input if the parser fails. This attribute can be applied to either
fields, or at top-level (all sub-parsers will be wrapped).
This helps resolving parse errors (at runtime).
#[derive(Nom)] pub struct S { pub a: u32, #[nom(Debug)] pub b: u64, }
DebugDerive
The DebugDerive
attribute, if applied to top-level, makes the generator print the
generated code to stderr
.
This helps resolving compiler errors.
#[derive(Nom)] #[nom(DebugDerive)] pub struct S { pub a: u32, }