[−][src]Derive Macro nom_derive::Nom

#[derive(Nom)]
{
    // Attributes available to this derive:
    #[nom]
}

The Nom derive automatically generates a parse function for the structure using nom parsers. It will try to infer parsers for primitive of known types, but also allows you to specify parsers using custom attributes.

Deriving parsers supports struct and enum types.

Many examples are provided, and more can be found in the project tests.

Attributes

Derived parsers can be controlled using the nom attribute, with a sub-attribute. For example, #[nom(Value)].

Most combinators support using literal strings #[nom(Count="4")] or parenthesized values #[nom(Count(4))]

To specify multiple attributes, use a comma-separated list: #[nom(Debug, Count="4")].

The available attributes are:

Attribute	Supports	Description
AlignAfter	fields	skip bytes until aligned to a multiple of the provided value, after parsing value
AlignBefore	fields	skip bytes until aligned to a multiple of the provided value, before parsing value
BigEndian	all	Set the endianness to big endian
Cond	fields	Used on an `Option<T>` to read a value of type `T` only if the condition is met
Complete	fields	Transforms Incomplete into Error
Count	fields	Set the expected number of items to parse
Debug	all	Print error message and input if parser fails (at runtime)
DebugDerive	top-level	Print the generated code to stderr during build
Default	fields	Do not parse, set a field to the default value for the type
ErrorIf	fields	Before parsing, check condition is true and return an error if false.
Exact	top-level	Check that input was entirely consumed by parser
If	fields	Similar to `Cond`
Ignore	fields	An alias for `default`
InputName	top-level	Change the internal name of input
LittleEndian	all	Set the endianness to little endian
Map	fields	Parse field, then apply a function
Move	fields	add the specified offset to current position, before parsing
MoveAbs	fields	go to the specified absoluted position, before parsing
Parse	fields	Use a custom parser function for reading from a file
PreExec	all	Execute Rust code before parsing field or struct
PostExec	all	Execute Rust code after parsing field or struct
Selector	all	Used to specify the value matching an enum variant
SetEndian	all	Dynamically set the endianness
SkipAfter	fields	skip the specified number of bytes, after parsing
SkipBefore	fields	skip the specified number of bytes, before parsing
Tag	fields	Parse a constant pattern
Take	fields	Take `n` bytes of input
Value	fields	Store result of evaluated expression in field
Verify	fields	After parsing, check that condition is true and return an error if false.

See below for examples.

Deriving parsers for `Struct`

Import the Nom derive attribute:

use nom_derive::Nom;

and add it to structs or enums.

For simple structures, the parsers are automatically generated:

#[derive(Nom)]
struct S {
  a: u32,
  b: u16,
  c: u16
}

This also work for tuple structs:

#[derive(Nom)]
struct S(u32);

Byteorder

By default, integers are parsed are big endian.

The LittleEndian attribute can be applied to a struct to change all integer parsers:

#[derive(Nom)]
#[nom(LittleEndian)]
struct LittleEndianStruct {
  a: u32,
  b: u16,
  c: u16
}

let input = b"\x00\x00\x00\x01\x12\x34\x56\x78";
let res = LittleEndianStruct::parse(input);
assert_eq!(res, Ok((&input[8..],
    LittleEndianStruct{a:0x0100_0000,b:0x3412,c:0x7856}))
);

The BigEndian and LittleEndian attributes can be specified for struct fields. If both per-struct and per-field attributes are present, the more specific wins.

For example, the all fields of the following struct will be parsed as big-endian, except b:

#[derive(Nom)]
#[nom(BigEndian)]
struct MixedEndianStruct {
  a: u32,
  #[nom(LittleEndian)]
  b: u16,
  c: u16
}

The SetEndian attribute changes the endianness of all following integer parsers to the provided endianness (expected argument has type nom::number::Endianness). The expression can be any expression or function returning an endianness, and will be evaluated once at the location of the attribute.

Only the parsers after this attribute (including it) are affected: if SetEndian is applied to the third field of a struct having 4 fields, only the fields 3 and 4 will have dynamic endianness.

This allows dynamic (runtime) change of the endianness, at a small cost (a test is done before every following integer parser). However, if the argument is static or known at compilation, the compiler will remove the test during optimization.

If a BigEndian or LittleEndian is applied to a field, its definition is used prior to SetEndian.

For ex, to create a parse function having two arguments (input, and the endianness):

#[derive(Nom)]
#[nom(ExtraArgs(endian: Endianness))]
#[nom(SetEndian(endian))] // Set dynamically the endianness
struct MixedEndianStruct {
  a: u32,
  b: u16,
  #[nom(BigEndian)] // Field c will always be parsed as BigEndian
  c: u16
}

let res = MixedEndianStruct::parse(input, Endianness::Big);

Deriving and Inferring Parsers

nom-derive is also able to infer parsers for some usual types: integers, Option, Vec, etc.

If the parser cannot be inferred, a default function will be called. It is also possible to override this using the Parse attribute.

Following sections give more details.

Option types

If a field is an Option<T>, the generated parser is opt(complete(T::parse))

For ex:

#[derive(Nom)]
struct S {
  a: Option<u32>
}

let input = b"\x00\x00\x00\x01";
let res = S::parse(input);
assert_eq!(res, Ok((&input[4..],S{a:Some(1)})));

Vec types

If a field is an Vec<T>, the generated parser is many0(complete(T::parse))

For ex:

#[derive(Nom)]
struct S {
  a: Vec<u16>
}

let input = b"\x00\x00\x00\x01";
let res = S::parse(input);
assert_eq!(res, Ok((&input[4..],S{a:vec![0,1]})));

Count

The Count(n) attribute can be used to specify the number of items to parse.

Notes:

the subparser is inferred as usual (item type must be Vec< ... >)
the number of items (n) can be any expression, and will be cast to usize

For ex:

#[derive(Nom)]
struct S {
  a: u16,
  #[nom(Count="a")]
  b: Vec<u16>
}

Tag

The Tag(value) attribute is used to parse a constant value (or "magic").

For ex:

#[derive(Nom)]
struct S<'a> {
  #[nom(Tag(b"TAG"))]
  tag: &'a[u8],
  a: u16,
  b: u16,
}

Take

The Take="n" attribute can be used to take n bytes of input.

Notes:

the number of items (n) can be any expression, and will be cast to usize

For ex:

#[derive(Nom)]
struct S<'a> {
  a: u16,
  #[nom(Take="1")]
  b: &'a [u8],
}

Default parsing function

If a field with type T is not a primitive or known type, the generated parser is T::parse(input).

This function can be automatically derived, or specified as a method for the struct. In that case, the function must be a static method with the same API as a nom combinator, returning the wrapped struct when parsing succeeds.

For example (using Nom derive):

#[derive(Nom)]
struct S2 {
  c: u16
}

#[derive(Nom)]
struct S {
  a: u16,
  b: S2
}

Example (defining parse method):

// no Nom derive
struct S2 {
  c: u16
}

impl S2 {
    fn parse(i:&[u8]) -> IResult<&[u8],S2> {
        map!(
            i,
            le_u16, // little-endian
            |c| S2{c} // return a struct S2
        )
    }
}

#[derive(Nom)]
struct S {
  a: u16,
  b: S2
}

Custom parsers

Sometimes, the default parsers generated automatically are not those you want.

The Parse custom attribute allows for specifying the parser that will be inserted in the nom parser.

The parser is called with input as argument, so the signature of the parser must be equivalent to:

fn parser(i: &[u8]) -> IResult<T> {
// ...
}

For example, to specify the parser of a field:

#[derive(Nom)]
struct S{
    #[nom(Parse="le_u16")]
    a: u16
}

The Parse argument can be a complex expression:

#[derive(Nom)]
struct S{
    pub a: u8,
    #[nom(Parse="cond(a > 0,be_u16)")]
    pub b: Option<u16>,
}

Note that you are responsible from providing correct code.

Default

If a field is marked as Ignore (or Default), it will not be parsed. Its value will be the default value for the field type.

This is convenient if the structured has more fields than the serialized value.

#[derive(Nom)]
struct S{
    pub a: u8,
    #[nom(Ignore)]
    pub b: Option<u16>,
}

Complete

The Complete attribute transforms Incomplete into Error.

Default is to use streaming parsers.

#[derive(Nom)]
struct S{
    pub a: u8,
    #[nom(Complete)]
    pub b: u64,
}

Map

The Map attribute can be used to apply a function to the result of the parser. It is often used combined with the Parse attribute.

#[derive(Nom)]
struct S{
    pub a: u8,
    #[nom(Parse="be_u8", Map = "|x: u8| x.to_string()")]
    pub b: String,
}

Conditional Values

The Cond custom attribute allows for specifying a condition. The generated parser will use the cond! combinator, which calls the child parser only if the condition is met. The type with this attribute must be an Option type.

#[derive(Nom)]
struct S{
    pub a: u8,
    #[nom(Cond="a == 1")]
    pub b: Option<u16>,
}

Value

The Value attribute does not parse data. It is used to store the result of the evaluated expression in the variable.

Previous fields can be used in the expression.

#[derive(Nom)]
struct S{
    pub a: u8,
    #[nom(Value = "a.to_string()")]
    pub b: String,
}

Verifications

The Verify custom attribute allows for specifying a verifying function. The generated parser will use the verify combinator, which calls the child parser only if is verifies a condition (and otherwise raises an error).

The argument used in verify function is passed as a reference.

#[derive(Nom)]
struct S{
    #[nom(Verify="*a == 1")]
    pub a: u8,
}

The ErrorIf checks the provided condition, and return an error if the test returns false. The condition is tested before any parsing occurs for this field, and does not change the input pointer.

Error has type ErrorKind::Verify (nom).

The argument used in verify function is passed as a reference.

#[derive(Nom)]
struct S{
    pub a: u8,
    #[nom(ErrorIf(a != 1))]
    pub b: u8,
}

Exact

The Exact custom attribute adds a verification after parsing the entire element. It succeeds if the input has been entirely consumed by the parser.

#[derive(Nom)]
#[nom(Exact)]
struct S{
    pub a: u8,
}

PreExec

The PreExec custom attribute executes the provided code before parsing the field or structure.

This attribute can be specified multiple times. Statements will be executed in order.

Note that the current input can be accessed, as a regular variable (see InputName). If you create a new variable with the same name, it will be used as input (resulting in side-effects).

Expected value: a valid Rust statement

#[derive(Nom)]
struct S{
    #[nom(PreExec="let sz = i.len();")]
    pub a: u8,
    #[nom(Value(sz))]
    pub sz: usize,
}

PostExec

The PostExec custom attribute executes the provided code after parsing the field or structure.

This attribute can be specified multiple times. Statements will be executed in order.

Note that the current input can be accessed, as a regular variable (see InputName). If you create a new variable with the same name, it will be used as input (resulting in side-effects).

Expected value: a valid Rust statement

#[derive(Nom)]
struct S{
    #[nom(PostExec="let b = a + 1;")]
    pub a: u8,
    #[nom(Value(b))]
    pub b: u8,
}

If applied to the top-level element, the statement is executing after the entire element is parsed.

If parsing a structure, the built structure is available in the struct_def variable.

If parsing an enum, the built structure is available in the enum_def variable.

#[derive(Debug)]
#[derive(Nom)]
#[nom(PostExec(println!("parsing done: {:?}", struct_def);))]
struct S{
    pub a: u8,
    pub b: u8,
}

Alignment and Padding

AlignAfter/AlignBefore: skip bytes until aligned to a multiple of the provided value Alignment is calculated to the start of the original parser input
SkipAfter/SkipBefore: skip the specified number of bytes
Move: add the speficied offset to current position, before parsing. Offset can be negative.
MoveAbs: go to specified absolute position (relative to the start of original parser input), before parsing

If multiple directives are provided, they are applied in order of appearance of the attribute.

If the new position would be before the start of the slice or beyond its end, an error is raised (TooLarge or Incomplete, depending on the case).

Expected value: a valid Rust value (immediate value, or expression)

#[derive(Nom)]
struct S{
    pub a: u8,
    #[nom(AlignBefore(4))]
    pub b: u8,
}

Deriving parsers for `Enum`

The Nom attribute can also used to generate parser for Enum types. The generated parser will used a value (called selector) to determine which attribute variant is parsed. Named and unnamed enums are supported.

In addition of derive(Nom), a Selector attribute must be used:

on the structure, to specify the type of selector to match
on each variant, to specify the value associated with this variant.

#[derive(Nom)]
#[nom(Selector="u8")]
pub enum U1{
    #[nom(Selector="0")] Field1(u32),
    #[nom(Selector="1")] Field2(Option<u32>),
}

The generated function will look like:

impl U1{
    pub fn parse(i:&[u8), selector: u8) -> IResult<&[u8],U1> {
        match selector {
            ...
        }
    }
}

It can be called either directly (U1::parse(n)) or using nom (call!(U1::parse,n)).

The selector can be a primitive type (u8), or any other type implementing the PartialEq trait.

#[derive(Debug,PartialEq,Eq,Clone,Copy,Nom)]
pub struct MessageType(pub u8);

#[derive(Nom)]
#[nom(Selector="MessageType")]
pub enum U1{
    #[nom(Selector="MessageType(0)")] Field1(u32),
    #[nom(Selector="MessageType(1)")] Field2(Option<u32>),
}

// Example of call from a struct:
#[derive(Nom)]
pub struct S1{
    pub msg_type: MessageType,
    #[nom(Parse="{ |i| U1::parse(i, msg_type) }")]
    pub msg_value: U1
}

Default case

By default, if no value of the selector matches the input value, a nom error ErrorKind::Switch is raised. This can be changed by using _ as selector value for one the variants.

#[derive(Nom)]
#[nom(Selector="u8")]
pub enum U2{
    #[nom(Selector="0")] Field1(u32),
    #[nom(Selector="_")] Field2(u32),
}

If the _ selector is not the last variant, the generated code will use it as the last match to avoid unreachable code.

Special case: specifying parsers for fields

Sometimes, an unnamed field requires a custom parser. In that case, the field (not the variant) must be annotated with attribute Parse.

Named fields:

#[derive(Nom)]
#[nom(Selector="MessageType")]
pub enum U3<'a>{
    #[nom(Selector="MessageType(0)")] Field1{a:u32},
    #[nom(Selector="MessageType(1)")] Field2{
        #[nom(Parse="take(4 as usize)")]
        a: &'a[u8]
    },
}

Unnamed fields:

#[derive(Nom)]
#[nom(Selector="MessageType")]
pub enum U3<'a>{
    #[nom(Selector="MessageType(0)")] Field1(u32),
    #[nom(Selector="MessageType(1)")] Field2(
        #[nom(Parse="take(4 as usize)")] &'a[u8]
    ),
}

Special case: fieldless enums

If the entire enum is fieldless (a list of constant integer values), a parser can be derived if

the Enum has a repr(ty) attribute, with ty an integer type
the Enum implements the Eq trait

In that case, the Selector attribute must not be specified.

#[repr(u8)]
#[derive(Eq,Nom)]
pub enum U3{
    A,
    B = 2,
    C
}

The generated parser will parse an element of type ty (as Big Endian), try to match to enum values, and return an instance of Enum if it succeeds (wrapped in an IResult).

For ex, U3::parse(b"\x02") will return Ok((&b""[..],U3::B)).

Input Name

Internally, the parser will use a variable to follow the input. By default, this variable is named i.

This can cause problems, for example, if one field of the structure has the same name

The internal variable name can be renamed using the InputName top-level attribute.

#[derive(Nom)]
#[nom(InputName(aaa))]
pub struct S {
    pub i: u8,
}

Note that this variable can be used as usual, for ex. to peek data without advancing in the current stream, determining the length of remaining bytes, etc.

#[derive(Nom)]
#[nom(InputName(i))]
pub struct S {
    pub a: u8,
    #[nom(Value(i.len()))]
    pub remaining_len: usize,
}

This can create side-effects: if you create a variable with the same name as the input, it will shadow it. While this will is generally an error, it can sometimes be useful.

For example, to skip 2 bytes of input:

#[derive(Nom)]
#[nom(InputName(i))]
pub struct S {
    pub a: u8,
    // skip 2 bytes
    // XXX this will panic if input is smaller than 2 bytes at this points
    #[nom(PreExec(let i = &i[2..];))]
    pub b: u8,
}

Limitations

Except if the entire enum is fieldless (a list of constant integer values), unit fields are not supported.

Debug

Errors in generated parsers may be hard to understand and debug.

The Debug attribute insert calls to nom's dbg_dmp function, which will print an error message and the input if the parser fails. This attribute can be applied to either fields, or at top-level (all sub-parsers will be wrapped).

This helps resolving parse errors (at runtime).

#[derive(Nom)]
pub struct S {
    pub a: u32,
    #[nom(Debug)]
    pub b: u64,
}

DebugDerive

The DebugDerive attribute, if applied to top-level, makes the generator print the generated code to stderr.

This helps resolving compiler errors.

#[derive(Nom)]
#[nom(DebugDerive)]
pub struct S {
    pub a: u32,
}