Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
nom, eating data byte by byte
nom is a parser combinators library written in Rust. Its goal is to provide tools to build safe parsers without compromising the speed or memory consumption. To that end, it uses extensively Rust's strong typing, zero copy parsing, push streaming, pull streaming, and provides macros and traits to abstract most of the error prone plumbing.
This work is currently experimental, the API and syntax may still change a lot, but it can already be used to parse relatively complex formats like MP4.
Features
Here are the current and planned features, with their status:
- byte oriented: the basic type is
&[u8]
and parsers will work as much as possible on byte array slices - zero copy:
- in the parsers: a parsing chain will almost always return a slice of its input data
- in the producers and consumers: some copying still happens
- streaming:
- push: a parser can be directly fed a producer's output and called every time there is data available
- pull: a consumer will take a parser and a producer, and handle all the data gathering and, if available, seeking the streaming
- macro based syntax: easier parser building through macro usage
- state machine handling: consumers provide a basic way of managing state machines
- safe parsing: while I have some confidence in Rust's abilities, this will be put to the test via extensive fuzzing and disassembling
- descriptive errors: currently, errors are just integers, but they should express what went wrong
Reference documentation is available here.
Some benchmarks are available on Github.
Installation
nom is available on crates.io and can be included in your Cargo enabled project like this:
[]
= "~0.2.0"
Then include it in your code like this:
extern crate nom;
While it is not mandatory to use the macros, they make it a lot easier to build parsers with nom.
Usage
Parser combinators
nom uses parser combinators to build and reuse parsers. To work with parser combinators, instead of writing the whole grammar from scratch and generating a parser, or writing the complete state machine by hand, you write small, reusable functions, that you will combine to make more interesting parsers.
This has a few advantages:
- the parsers are small and easy to write
- the parsers are easy to reuse (if they're general enough, please add them to nom!)
- the parsers are easy to test (unit tests and property-based tests)
- the parser combination code looks close to the grammar you would have written
- you can build partial parsers, specific to the data you need at the moment, and ignore the rest
Here is an example of one such parser:
This function takes a byte array as input, and tries to consume 4 bytes.
A parser combinator in Rust is basically a function which, for an input type I and an output type O, will have the following signature:
;
IResult
is an enumeration that can represent:
- a correct result
Done(I,O)
with the first element being the rest of the input (not parsed yet), and the second being the output value - an error
Error(Err)
with Err being an integer - an
Incomplete(u32)
indicating that more input is necessary (for now the value is ignored, but it should indicate how much is needed)
, like:
- **length_value**: a byte indicating the size of the following buffer
- **not_line_ending**: returning as much data as possible until a line ending is found
- **line_ending**: matches a line ending
- **alpha**: will return the longest alphabetical array from the beginning of the input
- **digit**: will return the longest numerical array from the beginning of the input
- **alphanumeric**: will return the longest alphanumeric array from the beginning of the input
- **space**: will return the longest array containing only spaces
- **multispace**: will return the longest array containing space, \r or \n
- **be_u8**, **be_u16**, **be_u32**, **be_u64** to parse big endian unsigned integers of multiple sizes
- **be_f32**, **be_f64** to parse big endian floating point numbers
#### Making new parsers with macros
Macros are the main way to make new parsers by combining other ones. Those macros accept other macros or function names as arguments. You then need to make a function out of that combinator with **named!**, or a closure with **closure!**. Here is how you would do, with the **tag!** and **take!** combinators:
```rust
named!; // will consume bytes if the input begins with "abcd"
named!; // will consume 10 bytes of input
```
The **named!** macro can take three different syntaxes:
```rust
named!;
named!;
named!; // when you know the parser takes &[u8] as input, and returns &[u8] as output
```
Here are the basic macros available:
- **tag!**: will match the byte array provided as argument
- **is_not!**: will match the longest array not containing any of the bytes of the array provided to the macro
- **is_a!**: will match the longest array containing only bytes of the array provided to the macro
- **filter!**: will walk the whole array and apply the closure to each suffix until the function fails
- **take!**: will take as many bytes as the number provided
- **take_until!**: will take as many bytes as possible until it encounters the provided byte array, and will skip it
- **take_until_and_leave!**: will take as many bytes as possible until it encounters the provided byte array, and will leave it in the remaining input
- **take_until_either!**: will take as many bytes as possible until it encounters one of the bytes of the provided array, and will skip it
- **take_until_either_and_leave!**: will take as many bytes as possible until it encounters one of the bytes of the provided array, and will leave it in the remaining input
#### Combining parsers
The `IResult` implements a few traits that make it easy to combine parsers. Here are their definitions:
```rust
```
- **map**: applies a function to the output of a `IResult` and puts the result in the output of a `IResult` with the same remaining input
- **flat_map**: applies a parser to the ouput of a `IResult` and returns a new `IResult` with the same remaining input.
- **map_opt**: applies a function returning an Option to the output of `IResult`, returns `Done ` if the result is `Some `, or `Error `
- **map_opt**: applies a function returning a Result to the output of `IResult`, returns `Done ` if the result is `Ok `, or `Error `
#### Combining parsers with macros
Here again, we use macros to combine parsers easily in useful patterns:
```rust
named!;
assert_eq!;
assert_eq!;
assert_eq!;
// make the abcd_p parser optional
named!;
assert_eq!;
assert_eq!;
// the abcd_p parser can happen 0 or more times
named!;
let a = b"abcdef";
let b = b"abcdabcdef";
let c = b"azerty";
assert_eq!;
assert_eq!;
assert_eq!;
```
Here are the basic combining macros available:
- **opt!**: will make the parser optional
- **many0!**: will apply the parser 0 or more times
- **many1!**: will appy the parser 1 or more times
- **fold0!**: takes an assembling macro and a parser, and will fold the macro on many0 of the provided parser
- **fold1!**: takes an assembling macro and a parser, and will fold the macro on many1 of the provided parser
There are more complex parsers like the chain, which is used to parse a whole buffer, gather data along the way, then assemble everything in a final closure, if none of the subparsers failed or returned an `Incomplete`:
````rust
;
;
named!;
let r = f;
assert_eq!;
let r2 = f;
assert_eq!;
```
More examples of chain usage can be found in the .
### Producers
While parser combinators alone are useful, you often need to handle the plumbing to feed them with data from a file, a network connection or a memory buffer. In nom, you can use producers to abstract those data accesses. A `Producer` has to implement the following trait:
````rust
use SeekFrom;
```
nom currently provides `FileProducer` and `MemProducer`. The network one and the channel one will soon be implemented. To use them, see the following code:
```rust
use ;
new.map
```
There is already a large list of parsers available