Expand description
§Spindle
Spindle is a simple and efficient expression and byte sequence generator to aid fuzz testing parsers and de-serializers. Spindle spins raw, untyped byte buffers into structured data.
§Overview
Spindle’s syntax lets users define the structure of generated data. This syntax compiles to Grammar, a state machine that can be arbitrarily traversed to produce structure-aware, matching expressions.
Spindle works with fuzzers such as cargo-fuzz or AFL because it is an extension of arbitrary; the traversal of the state machine is deterministically dependent on Unstructured.
Spindle is particularly useful for generating semi-correct and interesting inputs that attack edge cases of parsers and de-serializers, such as mixing familiar tokens in incorrect places or sprinkling in Unicode characters.
Spindle is developed and leveraged by AWS to fuzz test the parsers and de-serializers in their backend systems.
§Examples
For more examples, see the examples folder.
use spindle_lib::Grammar;
use arbitrary::Unstructured;
let math: Grammar = r#"
expr : u16 | paren | expr symbol expr ;
paren : "(" expr symbol expr ")" ;
symbol : r"-|\+|\*|÷" ;
"#.parse().unwrap();
let mut wool = Unstructured::new(b"poiuytasdbvcxeygrey");
let yarn: String = math.expression(&mut wool, None).unwrap();
// (21359*39933))+13082-62216The state machine traversal always starts at the first rule. In the example,
expris the first rule and evaluates to eitheru16,paren, or the concatenation ofexprandsymbolandexpr.;delimits different rules.u16is a pre-defined rule that directly evaluates tou16::arbitrary(u).parenevaluates to the concatenation of the literal"(",expr,symbol,expr, and")".symbolevaluates to any arbitrary string that matches the regex-|\+|\*|÷.
§Semi-Correct Expression
This grammar is similar to the well formed math expression grammar, but sometimes includes an extra closing parenthesis and/or an arbitrary symbol.
use spindle_lib::Grammar;
use arbitrary::Unstructured;
let math: Grammar = r#"
expr : u16 | paren | expr symbol expr ;
paren : "(" expr symbol expr ")" ")"? ;
symbol : r"-|\+|\*|÷" | String ;
"#.parse().unwrap();
let mut wool = Unstructured::new(b"poiuytasdbvcxeygrey");
let yarn: String = math.expression(&mut wool, None).unwrap();
// (44637*32200)Ѱ'x124928390-27338)§Usage in Fuzzer
use spindle_lib::Grammar;
use libfuzzer_sys::fuzz_target;
use arbitrary::{Arbitrary, Result, Unstructured};
use std::sync::LazyLock;
static GRAMMAR: LazyLock<Grammar> = LazyLock::new(|| {
r#"
expr : u16 | paren | expr symbol expr ;
paren : "(" expr symbol expr ")" ")"? ;
symbol : r"-|\+|\*|÷" | String ;
"#.parse().unwrap()
});
struct MathExpression(String);
impl<'a> Arbitrary<'a> for MathExpression {
fn arbitrary(u: &mut Unstructured<'a>) -> Result<Self> {
Ok(Self(GRAMMAR.expression(u, None)?))
}
}
fuzz_target!(|expr: MathExpression| {
// ... my_parser(expr);
});§Samples
6705d81051237=
♣69382149-12901+8851÷50*3993043534
(8198942155÷60177552446447)
(586643-96)*036074789
(8÷68){K2628
(5798))
(0868430}ݾ▼73)
0135259
(930-6*9502)
5045620÷91599§Grammar Syntax
For examples see examples.
| Syntax | Description |
|---|---|
rule : X ; | Defines a rule with name “rule” with some pattern X. “rule” can be referenced in the same grammar, e.g. another_rule : rule+ ; |
X? | Evaluates to either X or nothing. |
X+ | Evaluates to X 1 or more times (up to and including crate::MAX_REPEAT) |
X* | Evaluates to X 0 or more times (up to and including crate::MAX_REPEAT) |
X{k} | Evaluates to X exactly k times, where k is a u32. |
X{min,max} | Evaluates X at least min times and at most (including) max times. min and max are u32. |
X | Y | Evaluates to either X or Y. |
"X" | Literal value inside the quotes, e.g. "foo" |
[X] | Literal Vec<u8>, e.g. [1, 2]. |
r"X" | Arbitrarily evaluates the regex inside the quotes, e.g. r"[A-Z]+". |
X Y | Evaluates to X and then Y. |
(X) | Groups the expression inside the parenthesis, e.g. (X | Y)+. |
u16, String, etc | A pre-defined type that evaluates to T::arbitrary(u). See more. Supported pre-defined rules are String, char, f32, f64, and signed + unsigned integer types. |
§Visitor
A Visitor is state that is initialized before traversal and mutated as different rules are visited during the traversal, e.g. visit_or. Visitors that are already implemented are String and Vec<u8> for output buffers, and u64 for classification.
Users can implement their own Visitor to
- use a different output buffer
- use a different classification
- gather data
- build an abstract syntax tree
§Example
use spindle_lib::{Grammar, Visitor};
let math: Grammar = r#"
expr : u16 | paren | expr symbol expr ;
paren : "(" expr symbol expr ")" ;
symbol : r"-|\+|\*|÷" ;
"#.parse().unwrap();
/// Detects if any prime numbers were generated.
#[derive(Debug, Default)]
struct PrimeDetector(bool);
impl Visitor for PrimeDetector {
fn new() -> Self {
Self::default()
}
fn visit_u16(&mut self, num: u16) {
let num_is_prime = (2..num).all(|a| num % a != 0);
self.0 |= num_is_prime;
}
}
let mut wool = arbitrary::Unstructured::new(b"qwerty");
let (expr, any_primes): (String, PrimeDetector) = math.expression(&mut wool, None).unwrap();
let yarn: String = math.expression(&mut wool, None).unwrap();
assert!(any_primes.0);§Example
In this example, a math specific abstract syntax tree (AST) is built during the arbitrary traversal.
use spindle_lib::{Grammar, Visitor};
let math: Grammar = r#"
expr : u16 | paren | expr symbol expr ;
paren : "(" expr symbol expr ")" ;
symbol : r"-|\+|\*|÷" ;
"#.parse().unwrap();
#[derive(Debug, Default)]
struct MathAst {
cur_op: Option<char>,
stack: Vec<MathAstNode>,
}
#[derive(Debug)]
enum MathAstNode {
Num(u16),
Expr(Box<MathAstNode>, char, Box<MathAstNode>)
}
impl Visitor for MathAst {
fn new() -> Self {
Self::default()
}
fn visit_regex(&mut self, regex_output: &[u8]) {
assert_eq!(regex_output.len(), 1);
let c = regex_output[0] as char;
self.cur_op = Some(c);
}
fn visit_u16(&mut self, num: u16) {
match self.cur_op {
None => self.stack.push(MathAstNode::Num(num)),
Some(symbol) => {
let left = Box::new(self.stack.pop().unwrap());
let right = Box::new(MathAstNode::Num(num));
let new = MathAstNode::Expr(left, symbol, right);
self.stack.push(new);
self.cur_op = None;
}
}
}
}
let mut wool = arbitrary::Unstructured::new(b"494392");
// MathAst { cur_op: None, stack: [Expr(Num(13108), '*', Num(0))] }
let yarn: MathAst = math.expression(&mut wool, None).unwrap();Structs§
- Error
- The type of error that can occur when parsing a grammar.
- Grammar
- A state machine that produces
Arbitrarymatching expressions or byte sequences fromUnstructured.
Constants§
- MAX_
REPEAT - The maxium repititions in
+and*rules. This can be overriden with explicit range rules, e.g."3"{0,2345678}repeats up to 2345678"3"s.
Traits§
- Visitor
- Defines state that is built during
crate::Grammar::expression.