# proc-macro-regex
A proc macro regex library to match an arbitrary string or byte array to a regular expression.
[](https://github.com/LinkTed/proc-macro-regex/actions?query=workflow%3A%22Continuous+Integration%22)
[](https://crates.io/crates/proc-macro-regex)
[](https://deps.rs/repo/github/linkted/proc-macro-regex)
[](https://opensource.org/licenses/BSD-3-Clause)
## Usage
Add this to your `Cargo.toml`:
```toml
[dependencies]
proc-macro-regex = "~1.1.0"
```
## Example
The macro `regex!` creates a function of the given name which takes a string or byte array and
returns `true` if the argument matches the regex, otherwise `false`.
```rust
use proc_macro_regex::regex;
/// Create the function with the signature:
/// fn regex_email(s: &str) -> bool;
regex!(regex_email "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$");
fn main () {
println!("Returns true == {}", regex_email("example@example.org"));
println!("Returns false == {}", regex_email("example.example.org"));
}
```
The given regex works the same as in the [regex](https://crates.io/crates/regex) crate. If the `^`
is at the beginning of the regex and `$` at the end then the whole string is checked, otherwise is
check if the string contains the regex.
## How it works
The macro creates a *deterministic finite automaton* (DFA), which parse the given input.
Depending on the size of the DFA or the character of the regex, a lookup table or a code base
implementation (binary search) is generated. If the size of the lookup table would be bigger than
65536 bytes (can be changed) then a code base implementation (binary search) is used. Additionally,
if the regex contains any Unicode (no ASCII) character then a code base implementation
(binary search) is used, too.
The following macro generates the following code:
```rust
regex!(example_1 "abc");
```
Generates:
```rust
fn example_1(s: &str) -> bool {
static TABLE: [[u8; 256]; 3usize] = [ ... ];
let mut state = 0;
for c in s.bytes() {
state = TABLE[state as usize][c as usize];
if state == u8::MAX {
return true;
}
}
false
}
```
To tell the macro that the lookup table is not allowed to be bigger than 256 bytes, a third
argument can be given. Therefore, a code base implementation (binary search) of the DFA is
generated.
```rust
regex!(example_2 "abc" 256);
```
Generates:
```rust
fn example_2(s: &str) -> bool {
let mut state = 0;
for c in s.bytes() {
state = if state < 1usize {
match c {
97u8 => 1usize,
_ => 0usize,
}
} else {
if state == 1usize {
match c {
97u8 => 1usize,
98u8 => 2usize,
_ => 0usize,
}
} else {
match c {
97u8 => 1usize,
99u8 => return true,
_ => 0usize,
}
}
};
}
false
}
```
To change the visibility of the function, add the keywords at the beginning of the arguments.
```rust
regex!(pub example_2 "abc" 256);
```
Generates:
```rust
pub fn example_3(s: &str) -> bool {
// same as in example_1 (see above)
}
```
To parse a byte array instead of string, pass a byte string.
```rust
regex!(example_4 b"abc");
```
Generates:
```rust
fn example_4(s: &[u8]) -> bool {
// same as in example_1 (see above)
}
```
The generated code should work with `#![no_std]`, too.
## proc-macro-regex vs regex
Advantages:
* Compile-time (no runtime initialization, no lazy-static)
* Generated code that does not contain any dependencies
* No heap allocation
* Approximately 12%-68% faster for no trivia regex
Disadvantages:
* Currently, no group captures
* No runtime regex generation
### Performance
This is the performance comparison between this crate and the regex crate. If you want to test it
by yourself, run `cargo bench --bench compare`.
| E-Mail | 743.95 MiB/s | 441.67 MiB/s | 68.44 % |
| URL | 584.62 MiB/s | 519.00 MiB/s | 12.64 % |
| IPv6 | 746.92 MiB/s | 473.38 MiB/s | 57.78 % |
This was compiled with `rustc 1.53.0-nightly (392ba2ba1 2021-04-17)`.
## License
This project is licensed under the [BSD-3-Clause](https://opensource.org/licenses/BSD-3-Clause)
license.
### Contribution
Any contribution intentionally submitted for inclusion in `proc-macro-regex` by you, shall
be licensed as [BSD-3-Clause](https://opensource.org/licenses/BSD-3-Clause), without any additional
terms or conditions.