safe-regex 0.3.0

Safe regular expression library
Documentation
[![crates.io version](https://img.shields.io/crates/v/safe-regex.svg)](https://crates.io/crates/safe-regex)
[![license: Apache 2.0](https://gitlab.com/leonhard-llc/safe-regex-rs/-/raw/main/license-apache-2.0.svg)](http://www.apache.org/licenses/LICENSE-2.0)
[![unsafe forbidden](https://gitlab.com/leonhard-llc/safe-regex-rs/-/raw/main/unsafe-forbidden-success.svg)](https://github.com/rust-secure-code/safety-dance/)
[![pipeline status](https://gitlab.com/leonhard-llc/safe-regex-rs/badges/main/pipeline.svg)](https://gitlab.com/leonhard-llc/safe-regex-rs/-/pipelines)

A safe regular expression library.

# Features
- `forbid(unsafe_code)`
- Good test coverage (~80%)
- Runtime is linear.
- Memory usage is constant.  Does not allocate.
- Compiles your regular expression to a simple Rust function
- Rust compiler checks and optimizes the matcher
- Supports basic regular expression syntax:
  - Any byte: `.`
  - Sequences: `abc`
  - Classes: `[-ab0-9]`, `[^ab]`
  - Repetition: `a?`, `a*`, `a+`, `a{1}`, `a{1,}`, `a{,1}`, `a{1,2}`, `a{,}`
  - Alternates: `a|b|c`
  - Capturing groups: `a(bc)?`
  - Non-capturing groups: `a(?:bc)?`
- `no_std`, by omitting the default `"std"` feature

# Limitations
- Only works on byte slices, not strings.
- Partially optimized.  Runtime is about 10 times slower than
  [`regex`]https://crates.io/crates/regex crate.
  Here are relative runtimes measured with
  [`safe-regex-rs/bench`]https://gitlab.com/leonhard-llc/safe-regex-rs/-/tree/main/bench
  run on a 2018 Macbook Pro:

  | `regex` | `safe_regex` | expression |
  | ----- | ---------- | ---------- |
  | 1 | 6 | find phone num `.*([0-9]{3})[-. ]?([0-9]{3})[-. ]?([0-9]{4}).*` |
  | 1 | 20 | find date time `.*([0-9]+)-([0-9]+)-([0-9]+) ([0-9]+):([0-9]+).*` |
  | 1 | 0.75 | parse date time `([0-9]+)-([0-9]+)-([0-9]+) ([0-9]+):([0-9]+)` |
  | 1 | 50 | check PEM Base64 `[a-zA-Z0-9+/]{0,64}=*` |
  | 1 | 20-500 | substring search `.*(2G8H81RFNZ).*` |

# Alternatives
- [`regex`]https://crates.io/crates/regex
  - Mature & Popular
  - Maintained by the core Rust language developers
  - Contains `unsafe` code.
  - Allocates
  - Compiles your regular expression at runtime at first use.
  - Subsequent uses must retrieve it from the cache.
- [`pcre2`]https://crates.io/crates/pcre2
  - Uses PCRE library which is written in unsafe C.
- [`regular-expression`]https://crates.io/crates/regular-expression
  - No documentation
- [`rec`]https://crates.io/crates/rec

# Cargo Geiger Safety Report
```

Metric output format: x/y
    x = unsafe code used by the build
    y = total unsafe code found in the crate

Symbols: 
    🔒  = No `unsafe` usage found, declares #![forbid(unsafe_code)]
    ❓  = No `unsafe` usage found, missing #![forbid(unsafe_code)]
    ☢️  = `unsafe` usage found

Functions  Expressions  Impls  Traits  Methods  Dependency

0/0        0/0          0/0    0/0     0/0      🔒  safe-regex 0.3.0
0/0        0/0          0/0    0/0     0/0      🔒  └── safe-regex-macro 0.3.0
0/0        0/0          0/0    0/0     0/0      🔒      ├── safe-proc-macro2 1.0.68
0/0        0/0          0/0    0/0     0/0      🔒      │   └── unicode-xid 0.2.4
0/0        0/0          0/0    0/0     0/0      🔒      └── safe-regex-compiler 0.3.0
0/0        0/0          0/0    0/0     0/0      🔒          ├── safe-proc-macro2 1.0.68
0/0        0/0          0/0    0/0     0/0      🔒          └── safe-quote 1.0.15
0/0        0/0          0/0    0/0     0/0      🔒              └── safe-proc-macro2 1.0.68

0/0        0/0          0/0    0/0     0/0    

```
# Examples
```rust
use safe_regex::{regex, Matcher0};
let matcher: Matcher0<_> =
    regex!(br"[ab][0-9]*");
assert!(matcher.is_match(b"a42"));
assert!(!matcher.is_match(b"X"));
```

```rust
use safe_regex::{regex, Matcher3};
let matcher: Matcher3<_> =
    regex!(br"([ab])([0-9]*)(suffix)?");
let (prefix, digits, suffix) =
    matcher.match_slices(b"a42").unwrap();
assert_eq!(b"a", prefix);
assert_eq!(b"42", digits);
assert_eq!(b"", suffix);
let (prefix_range, digits_r, suffix_r)
    = matcher.match_ranges(b"a42").unwrap();
assert_eq!(0..1_usize, prefix_range);
assert_eq!(1..3_usize, digits_r);
assert_eq!(0..0_usize, suffix_r);
```

# Changelog
- v0.3.0 - Add `assert_match` and default `std` feature.
- v0.2.6 - Fix some Clippy warnings on `regex!` macro invocation sites.
- v0.2.5 - Fix `no_std`.  Thank you, Soares Chen! [github.com/soareschen]https://github.com/soareschen [gitlab.com/soareschen-informal]https://gitlab.com/soareschen-informal
- v0.2.4
  - Bug fixes, reducing performance.
  - Optimize non-match runtime.
- v0.2.3
  - Rename `match_all` -> `match_slices`.
  - Add `match_ranges`.
- v0.2.2 - Simplify `match_all` return type
- v0.2.1 - Non-capturing groups, bug fixes
- v0.2.0
  - Linear-time & constant-memory algorithm! :)
  - Work around rustc optimizer hang on regexes with exponential execution paths like "a{,30}".
    See `src/bin/uncompilable/main.rs`.
- v0.1.1 - Bug fixes and more tests.
- v0.1.0 - First published version

# TO DO
- 11+ capturing groups
- Increase coverage
- Add fuzzing tests
- Common character classes: whitespace, letters, punctuation, etc.
- Match strings
- Repeated capturing groups: `(ab|cd)*`.
  Idea: Return an `MatcherNIter` struct that is an iterator that returns `MatcherN` structs.
- Implement optimizations explained in <https://swtch.com/%7Ersc/regexp/regexp3.html> .
  Some of the code already exists in `tests/dfa_single_pass.rs`
  and `tests/nfa_without_capturing.rs`.
- Once [const generics]https://github.com/rust-lang/rust/issues/44580
  are stable, use the feature to simplify some types.
- Once
  [trait bounds on `const fn` parameters are stable]https://github.com/rust-lang/rust/issues/57563,
  make the `MatcherN::new` functions `const`.

# Development
- An overview of how this library works:
  <https://news.ycombinator.com/item?id=27301320>

License: Apache-2.0