Crate safe_regex[][src]

crates.io version license: Apache 2.0 unsafe forbidden pipeline status

A safe regular expression library.

Features

  • forbid(unsafe_code)
  • Good test coverage (~80%)
  • Runtime is linear. Memory usage is constant. Runtime and memory usage are both O(n * r * g) where
    • n is the length of the data to check
    • r is the length of the regex
    • g is the number of capturing groups in the regex
  • Does not allocate
  • no_std
  • Rust compiler checks and optimizes the matcher
  • Supports basic regular expression syntax:
    • Any byte: .
    • Sequences: abc
    • Classes: [-ab0-9], [^ab]
    • Repetition: a?, a*, a+, a{1}, a{1,}, a{,1}, a{1,2}, a{,}
    • Alternates: a|b|c
    • Capturing groups: a(b*)?

Limitations

  • Only works on byte slices, not strings.

  • Partially optimized. Runtime is about 10 times slower than regex crate. Here are relative runtimes measured with safe-regex-rs/bench run on a 2018 Macbook Pro:

    regexsafe_regexexpression
    16find phone num .*([0-9]{3})[-. ]?([0-9]{3})[-. ]?([0-9]{4}).*
    118find date time .*([0-9]+)-([0-9]+)-([0-9]+) ([0-9]+):([0-9]+).*
    10.9parse date time ([0-9]+)-([0-9]+)-([0-9]+) ([0-9]+):([0-9]+)
    130check PEM Base64 [a-zA-Z0-9+/=]{0,64}=*
    120-550substring search .*(2G8H81RFNZ).*

Alternatives

  • regex
    • Mature & Popular
    • Maintained by the core Rust language developers
    • Contains unsafe code.
  • pcre2
    • Uses PCRE library which is written in unsafe C.
  • regular-expression
    • No documentation
  • rec

Cargo Geiger Safety Report

Examples

use safe_regex::{regex, IsMatch, Matcher0};
let matcher: Matcher0<_> =
    regex!(br"[abc][0-9]*");
assert!(matcher.is_match(b"a42"));
assert!(!matcher.is_match(b"X"));
use safe_regex::{regex, IsMatch, Matcher2};
let matcher: Matcher2<_> =
    regex!(br"([abc])([0-9]*)");
let (prefix, digits) =
    matcher.match_all(b"a42").unwrap();
assert_eq!(b"a", prefix.unwrap());
assert_eq!(b"42", digits.unwrap());

Changelog

  • v0.2.0
    • Linear-time & constant-memory algorithm! :)
    • Work around rustc optimizer hang on regexes with exponential execution paths like “a{,30}”. See src/bin/uncompilable/main.rs.
  • v0.1.1 - Bug fixes and more tests.
  • v0.1.0 - First published version

TO DO

  • DONE - Read about regular expressions
  • DONE - Read about NFAs, https://swtch.com/~rsc/regexp/
  • DONE - Design API
  • DONE - Implement
  • DONE - Add integration tests
  • Simplify match_all return type
  • Non-capturing groups
  • 11+ capturing groups
  • Increase coverage
  • Add fuzzing tests
  • Common character classes: whitespace, letters, punctuation, etc.
  • Match strings
  • Implement optimizations explained in https://swtch.com/%7Ersc/regexp/regexp3.html . Some of the code already exists in tests/dfa_single_pass.rs and tests/nfa_without_capturing.rs.
  • Once const generics are stable, use the feature to simplify some types.
  • Once trait bounds on `const fn` parameters are stable, make the MatcherN::new functions const.

Release Process

  1. Edit Cargo.toml and bump version number.
  2. Run ../release.sh

Modules

internal

Macros

regex

Compiles a regular expression into a Rust type.

Structs

Matcher0

A compiled regular expression with no capturing groups.

Matcher1

A compiled regular expression with 1 capturing group.

Matcher2

A compiled regular expression with 2 capturing groups.

Matcher3

A compiled regular expression with 3 capturing groups.

Matcher4

A compiled regular expression with 4 capturing groups.

Matcher5

A compiled regular expression with 5 capturing groups.

Matcher6

A compiled regular expression with 6 capturing groups.

Matcher7

A compiled regular expression with 7 capturing groups.

Matcher8

A compiled regular expression with 8 capturing groups.

Matcher9

A compiled regular expression with 9 capturing groups.

Matcher10

A compiled regular expression with 10 capturing groups.

Traits

IsMatch

Provides an is_match function.