regress 0.1.3

A regular expression engine targeting EcmaScript syntax
Documentation
# regress - REGex in Rust with EcmaScript Syntax

oh no why

## Introduction

regress is a backtracking regular expression engine implemented in Rust, which targets JavaScript regular expression syntax. See [the crate documentation](https://docs.rs/regress) for more.

It's fast, Unicode-aware, has few dependencies, and has a big test suite. It makes fewer guarantees than the `regex` crate but it enables more syntactic features, such as backreferences and lookaround assertions.

## Usage

Add this to your `Cargo.toml`:

```toml
[dependencies]
regress = "0.1"
```



### Fun Tools

The `tester` binary can be used for some fun.

You can see how things get compiled with the `dump-phases` crate feature:

    > cargo run --features dump-phases --bin tester 'x{3,4}' 'i'

You can run a little benchmark too, for example:

    > cargo run --release --bin tester 'abcd' 'i' --file ~/3200.txt


## Want to contribute?

This was my first Rust program so no doubt there is room for improvement.

There's lots of stuff still missing, maybe you want to contribute?

### Currently Missing Syntax

- Named capture groups like `(?<count>\d+)`
- Named character classes like `[[:alpha:]]`
- Unicode property escapes like `\p{Sc}`

### Currently Missing Features

- An API for replacing a string while substituting in capture groups (e.g. with `$1`)
- An API for escaping a string to make it a literal
- Implementing `std::str::pattern::Pattern`
- The `tester` binary needs some real usage.

### Missing Performance Optimizations

- Anchored matches like `^abc` still perform a string search. We should compute whether the whole regex is anchored, and optimize matching if so.
- Non-greedy loops like `.*?` will eagerly compute their maximum match. This doesn't affect correctness but it does mean they may match more than they should.
- Case-insensitive literals should compute the "preimage" (i.e. characters which fold together) instead of folding. In particular if the preimage is only that character this will accelerate matching.
- Pure literal searches should use Boyer-Moore or etc.
- The fold table should be bitpacked more tightly, e.g. using 24 bits for a code point.
- There are lots of vectorization opportunities.