Oniguruma is the regex engine behind Ruby, PHP (mbstring), TextMate grammars, and tools like jq. It supports features that most regex libraries don't: named captures with multiple syntaxes, look-behind of variable length, conditional patterns, absent expressions, 886 Unicode properties, subexpression calls, and 12 syntax modes from Perl to POSIX.
Ferroni is a line-by-line Rust port of this engine — same structure, same
opcodes, same optimization passes — with SIMD-vectorized search via
memchr layered on top. The result:
up to 61x faster than C on scanner first-match, while an idiomatic Rust
API (Regex::new(), typed errors, Match/Captures) keeps the ergonomics
clean.
For syntax highlighting, Ferroni also includes a multi-pattern Scanner API compatible with vscode-oniguruma, used by Shiki, VS Code, and other TextMate-based highlighters.
Why Ferroni?
Full Oniguruma, pure Rust. Named captures, variable-length look-behind, conditionals, absent expressions, Unicode properties, subexpression calls — everything the C engine supports, without linking against C. If your pattern works in Oniguruma, it works in Ferroni. Every opcode and optimization pass is ported 1:1 and verified by 1,882 tests from three independent sources.
No more CVEs from C. C Oniguruma has a track record of memory safety vulnerabilities -- CVE-2019-13224 (CVSS 9.8), CVE-2019-19204, CVE-2019-19246, CVE-2019-19012, CVE-2019-13225 -- affecting Ruby, PHP, and anything linking against it. Ferroni eliminates buffer overflows, use-after-free, and NULL dereferences structurally through Rust's type system. 0.4% unsafe code, all documented in ADR-002.
No C toolchain required. Pure cargo build. Cross-compiles to
wasm32-unknown-unknown. Ship it as a Node.js native module via
napi-rs without node-gyp or a C compiler on the
user's machine.
Built-in multi-pattern scanner. For syntax highlighting with TextMate
grammars, Ferroni includes a
vscode-oniguruma-compatible Scanner API — regex engine and
scanner in a single dependency. cargo add ferroni and you're done.
Quick start
Add to your Cargo.toml:
[]
= "1"
Regex
use *;
Scanner API
The Scanner matches multiple patterns simultaneously -- the core operation behind TextMate-based syntax highlighting. Results include UTF-16 position mapping for direct use with vscode-textmate and Shiki.
use ;
let mut scanner = new.unwrap;
let code = r#"const x = "hello" // greeting"#;
let m = scanner.find_next_match.unwrap;
assert_eq!; // pattern 0 matched first ("const")
assert_eq!;
assert_eq!;
For fine-grained control, use RegexBuilder:
use *;
let re = builder
.case_insensitive
.build
.unwrap;
assert!;
The full C-ported API is also available for advanced usage:
use onig_new;
use onig_search;
use *;
use OnigSyntaxOniguruma;
let reg = onig_new.unwrap;
let input = b"Date: 2026-02-12";
let = onig_search;
assert!;
assert_eq!; // match starts at byte 6
Supported features
Scanner -- multi-pattern matching with result caching, two search strategies (RegSet for short strings, per-regex for long strings), and automatic UTF-16 position mapping. API-compatible with vscode-oniguruma.
Full Oniguruma regex -- every feature from the C engine:
- All Perl/Ruby/Python syntax --
(?:...),(?=...),(?!...),(?<=...),(?<!...),(?>...) - Named captures --
(?<name>...),(?'name'...),(?P<name>...) - Backreferences --
\k<name>,\g<name>, relative\g<-1> - Conditionals --
(?(cond)T|F) - Absent expressions --
(?~...) - Unicode properties --
\p{Script_Extensions=Greek},\p{Lu},\p{Emoji}(886 names) - Grapheme clusters --
\X, text segment boundaries\y,\Y - Callouts --
(?{...}),(*FAIL),(*MAX{n}),(*COUNT),(*CMP) - 12 syntax modes -- Oniguruma, Ruby, Perl, Perl_NG, Python, Java, Emacs, Grep, GNU, POSIX Basic/Extended, ASIS
- Safety limits -- retry, time, stack, subexp call depth (global + per-search)
Performance
All numbers compare Ferroni against C Oniguruma at -O3,
measured with Criterion on
Apple M1 Ultra. Bold = faster engine. See
full tables for all benchmarks.
Syntax highlighting
Syntax highlighters like Shiki compile a full TextMate grammar -- hundreds of regex patterns -- and scan each line token by token. We benchmark against complete, unmodified Shiki grammars for TypeScript (279 patterns), CSS (117 patterns), and Rust (81 patterns). No cherry-picked subsets.
| Scenario | Ferroni | C Oniguruma | Speedup |
|---|---|---|---|
| TypeScript (279 patterns) | |||
| Compile | 10.1 ms | 16.8 ms | 1.7x |
| First match | 414 ns | 25.3 us | 61x |
| Tokenize full line | 7.0 us | 221 us | 32x |
| Rust (81 patterns) | |||
| Compile | 257 us | 181 us | 0.7x |
| First match | 181 ns | 5.6 us | 31x |
| Tokenize full line | 8.2 us | 82.2 us | 10x |
| CSS (117 patterns) | |||
| Compile | 13.7 ms | 19.0 ms | 1.4x |
| Tokenize full line | 1.60 ms | 14.9 ms | 9.3x |
Text search and log scanning
First-match latency and full-scan rejection on log-sized inputs. The
regex crate is included where the
pattern is compatible with its syntax.
| Scenario | Ferroni | C Oniguruma | regex |
|---|---|---|---|
| Literal in 50 KB | 74 ns | 150 ns | 10 ns |
| No match, 50 KB | 1.53 us | 9.5 us | 1.46 us |
| No match, 10 KB | 357 ns | 1.96 us | 298 ns |
| Field extract, 50 KB | 127 ns | 172 ns | 56 ns |
| Timestamp, 50 KB | 120 ns | 177 ns | 54 ns |
| RegSet multi-pattern (5) | 101 ns | 395 ns | — |
The regex crate's DFA engine gives it a clear advantage on text search
workloads. memchr (shared by both
Ferroni and regex) enables SIMD-accelerated literal scans, but regex
goes further with full DFA-based matching that avoids per-character
backtracking. RegSet multi-pattern has no direct regex equivalent.
Pattern matching
One representative pattern per regex feature. Bold = fastest engine.
regex is omitted for features it does not support (lookaround,
backreferences).
| Category | Ferroni | C Oniguruma | regex |
|---|---|---|---|
| Literal exact | 104 ns | 159 ns | 11 ns |
| Quantifier greedy | 183 ns | 261 ns | 65 ns |
| Lookaround combined | 83 ns | 292 ns | — |
Unicode \p{Greek}+ |
96 ns | 251 ns | 60 ns |
Backref (\w+) \1 |
79 ns | 199 ns | — |
| Case-insensitive phrase | 101 ns | 188 ns | 62 ns |
| Alternation, 2 branches | 62 ns | 157 ns | 48 ns |
| Alternation, 10 branches | 49 ns | 225 ns | 21 ns |
| Named capture date | 361 ns | 277 ns | 44 ns |
Compilation
Simple patterns compile within 5% of C. The regex crate compiles
significantly slower due to DFA construction -- the cost of its faster
matching. Lookbehind is not supported by regex.
| Pattern | Ferroni | C Oniguruma | regex |
|---|---|---|---|
| Literal | 439 ns | 448 ns | 2.33 us |
| Named capture | 4.67 us | 5.78 us | 193 us |
| Lookbehind | 992 ns | 556 ns | — |
Where Ferroni is slower
- vs
regexcrate -- for patterns thatregexsupports, its DFA engine is 2-10x faster at matching (but 5-40x slower to compile) - Named capture extraction -- 1.3x vs C (region bookkeeping overhead)
- Scanner warm cache -- 2.2x vs C (C's pointer comparison vs hash lookup)
Ferroni vs the regex crate
The regex crate is faster at matching for all patterns it supports, thanks
to its DFA-based engine with guaranteed linear time. However, it compiles
5-40x slower and does not support: variable-length lookbehind,
backreferences, conditional patterns, absent expressions, subexpression
calls, named captures with multiple syntaxes ((?<n>), (?'n'),
(?P<n>)), TextMate grammar support, or drop-in replacement for Ruby/PHP
regex behavior. Use regex when your
patterns fit its syntax and compilation cost is amortized. Use Ferroni when
you need full Oniguruma compatibility.
# HTML report: target/criterion/report/index.html
Architecture
Each C source file maps 1:1 to a Rust module (ADR-001):
| C File | Rust Module | Purpose |
|---|---|---|
| regparse.c | regparse.rs |
Pattern parser |
| regcomp.c | regcomp.rs |
AST-to-bytecode compiler |
| regexec.c | regexec.rs |
VM executor |
| regint.h | regint.rs |
Internal types and opcodes |
| oniguruma.h | oniguruma.rs |
Public types and constants |
| regenc.c | regenc.rs |
Encoding trait |
| regsyntax.c | regsyntax.rs |
12 syntax definitions |
| regset.c | regset.rs |
Multi-regex search (RegSet) |
| regerror.c | regerror.rs |
Error messages |
| regtrav.c | regtrav.rs |
Capture tree traversal |
| unicode.c | unicode/mod.rs |
Unicode tables and segmentation |
| -- | scanner.rs |
Multi-pattern scanner for syntax highlighting |
Compilation pipeline (same as C):
onig_new() -> onig_compile()
-> onig_parse_tree() (pattern -> AST)
-> reduce_string_list() (merge adjacent strings)
-> tune_tree() (6 optimization sub-passes)
-> compile_tree() (AST -> VM bytecode)
-> set_optimize_info() (extract search strategy)
Scope
Ferroni targets ASCII/UTF-8 workloads. The following are intentionally not included:
- 27 of 29 encodings -- only ASCII and UTF-8 (ADR-003)
- POSIX/GNU API --
regcomp/regexec/regfree(ADR-012) - C memory management -- replaced by Rust's
Droptrait onig_new_deluxe-- C-specific allocation, useonig_new()instead
Running tests
# Full UTF-8 suite (requires increased stack for debug builds)
RUST_MIN_STACK=268435456
# Other suites
RUST_MIN_STACK=268435456
Warning: Never run
cargo test -- --ignored-- theconditional_recursion_complextest intentionally hangs.
Test coverage
1,882 tests from three independent sources:
- 1,554 ported 1:1 from C Oniguruma's test suite
- 25 from vscode-oniguruma's TypeScript tests (Scanner API, UTF-16 mapping)
- 303 Rust-specific tests for edge cases, error paths, and gaps in the upstream suites
C Oniguruma has no coverage reporting. Ferroni's test suite is a strict superset.
| Metric | Value | Notes |
|---|---|---|
| Function coverage | >94% | All reachable API and internal functions |
| Line coverage | ~82% | 42 deeply recursive tests overflow under LLVM instrumentation |
| Tests executed | 1,840 of 1,882 | All 1,882 pass in normal cargo test |
Coverage measured with cargo-llvm-cov, reported to Codecov.
Architecture decision records
| ADR | Decision |
|---|---|
| 001 | 1:1 structural parity with C original |
| 002 | Unsafe code policy |
| 003 | Encoding scope: ASCII and UTF-8 only |
| 004 | C-to-Rust translation patterns |
| 005 | Idiomatic Rust API layer |
| 006 | Scanner API for TextMate tokenization |
| 007 | SIMD-accelerated search via memchr |
| 008 | Rust-only optimizations and performance philosophy |
| 009 | Dependency philosophy |
| 010 | Benchmark strategy |
| 011 | Test strategy and C test suite parity |
| 012 | POSIX and GNU API not ported |
| 013 | Stack overflow mitigation in debug builds |
| 014 | Porting bugs: lessons learned |
Contributing
Contributions are welcome! Please read CONTRIBUTING.md and review the ADRs before submitting a PR.
Acknowledgments
Ferroni is built on the work of K. Kosako and the Oniguruma contributors. The C original powers regex in Ruby, PHP, TextMate, jq, and many other projects. The Scanner API and its test suite are based on vscode-oniguruma by Nicolò Ribaudo and the VS Code team.
License
BSD-2-Clause (same as Oniguruma)
Copyright 2026 Sebastian Software GmbH