Syntax highlighting in VS Code, Shiki, and every editor built on TextMate grammars runs on two things: an Oniguruma regex engine and a multi-pattern scanner. Today, that means C code with native bindings via vscode-oniguruma.
Ferroni puts both into a single Rust crate. Same regex semantics, same
Scanner API, no C compiler needed. Just cargo build.
It is a line-by-line port of Oniguruma's C source -- same structure, same
opcodes, same optimization passes -- with SIMD-vectorized search via
memchr layered on top. The result:
up to 6x faster than C on full-text scanning, while an idiomatic Rust
API (Regex::new(), typed errors, Match/Captures) keeps the ergonomics
clean.
Why Ferroni?
Regex engine + scanner in one crate. If you're building a syntax
highlighter, a TextMate grammar host, or anything that matches multiple
patterns against source code, you used to need C Oniguruma plus native
bindings. Ferroni gives you both the regex engine and the
vscode-oniguruma-compatible Scanner API in a single
dependency. cargo add ferroni and you're done.
No more CVEs from C. C Oniguruma has a track record of memory safety vulnerabilities -- CVE-2019-13224 (CVSS 9.8), CVE-2019-19204, CVE-2019-19246, CVE-2019-19012, CVE-2019-13225 -- affecting Ruby, PHP, and anything linking against it. Ferroni eliminates buffer overflows, use-after-free, and NULL dereferences structurally through Rust's type system. 0.4% unsafe code, all documented in ADR-005.
Drop-in compatible. If your pattern works in Oniguruma, it works in Ferroni. Every opcode, every optimization pass is ported 1:1 from C and verified by 1,882 tests from three independent sources.
No C toolchain required. Pure cargo build. Cross-compiles to
wasm32-unknown-unknown. Ship it as a Node.js native module via
napi-rs without node-gyp or a C compiler on the
user's machine.
Quick start
Add to your Cargo.toml:
[]
= "1"
Regex
use *;
Scanner API
The Scanner matches multiple patterns simultaneously -- the core operation behind TextMate-based syntax highlighting. Results include UTF-16 position mapping for direct use with vscode-textmate and Shiki.
use ;
let mut scanner = new.unwrap;
let code = r#"const x = "hello" // greeting"#;
let m = scanner.find_next_match.unwrap;
assert_eq!; // pattern 0 matched first ("const")
assert_eq!;
assert_eq!;
For fine-grained control, use RegexBuilder:
use *;
let re = builder
.case_insensitive
.build
.unwrap;
assert!;
The full C-ported API is also available for advanced usage:
use onig_new;
use onig_search;
use *;
use OnigSyntaxOniguruma;
let reg = onig_new.unwrap;
let input = b"Date: 2026-02-12";
let = onig_search;
assert!;
assert_eq!; // match starts at byte 6
Supported features
Scanner -- multi-pattern matching with result caching, two search strategies (RegSet for short strings, per-regex for long strings), and automatic UTF-16 position mapping. API-compatible with vscode-oniguruma.
Full Oniguruma regex -- every feature from the C engine:
- All Perl/Ruby/Python syntax --
(?:...),(?=...),(?!...),(?<=...),(?<!...),(?>...) - Named captures --
(?<name>...),(?'name'...),(?P<name>...) - Backreferences --
\k<name>,\g<name>, relative\g<-1> - Conditionals --
(?(cond)T|F) - Absent expressions --
(?~...) - Unicode properties --
\p{Script_Extensions=Greek},\p{Lu},\p{Emoji}(886 names) - Grapheme clusters --
\X, text segment boundaries\y,\Y - Callouts --
(?{...}),(*FAIL),(*MAX{n}),(*COUNT),(*CMP) - 12 syntax modes -- Oniguruma, Ruby, Perl, Perl_NG, Python, Java, Emacs, Grep, GNU, POSIX Basic/Extended, ASIS
- Safety limits -- retry, time, stack, subexp call depth (global + per-search)
Performance
Ferroni wins 31 of 42 execution benchmarks against C Oniguruma at -O3.
Criterion, Apple M1 Ultra. Bold = faster engine.
Highlights
| Scenario | Ferroni | C Oniguruma | Factor |
|---|---|---|---|
| Full-text scan, no match, 50 KB | 1.5 us | 9.3 us | 6.0x |
| Full-text scan, no match, 10 KB | 378 ns | 1.9 us | 5.0x |
| Scanner, short string | 168 ns | 407 ns | 2.4x |
| Multi-pattern RegSet | 147 ns | 396 ns | 2.7x |
| Scanner, warm cache | 24 ns | 23 ns | 1.06x |
The largest gains come from SIMD-vectorized search via
memchr -- NEON on ARM, SSE2/AVX2 on
x86-64 -- replacing C's hand-written byte loops with vectorized scans.
See ADR-006.
The Scanner warm path (all patterns served from cache, the steady-state in a syntax highlighter) runs at 24 ns -- within 6% of the C implementation. No heap allocation on cache hits.
Compilation is 1.2-1.7x slower than C (Rust allocates more per compilation), but patterns are compiled once and matched millions of times.
Regex execution
| Benchmark | Rust | C | Ratio |
|---|---|---|---|
| Literal match | |||
| exact string | 135 ns | 159 ns | 0.85 |
| anchored start | 105 ns | 151 ns | 0.69 |
| anchored end | 167 ns | 163 ns | 1.02 |
| word boundary | 120 ns | 151 ns | 0.80 |
| Quantifiers | |||
| greedy | 215 ns | 255 ns | 0.84 |
| lazy | 193 ns | 206 ns | 0.93 |
| possessive | 199 ns | 224 ns | 0.89 |
| nested | 200 ns | 212 ns | 0.94 |
| Alternation | |||
| 2 branches | 107 ns | 155 ns | 0.69 |
| 5 branches | 122 ns | 173 ns | 0.71 |
| 10 branches | 246 ns | 220 ns | 1.12 |
| nested | 129 ns | 184 ns | 0.70 |
| Backreferences | |||
simple (\w+) \1 |
150 ns | 183 ns | 0.82 |
| nested | 156 ns | 185 ns | 0.84 |
| named | 152 ns | 188 ns | 0.81 |
| Lookaround | |||
| positive lookahead | 128 ns | 170 ns | 0.76 |
| negative lookahead | 140 ns | 172 ns | 0.82 |
| positive lookbehind | 279 ns | 264 ns | 1.05 |
| negative lookbehind | 359 ns | 334 ns | 1.08 |
| combined | 301 ns | 280 ns | 1.08 |
| Unicode properties | |||
\p{Lu}+ |
92 ns | 150 ns | 0.62 |
\p{Letter}+ |
128 ns | 164 ns | 0.78 |
\p{Greek}+ |
323 ns | 246 ns | 1.31 |
\p{Cyrillic}+ |
450 ns | 329 ns | 1.37 |
| Case-insensitive | |||
| single word | 107 ns | 155 ns | 0.69 |
| phrase | 161 ns | 183 ns | 0.88 |
| alternation | 113 ns | 148 ns | 0.76 |
| Named captures | |||
| date extraction | 460 ns | 282 ns | 1.63 |
| Large text (first match) | |||
| literal 10 KB | 113 ns | 145 ns | 0.78 |
| literal 50 KB | 114 ns | 153 ns | 0.75 |
| timestamp 10 KB | 243 ns | 186 ns | 1.31 |
| timestamp 50 KB | 240 ns | 175 ns | 1.37 |
| field extract 10 KB | 160 ns | 170 ns | 0.94 |
| field extract 50 KB | 162 ns | 170 ns | 0.95 |
| no match 10 KB | 378 ns | 1.9 us | 0.20 |
| no match 50 KB | 1.5 us | 9.3 us | 0.17 |
| RegSet | |||
| position-lead (5 patterns) | 147 ns | 396 ns | 0.37 |
| regex-lead (5 patterns) | 164 ns | 233 ns | 0.70 |
| Match at position | |||
\d+ at offset 4 |
118 ns | 154 ns | 0.76 |
| Scanner (vs vscode-oniguruma C) | |||
| short string (RegSet path) | 168 ns | 407 ns | 0.41 |
| long string, cold (per-regex) | 191 ns | 188 ns | 1.02 |
| long string, warm (cached) | 24 ns | 23 ns | 1.06 |
Regex compilation
| Pattern | Rust | C | Ratio |
|---|---|---|---|
| literal | 416 ns | 449 ns | 0.93 |
.* |
745 ns | 517 ns | 1.44 |
| alternation | 1,711 ns | 1,410 ns | 1.21 |
| char class | 641 ns | 635 ns | 1.01 |
| quantifier | 1,356 ns | 1,059 ns | 1.28 |
| group | 1,040 ns | 803 ns | 1.30 |
| backref | 1,578 ns | 983 ns | 1.60 |
| lookahead | 733 ns | 474 ns | 1.55 |
| lookbehind | 678 ns | 538 ns | 1.26 |
| named capture | 46,153 ns | 5,734 ns | 8.05 |
Running benchmarks
# HTML report: target/criterion/report/index.html
Architecture
Each C source file maps 1:1 to a Rust module (ADR-001):
| C File | Rust Module | Purpose |
|---|---|---|
| regparse.c | regparse.rs |
Pattern parser |
| regcomp.c | regcomp.rs |
AST-to-bytecode compiler |
| regexec.c | regexec.rs |
VM executor |
| regint.h | regint.rs |
Internal types and opcodes |
| oniguruma.h | oniguruma.rs |
Public types and constants |
| regenc.c | regenc.rs |
Encoding trait |
| regsyntax.c | regsyntax.rs |
12 syntax definitions |
| regset.c | regset.rs |
Multi-regex search (RegSet) |
| regerror.c | regerror.rs |
Error messages |
| regtrav.c | regtrav.rs |
Capture tree traversal |
| unicode.c | unicode/mod.rs |
Unicode tables and segmentation |
| -- | scanner.rs |
Multi-pattern scanner for syntax highlighting |
Compilation pipeline (same as C):
onig_new() -> onig_compile()
-> onig_parse_tree() (pattern -> AST)
-> reduce_string_list() (merge adjacent strings)
-> tune_tree() (6 optimization sub-passes)
-> compile_tree() (AST -> VM bytecode)
-> set_optimize_info() (extract search strategy)
Scope
Ferroni targets ASCII/UTF-8 workloads. The following are intentionally not included:
- 27 of 29 encodings -- only ASCII and UTF-8 (ADR-002)
- POSIX/GNU API --
regcomp/regexec/regfree(ADR-007) - C memory management -- replaced by Rust's
Droptrait onig_new_deluxe-- C-specific allocation, useonig_new()instead
Running tests
# Full UTF-8 suite (requires increased stack for debug builds)
RUST_MIN_STACK=268435456
# Other suites
RUST_MIN_STACK=268435456
Warning: Never run
cargo test -- --ignored-- theconditional_recursion_complextest intentionally hangs.
Test coverage
1,882 tests from three independent sources:
- 1,554 ported 1:1 from C Oniguruma's test suite
- 25 from vscode-oniguruma's TypeScript tests (Scanner API, UTF-16 mapping)
- 303 Rust-specific tests for edge cases, error paths, and gaps in the upstream suites
C Oniguruma has no coverage reporting. Ferroni's test suite is a strict superset.
| Metric | Value | Notes |
|---|---|---|
| Function coverage | >94% | All reachable API and internal functions |
| Line coverage | ~82% | 42 deeply recursive tests overflow under LLVM instrumentation |
| Tests executed | 1,840 of 1,882 | All 1,882 pass in normal cargo test |
Coverage measured with cargo-llvm-cov, reported to Codecov.
Architecture decision records
| ADR | Decision |
|---|---|
| 001 | 1:1 structural parity with C original |
| 002 | ASCII and UTF-8 only |
| 003 | Stack overflow mitigation in debug builds |
| 004 | C-to-Rust translation patterns |
| 005 | Unsafe code policy |
| 006 | SIMD-accelerated search via memchr |
| 007 | POSIX and GNU API not ported |
| 008 | Test strategy and C test suite parity |
| 009 | Porting bugs: lessons learned |
| 010 | Idiomatic Rust API layer |
Contributing
Contributions are welcome! Please read CONTRIBUTING.md and review the ADRs before submitting a PR.
Acknowledgments
Ferroni is built on the work of K. Kosako and the Oniguruma contributors. The C original powers regex in Ruby, PHP, TextMate, and many other projects. The Scanner API and its test suite are based on vscode-oniguruma by Nicolò Ribaudo and the VS Code team.
License
BSD-2-Clause (same as Oniguruma)
Copyright 2026 Sebastian Software GmbH