# Technical Architecture: regex101 for the Terminal (Rust TUI)
Comprehensive research and implementation guide for building a terminal-native regex
debugger using Rust, ratatui, regex-syntax, and PCRE2.
---
## Table of Contents
1. [Ratatui TUI Framework](#1-ratatui-tui-framework)
2. [regex-syntax Crate: AST Parsing](#2-regex-syntax-crate-ast-parsing)
3. [PCRE2 FFI from Rust](#3-pcre2-ffi-from-rust)
4. [Cross-Compilation Considerations](#4-cross-compilation-considerations)
5. [Project Structure](#5-project-structure)
6. [Dependency Summary](#6-dependency-summary)
7. [Implementation Roadmap](#7-implementation-roadmap)
---
## 1. Ratatui TUI Framework
### 1.1 Current Version and Status
- **Crate:** `ratatui` v0.30.0 (released 2025-12-26)
- **Terminal backend:** `crossterm` v0.28+ (cross-platform: Linux, macOS, Windows)
- **Ecosystem:** 18.6k GitHub stars, 18M+ downloads, 2,400+ dependent crates
- **Architecture:** Immediate-mode rendering with intermediate buffers
Starting with v0.30.0, ratatui was reorganized from a monolithic crate into a modular
workspace:
- `ratatui-core` -- core widget traits, text types, layout engine
- `ratatui-widgets` -- built-in widget implementations
- `ratatui` -- umbrella crate that re-exports everything (use this for applications)
Most applications should depend on the umbrella `ratatui` crate. Widget library authors
should depend on `ratatui-core` for better API stability.
### 1.2 Immediate-Mode Rendering Model
Ratatui uses an immediate-mode rendering paradigm: **the entire UI is redrawn from
scratch every frame** based on current application state. There is no retained widget
tree. The framework handles diffing buffers between frames to minimize actual terminal
I/O.
```rust
// Core render loop pattern
loop {
terminal.draw(|frame| {
// Rebuild the entire UI from application state every frame.
// frame.area() gives the full terminal Rect.
render_app(frame, &app_state);
})?;
// Handle input events
handle_events(&mut app_state)?;
}
```
Key implications for the regex debugger:
- The regex pattern, test string, match results, and explanation are all just state.
- On every keystroke, update the state then let the next `draw()` reflect it.
- No need to manage widget lifecycle or partial updates -- the framework handles diffing.
- Performance: ratatui maintains 60+ FPS even with complex layouts via intelligent
buffer diffing and minimal ANSI escape output.
### 1.3 Real-Time Input Handling (Keystroke-by-Keystroke)
There are two primary architectures for input handling:
**Architecture A: Synchronous Polling (simpler, good for most cases)**
```rust
use crossterm::event::{self, Event, KeyCode, KeyEventKind};
use std::time::Duration;
fn run_app(terminal: &mut DefaultTerminal, app: &mut App) -> io::Result<()> {
loop {
terminal.draw(|frame| render(frame, app))?;
// Poll with timeout to allow periodic UI updates (cursor blink, etc.)
if event::poll(Duration::from_millis(50))? {
if let Event::Key(key) = event::read()? {
// IMPORTANT: On Windows, crossterm emits Press, Release, AND Repeat
// events. Always filter to Press only.
if key.kind == KeyEventKind::Press {
match key.code {
KeyCode::Char(c) => app.insert_char(c),
KeyCode::Backspace => app.delete_char(),
KeyCode::Enter => app.submit(),
KeyCode::Esc => return Ok(()),
KeyCode::Tab => app.cycle_focus(),
KeyCode::Left => app.move_cursor_left(),
KeyCode::Right => app.move_cursor_right(),
_ => {}
}
}
}
}
}
}
```
**Architecture B: Async with Tokio + Channels (better for concurrent work)**
This is the recommended approach for the regex debugger because regex compilation and
matching (especially PCRE2 JIT) can be non-trivial work that should not block the UI.
```rust
use crossterm::event::EventStream;
use tokio::sync::mpsc;
use tokio_stream::StreamExt;
enum AppEvent {
Key(crossterm::event::KeyEvent),
Tick,
Render,
Resize(u16, u16),
}
async fn event_loop(tx: mpsc::UnboundedSender<AppEvent>) {
let mut reader = EventStream::new();
let mut tick_interval = tokio::time::interval(Duration::from_millis(250));
let mut render_interval = tokio::time::interval(Duration::from_millis(16)); // ~60fps
loop {
tokio::select! {
_ = tick_interval.tick() => { tx.send(AppEvent::Tick).ok(); }
_ = render_interval.tick() => { tx.send(AppEvent::Render).ok(); }
Some(Ok(event)) = reader.next() => {
match event {
Event::Key(key) if key.kind == KeyEventKind::Press => {
tx.send(AppEvent::Key(key)).ok();
}
Event::Resize(w, h) => {
tx.send(AppEvent::Resize(w, h)).ok();
}
_ => {}
}
}
}
}
}
```
**Required crossterm features for async:**
```toml
crossterm = { version = "0.28", features = ["event-stream"] }
tokio = { version = "1", features = ["full"] }
futures = "0.3"
```
**Recommendation for the regex debugger:** Use Architecture B. Regex compilation (especially
PCRE2 with JIT) and AST explanation generation should happen in a background task so the
UI stays responsive. Send results back via channels.
### 1.4 Syntax Highlighting / Colored Text Spans
Ratatui's text rendering model is built on three composable types:
```
Span -> Line -> Text
^ ^ ^
string Spans Lines
chunk
```
**Core types:**
```rust
use ratatui::text::{Span, Line, Text};
use ratatui::style::{Style, Color, Modifier};
// A Span is a styled string fragment
let plain = Span::raw("hello");
let styled = Span::styled("world", Style::default()
.fg(Color::Red)
.bg(Color::White)
.add_modifier(Modifier::BOLD));
// A Line is a sequence of Spans (one terminal row)
let line = Line::from(vec![
Span::styled("Group 1: ", Style::default().fg(Color::Yellow)),
Span::styled("captured text", Style::default().fg(Color::Green).bg(Color::Black)),
]);
// Shorthand style API (v0.26+)
let line = Line::from(vec![
"hello".red(),
" ".into(),
"world".red().bold(),
]);
// Text is a vec of Lines (multi-row content)
let text = Text::from(vec![line1, line2, line3]);
// Render via Paragraph widget
let paragraph = Paragraph::new(text)
.block(Block::bordered().title("Matches"))
.wrap(Wrap { trim: true });
frame.render_widget(paragraph, area);
```
**For the regex debugger, this means:**
- Each match region in the test string becomes a colored `Span`
- Different capture groups get different `fg` colors
- Non-matching text is plain `Span::raw()`
- The explanation panel uses styled Spans for syntax highlighting of regex tokens
- Build a function: `fn highlight_matches(text: &str, matches: &[MatchResult]) -> Text`
**Available colors:**
- 16 ANSI colors: `Color::Red`, `Color::Green`, `Color::Blue`, etc.
- 256 indexed: `Color::Indexed(196)`
- True color RGB: `Color::Rgb(255, 100, 50)`
- Modifiers: `BOLD`, `ITALIC`, `UNDERLINED`, `REVERSED`, `DIM`, `CROSSED_OUT`
### 1.5 Multiple Panes / Panels Layout
Ratatui's `Layout` engine divides a `Rect` into sub-`Rect`s using constraints:
```rust
use ratatui::layout::{Layout, Direction, Constraint, Rect};
fn render(frame: &mut Frame, app: &App) {
let area = frame.area();
// Top-level: vertical split into 3 rows
let main_layout = Layout::new(Direction::Vertical, [
Constraint::Length(3), // Row 1: regex input (fixed height)
Constraint::Min(5), // Row 2: main content (fills remaining)
Constraint::Length(3), // Row 3: status bar (fixed height)
]).split(area);
// Row 2: horizontal split into left and right panels
let content_layout = Layout::new(Direction::Horizontal, [
Constraint::Percentage(60), // Left: test string + matches
Constraint::Percentage(40), // Right: explanation panel
]).split(main_layout[1]);
// Left panel: further vertical split
let left_layout = Layout::new(Direction::Vertical, [
Constraint::Percentage(50), // Test string input
Constraint::Percentage(50), // Match results
]).split(content_layout[0]);
// Render widgets into each sub-area
render_regex_input(frame, main_layout[0], app);
render_test_string(frame, left_layout[0], app);
render_match_results(frame, left_layout[1], app);
render_explanation(frame, content_layout[1], app);
render_status_bar(frame, main_layout[2], app);
}
```
**Constraint types and when to use each:**
| `Length(n)` | Exact n rows/cols | Input bars, status bars |
| `Min(n)` | At least n, expands to fill | Main content areas |
| `Max(n)` | At most n | Sidebars that shouldn't dominate |
| `Percentage(p)` | p% of parent | Proportional panels |
| `Ratio(a, b)` | a/b of parent | Precise fractions |
| `Fill(weight)` | Fill remaining space proportionally | Flexible content |
**Responsive layout best practices:**
- Use `Min`/`Max` constraints rather than fixed `Length` for content panels
- Check `frame.area().width` and `frame.area().height` to switch between layouts
(e.g., stack panels vertically on narrow terminals)
- Use `Fill` for elements that should share remaining space
- Listen for `Event::Resize` to trigger re-layout
- Cache layout computations via `Layout::init_cache()` for complex layouts
```rust
// Responsive: switch between horizontal and vertical layout
fn content_layout(area: Rect) -> (Rect, Rect) {
if area.width >= 120 {
// Wide terminal: side-by-side
let chunks = Layout::new(Direction::Horizontal, [
Constraint::Percentage(60),
Constraint::Percentage(40),
]).split(area);
(chunks[0], chunks[1])
} else {
// Narrow terminal: stacked
let chunks = Layout::new(Direction::Vertical, [
Constraint::Percentage(50),
Constraint::Percentage(50),
]).split(area);
(chunks[0], chunks[1])
}
}
```
### 1.6 Useful Built-in Widgets
| `Paragraph` | Test string display, explanation text, match results |
| `Block` | Borders and titles around every panel |
| `Tabs` | Engine selector (Rust regex / PCRE2 / fancy-regex) |
| `List` | Match list, capture group list |
| `Table` | Capture group details (name, index, span, value) |
| `Scrollbar` | Long test strings, long explanation output |
| `Clear` | Clearing areas before re-rendering |
### 1.7 Key ratatui Dependencies
```toml
[dependencies]
ratatui = { version = "0.30", features = ["all-widgets"] }
crossterm = { version = "0.28", features = ["event-stream"] }
```
---
## 2. regex-syntax Crate: AST Parsing
### 2.1 Current Version and Status
- **Crate:** `regex-syntax` v0.8.9
- **Maintained by:** Andrew Gallant (BurntSushi), same author as ripgrep
- **Stability:** Mature, part of the official `rust-lang/regex` repository
- **API guarantees:** Non-recursive parsing and traversal (constant stack space)
- **Memory:** Heap usage proportional to original pattern string length in bytes
### 2.2 Parsing a Regex into an AST
```rust
use regex_syntax::ast::parse::Parser;
use regex_syntax::ast::Ast;
// Parse a regex pattern into an AST
let mut parser = Parser::new();
let ast: Ast = parser.parse(r"(\d{3})-(\d{4})").unwrap();
// The parser handles the full regex syntax including:
// - Character classes: [a-z], \d, \w, \p{Greek}
// - Quantifiers: *, +, ?, {n,m}
// - Groups: (...), (?:...), (?P<name>...)
// - Escape sequences: \n, \t, \x{1F600}
// - Flags: (?i), (?m), (?s), (?x)
// - Comments (in extended mode)
```
There are two levels of representation:
- **AST (`regex_syntax::ast::Ast`):** Faithful to the concrete syntax. Preserves
whitespace, comments, and exact notation used. Best for generating explanations
because it retains the user's original intent.
- **HIR (`regex_syntax::hir::Hir`):** High-Level Intermediate Representation. Simplified,
normalized form. Better for compilation/matching but loses syntactic information.
**For the explanation engine, use AST.** It preserves exactly what the user typed.
### 2.3 Complete AST Node Types
The `Ast` enum has 12 variants:
```rust
pub enum Ast {
Empty(Box<Span>), // Empty regex, matches everything
Flags(Box<SetFlags>), // Inline flags: (?is)
Literal(Box<Literal>), // Single character: a, \n, \x41
Dot(Box<Span>), // Any character: .
Assertion(Box<Assertion>), // Zero-width: ^, $, \b, \B
ClassUnicode(Box<ClassUnicode>), // Unicode class: \pL, \p{Greek}
ClassPerl(Box<ClassPerl>), // Perl class: \d, \w, \s, \D, \W, \S
ClassBracketed(Box<ClassBracketed>),// Bracketed class: [a-z0-9]
Repetition(Box<Repetition>), // Quantifier: a*, b+, c?, d{3,5}
Group(Box<Group>), // Group: (a), (?:a), (?P<n>a)
Alternation(Box<Alternation>), // Alternation: a|b|c
Concat(Box<Concat>), // Concatenation: abc (implicit)
}
```
**Supporting types in detail:**
```
Literal
+-- c: char
+-- span: Span
Repetition
+-- span: Span
+-- op: RepetitionOp
| +-- span: Span
| +-- kind: RepetitionKind
| +-- ZeroOrOne -- ?
| +-- ZeroOrMore -- *
| +-- OneOrMore -- +
| +-- Range(RepetitionRange)
| +-- Exactly(u32) -- {n}
| +-- AtLeast(u32) -- {n,}
| +-- Bounded(u32, u32) -- {n,m}
+-- greedy: bool
+-- ast: Ast (the sub-expression being repeated)
Group
+-- span: Span
+-- kind: GroupKind
| +-- CaptureIndex(u32) -- (...)
| +-- CaptureName { name, index } -- (?P<name>...)
| +-- NonCapturing(Flags) -- (?:...)
+-- ast: Ast (the group's content)
Assertion
+-- kind: AssertionKind
+-- StartLine -- ^
+-- EndLine -- $
+-- StartText -- \A
+-- EndText -- \z
+-- WordBoundary -- \b
+-- NotWordBoundary -- \B
ClassBracketed
+-- span: Span
+-- negated: bool
+-- kind: ClassSet
+-- Item(ClassSetItem)
+-- BinaryOp(ClassSetBinaryOp)
ClassSetItem (enum)
ClassPerl
ClassUnicode
+-- kind: ClassUnicodeKind
+-- OneLetter(char) -- \pL
+-- Named(String) -- \p{Greek}
+-- NamedValue { op, name, value } -- \p{Script=Greek}
Alternation
+-- asts: Vec<Ast>
Concat
+-- asts: Vec<Ast>
Flags / SetFlags
+-- items: Vec<FlagsItem>
| SwapGreed | Unicode | IgnoreWhitespace | CRLF
```
### 2.4 Walking the AST with the Visitor Trait
The `Visitor` trait enables non-recursive depth-first traversal:
```rust
use regex_syntax::ast::{self, Ast, Visitor};
use regex_syntax::ast::parse::Parser;
struct ExplainVisitor {
explanations: Vec<String>,
depth: usize,
}
impl Visitor for ExplainVisitor {
type Output = Vec<String>;
type Err = String;
fn finish(self) -> Result<Self::Output, Self::Err> {
Ok(self.explanations)
}
fn visit_pre(&mut self, ast: &Ast) -> Result<(), Self::Err> {
let indent = " ".repeat(self.depth);
let explanation = match ast {
Ast::Literal(lit) => {
format!("{}Match the character '{}'", indent, lit.c)
}
Ast::Dot(_) => {
format!("{}Match any single character", indent)
}
Ast::Assertion(a) => {
let desc = match a.kind {
ast::AssertionKind::StartLine => "Assert position at start of line",
ast::AssertionKind::EndLine => "Assert position at end of line",
ast::AssertionKind::StartText => "Assert position at start of string",
ast::AssertionKind::EndText => "Assert position at end of string",
ast::AssertionKind::WordBoundary => "Assert position at a word boundary",
ast::AssertionKind::NotWordBoundary => "Assert position NOT at a word boundary",
};
format!("{}{}", indent, desc)
}
Ast::ClassPerl(cls) => {
let (name, desc) = match cls.kind {
ast::ClassPerlKind::Digit => ("\\d", "digit [0-9]"),
ast::ClassPerlKind::Space => ("\\s", "whitespace"),
ast::ClassPerlKind::Word => ("\\w", "word character [a-zA-Z0-9_]"),
};
let neg = if cls.negated { "non-" } else { "" };
format!("{}Match any {}{} ({})", indent, neg, desc, name)
}
Ast::Repetition(rep) => {
let quantifier = match &rep.op.kind {
ast::RepetitionKind::ZeroOrOne => "zero or one time".to_string(),
ast::RepetitionKind::ZeroOrMore => "zero or more times".to_string(),
ast::RepetitionKind::OneOrMore => "one or more times".to_string(),
ast::RepetitionKind::Range(range) => match range {
ast::RepetitionRange::Exactly(n) => format!("exactly {} time(s)", n),
ast::RepetitionRange::AtLeast(n) => format!("{} or more times", n),
ast::RepetitionRange::Bounded(n, m) => {
format!("between {} and {} times", n, m)
}
},
};
let greedy = if rep.greedy { "" } else { " (lazy)" };
format!("{}Repeat {}{}", indent, quantifier, greedy)
}
Ast::Group(group) => {
let kind = match &group.kind {
ast::GroupKind::CaptureIndex(i) => format!("Capturing group #{}", i),
ast::GroupKind::CaptureName { name, .. } => {
format!("Named capturing group '{}'", name.name)
}
ast::GroupKind::NonCapturing(_) => "Non-capturing group".to_string(),
};
format!("{}{}", indent, kind)
}
Ast::Alternation(_) => {
format!("{}Match one of the following alternatives:", indent)
}
Ast::Concat(_) => {
// Concatenation is implicit; often skip in explanations
return Ok(());
}
_ => return Ok(()),
};
self.explanations.push(explanation);
self.depth += 1;
Ok(())
}
fn visit_post(&mut self, _ast: &Ast) -> Result<(), Self::Err> {
self.depth = self.depth.saturating_sub(1);
Ok(())
}
fn visit_alternation_in(&mut self) -> Result<(), Self::Err> {
let indent = " ".repeat(self.depth);
self.explanations.push(format!("{}-- OR --", indent));
Ok(())
}
}
// Usage:
fn explain_regex(pattern: &str) -> Result<Vec<String>, String> {
let mut parser = Parser::new();
let ast = parser.parse(pattern).map_err(|e| e.to_string())?;
let visitor = ExplainVisitor {
explanations: Vec::new(),
depth: 0,
};
ast::visit(&ast, visitor)
}
```
**Key Visitor trait methods:**
| `start()` | Before traversal begins | Initialize state |
| `visit_pre(&Ast)` | Before descending into children | Generate explanation for current node |
| `visit_post(&Ast)` | After all children processed | Close brackets, decrease depth |
| `visit_alternation_in()` | Between alternation branches | Insert "OR" separator |
| `visit_concat_in()` | Between concatenated elements | (Rarely needed) |
| `visit_class_set_item_pre/post()` | Around class set items | Explain character ranges |
| `visit_class_set_binary_op_pre/in/post()` | Around set operations | Explain intersection/subtraction |
| `finish()` | After traversal completes | Return accumulated result |
**Critical design note:** All Visitor methods use constant stack space. The `visit()`
function drives traversal iteratively using an explicit stack on the heap. This is
essential because user-provided regex patterns can be arbitrarily deeply nested, and
recursive traversal would risk stack overflow.
### 2.5 AST vs HIR: When to Use Which
| Preserves syntax | Yes -- `\d` stays as `\d` | No -- `\d` becomes `[0-9]` |
| Comments/whitespace | Preserved | Stripped |
| Flags inline | Preserved as nodes | Applied/resolved |
| Use case | Explanation engine | Compilation, optimization |
| Module | `regex_syntax::ast` | `regex_syntax::hir` |
**For this project, use AST for the explanation engine and HIR for nothing** (matching
is delegated to the regex or pcre2 crates, which handle their own compilation).
---
## 3. PCRE2 FFI from Rust
### 3.1 Crate Status
**Primary crate:** `pcre2` v0.2.11
- Author: Andrew Gallant (BurntSushi) -- same author as ripgrep and regex
- Repository: https://github.com/BurntSushi/rust-pcre2
- Status: Functional but lightly maintained. PRs welcome but maintainer bandwidth is limited.
- Safety: Safe Rust wrapper around unsafe FFI calls
**FFI crate:** `pcre2-sys` (automatically pulled in)
- Build script looks for system `libpcre2` by default
- If `PCRE2_SYS_STATIC` env var is set, builds `libpcre2.a` from bundled sources
- Bundling sources means no system PCRE2 dependency needed for distribution
**API design:** The `pcre2` crate intentionally mirrors the `regex` crate API, making it
possible to write engine-agnostic code with a trait abstraction.
### 3.2 Compilation and Matching
```rust
use pcre2::bytes::{Regex, RegexBuilder};
// Basic compilation and matching
let re = Regex::new(r"(\d{3})-(\d{4})")?;
let text = b"Call 555-1234 or 555-5678";
// Check if pattern matches
assert!(re.is_match(text)?);
// Find first match
if let Some(m) = re.find(text)? {
println!("Match at {}..{}", m.start(), m.end());
}
// Capture groups
if let Some(caps) = re.captures(text)? {
println!("Full match: {:?}", &text[caps.get(0).unwrap()]);
println!("Area code: {:?}", &text[caps.get(1).unwrap()]);
println!("Number: {:?}", &text[caps.get(2).unwrap()]);
}
// Iterate all matches
for caps in re.captures_iter(text) {
let caps = caps?;
println!("Found: {:?}", &text[caps.get(0).unwrap()]);
}
// Named captures
let re = Regex::new(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})")?;
if let Some(caps) = re.captures(b"2025-03-15")? {
let year = std::str::from_utf8(&text[caps.get_group_by_name("year").unwrap()])?;
}
```
### 3.3 RegexBuilder Configuration
```rust
use pcre2::bytes::RegexBuilder;
let re = RegexBuilder::new()
.caseless(true) // Case-insensitive matching
.dotall(true) // . matches newlines
.multi_line(true) // ^ and $ match line boundaries
.extended(true) // Ignore whitespace in pattern (verbose mode)
.utf(true) // Enable UTF-8 mode
.ucp(true) // Unicode character properties for \w, \d, \s
.jit_if_available(true) // Enable JIT compilation (falls back gracefully)
.crlf(true) // \r\n line endings
.build(r"pattern here")?;
```
### 3.4 What PCRE2 Supports That Rust regex Does Not
This is a critical differentiator for the regex debugger. The ability to test patterns
with PCRE2 features is a major selling point.
| Backreferences `\1` | No | Yes | Yes |
| Positive lookahead `(?=...)` | No | Yes | Yes |
| Negative lookahead `(?!...)` | No | Yes | Yes |
| Positive lookbehind `(?<=...)` | No | Yes | Yes |
| Negative lookbehind `(?<!...)` | No | Yes | Yes |
| Atomic groups `(?>...)` | No | Yes | Yes |
| Possessive quantifiers `a++` | No | Yes | No |
| Conditional patterns `(?(cond)yes|no)` | No | Yes | No |
| Recursive patterns `(?R)` | No | Yes | No |
| Subroutine calls `(?&name)` | No | Yes | No |
| `\K` (reset match start) | No | Yes | No |
| Named backrefs `\k<name>` | No | Yes | Yes |
| Callouts | No | Yes | No |
| Unicode properties `\p{...}` | Yes | Yes | Yes |
| Guaranteed linear time | Yes | No | Partial |
| No C dependency | Yes | No | Yes |
### 3.5 fancy-regex as an Alternative/Supplement
**Crate:** `fancy-regex` -- pure Rust, no FFI
- Delegates to the Rust `regex` crate for patterns that don't need fancy features
- Falls back to a backtracking VM only for lookaround/backreferences
- Does NOT support: possessive quantifiers, conditional patterns, recursive patterns,
subroutine calls, `\K`
**Recommendation:** Support all three engines:
1. **Rust `regex`** -- default, fastest, guaranteed linear time
2. **`fancy-regex`** -- adds lookaround/backreferences without C dependency
3. **`pcre2`** -- full PCRE2 feature set, optional (requires C library)
This gives users a choice and avoids the PCRE2 C dependency for basic use cases.
### 3.6 Engine Abstraction Layer
```rust
/// Unified trait for regex engines
pub trait RegexEngine: Send + Sync {
fn name(&self) -> &str;
fn compile(&self, pattern: &str, flags: &EngineFlags) -> Result<CompiledRegex, EngineError>;
}
pub trait CompiledRegex: Send + Sync {
fn is_match(&self, text: &str) -> Result<bool, EngineError>;
fn find_all(&self, text: &str) -> Result<Vec<Match>, EngineError>;
fn captures_all(&self, text: &str) -> Result<Vec<CaptureGroup>, EngineError>;
}
pub struct Match {
pub start: usize,
pub end: usize,
pub text: String,
}
pub struct CaptureGroup {
pub index: usize,
pub name: Option<String>,
pub start: usize,
pub end: usize,
pub text: String,
}
pub struct EngineFlags {
pub case_insensitive: bool,
pub multi_line: bool,
pub dot_matches_newline: bool,
pub unicode: bool,
pub extended: bool,
}
```
---
## 4. Cross-Compilation Considerations
### 4.1 Pure Rust Components (No Issues)
The following crates are pure Rust and cross-compile trivially:
- `ratatui` + `crossterm` -- cross-platform by design (Linux, macOS, Windows)
- `regex` + `regex-syntax` -- pure Rust, no system dependencies
- `fancy-regex` -- pure Rust
Cross-compilation for these requires only adding the target:
```bash
rustup target add x86_64-unknown-linux-gnu
rustup target add x86_64-apple-darwin
rustup target add x86_64-pc-windows-msvc
rustup target add aarch64-unknown-linux-gnu
rustup target add aarch64-apple-darwin
```
### 4.2 PCRE2 Cross-Compilation Challenges
PCRE2 is a C library. The `pcre2-sys` crate handles this in two ways:
**Option A: System PCRE2 (default)**
- Requires `libpcre2-8` installed on the build system
- Harder for cross-compilation (need target-arch PCRE2 libraries)
- Package managers: `apt install libpcre2-dev`, `brew install pcre2`
**Option B: Static bundled build (recommended for distribution)**
```bash
# Build with bundled PCRE2 sources
PCRE2_SYS_STATIC=1 cargo build --release
```
- `pcre2-sys` bundles PCRE2 C source code and compiles it with `cc`
- Requires a C compiler for the target platform
- For cross-compilation, set `CC` and `CFLAGS`:
```bash
CC=x86_64-linux-gnu-gcc PCRE2_SYS_STATIC=1 cargo build \
--target x86_64-unknown-linux-gnu --release
```
**PCRE2 cross-compilation gotchas:**
1. PCRE2's `pcre2_dftables` utility must run on the build host, not the target.
The `pcre2-sys` build script handles this correctly for static builds.
2. musl targets: The build script used to assume static linking for all musl targets.
Set `PCRE2_SYS_STATIC=0` to explicitly disable if linking to system PCRE2 on musl.
3. Windows: Statically linking against non-DLL PCRE2 requires defining `PCRE2_STATIC`
before including `pcre2.h`. The `pcre2-sys` crate handles this automatically.
### 4.3 Recommended Build Strategy
**Make PCRE2 optional via a Cargo feature:**
```toml
[features]
default = ["pcre2-engine"]
pcre2-engine = ["pcre2"]
[dependencies]
pcre2 = { version = "0.2", optional = true }
```
This way:
- Default builds include PCRE2 support (most users want it)
- `cargo build --no-default-features` builds without any C dependency
- CI can test both configurations
- Distribution binaries use static PCRE2: `PCRE2_SYS_STATIC=1 cargo build --release`
### 4.4 CI/CD Cross-Platform Build Matrix
```yaml
# GitHub Actions example
strategy:
matrix:
include:
- target: x86_64-unknown-linux-gnu
os: ubuntu-latest
- target: aarch64-unknown-linux-gnu
os: ubuntu-latest
use_cross: true
- target: x86_64-apple-darwin
os: macos-latest
- target: aarch64-apple-darwin
os: macos-latest
- target: x86_64-pc-windows-msvc
os: windows-latest
env:
PCRE2_SYS_STATIC: 1
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
targets: ${{ matrix.target }}
- if: matrix.use_cross
uses: cross-rs/cross-action@v1
- run: cargo build --release --target ${{ matrix.target }}
```
**Tool recommendation:** Use `cross` (https://github.com/cross-rs/cross) for
Linux cross-compilation. It provides Docker containers with pre-configured
cross-compilation toolchains including C compilers, which handles the PCRE2
C compilation automatically.
### 4.5 Terminal Backend Compatibility
Crossterm provides cross-platform terminal manipulation:
- **Linux/macOS:** Uses termios/POSIX APIs. Works in all standard terminals.
- **Windows:** Uses Windows Console API. Works in cmd.exe, PowerShell, and
Windows Terminal. Supports Windows 7+ (legacy console) and Windows 10+
(virtual terminal sequences).
- **SSH sessions:** Work correctly because crossterm operates on stdin/stdout.
No platform-specific code is needed in the application layer.
---
## 5. Project Structure
### 5.1 Recommended Module Layout
```
regex-tui/
+-- Cargo.toml
+-- build.rs # (only if needed for PCRE2 build customization)
+-- src/
| +-- main.rs # Entry point: CLI arg parsing, terminal setup, run loop
| +-- app.rs # App state struct, mode management, event dispatch
| +-- lib.rs # Library root (for testability)
| +--
| +-- ui/
| | +-- mod.rs # Re-exports, top-level render() function
| | +-- layout.rs # Layout computation, responsive breakpoints
| | +-- regex_input.rs # Regex pattern input widget
| | +-- test_input.rs # Test string input widget
| | +-- match_display.rs # Match results display with highlighting
| | +-- explanation.rs # Regex explanation panel
| | +-- status_bar.rs # Status bar (engine, match count, errors)
| | +-- engine_selector.rs # Engine tab selector widget
| | +-- help.rs # Help overlay / keybinding reference
| | +-- theme.rs # Color palette, style definitions
| |
| +-- engine/
| | +-- mod.rs # RegexEngine trait, EngineFlags, Match, CaptureGroup
| | +-- rust_regex.rs # Rust regex crate implementation
| | +-- fancy.rs # fancy-regex implementation
| | +-- pcre2.rs # PCRE2 implementation (behind feature gate)
| | +-- error.rs # Unified error types across engines
| |
| +-- explain/
| | +-- mod.rs # Public explain() API
| | +-- visitor.rs # AST Visitor implementation
| | +-- formatter.rs # Plain-English formatting of AST nodes
| | +-- tree.rs # Tree-structured explanation model
| |
| +-- highlight/
| | +-- mod.rs # Match highlighting logic
| | +-- colors.rs # Color assignment for capture groups
| | +-- spans.rs # Convert matches to ratatui Spans/Lines
| |
| +-- input/
| | +-- mod.rs # Input handling, cursor management
| | +-- editor.rs # Text editor state (cursor pos, selection, scroll)
| | +-- keybindings.rs # Keybinding configuration
| |
| +-- config/
| | +-- mod.rs # Configuration loading/saving
| | +-- cli.rs # CLI argument definitions (clap)
| | +-- settings.rs # Persistent settings (default engine, theme, etc.)
| |
| +-- event.rs # Event types, event loop (async channel pattern)
|
+-- tests/
| +-- integration/
| | +-- engine_tests.rs # Cross-engine match result comparison
| | +-- explain_tests.rs # Explanation output regression tests
| | +-- ui_tests.rs # Snapshot tests using TestBackend
| |
| +-- snapshots/ # insta snapshot files
| +-- ...
|
+-- benches/
| +-- engine_bench.rs # Benchmarks for regex compilation + matching
| +-- render_bench.rs # Benchmarks for UI rendering
|
+-- assets/
+-- demo.gif # For README
+-- screenshots/ # For README
```
### 5.2 Separation of Concerns
```
+-------------------------------------------------------------------+
| main.rs |
| - Parse CLI args (clap) |
| - Initialize terminal (crossterm raw mode, alternate screen) |
| - Create App, run event loop |
| - Restore terminal on exit |
+-------------------------------------------------------------------+
|
v
+-------------------------------------------------------------------+
| app.rs |
| struct App { |
| mode: AppMode, // Normal, EditingRegex, EditingTest |
| regex_input: Editor, // Current regex pattern text |
| test_input: Editor, // Current test string text |
| engine: Box<dyn RegexEngine>, |
| matches: Vec<CaptureGroup>, // Latest match results |
| explanation: Vec<ExplainNode>, // Latest explanation |
| error: Option<String>, // Compilation error message |
| focus: PanelFocus, // Which panel has focus |
| } |
| |
| - Dispatches key events to the correct handler |
| - On regex/test string change: recompiles, re-matches, re-explains|
| - Owns all mutable state |
+-------------------------------------------------------------------+
|
+--------------------+--------------------+
| | |
v v v
+----------------+ +------------------+ +------------------+
| engine/ | | explain/ | | ui/ |
| | | | | |
| - Compiles | | - Parses regex | | - Reads App |
| pattern | | into AST | | state (immut) |
| - Runs matches | | - Walks AST | | - Builds Spans, |
| - Returns | | - Generates | | Lines, widgets |
| captures | | explanation | | - Renders into |
| - Reports | | tree | | Frame |
| errors | | | | |
+----------------+ +------------------+ +------------------+
```
**Data flow on every keystroke:**
```
User types char
|
v
Event loop receives KeyEvent
|
v
App.handle_key() updates Editor (regex_input or test_input)
|
v
App.recompute() is called:
|
+-- engine.compile(pattern, flags) -> Result<CompiledRegex, Error>
| |
| +-- On error: store error message, clear matches
| +-- On success: compiled.captures_all(test_string) -> matches
|
+-- explain::explain(pattern) -> explanation tree
|
v
Next frame: terminal.draw(|frame| ui::render(frame, &app))
|
+-- ui renders regex_input with syntax-highlighted pattern
+-- ui renders test_input with colored match spans
+-- ui renders explanation tree
+-- ui renders match table / capture groups
+-- ui renders error (if any) in status bar
```
### 5.3 Testing Strategies
**Unit Tests (per module, in-file):**
```rust
// engine/rust_regex.rs
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic_match() {
let engine = RustRegexEngine;
let compiled = engine.compile(r"\d+", &EngineFlags::default()).unwrap();
let matches = compiled.find_all("abc 123 def 456").unwrap();
assert_eq!(matches.len(), 2);
assert_eq!(matches[0].text, "123");
assert_eq!(matches[1].text, "456");
}
#[test]
fn test_invalid_pattern() {
let engine = RustRegexEngine;
assert!(engine.compile(r"(unclosed", &EngineFlags::default()).is_err());
}
}
```
**Explanation Regression Tests:**
```rust
// explain/mod.rs
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_explain_simple_pattern() {
let explanation = explain(r"\d{3}-\d{4}").unwrap();
// Use insta for snapshot testing
insta::assert_snapshot!(format_explanation(&explanation));
}
#[test]
fn test_explain_complex_groups() {
let explanation = explain(r"(?P<area>\d{3})-(?P<number>\d{4})").unwrap();
insta::assert_snapshot!(format_explanation(&explanation));
}
}
```
**UI Snapshot Tests with TestBackend:**
```rust
// tests/integration/ui_tests.rs
use ratatui::backend::TestBackend;
use ratatui::Terminal;
#[test]
fn test_main_layout_renders() {
let backend = TestBackend::new(80, 24);
let mut terminal = Terminal::new(backend).unwrap();
let app = App::new_with_test_data(
r"\d+",
"hello 123 world 456",
);
terminal.draw(|frame| {
ui::render(frame, &app);
}).unwrap();
// Snapshot test the rendered buffer
let buffer = terminal.backend().buffer().clone();
insta::assert_snapshot!(buffer_to_string(&buffer));
}
```
**Cross-Engine Consistency Tests:**
```rust
// tests/integration/engine_tests.rs
#[test]
fn test_engines_agree_on_basic_patterns() {
let patterns = vec![
(r"\d+", "abc 123"),
(r"[a-z]+", "Hello World"),
(r"(\w+)\s(\w+)", "hello world"),
];
let engines: Vec<Box<dyn RegexEngine>> = vec![
Box::new(RustRegexEngine),
Box::new(FancyRegexEngine),
#[cfg(feature = "pcre2-engine")]
Box::new(Pcre2Engine),
];
for (pattern, text) in patterns {
let results: Vec<_> = engines.iter()
.map(|e| e.compile(pattern, &EngineFlags::default())
.unwrap()
.find_all(text)
.unwrap())
.collect();
// All engines should produce the same matches for basic patterns
for i in 1..results.len() {
assert_eq!(
results[0].len(), results[i].len(),
"Engine mismatch on pattern '{}': {} vs {}",
pattern, engines[0].name(), engines[i].name()
);
}
}
}
```
**Testing tools:**
- `insta` -- snapshot testing (https://crates.io/crates/insta)
- `ratatui::backend::TestBackend` -- headless terminal backend for unit tests
- `ratatui-testlib` -- PTY-based integration testing for end-to-end TUI tests
- `proptest` / `quickcheck` -- property-based testing for regex engine edge cases
---
## 6. Dependency Summary
### 6.1 Cargo.toml
```toml
[package]
name = "regex-tui"
version = "0.1.0"
edition = "2021"
description = "regex101 for the terminal - an interactive regex debugger TUI"
license = "MIT OR Apache-2.0"
[features]
default = ["pcre2-engine"]
pcre2-engine = ["dep:pcre2"]
[dependencies]
# TUI framework
ratatui = { version = "0.30", features = ["all-widgets"] }
crossterm = { version = "0.28", features = ["event-stream"] }
# Async runtime (for non-blocking input and background regex work)
tokio = { version = "1", features = ["full"] }
futures = "0.3"
# Regex engines
regex = "1"
regex-syntax = "0.8" # AST parsing for explanations
fancy-regex = "0.14" # Lookaround + backreference support (pure Rust)
pcre2 = { version = "0.2", optional = true } # Full PCRE2 (requires C lib)
# CLI and config
clap = { version = "4", features = ["derive"] }
serde = { version = "1", features = ["derive"] }
toml = "0.8" # Config file format
directories = "5" # XDG/platform config dirs
# Error handling
anyhow = "1"
thiserror = "2"
# Logging (debug builds)
tracing = "0.1"
tracing-subscriber = "0.3"
[dev-dependencies]
insta = { version = "1", features = ["yaml"] }
proptest = "1"
```
### 6.2 Version Summary
| `ratatui` | 0.30.0 | TUI framework | None |
| `crossterm` | 0.28.x | Terminal backend | None |
| `regex` | 1.x | Rust regex engine | None |
| `regex-syntax` | 0.8.9 | Regex AST parsing | None |
| `fancy-regex` | 0.14.x | Extended regex (lookaround, backrefs) | None |
| `pcre2` | 0.2.11 | Full PCRE2 engine | libpcre2 (bundled option) |
| `clap` | 4.x | CLI argument parsing | None |
---
## 7. Implementation Roadmap
### Phase 1: Core MVP (Weeks 1-4)
- [ ] Project scaffold with module structure
- [ ] Basic TUI shell: regex input + test string input + results pane
- [ ] Rust `regex` engine integration with match highlighting
- [ ] Real-time recompilation on every keystroke
- [ ] Basic error display for invalid patterns
- [ ] Capture group coloring (distinct colors per group)
### Phase 2: Explanation Engine (Weeks 5-7)
- [ ] AST Visitor implementation covering all 12 node types
- [ ] Plain-English explanation panel
- [ ] Tree-structured explanation with indentation
- [ ] Syntax highlighting of the regex pattern itself (color-coded tokens)
### Phase 3: Multi-Engine Support (Weeks 8-10)
- [ ] Engine abstraction trait
- [ ] `fancy-regex` engine integration
- [ ] PCRE2 engine integration (behind feature flag)
- [ ] Engine selector tabs in UI
- [ ] Engine comparison mode (show where engines differ)
### Phase 4: Polish (Weeks 11-14)
- [ ] Responsive layout for different terminal sizes
- [ ] Scrollable panels for long content
- [ ] Keybinding help overlay
- [ ] Configuration file support (default engine, theme, etc.)
- [ ] Clipboard integration (copy regex, copy matches)
- [ ] History of recent patterns
### Phase 5: Distribution (Weeks 15-16)
- [ ] CI/CD pipeline for cross-platform builds
- [ ] `cargo install` readiness
- [ ] Homebrew formula
- [ ] README with demo GIFs
- [ ] Release on crates.io
---
## Sources
- [Ratatui Documentation](https://ratatui.rs/)
- [Ratatui GitHub](https://github.com/ratatui/ratatui)
- [ratatui crate docs.rs](https://docs.rs/ratatui/latest/ratatui/)
- [ratatui Layout](https://ratatui.rs/concepts/layout/)
- [ratatui Rendering](https://ratatui.rs/concepts/rendering/)
- [ratatui Async Event Stream](https://ratatui.rs/tutorials/counter-async-app/async-event-stream/)
- [ratatui Snapshot Testing](https://ratatui.rs/recipes/testing/snapshots/)
- [ratatui v0.30.0 Highlights](https://ratatui.rs/highlights/v030/)
- [regex-syntax AST enum](https://docs.rs/regex-syntax/latest/regex_syntax/ast/enum.Ast.html)
- [regex-syntax ast module](https://docs.rs/regex-syntax/latest/regex_syntax/ast/index.html)
- [regex-syntax Visitor trait](https://docs.rs/regex-syntax/latest/regex_syntax/ast/trait.Visitor.html)
- [regex-syntax visit function](https://docs.rs/regex-syntax/latest/regex_syntax/ast/fn.visit.html)
- [pcre2 Rust crate](https://docs.rs/pcre2/latest/pcre2/)
- [pcre2 Regex API](https://docs.rs/pcre2/latest/pcre2/bytes/struct.Regex.html)
- [pcre2 RegexBuilder API](https://docs.rs/pcre2/latest/pcre2/bytes/struct.RegexBuilder.html)
- [rust-pcre2 GitHub](https://github.com/BurntSushi/rust-pcre2)
- [fancy-regex crate](https://docs.rs/fancy-regex/latest/fancy_regex/)
- [fancy-regex GitHub](https://github.com/fancy-regex/fancy-regex)
- [crossterm crate](https://docs.rs/crossterm/)
- [crossterm GitHub](https://github.com/crossterm-rs/crossterm)
- [Rust cross-compilation guide](https://rust-lang.github.io/rustup/cross-compilation.html)
- [cross tool](https://github.com/cross-rs/cross)
- [pcre2-sys static linking](https://github.com/BurntSushi/rust-pcre2/issues/7)
- [ratatui-testlib](https://docs.rs/ratatui-testlib)
- [PCRE2 syntax documentation](https://www.pcre.org/current/doc/html/pcre2syntax.html)