ansi_escape_sequences/lib.rs
1//! # ANSI Regex - Professional ANSI Escape Sequence Processing
2//!
3//! [](https://crates.io/crates/ansi-regex)
4//! [](https://docs.rs/ansi-regex)
5//! [](LICENSE)
6//!
7//! A high-performance, zero-allocation Rust library for detecting, matching, and processing
8//! ANSI escape sequences in terminal text. This crate provides comprehensive support for
9//! ANSI/VT100 terminal control sequences with optimized regex patterns and convenient APIs.
10//!
11//! ## 🚀 Features
12//!
13//! - **High Performance**: Pre-compiled regex patterns with zero-allocation static references
14//! - **Comprehensive Coverage**: Supports CSI, OSC, and other ANSI escape sequences
15//! - **Flexible API**: Multiple matching modes (global, first-only, owned)
16//! - **Zero Dependencies**: Only depends on the `regex` crate
17//! - **Memory Efficient**: Uses `LazyLock` for optimal memory usage
18//! - **Thread Safe**: All operations are thread-safe and can be used in concurrent environments
19//!
20//! ## 📖 ANSI Escape Sequences Overview
21//!
22//! ANSI escape sequences are special character sequences used to control terminal formatting,
23//! cursor positioning, and other terminal behaviors. They typically start with the ESC character
24//! (`\u{001B}` or `\x1B`) followed by specific control characters.
25//!
26//! ### Common ANSI Sequence Types:
27//!
28//! - **CSI (Control Sequence Introducer)**: `ESC[` followed by parameters and a final character
29//! - Example: `\u{001B}[31m` (red foreground color)
30//! - Example: `\u{001B}[2J` (clear screen)
31//! - **OSC (Operating System Command)**: `ESC]` followed by data and terminated by ST
32//! - Example: `\u{001B}]0;Window Title\u{0007}` (set window title)
33//! - **C1 Control Characters**: Direct 8-bit control characters like `\u{009B}` (CSI)
34//!
35//! ## 🔧 Quick Start
36//!
37//! ### Basic Usage
38//!
39//! ```rust
40//! use ansi_escape_sequences::{ansi_regex, strip_ansi, has_ansi};
41//!
42//! // Check if text contains ANSI sequences
43//! let colored_text = "\u{001B}[31mHello\u{001B}[0m World";
44//! assert!(has_ansi(colored_text));
45//!
46//! // Strip all ANSI sequences
47//! let clean_text = strip_ansi(colored_text);
48//! assert_eq!(clean_text, "Hello World");
49//!
50//! // Get a regex for custom processing
51//! let regex = ansi_regex(None);
52//! let matches: Vec<_> = regex.find_iter(colored_text).collect();
53//! assert_eq!(matches.len(), 2); // Found 2 ANSI sequences
54//! ```
55//!
56//! ### Advanced Configuration
57//!
58//! ```rust
59//! use ansi_escape_sequences::{ansi_regex, AnsiRegexOptions};
60//!
61//! // Match only the first ANSI sequence
62//! let options = AnsiRegexOptions::new().only_first();
63//! let regex = ansi_regex(Some(options));
64//!
65//! // Global matching (default behavior)
66//! let options = AnsiRegexOptions::new().global();
67//! let regex = ansi_regex(Some(options));
68//!
69//! // Find all ANSI sequences with detailed information
70//! let colored_text = "\u{001B}[31mHello\u{001B}[0m World";
71//! let sequences = AnsiRegexOptions::find_ansi_sequences(colored_text);
72//! for seq in sequences {
73//! println!("Found ANSI sequence: {:?} at position {}",
74//! seq.as_str(), seq.start());
75//! }
76//! ```
77//!
78//! ### Performance-Optimized Usage
79//!
80//! ```rust
81//! use ansi_escape_sequences::{ansi_regex, AnsiRegexOptions};
82//!
83//! // Use pre-compiled static regex for best performance
84//! let global_regex = ansi_regex(None); // For finding all matches
85//! let first_regex = ansi_regex(Some(AnsiRegexOptions::new().only_first())); // For first match only
86//!
87//! // Process large amounts of text efficiently
88//! let large_text = "\u{001B}[31mRed\u{001B}[0m text"; // Your large text with ANSI codes
89//! let clean_text = global_regex.replace_all(large_text, "");
90//! assert_eq!(clean_text, "Red text");
91//! ```
92//!
93//! ## 🎯 Use Cases
94//!
95//! - **Log Processing**: Clean ANSI codes from log files for storage or analysis
96//! - **Terminal Emulators**: Parse and process terminal control sequences
97//! - **Text Processing**: Extract plain text content from terminal-formatted strings
98//! - **CLI Tools**: Handle colored output in command-line applications
99//! - **Web Applications**: Convert terminal output for web display
100//! - **Testing**: Validate terminal output in automated tests
101//!
102//! ## ⚡ Performance Notes
103//!
104//! - Static regex compilation happens only once per pattern type
105//! - Zero-allocation operations when using static references
106//! - Optimized regex patterns for common ANSI sequence formats
107//! - Thread-safe operations suitable for concurrent processing
108//!
109//! ## 🔍 Supported ANSI Sequences
110//!
111//! This library recognizes and matches:
112//!
113//! - **CSI sequences**: `ESC[` + parameters + final byte
114//! - **OSC sequences**: `ESC]` + data + string terminator
115//! - **C1 control characters**: Direct 8-bit equivalents
116//! - **String terminators**: BEL (`\u{0007}`), ESC\ (`\u{001B}\u{005C}`), ST (`\u{009C}`)
117//!
118//! ## 📚 Examples
119
120#![deny(missing_docs)]
121#![warn(clippy::all)]
122#![warn(clippy::pedantic)]
123#![warn(clippy::nursery)]
124
125use regex::Regex;
126use std::sync::LazyLock;
127
128/// Comprehensive error types for ANSI regex operations
129///
130/// This enum represents all possible errors that can occur when working with ANSI regex patterns.
131/// Currently, the primary error case is regex compilation failure, but this enum is designed
132/// to be extensible for future error types.
133///
134/// # Examples
135///
136/// ```rust
137/// use ansi_escape_sequences::AnsiRegexError;
138///
139/// // Errors are typically handled internally, but can be exposed in custom implementations
140/// match some_regex_operation() {
141/// Ok(result) => println!("Success: {:?}", result),
142/// Err(AnsiRegexError::RegexCompilation(e)) => {
143/// eprintln!("Regex compilation failed: {}", e);
144/// }
145/// }
146/// # fn some_regex_operation() -> Result<(), AnsiRegexError> { Ok(()) }
147/// ```
148#[derive(Debug)]
149pub enum AnsiRegexError {
150 /// Failed to compile the internal regex pattern
151 ///
152 /// This error occurs when the underlying regex engine fails to compile the ANSI pattern.
153 /// This should be extremely rare in normal usage as the patterns are pre-validated.
154 RegexCompilation(regex::Error),
155}
156
157impl std::fmt::Display for AnsiRegexError {
158 fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
159 match self {
160 Self::RegexCompilation(err) => {
161 write!(f, "Failed to compile regex pattern: {err}")
162 }
163 }
164 }
165}
166
167impl std::error::Error for AnsiRegexError {
168 fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
169 match self {
170 Self::RegexCompilation(err) => Some(err),
171 }
172 }
173}
174
175impl From<regex::Error> for AnsiRegexError {
176 fn from(err: regex::Error) -> Self {
177 Self::RegexCompilation(err)
178 }
179}
180
181/// Configuration options for ANSI regex pattern matching behavior
182///
183/// This struct provides a builder-pattern API for configuring how ANSI escape sequences
184/// are matched and processed. It allows fine-tuning of the regex behavior for different
185/// use cases and performance requirements.
186///
187/// # Design Philosophy
188///
189/// The options are designed to be:
190/// - **Composable**: Chain method calls for readable configuration
191/// - **Immutable**: Each method returns a new instance, preventing accidental mutation
192/// - **Zero-cost**: Configuration happens at compile time where possible
193/// - **Explicit**: Clear method names that describe the intended behavior
194///
195/// # Examples
196///
197/// ## Basic Configuration
198///
199/// ```rust
200/// use ansi_escape_sequences::AnsiRegexOptions;
201///
202/// // Default configuration (matches all sequences globally)
203/// let default_opts = AnsiRegexOptions::new();
204///
205/// // Match only the first ANSI sequence found
206/// let first_only = AnsiRegexOptions::new().only_first();
207///
208/// // Explicitly configure for global matching (same as default)
209/// let global_opts = AnsiRegexOptions::new().global();
210/// ```
211///
212/// ## Method Chaining
213///
214/// ```rust
215/// use ansi_escape_sequences::AnsiRegexOptions;
216///
217/// // Configuration methods can be chained fluently
218/// let opts = AnsiRegexOptions::new()
219/// .only_first() // Find first match only
220/// .global(); // Override to global matching
221///
222/// // The last method in the chain takes precedence
223/// assert!(!opts.is_only_first());
224/// ```
225///
226/// ## Performance Considerations
227///
228/// ```rust
229/// use ansi_escape_sequences::AnsiRegexOptions;
230///
231/// // For processing large texts where you only need to know if ANSI codes exist
232/// let check_opts = AnsiRegexOptions::new().only_first();
233///
234/// // For complete processing where all sequences matter
235/// let process_opts = AnsiRegexOptions::new().global();
236/// ```
237#[derive(Debug, Clone, PartialEq, Eq)]
238pub struct AnsiRegexOptions {
239 only_first: bool,
240}
241
242impl AnsiRegexOptions {
243 /// Create new options with default values (global matching)
244 #[must_use]
245 pub const fn new() -> Self {
246 Self { only_first: false }
247 }
248
249 /// Configure to match only the first ANSI sequence encountered
250 ///
251 /// When this option is enabled, the regex will be optimized for finding just the first
252 /// ANSI escape sequence in the text. This is more efficient when you only need to detect
253 /// the presence of ANSI codes or process the first occurrence.
254 ///
255 /// # Performance Benefits
256 ///
257 /// - Stops searching after finding the first match
258 /// - Reduces memory allocation for match results
259 /// - Ideal for validation and detection use cases
260 ///
261 /// # Examples
262 ///
263 /// ```rust
264 /// use ansi_escape_sequences::{ansi_regex, AnsiRegexOptions};
265 ///
266 /// let text = "\u{001B}[31mRed\u{001B}[0m \u{001B}[32mGreen\u{001B}[0m";
267 /// let options = AnsiRegexOptions::new().only_first();
268 /// let regex = ansi_regex(Some(options));
269 ///
270 /// // Only finds the first sequence
271 /// let first_match = regex.find(text).unwrap();
272 /// assert_eq!(first_match.as_str(), "\u{001B}[31m");
273 /// ```
274 #[must_use]
275 pub const fn only_first(mut self) -> Self {
276 self.only_first = true;
277 self
278 }
279
280 /// Configure to match all ANSI sequences globally (default behavior)
281 ///
282 /// This is the default matching mode that finds all ANSI escape sequences in the text.
283 /// Use this when you need to process, count, or extract all ANSI codes from the input.
284 ///
285 /// # Use Cases
286 ///
287 /// - Complete ANSI code removal
288 /// - Counting total sequences
289 /// - Processing all formatting information
290 /// - Converting formatted text to other formats
291 ///
292 /// # Examples
293 ///
294 /// ```rust
295 /// use ansi_escape_sequences::{ansi_regex, AnsiRegexOptions};
296 ///
297 /// let text = "\u{001B}[31mRed\u{001B}[0m \u{001B}[32mGreen\u{001B}[0m";
298 /// let options = AnsiRegexOptions::new().global();
299 /// let regex = ansi_regex(Some(options));
300 ///
301 /// // Finds all sequences
302 /// let matches: Vec<_> = regex.find_iter(text).collect();
303 /// assert_eq!(matches.len(), 4); // 2 color codes + 2 reset codes
304 /// ```
305 #[must_use]
306 pub const fn global(mut self) -> Self {
307 self.only_first = false;
308 self
309 }
310
311 /// Check if configured for first-only matching
312 ///
313 /// Returns `true` if the options are configured to match only the first ANSI sequence,
314 /// `false` if configured for global matching.
315 ///
316 /// # Examples
317 ///
318 /// ```rust
319 /// use ansi_escape_sequences::AnsiRegexOptions;
320 ///
321 /// let first_only = AnsiRegexOptions::new().only_first();
322 /// assert!(first_only.is_only_first());
323 ///
324 /// let global = AnsiRegexOptions::new().global();
325 /// assert!(!global.is_only_first());
326 /// ```
327 #[must_use]
328 pub const fn is_only_first(&self) -> bool {
329 self.only_first
330 }
331
332 /// Remove all ANSI escape sequences from a string
333 ///
334 /// This is a high-performance utility function that strips all ANSI escape sequences
335 /// from the input text, returning clean, plain text suitable for storage, logging,
336 /// or display in non-terminal environments.
337 ///
338 /// # Performance Characteristics
339 ///
340 /// - Uses pre-compiled static regex for optimal performance
341 /// - Zero-allocation when no ANSI sequences are found
342 /// - Efficient string replacement using regex engine optimizations
343 ///
344 /// # Examples
345 ///
346 /// ## Basic Usage
347 ///
348 /// ```rust
349 /// use ansi_escape_sequences::strip_ansi;
350 ///
351 /// let colored = "\u{001B}[31mError:\u{001B}[0m Something went wrong";
352 /// let clean = strip_ansi(colored);
353 /// assert_eq!(clean, "Error: Something went wrong");
354 /// ```
355 ///
356 /// ## Complex ANSI Sequences
357 ///
358 /// ```rust
359 /// use ansi_escape_sequences::strip_ansi;
360 ///
361 /// let complex = "\u{001B}[1;31;4mBold Red Underlined\u{001B}[0m Normal";
362 /// let clean = strip_ansi(complex);
363 /// assert_eq!(clean, "Bold Red Underlined Normal");
364 /// ```
365 ///
366 /// ## Log Processing
367 ///
368 /// ```rust
369 /// use ansi_escape_sequences::strip_ansi;
370 ///
371 /// let log_line = "[INFO] \u{001B}[32m✓\u{001B}[0m Operation completed successfully";
372 /// let clean_log = strip_ansi(log_line);
373 /// assert_eq!(clean_log, "[INFO] ✓ Operation completed successfully");
374 /// ```
375 #[must_use]
376 pub(crate) fn strip_ansi(text: &str) -> String {
377 ansi_regex_global().replace_all(text, "").into_owned()
378 }
379
380 /// Check if a string contains any ANSI escape sequences
381 ///
382 /// This function provides a fast way to detect the presence of ANSI escape sequences
383 /// without extracting or processing them. It's optimized for validation and filtering
384 /// use cases where you only need a boolean result.
385 ///
386 /// # Performance Benefits
387 ///
388 /// - Early termination on first match found
389 /// - No memory allocation for match results
390 /// - Optimized regex matching for existence check
391 ///
392 /// # Examples
393 ///
394 /// ## Basic Detection
395 ///
396 /// ```rust
397 /// use ansi_escape_sequences::has_ansi;
398 ///
399 /// assert!(has_ansi("\u{001B}[31mcolored\u{001B}[0m"));
400 /// assert!(!has_ansi("plain text"));
401 /// ```
402 ///
403 /// ## Filtering Content
404 ///
405 /// ```rust
406 /// use ansi_escape_sequences::has_ansi;
407 ///
408 /// let messages = vec![
409 /// "Plain message",
410 /// "\u{001B}[32mSuccess!\u{001B}[0m",
411 /// "Another plain message",
412 /// ];
413 ///
414 /// let colored_messages: Vec<_> = messages
415 /// .iter()
416 /// .filter(|msg| has_ansi(msg))
417 /// .collect();
418 ///
419 /// assert_eq!(colored_messages.len(), 1);
420 /// ```
421 ///
422 /// ## Conditional Processing
423 ///
424 /// ```rust
425 /// use ansi_escape_sequences::{has_ansi, strip_ansi};
426 ///
427 /// fn process_text(text: &str) -> String {
428 /// if has_ansi(text) {
429 /// // Process colored text
430 /// strip_ansi(text)
431 /// } else {
432 /// // Pass through plain text unchanged
433 /// text.to_string()
434 /// }
435 /// }
436 /// ```
437 #[must_use]
438 pub(crate) fn has_ansi(text: &str) -> bool {
439 ansi_regex_global().is_match(text)
440 }
441
442 /// Find and extract all ANSI escape sequences from a string
443 ///
444 /// This function returns detailed information about every ANSI escape sequence found
445 /// in the input text, including their positions and content. This is useful for
446 /// analysis, conversion, or detailed processing of terminal-formatted text.
447 ///
448 /// # Return Value
449 ///
450 /// Returns a `Vec<regex::Match>` where each `Match` contains:
451 /// - The matched ANSI sequence as a string slice
452 /// - Start and end positions in the original text
453 /// - Methods for extracting the sequence content
454 ///
455 /// # Examples
456 ///
457 /// ## Basic Sequence Extraction
458 ///
459 /// ```rust
460 /// use ansi_escape_sequences::AnsiRegexOptions;
461 ///
462 /// let text = "\u{001B}[31mRed\u{001B}[0m Normal \u{001B}[32mGreen\u{001B}[0m";
463 /// let sequences = AnsiRegexOptions::find_ansi_sequences(text);
464 ///
465 /// assert_eq!(sequences.len(), 4);
466 /// assert_eq!(sequences[0].as_str(), "\u{001B}[31m"); // Red color
467 /// assert_eq!(sequences[1].as_str(), "\u{001B}[0m"); // Reset
468 /// assert_eq!(sequences[2].as_str(), "\u{001B}[32m"); // Green color
469 /// assert_eq!(sequences[3].as_str(), "\u{001B}[0m"); // Reset
470 /// ```
471 ///
472 /// ## Position Analysis
473 ///
474 /// ```rust
475 /// use ansi_escape_sequences::AnsiRegexOptions;
476 ///
477 /// let text = "Hello \u{001B}[31mWorld\u{001B}[0m!";
478 /// let sequences = AnsiRegexOptions::find_ansi_sequences(text);
479 ///
480 /// for (i, seq) in sequences.iter().enumerate() {
481 /// println!("Sequence {}: '{}' at position {}-{}",
482 /// i, seq.as_str(), seq.start(), seq.end());
483 /// }
484 /// ```
485 ///
486 /// ## Content Reconstruction
487 ///
488 /// ```rust
489 /// use ansi_escape_sequences::AnsiRegexOptions;
490 ///
491 /// let original = "\u{001B}[1mBold\u{001B}[0m text";
492 /// let sequences = AnsiRegexOptions::find_ansi_sequences(original);
493 ///
494 /// // Reconstruct with different formatting
495 /// let mut result = original.to_string();
496 /// for seq in sequences.iter().rev() { // Reverse to maintain positions
497 /// let replacement = format!("<ansi:{}>", seq.as_str().len());
498 /// result.replace_range(seq.range(), &replacement);
499 /// }
500 /// ```
501 #[must_use]
502 pub fn find_ansi_sequences(text: &str) -> Vec<regex::Match> {
503 ansi_regex_global().find_iter(text).collect()
504 }
505}
506
507impl Default for AnsiRegexOptions {
508 fn default() -> Self {
509 Self::new()
510 }
511}
512
513/// Create a high-performance regex pattern for matching ANSI escape sequences
514///
515/// This is the primary entry point for the library, providing access to pre-compiled,
516/// optimized regex patterns for ANSI escape sequence detection and processing. The function
517/// returns static references to cached regex instances for maximum performance.
518///
519/// # Design Philosophy
520///
521/// - **Zero-allocation**: Returns references to static, pre-compiled regex patterns
522/// - **Performance-first**: Optimized patterns with minimal backtracking
523/// - **Flexible**: Supports both global and first-match-only modes
524/// - **Thread-safe**: All returned regex instances are safe for concurrent use
525///
526/// # Arguments
527///
528/// * `options` - Configuration options for regex behavior. Pass `None` for default global matching,
529/// or `Some(AnsiRegexOptions)` for custom configuration.
530///
531/// # Returns
532///
533/// A reference to a pre-compiled `Regex` that matches ANSI escape sequences according to
534/// the specified options. The regex lifetime is `'static`, making it suitable for storage
535/// in static variables or long-lived data structures.
536///
537/// # Performance Characteristics
538///
539/// - **Compilation**: Happens once per pattern type using `LazyLock`
540/// - **Memory**: Minimal overhead with shared static instances
541/// - **Matching**: Optimized for common ANSI sequence patterns
542/// - **Concurrency**: Thread-safe with no synchronization overhead
543///
544/// # Examples
545///
546/// ## Basic Usage
547///
548/// ```rust
549/// use ansi_escape_sequences::ansi_regex;
550///
551/// // Default configuration - matches all ANSI sequences
552/// let regex = ansi_regex(None);
553///
554/// let text = "\u{001B}[31mRed text\u{001B}[0m and \u{001B}[32mgreen text\u{001B}[0m";
555/// assert!(regex.is_match(text));
556///
557/// // Count all ANSI sequences
558/// let count = regex.find_iter(text).count();
559/// assert_eq!(count, 4); // 2 color codes + 2 reset codes
560/// ```
561///
562/// ## Configuration Options
563///
564/// ```rust
565/// use ansi_escape_sequences::{ansi_regex, AnsiRegexOptions};
566///
567/// // Match only the first ANSI sequence (more efficient for detection)
568/// let first_only = AnsiRegexOptions::new().only_first();
569/// let regex = ansi_regex(Some(first_only));
570///
571/// let text = "\u{001B}[31mRed\u{001B}[0m \u{001B}[32mGreen\u{001B}[0m";
572/// let first_match = regex.find(text).unwrap();
573/// assert_eq!(first_match.as_str(), "\u{001B}[31m");
574///
575/// // Global matching (explicit, same as default)
576/// let global = AnsiRegexOptions::new().global();
577/// let regex = ansi_regex(Some(global));
578/// let all_matches: Vec<_> = regex.find_iter(text).collect();
579/// assert_eq!(all_matches.len(), 4);
580/// ```
581///
582/// ## Advanced Processing
583///
584/// ```rust
585/// use ansi_escape_sequences::ansi_regex;
586///
587/// let regex = ansi_regex(None);
588/// let input = "Normal \u{001B}[1;31mBold Red\u{001B}[0m Normal";
589///
590/// // Extract plain text by removing ANSI codes
591/// let clean_text = regex.replace_all(input, "");
592/// assert_eq!(clean_text, "Normal Bold Red Normal");
593///
594/// // Replace ANSI codes with HTML tags
595/// let html = regex.replace_all(input, |caps: ®ex::Captures| {
596/// match caps.get(0).unwrap().as_str() {
597/// "\u{001B}[1;31m" => "<span style='color: red; font-weight: bold'>",
598/// "\u{001B}[0m" => "</span>",
599/// _ => "",
600/// }
601/// });
602/// ```
603///
604/// ## Performance Optimization
605///
606/// ```rust
607/// use ansi_escape_sequences::{ansi_regex, AnsiRegexOptions};
608///
609/// // For validation/detection only - use first-match mode
610/// let detector = ansi_regex(Some(AnsiRegexOptions::new().only_first()));
611/// let text_with_ansi = "\u{001B}[31mRed\u{001B}[0m text";
612/// let plain_text = "Plain text";
613///
614/// // Check for ANSI codes efficiently
615/// assert!(detector.is_match(text_with_ansi)); // Stops at first match
616/// assert!(!detector.is_match(plain_text));
617///
618/// // For complete processing - use global mode
619/// let processor = ansi_regex(None);
620///
621/// // Strip all ANSI codes
622/// let clean_text = processor.replace_all(text_with_ansi, "").into_owned();
623/// assert_eq!(clean_text, "Red text");
624/// ```
625///
626/// ## Static Storage
627///
628/// ```rust
629/// use ansi_escape_sequences::ansi_regex;
630/// use std::sync::LazyLock;
631///
632/// // Store regex in static variable for application-wide use
633/// static ANSI_DETECTOR: LazyLock<&'static regex::Regex> = LazyLock::new(|| {
634/// ansi_regex(None)
635/// });
636///
637/// fn process_log_line(line: &str) -> String {
638/// ANSI_DETECTOR.replace_all(line, "").into_owned()
639/// }
640/// ```
641#[must_use]
642pub fn ansi_regex(options: Option<AnsiRegexOptions>) -> &'static Regex {
643 let opts = options.unwrap_or_default();
644
645 if opts.is_only_first() {
646 ansi_regex_first()
647 } else {
648 ansi_regex_global()
649 }
650}
651
652/// Internal ANSI escape sequence pattern constants and utilities
653///
654/// This module contains the low-level pattern definitions used to construct the ANSI regex.
655/// These patterns are based on the ANSI X3.64 standard and VT100/VT220 terminal specifications.
656///
657/// # Pattern Design
658///
659/// The patterns are designed to be:
660/// - **Comprehensive**: Cover all common ANSI escape sequence types
661/// - **Efficient**: Minimize regex backtracking and false positives
662/// - **Standards-compliant**: Follow ANSI/ISO terminal control standards
663/// - **Unicode-aware**: Handle both 7-bit and 8-bit control sequences
664///
665/// # Technical Details
666///
667/// ANSI escape sequences follow these general patterns:
668/// - **7-bit sequences**: Start with ESC (0x1B) followed by specific characters
669/// - **8-bit sequences**: Use C1 control characters directly (0x80-0x9F range)
670/// - **Parameter sequences**: Include numeric parameters separated by semicolons
671/// - **String sequences**: Contain arbitrary data terminated by specific sequences
672mod patterns {
673 /// Valid string terminator sequences for OSC and similar commands
674 ///
675 /// These terminators are used to end string-based ANSI sequences like OSC (Operating System Command).
676 /// The pattern matches any of the three standard terminators:
677 ///
678 /// - **BEL** (`\u{0007}`): Bell character, traditional terminator
679 /// - **ESC\\** (`\u{001B}\u{005C}`): ESC followed by backslash, standard terminator
680 /// - **ST** (`\u{009C}`): String Terminator C1 control character
681 ///
682 /// # Examples of Usage
683 ///
684 /// ```text
685 /// \u{001B}]0;Window Title\u{0007} // OSC sequence with BEL terminator
686 /// \u{001B}]0;Window Title\u{001B}\u{005C} // OSC sequence with ESC\ terminator
687 /// \u{001B}]0;Window Title\u{009C} // OSC sequence with ST terminator
688 /// ```
689 pub const STRING_TERMINATORS: &str = r"(?:\u{0007}|\u{001B}\u{005C}|\u{009C})";
690
691 /// ESC (Escape) character - the foundation of 7-bit ANSI sequences
692 ///
693 /// The ESC character (`\u{001B}` or decimal 27) introduces most ANSI escape sequences.
694 /// It's followed by specific characters to form different types of control sequences:
695 ///
696 /// - **CSI**: ESC + `[` → Control Sequence Introducer
697 /// - **OSC**: ESC + `]` → Operating System Command
698 /// - **DCS**: ESC + `P` → Device Control String
699 /// - **APC**: ESC + `_` → Application Program Command
700 ///
701 /// # Historical Context
702 ///
703 /// The ESC character was chosen because it's outside the printable ASCII range
704 /// and unlikely to appear in normal text, making it safe for control purposes.
705 pub const ESC: &str = r"\u{001B}";
706
707 /// C1 CSI (Control Sequence Introducer) - 8-bit equivalent of ESC[
708 ///
709 /// The C1 CSI character (`\u{009B}` or decimal 155) is the 8-bit equivalent
710 /// of the two-character sequence ESC[. It directly introduces control sequences
711 /// without needing the ESC prefix.
712 ///
713 /// # Usage Context
714 ///
715 /// - **7-bit mode**: Uses ESC[ (two characters)
716 /// - **8-bit mode**: Uses CSI directly (one character)
717 /// - **Compatibility**: Modern terminals support both forms
718 ///
719 /// # Examples
720 ///
721 /// ```text
722 /// \u{001B}[31m // 7-bit: ESC[ followed by parameters and final byte
723 /// \u{009B}31m // 8-bit: CSI directly followed by parameters and final byte
724 /// ```
725 pub const C1_CSI: &str = r"\u{009B}";
726}
727
728// Pre-compiled regex patterns for common use cases
729static ANSI_REGEX_GLOBAL: LazyLock<Regex> = LazyLock::new(|| {
730 Regex::new(&build_ansi_pattern()).expect("Failed to compile global ANSI regex")
731});
732
733static ANSI_REGEX_FIRST: LazyLock<Regex> = LazyLock::new(|| {
734 // For first-only matching, we use the same pattern but the caller will use find() instead of find_iter()
735 // The regex itself doesn't need to be different - the behavior difference is in how it's used
736 Regex::new(&build_ansi_pattern()).expect("Failed to compile first-only ANSI regex")
737});
738
739/// Build the comprehensive ANSI escape sequence regex pattern
740///
741/// This function constructs the complete regex pattern that matches all supported ANSI escape
742/// sequences. The pattern is optimized for performance while maintaining comprehensive coverage
743/// of ANSI/VT terminal control sequences.
744///
745/// # Pattern Architecture
746///
747/// The regex combines two main pattern types using alternation (`|`):
748///
749/// 1. **OSC (Operating System Command) sequences**: `ESC]...ST`
750/// 2. **CSI (Control Sequence Introducer) sequences**: `ESC[` or C1 CSI + parameters + final byte
751///
752/// # Technical Implementation
753///
754/// ## OSC Pattern: `(?:ESC\][\s\S]*?STRING_TERMINATORS)`
755///
756/// - Matches ESC followed by `]` (OSC introducer)
757/// - `[\s\S]*?` matches any character (including newlines) non-greedily
758/// - Terminated by any valid string terminator (BEL, ESC\, or ST)
759/// - Non-greedy matching prevents over-consumption across multiple sequences
760///
761/// ## CSI Pattern: `[ESC[C1_CSI][\[\]()#;?]*(?:\d{1,4}(?:[;:]\d{0,4}})*)?[A-PR-TZcf-nq-uy=><~]`
762///
763/// - `[ESC[C1_CSI]` matches either ESC[ or direct C1 CSI character
764/// - `[\[\]()#;?]*` matches optional intermediate characters
765/// - `(?:\d{1,4}(?:[;:]\d{0,4}})*)?` matches optional numeric parameters
766/// - `[A-PR-TZcf-nq-uy=><~]` matches the final command character
767///
768/// # Performance Optimizations
769///
770/// - **Character classes**: Use efficient character class matching where possible
771/// - **Non-greedy quantifiers**: Prevent excessive backtracking
772/// - **Anchored alternation**: Order patterns by frequency for faster matching
773/// - **Bounded repetition**: Limit parameter lengths to prevent `ReDoS` attacks
774///
775/// # Standards Compliance
776///
777/// The pattern follows these terminal standards:
778/// - **ANSI X3.64**: Core escape sequence definitions
779/// - **ISO/IEC 6429**: International standard for control functions
780/// - **VT100/VT220**: DEC terminal compatibility
781/// - **xterm**: Extended sequence support
782///
783/// # Examples of Matched Sequences
784///
785/// ```text
786/// \u{001B}[31m // CSI: Set foreground color to red
787/// \u{001B}[2J // CSI: Clear entire screen
788/// \u{001B}[1;1H // CSI: Move cursor to position 1,1
789/// \u{001B}]0;Title\u{0007} // OSC: Set window title
790/// \u{009B}31m // C1 CSI: Set foreground color (8-bit)
791/// ```
792fn build_ansi_pattern() -> String {
793 use patterns::{C1_CSI, ESC, STRING_TERMINATORS};
794
795 // OSC sequences: ESC ] ... ST (non-greedy until first ST)
796 // Matches operating system commands like window title setting
797 let osc = format!(r"(?:{ESC}\][\s\S]*?{STRING_TERMINATORS})");
798
799 // CSI sequences: ESC[/C1, optional intermediates, params, final byte
800 // Matches control sequence introducers for cursor movement, colors, etc.
801 let csi = format!(
802 r"[{ESC}{C1_CSI}][\[\]()#;?]*(?:\d{{1,4}}(?:[;:]\d{{0,4}})*)?[A-PR-TZcf-nq-uy=><~]"
803 );
804
805 // Combine patterns with alternation, OSC first as it's typically longer
806 format!("{osc}|{csi}")
807}
808
809/// Get a pre-compiled global ANSI regex (matches all occurrences)
810///
811/// This is equivalent to calling `ansi_regex(None)` but more explicit about the behavior.
812/// Use this when you want to find all ANSI sequences in a string.
813#[must_use]
814fn ansi_regex_global() -> &'static Regex {
815 &ANSI_REGEX_GLOBAL
816}
817
818/// Get a pre-compiled ANSI regex for first-match usage
819///
820/// Note: This returns the same regex as `ansi_regex_global()`. The "first only" behavior
821/// is achieved by using `find()` instead of `find_iter()` on the regex.
822/// This function exists for API consistency and clarity of intent.
823#[must_use]
824fn ansi_regex_first() -> &'static Regex {
825 &ANSI_REGEX_FIRST
826}
827
828/// Create an owned copy of the ANSI regex
829///
830/// Use this when you need ownership of the regex rather than a reference.
831/// This is less efficient than using the static references but provides more flexibility.
832///
833/// # Arguments
834///
835/// * `options` - Configuration options for the regex pattern
836///
837/// # Returns
838///
839/// An owned regex that matches ANSI escape sequences
840///
841/// # Examples
842///
843/// ```
844/// use ansi_escape_sequences::{ansi_regex_owned, AnsiRegexOptions};
845///
846/// let regex = ansi_regex_owned(None);
847/// assert!(regex.is_match("\u{001B}[31m"));
848/// ```
849#[must_use]
850pub fn ansi_regex_owned(options: Option<AnsiRegexOptions>) -> Regex {
851 ansi_regex(options).clone()
852}
853
854/// Remove all ANSI sequences from a string
855///
856/// This is a convenience function equivalent to `AnsiRegexOptions::strip_ansi()`.
857///
858/// # Arguments
859///
860/// * `text` - The text to strip ANSI sequences from
861///
862/// # Returns
863///
864/// A new string with all ANSI sequences removed
865///
866/// # Examples
867///
868/// ```
869/// use ansi_escape_sequences::strip_ansi;
870///
871/// let text = "\u{001B}[31mred\u{001B}[0m text";
872/// let clean = strip_ansi(text);
873/// assert_eq!(clean, "red text");
874/// ```
875#[must_use]
876pub fn strip_ansi(text: &str) -> String {
877 AnsiRegexOptions::strip_ansi(text)
878}
879
880/// Check if a string contains any ANSI sequences
881///
882/// This is a convenience function equivalent to `AnsiRegexOptions::has_ansi()`.
883///
884/// # Arguments
885///
886/// * `text` - The text to check for ANSI sequences
887///
888/// # Returns
889///
890/// `true` if the text contains ANSI sequences, `false` otherwise
891///
892/// # Examples
893///
894/// ```
895/// use ansi_escape_sequences::has_ansi;
896///
897/// assert!(has_ansi("\u{001B}[31mred\u{001B}[0m"));
898/// assert!(!has_ansi("plain text"));
899/// ```
900#[must_use]
901pub fn has_ansi(text: &str) -> bool {
902 AnsiRegexOptions::has_ansi(text)
903}
904
905#[cfg(test)]
906mod tests;