ansi_escape_sequences/
lib.rs

1//! # ANSI Regex - Professional ANSI Escape Sequence Processing
2//!
3//! [![Crates.io](https://img.shields.io/crates/v/ansi-regex.svg)](https://crates.io/crates/ansi-regex)
4//! [![Documentation](https://docs.rs/ansi-regex/badge.svg)](https://docs.rs/ansi-regex)
5//! [![License](https://img.shields.io/crates/l/ansi-regex.svg)](LICENSE)
6//!
7//! A high-performance, zero-allocation Rust library for detecting, matching, and processing
8//! ANSI escape sequences in terminal text. This crate provides comprehensive support for
9//! ANSI/VT100 terminal control sequences with optimized regex patterns and convenient APIs.
10//!
11//! ## 🚀 Features
12//!
13//! - **High Performance**: Pre-compiled regex patterns with zero-allocation static references
14//! - **Comprehensive Coverage**: Supports CSI, OSC, and other ANSI escape sequences
15//! - **Flexible API**: Multiple matching modes (global, first-only, owned)
16//! - **Zero Dependencies**: Only depends on the `regex` crate
17//! - **Memory Efficient**: Uses `LazyLock` for optimal memory usage
18//! - **Thread Safe**: All operations are thread-safe and can be used in concurrent environments
19//!
20//! ## 📖 ANSI Escape Sequences Overview
21//!
22//! ANSI escape sequences are special character sequences used to control terminal formatting,
23//! cursor positioning, and other terminal behaviors. They typically start with the ESC character
24//! (`\u{001B}` or `\x1B`) followed by specific control characters.
25//!
26//! ### Common ANSI Sequence Types:
27//!
28//! - **CSI (Control Sequence Introducer)**: `ESC[` followed by parameters and a final character
29//!   - Example: `\u{001B}[31m` (red foreground color)
30//!   - Example: `\u{001B}[2J` (clear screen)
31//! - **OSC (Operating System Command)**: `ESC]` followed by data and terminated by ST
32//!   - Example: `\u{001B}]0;Window Title\u{0007}` (set window title)
33//! - **C1 Control Characters**: Direct 8-bit control characters like `\u{009B}` (CSI)
34//!
35//! ## 🔧 Quick Start
36//!
37//! ### Basic Usage
38//!
39//! ```rust
40//! use ansi_escape_sequences::{ansi_regex, strip_ansi, has_ansi};
41//!
42//! // Check if text contains ANSI sequences
43//! let colored_text = "\u{001B}[31mHello\u{001B}[0m World";
44//! assert!(has_ansi(colored_text));
45//!
46//! // Strip all ANSI sequences
47//! let clean_text = strip_ansi(colored_text);
48//! assert_eq!(clean_text, "Hello World");
49//!
50//! // Get a regex for custom processing
51//! let regex = ansi_regex(None);
52//! let matches: Vec<_> = regex.find_iter(colored_text).collect();
53//! assert_eq!(matches.len(), 2); // Found 2 ANSI sequences
54//! ```
55//!
56//! ### Advanced Configuration
57//!
58//! ```rust
59//! use ansi_escape_sequences::{ansi_regex, AnsiRegexOptions};
60//!
61//! // Match only the first ANSI sequence
62//! let options = AnsiRegexOptions::new().only_first();
63//! let regex = ansi_regex(Some(options));
64//!
65//! // Global matching (default behavior)
66//! let options = AnsiRegexOptions::new().global();
67//! let regex = ansi_regex(Some(options));
68//!
69//! // Find all ANSI sequences with detailed information
70//! let colored_text = "\u{001B}[31mHello\u{001B}[0m World";
71//! let sequences = AnsiRegexOptions::find_ansi_sequences(colored_text);
72//! for seq in sequences {
73//!     println!("Found ANSI sequence: {:?} at position {}",
74//!              seq.as_str(), seq.start());
75//! }
76//! ```
77//!
78//! ### Performance-Optimized Usage
79//!
80//! ```rust
81//! use ansi_escape_sequences::{ansi_regex, AnsiRegexOptions};
82//!
83//! // Use pre-compiled static regex for best performance
84//! let global_regex = ansi_regex(None); // For finding all matches
85//! let first_regex = ansi_regex(Some(AnsiRegexOptions::new().only_first())); // For first match only
86//!
87//! // Process large amounts of text efficiently
88//! let large_text = "\u{001B}[31mRed\u{001B}[0m text"; // Your large text with ANSI codes
89//! let clean_text = global_regex.replace_all(large_text, "");
90//! assert_eq!(clean_text, "Red text");
91//! ```
92//!
93//! ## 🎯 Use Cases
94//!
95//! - **Log Processing**: Clean ANSI codes from log files for storage or analysis
96//! - **Terminal Emulators**: Parse and process terminal control sequences
97//! - **Text Processing**: Extract plain text content from terminal-formatted strings
98//! - **CLI Tools**: Handle colored output in command-line applications
99//! - **Web Applications**: Convert terminal output for web display
100//! - **Testing**: Validate terminal output in automated tests
101//!
102//! ## ⚡ Performance Notes
103//!
104//! - Static regex compilation happens only once per pattern type
105//! - Zero-allocation operations when using static references
106//! - Optimized regex patterns for common ANSI sequence formats
107//! - Thread-safe operations suitable for concurrent processing
108//!
109//! ## 🔍 Supported ANSI Sequences
110//!
111//! This library recognizes and matches:
112//!
113//! - **CSI sequences**: `ESC[` + parameters + final byte
114//! - **OSC sequences**: `ESC]` + data + string terminator
115//! - **C1 control characters**: Direct 8-bit equivalents
116//! - **String terminators**: BEL (`\u{0007}`), ESC\ (`\u{001B}\u{005C}`), ST (`\u{009C}`)
117//!
118//! ## 📚 Examples
119
120#![deny(missing_docs)]
121#![warn(clippy::all)]
122#![warn(clippy::pedantic)]
123#![warn(clippy::nursery)]
124
125use regex::Regex;
126use std::sync::LazyLock;
127
128/// Comprehensive error types for ANSI regex operations
129///
130/// This enum represents all possible errors that can occur when working with ANSI regex patterns.
131/// Currently, the primary error case is regex compilation failure, but this enum is designed
132/// to be extensible for future error types.
133///
134/// # Examples
135///
136/// ```rust
137/// use ansi_escape_sequences::AnsiRegexError;
138///
139/// // Errors are typically handled internally, but can be exposed in custom implementations
140/// match some_regex_operation() {
141///     Ok(result) => println!("Success: {:?}", result),
142///     Err(AnsiRegexError::RegexCompilation(e)) => {
143///         eprintln!("Regex compilation failed: {}", e);
144///     }
145/// }
146/// # fn some_regex_operation() -> Result<(), AnsiRegexError> { Ok(()) }
147/// ```
148#[derive(Debug)]
149pub enum AnsiRegexError {
150    /// Failed to compile the internal regex pattern
151    ///
152    /// This error occurs when the underlying regex engine fails to compile the ANSI pattern.
153    /// This should be extremely rare in normal usage as the patterns are pre-validated.
154    RegexCompilation(regex::Error),
155}
156
157impl std::fmt::Display for AnsiRegexError {
158    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
159        match self {
160            Self::RegexCompilation(err) => {
161                write!(f, "Failed to compile regex pattern: {err}")
162            }
163        }
164    }
165}
166
167impl std::error::Error for AnsiRegexError {
168    fn source(&self) -> Option<&(dyn std::error::Error + 'static)> {
169        match self {
170            Self::RegexCompilation(err) => Some(err),
171        }
172    }
173}
174
175impl From<regex::Error> for AnsiRegexError {
176    fn from(err: regex::Error) -> Self {
177        Self::RegexCompilation(err)
178    }
179}
180
181/// Configuration options for ANSI regex pattern matching behavior
182///
183/// This struct provides a builder-pattern API for configuring how ANSI escape sequences
184/// are matched and processed. It allows fine-tuning of the regex behavior for different
185/// use cases and performance requirements.
186///
187/// # Design Philosophy
188///
189/// The options are designed to be:
190/// - **Composable**: Chain method calls for readable configuration
191/// - **Immutable**: Each method returns a new instance, preventing accidental mutation
192/// - **Zero-cost**: Configuration happens at compile time where possible
193/// - **Explicit**: Clear method names that describe the intended behavior
194///
195/// # Examples
196///
197/// ## Basic Configuration
198///
199/// ```rust
200/// use ansi_escape_sequences::AnsiRegexOptions;
201///
202/// // Default configuration (matches all sequences globally)
203/// let default_opts = AnsiRegexOptions::new();
204///
205/// // Match only the first ANSI sequence found
206/// let first_only = AnsiRegexOptions::new().only_first();
207///
208/// // Explicitly configure for global matching (same as default)
209/// let global_opts = AnsiRegexOptions::new().global();
210/// ```
211///
212/// ## Method Chaining
213///
214/// ```rust
215/// use ansi_escape_sequences::AnsiRegexOptions;
216///
217/// // Configuration methods can be chained fluently
218/// let opts = AnsiRegexOptions::new()
219///     .only_first()  // Find first match only
220///     .global();     // Override to global matching
221///
222/// // The last method in the chain takes precedence
223/// assert!(!opts.is_only_first());
224/// ```
225///
226/// ## Performance Considerations
227///
228/// ```rust
229/// use ansi_escape_sequences::AnsiRegexOptions;
230///
231/// // For processing large texts where you only need to know if ANSI codes exist
232/// let check_opts = AnsiRegexOptions::new().only_first();
233///
234/// // For complete processing where all sequences matter
235/// let process_opts = AnsiRegexOptions::new().global();
236/// ```
237#[derive(Debug, Clone, PartialEq, Eq)]
238pub struct AnsiRegexOptions {
239    only_first: bool,
240}
241
242impl AnsiRegexOptions {
243    /// Create new options with default values (global matching)
244    #[must_use]
245    pub const fn new() -> Self {
246        Self { only_first: false }
247    }
248
249    /// Configure to match only the first ANSI sequence encountered
250    ///
251    /// When this option is enabled, the regex will be optimized for finding just the first
252    /// ANSI escape sequence in the text. This is more efficient when you only need to detect
253    /// the presence of ANSI codes or process the first occurrence.
254    ///
255    /// # Performance Benefits
256    ///
257    /// - Stops searching after finding the first match
258    /// - Reduces memory allocation for match results
259    /// - Ideal for validation and detection use cases
260    ///
261    /// # Examples
262    ///
263    /// ```rust
264    /// use ansi_escape_sequences::{ansi_regex, AnsiRegexOptions};
265    ///
266    /// let text = "\u{001B}[31mRed\u{001B}[0m \u{001B}[32mGreen\u{001B}[0m";
267    /// let options = AnsiRegexOptions::new().only_first();
268    /// let regex = ansi_regex(Some(options));
269    ///
270    /// // Only finds the first sequence
271    /// let first_match = regex.find(text).unwrap();
272    /// assert_eq!(first_match.as_str(), "\u{001B}[31m");
273    /// ```
274    #[must_use]
275    pub const fn only_first(mut self) -> Self {
276        self.only_first = true;
277        self
278    }
279
280    /// Configure to match all ANSI sequences globally (default behavior)
281    ///
282    /// This is the default matching mode that finds all ANSI escape sequences in the text.
283    /// Use this when you need to process, count, or extract all ANSI codes from the input.
284    ///
285    /// # Use Cases
286    ///
287    /// - Complete ANSI code removal
288    /// - Counting total sequences
289    /// - Processing all formatting information
290    /// - Converting formatted text to other formats
291    ///
292    /// # Examples
293    ///
294    /// ```rust
295    /// use ansi_escape_sequences::{ansi_regex, AnsiRegexOptions};
296    ///
297    /// let text = "\u{001B}[31mRed\u{001B}[0m \u{001B}[32mGreen\u{001B}[0m";
298    /// let options = AnsiRegexOptions::new().global();
299    /// let regex = ansi_regex(Some(options));
300    ///
301    /// // Finds all sequences
302    /// let matches: Vec<_> = regex.find_iter(text).collect();
303    /// assert_eq!(matches.len(), 4); // 2 color codes + 2 reset codes
304    /// ```
305    #[must_use]
306    pub const fn global(mut self) -> Self {
307        self.only_first = false;
308        self
309    }
310
311    /// Check if configured for first-only matching
312    ///
313    /// Returns `true` if the options are configured to match only the first ANSI sequence,
314    /// `false` if configured for global matching.
315    ///
316    /// # Examples
317    ///
318    /// ```rust
319    /// use ansi_escape_sequences::AnsiRegexOptions;
320    ///
321    /// let first_only = AnsiRegexOptions::new().only_first();
322    /// assert!(first_only.is_only_first());
323    ///
324    /// let global = AnsiRegexOptions::new().global();
325    /// assert!(!global.is_only_first());
326    /// ```
327    #[must_use]
328    pub const fn is_only_first(&self) -> bool {
329        self.only_first
330    }
331
332    /// Remove all ANSI escape sequences from a string
333    ///
334    /// This is a high-performance utility function that strips all ANSI escape sequences
335    /// from the input text, returning clean, plain text suitable for storage, logging,
336    /// or display in non-terminal environments.
337    ///
338    /// # Performance Characteristics
339    ///
340    /// - Uses pre-compiled static regex for optimal performance
341    /// - Zero-allocation when no ANSI sequences are found
342    /// - Efficient string replacement using regex engine optimizations
343    ///
344    /// # Examples
345    ///
346    /// ## Basic Usage
347    ///
348    /// ```rust
349    /// use ansi_escape_sequences::strip_ansi;
350    ///
351    /// let colored = "\u{001B}[31mError:\u{001B}[0m Something went wrong";
352    /// let clean = strip_ansi(colored);
353    /// assert_eq!(clean, "Error: Something went wrong");
354    /// ```
355    ///
356    /// ## Complex ANSI Sequences
357    ///
358    /// ```rust
359    /// use ansi_escape_sequences::strip_ansi;
360    ///
361    /// let complex = "\u{001B}[1;31;4mBold Red Underlined\u{001B}[0m Normal";
362    /// let clean = strip_ansi(complex);
363    /// assert_eq!(clean, "Bold Red Underlined Normal");
364    /// ```
365    ///
366    /// ## Log Processing
367    ///
368    /// ```rust
369    /// use ansi_escape_sequences::strip_ansi;
370    ///
371    /// let log_line = "[INFO] \u{001B}[32m✓\u{001B}[0m Operation completed successfully";
372    /// let clean_log = strip_ansi(log_line);
373    /// assert_eq!(clean_log, "[INFO] ✓ Operation completed successfully");
374    /// ```
375    #[must_use]
376    pub(crate) fn strip_ansi(text: &str) -> String {
377        ansi_regex_global().replace_all(text, "").into_owned()
378    }
379
380    /// Check if a string contains any ANSI escape sequences
381    ///
382    /// This function provides a fast way to detect the presence of ANSI escape sequences
383    /// without extracting or processing them. It's optimized for validation and filtering
384    /// use cases where you only need a boolean result.
385    ///
386    /// # Performance Benefits
387    ///
388    /// - Early termination on first match found
389    /// - No memory allocation for match results
390    /// - Optimized regex matching for existence check
391    ///
392    /// # Examples
393    ///
394    /// ## Basic Detection
395    ///
396    /// ```rust
397    /// use ansi_escape_sequences::has_ansi;
398    ///
399    /// assert!(has_ansi("\u{001B}[31mcolored\u{001B}[0m"));
400    /// assert!(!has_ansi("plain text"));
401    /// ```
402    ///
403    /// ## Filtering Content
404    ///
405    /// ```rust
406    /// use ansi_escape_sequences::has_ansi;
407    ///
408    /// let messages = vec![
409    ///     "Plain message",
410    ///     "\u{001B}[32mSuccess!\u{001B}[0m",
411    ///     "Another plain message",
412    /// ];
413    ///
414    /// let colored_messages: Vec<_> = messages
415    ///     .iter()
416    ///     .filter(|msg| has_ansi(msg))
417    ///     .collect();
418    ///
419    /// assert_eq!(colored_messages.len(), 1);
420    /// ```
421    ///
422    /// ## Conditional Processing
423    ///
424    /// ```rust
425    /// use ansi_escape_sequences::{has_ansi, strip_ansi};
426    ///
427    /// fn process_text(text: &str) -> String {
428    ///     if has_ansi(text) {
429    ///         // Process colored text
430    ///         strip_ansi(text)
431    ///     } else {
432    ///         // Pass through plain text unchanged
433    ///         text.to_string()
434    ///     }
435    /// }
436    /// ```
437    #[must_use]
438    pub(crate) fn has_ansi(text: &str) -> bool {
439        ansi_regex_global().is_match(text)
440    }
441
442    /// Find and extract all ANSI escape sequences from a string
443    ///
444    /// This function returns detailed information about every ANSI escape sequence found
445    /// in the input text, including their positions and content. This is useful for
446    /// analysis, conversion, or detailed processing of terminal-formatted text.
447    ///
448    /// # Return Value
449    ///
450    /// Returns a `Vec<regex::Match>` where each `Match` contains:
451    /// - The matched ANSI sequence as a string slice
452    /// - Start and end positions in the original text
453    /// - Methods for extracting the sequence content
454    ///
455    /// # Examples
456    ///
457    /// ## Basic Sequence Extraction
458    ///
459    /// ```rust
460    /// use ansi_escape_sequences::AnsiRegexOptions;
461    ///
462    /// let text = "\u{001B}[31mRed\u{001B}[0m Normal \u{001B}[32mGreen\u{001B}[0m";
463    /// let sequences = AnsiRegexOptions::find_ansi_sequences(text);
464    ///
465    /// assert_eq!(sequences.len(), 4);
466    /// assert_eq!(sequences[0].as_str(), "\u{001B}[31m"); // Red color
467    /// assert_eq!(sequences[1].as_str(), "\u{001B}[0m");  // Reset
468    /// assert_eq!(sequences[2].as_str(), "\u{001B}[32m"); // Green color
469    /// assert_eq!(sequences[3].as_str(), "\u{001B}[0m");  // Reset
470    /// ```
471    ///
472    /// ## Position Analysis
473    ///
474    /// ```rust
475    /// use ansi_escape_sequences::AnsiRegexOptions;
476    ///
477    /// let text = "Hello \u{001B}[31mWorld\u{001B}[0m!";
478    /// let sequences = AnsiRegexOptions::find_ansi_sequences(text);
479    ///
480    /// for (i, seq) in sequences.iter().enumerate() {
481    ///     println!("Sequence {}: '{}' at position {}-{}",
482    ///              i, seq.as_str(), seq.start(), seq.end());
483    /// }
484    /// ```
485    ///
486    /// ## Content Reconstruction
487    ///
488    /// ```rust
489    /// use ansi_escape_sequences::AnsiRegexOptions;
490    ///
491    /// let original = "\u{001B}[1mBold\u{001B}[0m text";
492    /// let sequences = AnsiRegexOptions::find_ansi_sequences(original);
493    ///
494    /// // Reconstruct with different formatting
495    /// let mut result = original.to_string();
496    /// for seq in sequences.iter().rev() { // Reverse to maintain positions
497    ///     let replacement = format!("<ansi:{}>", seq.as_str().len());
498    ///     result.replace_range(seq.range(), &replacement);
499    /// }
500    /// ```
501    #[must_use]
502    pub fn find_ansi_sequences(text: &str) -> Vec<regex::Match> {
503        ansi_regex_global().find_iter(text).collect()
504    }
505}
506
507impl Default for AnsiRegexOptions {
508    fn default() -> Self {
509        Self::new()
510    }
511}
512
513/// Create a high-performance regex pattern for matching ANSI escape sequences
514///
515/// This is the primary entry point for the library, providing access to pre-compiled,
516/// optimized regex patterns for ANSI escape sequence detection and processing. The function
517/// returns static references to cached regex instances for maximum performance.
518///
519/// # Design Philosophy
520///
521/// - **Zero-allocation**: Returns references to static, pre-compiled regex patterns
522/// - **Performance-first**: Optimized patterns with minimal backtracking
523/// - **Flexible**: Supports both global and first-match-only modes
524/// - **Thread-safe**: All returned regex instances are safe for concurrent use
525///
526/// # Arguments
527///
528/// * `options` - Configuration options for regex behavior. Pass `None` for default global matching,
529///   or `Some(AnsiRegexOptions)` for custom configuration.
530///
531/// # Returns
532///
533/// A reference to a pre-compiled `Regex` that matches ANSI escape sequences according to
534/// the specified options. The regex lifetime is `'static`, making it suitable for storage
535/// in static variables or long-lived data structures.
536///
537/// # Performance Characteristics
538///
539/// - **Compilation**: Happens once per pattern type using `LazyLock`
540/// - **Memory**: Minimal overhead with shared static instances
541/// - **Matching**: Optimized for common ANSI sequence patterns
542/// - **Concurrency**: Thread-safe with no synchronization overhead
543///
544/// # Examples
545///
546/// ## Basic Usage
547///
548/// ```rust
549/// use ansi_escape_sequences::ansi_regex;
550///
551/// // Default configuration - matches all ANSI sequences
552/// let regex = ansi_regex(None);
553///
554/// let text = "\u{001B}[31mRed text\u{001B}[0m and \u{001B}[32mgreen text\u{001B}[0m";
555/// assert!(regex.is_match(text));
556///
557/// // Count all ANSI sequences
558/// let count = regex.find_iter(text).count();
559/// assert_eq!(count, 4); // 2 color codes + 2 reset codes
560/// ```
561///
562/// ## Configuration Options
563///
564/// ```rust
565/// use ansi_escape_sequences::{ansi_regex, AnsiRegexOptions};
566///
567/// // Match only the first ANSI sequence (more efficient for detection)
568/// let first_only = AnsiRegexOptions::new().only_first();
569/// let regex = ansi_regex(Some(first_only));
570///
571/// let text = "\u{001B}[31mRed\u{001B}[0m \u{001B}[32mGreen\u{001B}[0m";
572/// let first_match = regex.find(text).unwrap();
573/// assert_eq!(first_match.as_str(), "\u{001B}[31m");
574///
575/// // Global matching (explicit, same as default)
576/// let global = AnsiRegexOptions::new().global();
577/// let regex = ansi_regex(Some(global));
578/// let all_matches: Vec<_> = regex.find_iter(text).collect();
579/// assert_eq!(all_matches.len(), 4);
580/// ```
581///
582/// ## Advanced Processing
583///
584/// ```rust
585/// use ansi_escape_sequences::ansi_regex;
586///
587/// let regex = ansi_regex(None);
588/// let input = "Normal \u{001B}[1;31mBold Red\u{001B}[0m Normal";
589///
590/// // Extract plain text by removing ANSI codes
591/// let clean_text = regex.replace_all(input, "");
592/// assert_eq!(clean_text, "Normal Bold Red Normal");
593///
594/// // Replace ANSI codes with HTML tags
595/// let html = regex.replace_all(input, |caps: &regex::Captures| {
596///     match caps.get(0).unwrap().as_str() {
597///         "\u{001B}[1;31m" => "<span style='color: red; font-weight: bold'>",
598///         "\u{001B}[0m" => "</span>",
599///         _ => "",
600///     }
601/// });
602/// ```
603///
604/// ## Performance Optimization
605///
606/// ```rust
607/// use ansi_escape_sequences::{ansi_regex, AnsiRegexOptions};
608///
609/// // For validation/detection only - use first-match mode
610/// let detector = ansi_regex(Some(AnsiRegexOptions::new().only_first()));
611/// let text_with_ansi = "\u{001B}[31mRed\u{001B}[0m text";
612/// let plain_text = "Plain text";
613///
614/// // Check for ANSI codes efficiently
615/// assert!(detector.is_match(text_with_ansi)); // Stops at first match
616/// assert!(!detector.is_match(plain_text));
617///
618/// // For complete processing - use global mode
619/// let processor = ansi_regex(None);
620///
621/// // Strip all ANSI codes
622/// let clean_text = processor.replace_all(text_with_ansi, "").into_owned();
623/// assert_eq!(clean_text, "Red text");
624/// ```
625///
626/// ## Static Storage
627///
628/// ```rust
629/// use ansi_escape_sequences::ansi_regex;
630/// use std::sync::LazyLock;
631///
632/// // Store regex in static variable for application-wide use
633/// static ANSI_DETECTOR: LazyLock<&'static regex::Regex> = LazyLock::new(|| {
634///     ansi_regex(None)
635/// });
636///
637/// fn process_log_line(line: &str) -> String {
638///     ANSI_DETECTOR.replace_all(line, "").into_owned()
639/// }
640/// ```
641#[must_use]
642pub fn ansi_regex(options: Option<AnsiRegexOptions>) -> &'static Regex {
643    let opts = options.unwrap_or_default();
644
645    if opts.is_only_first() {
646        ansi_regex_first()
647    } else {
648        ansi_regex_global()
649    }
650}
651
652/// Internal ANSI escape sequence pattern constants and utilities
653///
654/// This module contains the low-level pattern definitions used to construct the ANSI regex.
655/// These patterns are based on the ANSI X3.64 standard and VT100/VT220 terminal specifications.
656///
657/// # Pattern Design
658///
659/// The patterns are designed to be:
660/// - **Comprehensive**: Cover all common ANSI escape sequence types
661/// - **Efficient**: Minimize regex backtracking and false positives
662/// - **Standards-compliant**: Follow ANSI/ISO terminal control standards
663/// - **Unicode-aware**: Handle both 7-bit and 8-bit control sequences
664///
665/// # Technical Details
666///
667/// ANSI escape sequences follow these general patterns:
668/// - **7-bit sequences**: Start with ESC (0x1B) followed by specific characters
669/// - **8-bit sequences**: Use C1 control characters directly (0x80-0x9F range)
670/// - **Parameter sequences**: Include numeric parameters separated by semicolons
671/// - **String sequences**: Contain arbitrary data terminated by specific sequences
672mod patterns {
673    /// Valid string terminator sequences for OSC and similar commands
674    ///
675    /// These terminators are used to end string-based ANSI sequences like OSC (Operating System Command).
676    /// The pattern matches any of the three standard terminators:
677    ///
678    /// - **BEL** (`\u{0007}`): Bell character, traditional terminator
679    /// - **ESC\\** (`\u{001B}\u{005C}`): ESC followed by backslash, standard terminator  
680    /// - **ST** (`\u{009C}`): String Terminator C1 control character
681    ///
682    /// # Examples of Usage
683    ///
684    /// ```text
685    /// \u{001B}]0;Window Title\u{0007}        // OSC sequence with BEL terminator
686    /// \u{001B}]0;Window Title\u{001B}\u{005C} // OSC sequence with ESC\ terminator
687    /// \u{001B}]0;Window Title\u{009C}        // OSC sequence with ST terminator
688    /// ```
689    pub const STRING_TERMINATORS: &str = r"(?:\u{0007}|\u{001B}\u{005C}|\u{009C})";
690
691    /// ESC (Escape) character - the foundation of 7-bit ANSI sequences
692    ///
693    /// The ESC character (`\u{001B}` or decimal 27) introduces most ANSI escape sequences.
694    /// It's followed by specific characters to form different types of control sequences:
695    ///
696    /// - **CSI**: ESC + `[` → Control Sequence Introducer
697    /// - **OSC**: ESC + `]` → Operating System Command  
698    /// - **DCS**: ESC + `P` → Device Control String
699    /// - **APC**: ESC + `_` → Application Program Command
700    ///
701    /// # Historical Context
702    ///
703    /// The ESC character was chosen because it's outside the printable ASCII range
704    /// and unlikely to appear in normal text, making it safe for control purposes.
705    pub const ESC: &str = r"\u{001B}";
706
707    /// C1 CSI (Control Sequence Introducer) - 8-bit equivalent of ESC[
708    ///
709    /// The C1 CSI character (`\u{009B}` or decimal 155) is the 8-bit equivalent
710    /// of the two-character sequence ESC[. It directly introduces control sequences
711    /// without needing the ESC prefix.
712    ///
713    /// # Usage Context
714    ///
715    /// - **7-bit mode**: Uses ESC[ (two characters)
716    /// - **8-bit mode**: Uses CSI directly (one character)
717    /// - **Compatibility**: Modern terminals support both forms
718    ///
719    /// # Examples
720    ///
721    /// ```text
722    /// \u{001B}[31m  // 7-bit: ESC[ followed by parameters and final byte
723    /// \u{009B}31m   // 8-bit: CSI directly followed by parameters and final byte
724    /// ```
725    pub const C1_CSI: &str = r"\u{009B}";
726}
727
728// Pre-compiled regex patterns for common use cases
729static ANSI_REGEX_GLOBAL: LazyLock<Regex> = LazyLock::new(|| {
730    Regex::new(&build_ansi_pattern()).expect("Failed to compile global ANSI regex")
731});
732
733static ANSI_REGEX_FIRST: LazyLock<Regex> = LazyLock::new(|| {
734    // For first-only matching, we use the same pattern but the caller will use find() instead of find_iter()
735    // The regex itself doesn't need to be different - the behavior difference is in how it's used
736    Regex::new(&build_ansi_pattern()).expect("Failed to compile first-only ANSI regex")
737});
738
739/// Build the comprehensive ANSI escape sequence regex pattern
740///
741/// This function constructs the complete regex pattern that matches all supported ANSI escape
742/// sequences. The pattern is optimized for performance while maintaining comprehensive coverage
743/// of ANSI/VT terminal control sequences.
744///
745/// # Pattern Architecture
746///
747/// The regex combines two main pattern types using alternation (`|`):
748///
749/// 1. **OSC (Operating System Command) sequences**: `ESC]...ST`
750/// 2. **CSI (Control Sequence Introducer) sequences**: `ESC[` or C1 CSI + parameters + final byte
751///
752/// # Technical Implementation
753///
754/// ## OSC Pattern: `(?:ESC\][\s\S]*?STRING_TERMINATORS)`
755///
756/// - Matches ESC followed by `]` (OSC introducer)
757/// - `[\s\S]*?` matches any character (including newlines) non-greedily
758/// - Terminated by any valid string terminator (BEL, ESC\, or ST)
759/// - Non-greedy matching prevents over-consumption across multiple sequences
760///
761/// ## CSI Pattern: `[ESC[C1_CSI][\[\]()#;?]*(?:\d{1,4}(?:[;:]\d{0,4}})*)?[A-PR-TZcf-nq-uy=><~]`
762///
763/// - `[ESC[C1_CSI]` matches either ESC[ or direct C1 CSI character
764/// - `[\[\]()#;?]*` matches optional intermediate characters
765/// - `(?:\d{1,4}(?:[;:]\d{0,4}})*)?` matches optional numeric parameters
766/// - `[A-PR-TZcf-nq-uy=><~]` matches the final command character
767///
768/// # Performance Optimizations
769///
770/// - **Character classes**: Use efficient character class matching where possible
771/// - **Non-greedy quantifiers**: Prevent excessive backtracking
772/// - **Anchored alternation**: Order patterns by frequency for faster matching
773/// - **Bounded repetition**: Limit parameter lengths to prevent `ReDoS` attacks
774///
775/// # Standards Compliance
776///
777/// The pattern follows these terminal standards:
778/// - **ANSI X3.64**: Core escape sequence definitions
779/// - **ISO/IEC 6429**: International standard for control functions
780/// - **VT100/VT220**: DEC terminal compatibility
781/// - **xterm**: Extended sequence support
782///
783/// # Examples of Matched Sequences
784///
785/// ```text
786/// \u{001B}[31m           // CSI: Set foreground color to red
787/// \u{001B}[2J            // CSI: Clear entire screen
788/// \u{001B}[1;1H          // CSI: Move cursor to position 1,1
789/// \u{001B}]0;Title\u{0007} // OSC: Set window title
790/// \u{009B}31m            // C1 CSI: Set foreground color (8-bit)
791/// ```
792fn build_ansi_pattern() -> String {
793    use patterns::{C1_CSI, ESC, STRING_TERMINATORS};
794
795    // OSC sequences: ESC ] ... ST (non-greedy until first ST)
796    // Matches operating system commands like window title setting
797    let osc = format!(r"(?:{ESC}\][\s\S]*?{STRING_TERMINATORS})");
798
799    // CSI sequences: ESC[/C1, optional intermediates, params, final byte
800    // Matches control sequence introducers for cursor movement, colors, etc.
801    let csi = format!(
802        r"[{ESC}{C1_CSI}][\[\]()#;?]*(?:\d{{1,4}}(?:[;:]\d{{0,4}})*)?[A-PR-TZcf-nq-uy=><~]"
803    );
804
805    // Combine patterns with alternation, OSC first as it's typically longer
806    format!("{osc}|{csi}")
807}
808
809/// Get a pre-compiled global ANSI regex (matches all occurrences)
810///
811/// This is equivalent to calling `ansi_regex(None)` but more explicit about the behavior.
812/// Use this when you want to find all ANSI sequences in a string.
813#[must_use]
814fn ansi_regex_global() -> &'static Regex {
815    &ANSI_REGEX_GLOBAL
816}
817
818/// Get a pre-compiled ANSI regex for first-match usage
819///
820/// Note: This returns the same regex as `ansi_regex_global()`. The "first only" behavior
821/// is achieved by using `find()` instead of `find_iter()` on the regex.
822/// This function exists for API consistency and clarity of intent.
823#[must_use]
824fn ansi_regex_first() -> &'static Regex {
825    &ANSI_REGEX_FIRST
826}
827
828/// Create an owned copy of the ANSI regex
829///
830/// Use this when you need ownership of the regex rather than a reference.
831/// This is less efficient than using the static references but provides more flexibility.
832///
833/// # Arguments
834///
835/// * `options` - Configuration options for the regex pattern
836///
837/// # Returns
838///
839/// An owned regex that matches ANSI escape sequences
840///
841/// # Examples
842///
843/// ```
844/// use ansi_escape_sequences::{ansi_regex_owned, AnsiRegexOptions};
845///
846/// let regex = ansi_regex_owned(None);
847/// assert!(regex.is_match("\u{001B}[31m"));
848/// ```
849#[must_use]
850pub fn ansi_regex_owned(options: Option<AnsiRegexOptions>) -> Regex {
851    ansi_regex(options).clone()
852}
853
854/// Remove all ANSI sequences from a string
855///
856/// This is a convenience function equivalent to `AnsiRegexOptions::strip_ansi()`.
857///
858/// # Arguments
859///
860/// * `text` - The text to strip ANSI sequences from
861///
862/// # Returns
863///
864/// A new string with all ANSI sequences removed
865///
866/// # Examples
867///
868/// ```
869/// use ansi_escape_sequences::strip_ansi;
870///
871/// let text = "\u{001B}[31mred\u{001B}[0m text";
872/// let clean = strip_ansi(text);
873/// assert_eq!(clean, "red text");
874/// ```
875#[must_use]
876pub fn strip_ansi(text: &str) -> String {
877    AnsiRegexOptions::strip_ansi(text)
878}
879
880/// Check if a string contains any ANSI sequences
881///
882/// This is a convenience function equivalent to `AnsiRegexOptions::has_ansi()`.
883///
884/// # Arguments
885///
886/// * `text` - The text to check for ANSI sequences
887///
888/// # Returns
889///
890/// `true` if the text contains ANSI sequences, `false` otherwise
891///
892/// # Examples
893///
894/// ```
895/// use ansi_escape_sequences::has_ansi;
896///
897/// assert!(has_ansi("\u{001B}[31mred\u{001B}[0m"));
898/// assert!(!has_ansi("plain text"));
899/// ```
900#[must_use]
901pub fn has_ansi(text: &str) -> bool {
902    AnsiRegexOptions::has_ansi(text)
903}
904
905#[cfg(test)]
906mod tests;