Enum regex::Regex

source ·
pub enum Regex {
    // some variants omitted
}
Expand description

A compiled regular expression

It is represented as either a sequence of bytecode instructions (dynamic) or as a specialized Rust function (native). It can be used to search, split or replace text. All searching is done with an implicit .*? at the beginning and end of an expression. To force an expression to match the whole string (or a prefix or a suffix), you must use an anchor like ^ or $ (or \A and \z).

While this crate will handle Unicode strings (whether in the regular expression or in the search text), all positions returned are byte indices. Every byte index is guaranteed to be at a Unicode code point boundary.

The lifetimes 'r and 't in this crate correspond to the lifetime of a compiled regular expression and text to search, respectively.

The only methods that allocate new strings are the string replacement methods. All other methods (searching and splitting) return borrowed pointers into the string given.

§Examples

Find the location of a US phone number:

let re = Regex::new("[0-9]{3}-[0-9]{3}-[0-9]{4}").unwrap();
assert_eq!(re.find("phone: 111-222-3333"), Some((7, 19)));

§Using the std::str::StrExt methods with Regex

Note: This section requires that this crate is currently compiled with the pattern Cargo feature enabled.

Since Regex implements Pattern, you can use regexes with methods defined on std::str::StrExt. For example, is_match, find, find_iter and split can be replaced with StrExt::contains, StrExt::find, StrExt::match_indices and StrExt::split.

Here are some examples:

let re = Regex::new(r"\d+").unwrap();
let haystack = "a111b222c";

assert!(haystack.contains(&re));
assert_eq!(haystack.find(&re), Some(1));
assert_eq!(haystack.match_indices(&re).collect::<Vec<_>>(),
           vec![(1, 4), (5, 8)]);
assert_eq!(haystack.split(&re).collect::<Vec<_>>(), vec!["a", "b", "c"]);

Implementations§

source§

impl Regex

source

pub fn new(re: &str) -> Result<Regex, Error>

Compiles a dynamic regular expression. Once compiled, it can be used repeatedly to search, split or replace text in a string.

If an invalid expression is given, then an error is returned.

source

pub fn is_match(&self, text: &str) -> bool

Returns true if and only if the regex matches the string given.

§Example

Test if some text contains at least one word with exactly 13 characters:

let text = "I categorically deny having triskaidekaphobia.";
let matched = Regex::new(r"\b\w{13}\b").unwrap().is_match(text);
assert!(matched);
source

pub fn find(&self, text: &str) -> Option<(usize, usize)>

Returns the start and end byte range of the leftmost-first match in text. If no match exists, then None is returned.

Note that this should only be used if you want to discover the position of the match. Testing the existence of a match is faster if you use is_match.

§Example

Find the start and end location of the first word with exactly 13 characters:

let text = "I categorically deny having triskaidekaphobia.";
let pos = Regex::new(r"\b\w{13}\b").unwrap().find(text);
assert_eq!(pos, Some((2, 15)));
source

pub fn find_iter<'r, 't>(&'r self, text: &'t str) -> FindMatches<'r, 't>

Returns an iterator for each successive non-overlapping match in text, returning the start and end byte indices with respect to text.

§Example

Find the start and end location of every word with exactly 13 characters:

let text = "Retroactively relinquishing remunerations is reprehensible.";
for pos in Regex::new(r"\b\w{13}\b").unwrap().find_iter(text) {
    println!("{:?}", pos);
}
// Output:
// (0, 13)
// (14, 27)
// (28, 41)
// (45, 58)
source

pub fn captures<'t>(&self, text: &'t str) -> Option<Captures<'t>>

Returns the capture groups corresponding to the leftmost-first match in text. Capture group 0 always corresponds to the entire match. If no match is found, then None is returned.

You should only use captures if you need access to submatches. Otherwise, find is faster for discovering the location of the overall match.

§Examples

Say you have some text with movie names and their release years, like “‘Citizen Kane’ (1941)”. It’d be nice if we could search for text looking like that, while also extracting the movie name and its release year separately.

let re = Regex::new(r"'([^']+)'\s+\((\d{4})\)").unwrap();
let text = "Not my favorite movie: 'Citizen Kane' (1941).";
let caps = re.captures(text).unwrap();
assert_eq!(caps.at(1), Some("Citizen Kane"));
assert_eq!(caps.at(2), Some("1941"));
assert_eq!(caps.at(0), Some("'Citizen Kane' (1941)"));

Note that the full match is at capture group 0. Each subsequent capture group is indexed by the order of its opening (.

We can make this example a bit clearer by using named capture groups:

let re = Regex::new(r"'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)")
               .unwrap();
let text = "Not my favorite movie: 'Citizen Kane' (1941).";
let caps = re.captures(text).unwrap();
assert_eq!(caps.name("title"), Some("Citizen Kane"));
assert_eq!(caps.name("year"), Some("1941"));
assert_eq!(caps.at(0), Some("'Citizen Kane' (1941)"));

Here we name the capture groups, which we can access with the name method. Note that the named capture groups are still accessible with at.

The 0th capture group is always unnamed, so it must always be accessed with at(0).

source

pub fn captures_iter<'r, 't>(&'r self, text: &'t str) -> FindCaptures<'r, 't>

Returns an iterator over all the non-overlapping capture groups matched in text. This is operationally the same as find_iter (except it yields information about submatches).

§Example

We can use this to find all movie titles and their release years in some text, where the movie is formatted like “‘Title’ (xxxx)”:

let re = Regex::new(r"'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)")
               .unwrap();
let text = "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931).";
for caps in re.captures_iter(text) {
    println!("Movie: {:?}, Released: {:?}", caps.name("title"), caps.name("year"));
}
// Output:
// Movie: Citizen Kane, Released: 1941
// Movie: The Wizard of Oz, Released: 1939
// Movie: M, Released: 1931
source

pub fn split<'r, 't>(&'r self, text: &'t str) -> RegexSplits<'r, 't>

Returns an iterator of substrings of text delimited by a match of the regular expression. Namely, each element of the iterator corresponds to text that isn’t matched by the regular expression.

This method will not copy the text given.

§Example

To split a string delimited by arbitrary amounts of spaces or tabs:

let re = Regex::new(r"[ \t]+").unwrap();
let fields: Vec<&str> = re.split("a b \t  c\td    e").collect();
assert_eq!(fields, vec!("a", "b", "c", "d", "e"));
source

pub fn splitn<'r, 't>( &'r self, text: &'t str, limit: usize, ) -> RegexSplitsN<'r, 't>

Returns an iterator of at most limit substrings of text delimited by a match of the regular expression. (A limit of 0 will return no substrings.) Namely, each element of the iterator corresponds to text that isn’t matched by the regular expression. The remainder of the string that is not split will be the last element in the iterator.

This method will not copy the text given.

§Example

Get the first two words in some text:

let re = Regex::new(r"\W+").unwrap();
let fields: Vec<&str> = re.splitn("Hey! How are you?", 3).collect();
assert_eq!(fields, vec!("Hey", "How", "are you?"));
source

pub fn replace<R: Replacer>(&self, text: &str, rep: R) -> String

Replaces the leftmost-first match with the replacement provided. The replacement can be a regular string (where $N and $name are expanded to match capture groups) or a function that takes the matches’ Captures and returns the replaced string.

If no match is found, then a copy of the string is returned unchanged.

§Examples

Note that this function is polymorphic with respect to the replacement. In typical usage, this can just be a normal string:

let re = Regex::new("[^01]+").unwrap();
assert_eq!(re.replace("1078910", ""), "1010");

But anything satisfying the Replacer trait will work. For example, a closure of type |&Captures| -> String provides direct access to the captures corresponding to a match. This allows one to access submatches easily:

let re = Regex::new(r"([^,\s]+),\s+(\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", |caps: &Captures| {
    format!("{} {}", caps.at(2).unwrap_or(""), caps.at(1).unwrap_or(""))
});
assert_eq!(result, "Bruce Springsteen");

But this is a bit cumbersome to use all the time. Instead, a simple syntax is supported that expands $name into the corresponding capture group. Here’s the last example, but using this expansion technique with named capture groups:

let re = Regex::new(r"(?P<last>[^,\s]+),\s+(?P<first>\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", "$first $last");
assert_eq!(result, "Bruce Springsteen");

Note that using $2 instead of $first or $1 instead of $last would produce the same result. To write a literal $ use $$.

Finally, sometimes you just want to replace a literal string with no submatch expansion. This can be done by wrapping a string with NoExpand:

use regex::NoExpand;

let re = Regex::new(r"(?P<last>[^,\s]+),\s+(\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", NoExpand("$2 $last"));
assert_eq!(result, "$2 $last");
source

pub fn replace_all<R: Replacer>(&self, text: &str, rep: R) -> String

Replaces all non-overlapping matches in text with the replacement provided. This is the same as calling replacen with limit set to 0.

See the documentation for replace for details on how to access submatches in the replacement string.

source

pub fn replacen<R: Replacer>(&self, text: &str, limit: usize, rep: R) -> String

Replaces at most limit non-overlapping matches in text with the replacement provided. If limit is 0, then all non-overlapping matches are replaced.

See the documentation for replace for details on how to access submatches in the replacement string.

source

pub fn as_str<'a>(&'a self) -> &'a str

Returns the original string of this regex.

Trait Implementations§

source§

impl Clone for Regex

source§

fn clone(&self) -> Regex

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for Regex

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Shows the original regular expression.

source§

impl Display for Regex

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Shows the original regular expression.

source§

impl FromStr for Regex

source§

fn from_str(s: &str) -> Result<Regex, Error>

Attempts to parse a string into a regular expression

§

type Err = Error

The associated error which can be returned from parsing.
source§

impl PartialEq for Regex

Equality comparison is based on the original string. It is possible that different regular expressions have the same matching behavior, but are still compared unequal. For example, \d+ and \d\d* match the same set of strings, but are not considered equal.

source§

fn eq(&self, other: &Regex) -> bool

This method tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl Eq for Regex

Auto Trait Implementations§

§

impl Freeze for Regex

§

impl RefUnwindSafe for Regex

§

impl Send for Regex

§

impl Sync for Regex

§

impl Unpin for Regex

§

impl UnwindSafe for Regex

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

default unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T> ToString for T
where T: Display + ?Sized,

source§

default fn to_string(&self) -> String

Converts the given value to a String. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.