Struct Regex

Source

pub struct Regex<'a> { /* private fields */ }

Available on crate features regex-automata and regex-lita only.

Expand description

A compiled regular expression for searching Unicode haystacks.

A Regex can be used to search haystacks, split haystacks into substrings or replace substrings in a haystack with a different substring. All searching is done with an implicit (?s:.)*? at the beginning and end of an pattern. To force an expression to match the whole string (or a prefix or a suffix), you can use anchored search or an anchor like ^ or $ (or \A and \z).

§Overview

The most important methods are as follows:

Regex::new compiles a regex using the default configuration. A Builder permits setting a non-default configuration. (For example, case insensitive matching, verbose mode and others.)
Regex::is_match reports whether a match exists in a particular haystack.
Regex::find reports the byte offsets of a match in a haystack, if one exists. [Regex::find_iter] returns an iterator over all such matches.
Regex::captures returns a Captures, which reports both the byte offsets of a match in a haystack and the byte offsets of each matching capture group from the regex in the haystack. [Regex::captures_iter] returns an iterator over all such matches.

§Example

use ib_matcher::regex::lita::Regex;

let re = Regex::new(r"^[0-9]{4}-[0-9]{2}-[0-9]{2}$")?;
assert!(re.is_match("2010-03-14"));

With IbMatcher’s Chinese pinyin and Japanese romaji matching:

// cargo add ib-matcher --features regex,pinyin,romaji
use ib_matcher::{
    matcher::{MatchConfig, PinyinMatchConfig, RomajiMatchConfig},
    regex::{lita::Regex, Match},
};

let config = MatchConfig::builder()
    .pinyin(PinyinMatchConfig::default())
    .romaji(RomajiMatchConfig::default())
    .build();

let re = Regex::builder()
    .ib(config.shallow_clone())
    .build("raki.suta")
    .unwrap();
assert_eq!(re.find("「らき☆すた」"), Some(Match::must(0, 3..18)));

let re = Regex::builder()
    .ib(config.shallow_clone())
    .build("pysou.*?(any|every)thing")
    .unwrap();
assert_eq!(re.find("拼音搜索Everything"), Some(Match::must(0, 0..22)));

let config = MatchConfig::builder()
    .pinyin(PinyinMatchConfig::default())
    .romaji(RomajiMatchConfig::default())
    .mix_lang(true)
    .build();
let re = Regex::builder()
    .ib(config.shallow_clone())
    .build("(?x)^zangsounofuri-?ren # Mixing pinyin and romaji")
    .unwrap();
assert_eq!(re.find("葬送のフリーレン"), Some(Match::must(0, 0..24)));

For more examples and the syntax, see crate::regex.

§Case insensitivity

To enable case insensitivity:

use ib_matcher::{matcher::{PinyinMatchConfig, PlainMatchConfig, MatchConfig}, regex::lita::Regex};

let re = Regex::builder().ib(MatchConfig::default()).build("foo").unwrap();
assert!(re.is_match("FOO"));

// Alternatively, with `case_insensitive()`:
let re = Regex::builder()
    .ib(MatchConfig::builder()
        .case_insensitive(true)
        .pinyin(PinyinMatchConfig::default())
        .build())
    .build("pyss")
    .unwrap();
assert!(re.is_match("PY搜索"));

Note that enabling syntax.case_insensitive will make ib (i.e. pinyin and romaji match) doesn’t work at the moment. You should only set MatchConfigBuilder::case_insensitive (PlainMatchConfigBuilder::case_insensitive).

If you need case insensitive character classes, you need to write (?i:[a-z]) instead at the moment.

§Synchronization and cloning

In order to make the Regex API convenient, most of the routines hide the fact that a Cache is needed at all. To achieve this, a memory pool is used internally to retrieve Cache values in a thread safe way that also permits reuse. This in turn implies that every such search call requires some form of synchronization. Usually this synchronization is fast enough to not notice, but in some cases, it can be a bottleneck. This typically occurs when all of the following are true:

The same Regex is shared across multiple threads simultaneously, usually via a util::lazy::Lazy or something similar from the once_cell or lazy_static crates.
The primary unit of work in each thread is a regex search.
Searches are run on very short haystacks.

This particular case can lead to high contention on the pool used by a Regex internally, which can in turn increase latency to a noticeable effect. This cost can be mitigated in one of the following ways:

Use a distinct copy of a Regex in each thread, usually by cloning it. Cloning a Regex does not do a deep copy of its read-only component. But it does lead to each Regex having its own memory pool, which in turn eliminates the problem of contention. In general, this technique should not result in any additional memory usage when compared to sharing the same Regex across multiple threads simultaneously.
Use lower level APIs, like [Regex::try_find], which permit passing a Cache explicitly. In this case, it is up to you to determine how best to provide a Cache. For example, you might put a Cache in thread-local storage if your use case allows for it.

Overall, this is an issue that happens rarely in practice, but it can happen.

§Warning: spin-locks may be used in alloc-only mode

When this crate is built without the std feature and the high level APIs on a Regex are used, then a spin-lock will be used to synchronize access to an internal pool of Cache values. This may be undesirable because a spin-lock is effectively impossible to implement correctly in user space. That is, more concretely, the spin-lock could result in a deadlock.

If one wants to avoid the use of spin-locks when the std feature is disabled, then you must use APIs that accept a Cache value explicitly. For example, [Regex::try_find].

Struct Regex Copy item path

§Overview

§Example

§Case insensitivity

§Synchronization and cloning

§Warning: spin-locks may be used in alloc-only mode

Implementations§

impl<'a> Regex<'a>

pub fn new(pattern: &str) -> Result<Self, BuildError>

pub fn config() -> Config

pub fn create_captures(&self) -> Captures

pub fn builder<'f1>() -> Builder<'a, 'f1>

§Example: change the line terminator

impl<'a> Regex<'a>

pub fn is_match<'h, I: Into<Input<'h>>>(&self, input: I) -> bool

§Example

§Example: consistency with search APIs

pub fn find<'h, I: Into<Input<'h>>>(&self, input: I) -> Option<Match>

§Example

pub fn captures<'h, I: Into<Input<'h>>>( &self, input: I, caps: &mut Captures, ) -> Result<(), MatchError>

§Example

Trait Implementations§

impl<'a> Clone for Regex<'a>

fn clone(&self) -> Regex<'a>

fn clone_from(&mut self, source: &Self)

Auto Trait Implementations§

impl<'a> Freeze for Regex<'a>

impl<'a> !RefUnwindSafe for Regex<'a>

impl<'a> !Send for Regex<'a>

impl<'a> !Sync for Regex<'a>

impl<'a> Unpin for Regex<'a>

impl<'a> !UnwindSafe for Regex<'a>

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> Same for T

type Output = T

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct Regex

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,