pub struct Structex<R>where
R: RegexEngine,{ /* private fields */ }Expand description
A compiled structural regular expression backed by an underlying regular expression engine.
A Structex can be used to search for tagged substrings within a haystack supported by the
regular expression engine it is backed by. The primary API for making use of a Structex is
the Structex::iter_tagged_captures method which will iterate over the TaggedCaptures within
a given haystack as it is searched.
Implementations§
Source§impl<R> Structex<R>where
R: RegexEngine,
impl<R> Structex<R>where
R: RegexEngine,
Sourcepub fn new(se: &str) -> Result<Self, Error>
pub fn new(se: &str) -> Result<Self, Error>
Compiles a structural regular expression. Once compiled it may be used repeatedly and cloned cheaply, but note that compilation can be an expensive process so Structex instances should be reused wherever possible.
To configure how given Structex is compiled, see StructexBuilder.
§Error
If an invalid expression is given then an error is returned. The exact expressions that are valid to compile will depend on the underlying regular expression engine being used.
§Example
// A Structex backed by the regex crate
type Structex = structex::Structex<regex::Regex>;
// An empty expression is always invalid
assert!(Structex::new("").is_err());
// The top level expression must not be a bare action
assert!(Structex::new("P/I am invalid/").is_err());
// A valid expression with a named action
assert!(Structex::new("x/hello, (world|sailor)!/ p").is_ok());Sourcepub fn as_str(&self) -> &str
pub fn as_str(&self) -> &str
Returns the original string of this structex.
§Example
type Structex = structex::Structex<regex::Regex>;
let se = Structex::new("x/foo.*bar/ p").unwrap();
assert_eq!(se.as_str(), "x/foo.*bar/ p");Sourcepub fn actions(&self) -> &[Action]
pub fn actions(&self) -> &[Action]
Returns the registered actions that were parsed from the compiled expression.
§Example
use structex::Action;
type Structex = structex::Structex<regex::Regex>;
let se = Structex::new("x/foo.*bar/ { p; a/baz/; }").unwrap();
let actions = se.actions();
assert_eq!(actions.len(), 2);
assert_eq!(actions[0].id(), 0);
assert_eq!(actions[0].tag(), 'p');
assert_eq!(actions[0].arg(), None);
assert_eq!(actions[1].id(), 1);
assert_eq!(actions[1].tag(), 'a');
assert_eq!(actions[1].arg(), Some("baz"));Returns the registered tags that were parsed from the compiled expression.
§Example
type Structex = structex::Structex<regex::Regex>;
let se = Structex::new("x/foo.*bar/ { p; a/baz/; }").unwrap();
assert_eq!(se.tags(), &['a', 'p']);Sourcepub fn iter_tagged_captures<'s, 'h, H>(
&'s self,
haystack: &'h H,
) -> TaggedCapturesIter<'s, 'h, R, H> ⓘ
pub fn iter_tagged_captures<'s, 'h, H>( &'s self, haystack: &'h H, ) -> TaggedCapturesIter<'s, 'h, R, H> ⓘ
Iterate over all TaggedCaptures within the given haystack in order.
§Examples
By default, matches will be emitted without an associated action attached to them, allowing you to write simple expressions that filter and refine regions of the haystack to locate the structure you are looking for.
type Structex = structex::Structex<regex::Regex>;
let se = Structex::new(r#"
x/(.|\n)*?\./ # split into sentences
g/Alice/ # if the sentence contains "Alice"
n/(\w+)\./ # extract the last word of the sentence
"#).unwrap();
let haystack = r#"This is a multi-line
string that mentions peoples names.
People like Alice and Bob. People
like Claire and David, but really
we're here to talk about Alice.
Alice is everyone's friend."#;
let last_words: Vec<String> = se
.iter_tagged_captures(haystack)
.map(|m| m.submatch_text(1).unwrap().to_string())
.collect();
assert_eq!(&last_words, &["Bob", "Alice", "friend"]);When writing more complex expressions you will want to assign tagged actions to each matching branch in order to distinguish them:
use structex::TaggedCaptures;
type Structex = structex::Structex<regex::Regex>;
let se = Structex::new(r#"
# split into sentences
x/(.|\n)*?\./ {
# if the sentence contains "Alice" extract the last word of the sentence
g/Alice/ n/(\w+)\./ A;
# if it doesn't, extract the first word of the sentence
v/Alice/ n/(\w+)/ B;
}
"#).unwrap();
let haystack = r#"This is a multi-line
string that mentions peoples names.
People like Alice and Bob. People
like Claire and David, but really
we're here to talk about Alice.
Alice is everyone's friend."#;
let captures: Vec<TaggedCaptures<str>> = se
.iter_tagged_captures(haystack)
.collect();
let words: Vec<(char, &str)> = captures
.iter()
.map(|m| (m.tag().unwrap(), m.submatch_text(1).unwrap()))
.collect();
assert_eq!(
&words,
&[('B', "This"), ('A', "Bob"), ('A', "Alice"), ('A', "friend")]
);Sourcepub fn iter_tagged_captures_between<'s, 'h, H>(
&'s self,
byte_from: usize,
byte_to: usize,
haystack: &'h H,
) -> TaggedCapturesIter<'s, 'h, R, H> ⓘ
pub fn iter_tagged_captures_between<'s, 'h, H>( &'s self, byte_from: usize, byte_to: usize, haystack: &'h H, ) -> TaggedCapturesIter<'s, 'h, R, H> ⓘ
Iterate over all TaggedCaptures within the given haystack between the given byte offsets in order.
See iter_tagged_captures for details of semantics.
§Example
type Structex = structex::Structex<regex::Regex>;
let se = Structex::new(r#"
x/(.|\n)*?\./ # split into sentences
g/Alice/ # if the sentence contains "Alice"
n/(\w+)\./ # extract the last word of the sentence
"#).unwrap();
let haystack = r#"This is a multi-line
string that mentions peoples names.
People like Alice and Bob. People
like Claire and David, but really
we're here to talk about Alice.
Alice is everyone's friend."#;
// The byte range 57..156 removes the first an last sentences from the initial haystack.
assert_eq!(
&haystack[57..156],
r"People like Alice and Bob. People
like Claire and David, but really
we're here to talk about Alice."
);
let last_words: Vec<String> = se
.iter_tagged_captures_between(57, 156, haystack)
.map(|m| m.submatch_text(1).unwrap().to_string())
.collect();
assert_eq!(&last_words, &["Bob", "Alice"]);