pub struct Entity {
pub kind: EntityKind,
pub range: (usize, usize),
}Expand description
Represents an entity extracted from a given text.
This struct is meant to be returned from the entity parsing functions and linked to the source string that was parsed from the function in question. This is because the Entity struct itself only contains byte offsets for the string in question.
§Examples
To load the string in question, you can use the byte offsets directly, or use the substr
method on the Entity itself:
use egg_mode_text::hashtag_entities;
let text = "this is a #hashtag";
let results = hashtag_entities(text, true);
let entity = results.first().unwrap();
assert_eq!(&text[entity.range.0..entity.range.1], "#hashtag");
assert_eq!(entity.substr(text), "#hashtag");Just having the byte offsets may seem like a roundabout way to store the extracted string, but with the byte offsets, you can also substitute in text decoration, like HTML links:
use egg_mode_text::hashtag_entities;
let text = "this is a #hashtag";
let results = hashtag_entities(text, true);
let mut output = String::new();
let mut last_pos = 0;
for entity in results {
output.push_str(&text[last_pos..entity.range.0]);
//NOTE: this doesn't URL-encode the hashtag for the link
let tag = entity.substr(text);
let link = format!("<a href='https://twitter.com/#!/search?q={0}'>{0}</a>", tag);
output.push_str(&link);
last_pos = entity.range.1;
}
output.push_str(&text[last_pos..]);
assert_eq!(output, "this is a <a href='https://twitter.com/#!/search?q=#hashtag'>#hashtag</a>");Fields§
§kind: EntityKindThe kind of entity that was extracted.
range: (usize, usize)The byte offsets between which the entity text is. The first index indicates the byte at the beginning of the extracted entity, but the second one is the byte index for the first character after the extracted entity (or one past the end of the string if the entity was at the end of the string). For hashtags and symbols, the range includes the # or $ character.
Implementations§
Source§impl Entity
impl Entity
Sourcepub fn substr<'a>(&self, text: &'a str) -> &'a str
pub fn substr<'a>(&self, text: &'a str) -> &'a str
Returns the substring matching this entity’s byte offsets from the given text.
§Panics
This function will panic if the byte offsets in this entity do not match codepoint boundaries in the given text. This can happen if the text is not the original string that this entity was parsed from.