Crate egg_mode_text

source ·
Expand description

A library for parsing text for Twitter, including character counting with URL shortening.

This is an implementation of the twitter-text library that Twitter makes available as reference code to demonstrate how they count characters in tweets and parse links, hashtags, and user mentions.

The most likely entry point into this module is character_count or its close sibling, characters_remaining. These functions parse the given text for URLs and returns a character count according to the rules set up by Twitter, with the parsed URLs only accounting for the given short-URL lengths. The remaining *_entities functions allow you to parse a given text to see what entities of a given kind Twitter would extract from it, or for all entities with the entities function. These can be used, for example, to provide auto-completion for a screen name or hashtag when composing a tweet.

As the entities parsed by this module are simplified compared to the entities returned via the Twitter API, they have been combined into one simplified Entity struct, with a companion EntityKind enum to differentiate between them. See the struct documentation for Entity for examples of how to use one.

Structs

Represents an entity extracted from a given text.

Enums

Represents the kinds of entities that can be extracted from a given text.

Functions

Returns how many characters the given text would be, after accounting for URL shortening.
Returns how many characters would remain with the given text, if the given bound were used as a maximum. Also returns an indicator of whether the given text is a valid length to post with that maximum.
Parses the given string for all entities: URLs, hashtags, financial symbols (“cashtags”), user mentions, and list mentions.
Parses the given string for hashtags, optionally leaving out those that are part of URLs.
Parses the given string for user mentions.
Parses the given string for user and list mentions.
Parses the given string for a user mention at the beginning of the text, if present.
Parses the given string for financial symbols (“cashtags”), optionally leaving out those that are part of URLs.
Parses the given string for URLs.