Expand description

finl_unicode is a crate to provide Unicode support for the finl project. This is not necessarily meant to be a comoprehensive Unicode support, although I will consider adding additional use cases as necessary. Unicode 14.0.0 is implemented in the current version.

Two features are currently supported:

  • Unicode segmentation. (Specify clusters as a feature when importing the crate.) For a peekable iterator of CharIndices, we extend that iterator to include a next_cluster method which returns Option<String> which will contain the next grapheme cluster if there is one or None if there isn’t.
  • Character category. (Specify categories as a feature when importing the crate.) Extends the char class with methods for testing the category of the character.

The default is to compile all features. Note that the Rust compiler/linker will not automatically link unused code, so you most of the time, there will be no need to remove features.

Building the crate runs a build script which connects to unicode.org to download the data files.


The code in this module provides a trait that is implemented against char that allows testing or retrieving the Unicode category for the character as well as two enums for identifying character classes.
This module provides two interfaces for accessing clusters from an underlying string. The GraphemeCluster trait extends the Peekable iterators over Chars or CharIndices to add a next_cluster method which returns Option<String> with the next cluster if one exists. This is the best method for getting individual clusters from a stream which is normally only getting chars but is not recommended if you wish to iterate over clusters.