Expand description
Library for conversion of Latin UTF-8 text into Tengwar, using the unicode codepoints of the Free Tengwar Font Project. Specifically, but not exclusively, designed with Tengwar Telcontar in mind, for the purpose of use within LaTeX macros.
§Overview
The library is split into two main modules. The characters module is
primarily concerned with defining the data and datastructures needed to
represent Tengwar. The mode module, on the other hand, is mainly
concerned with transcription, defining the TengwarMode trait for the
rules and the Tokenizer type for applying them.
However, this first level of transcription is usually not enough; Therefore,
the top level of the crate defines the TokenIter type to perform
additional transformations. This higher-level iterator can be configured
at runtime, and is capable of looking ahead and behind to determine the
context, enabling critical situational behaviors.
Three modes are currently provided by default: Quenya (“Classical”),
Beleriand, and Gondor. Each mode implements the TengwarMode
trait.
§Examples
§TengwarMode trait
The most direct way to convert text is TengwarMode::transcribe. This
function accepts any input type that implements AsRef<str>, and can
return any type that implements FromIterator<Token>; This includes
Vec<Token> and String.
use tengwar::{Quenya, TengwarMode};
let text: String = Quenya::transcribe("namárië !");
assert_eq!(text, " ");§ToTengwar trait
With the use of the ToTengwar helper trait (automatically implemented
for any type implementing AsRef<str>), three methods are provided on
the input type directly. The first is ToTengwar::transcriber, which
constructs a Transcriber for the text, allowing iteration over
Tokens.
The Transcriber also has TranscriberSettings, holding several public
fields, which can be changed to adjust various aspects of its behavior.
use tengwar::{Quenya, ToTengwar};
let mut transcriber = "namárië !".transcriber::<Quenya>();
transcriber.settings.alt_a = true; // Use the alternate form of the A-tehta.
let text: String = transcriber.collect();
assert_eq!(text, " ");The second method is ToTengwar::to_tengwar. This is mostly a convenience
method, which simply calls ToTengwar::transcriber and immediately
collects the Iterator into a String.
use tengwar::{Quenya, ToTengwar};
let text: String = "namárië !".to_tengwar::<Quenya>();
assert_eq!(text, " ");The third method is ToTengwar::to_tengwar_with, which does the same, but
takes TranscriberSettings to modify the Transcriber before it is
collected. This allows settings to be specified once and reused.
use tengwar::{Quenya, ToTengwar, TranscriberSettings};
let mut settings = TranscriberSettings::new();
settings.alt_a = true;
settings.nuquerna = true;
let text: String = "namárië !".to_tengwar_with::<Quenya>(settings);
assert_eq!(text, " ");
let text: String = "lotsë súva".to_tengwar_with::<Quenya>(settings);
assert_eq!(text, " ");§Crate-level function
Also available, and likely the easiest to discover via code completion, is
the top-level transcribe function, which takes an implementor of
TengwarMode as a generic parameter. This function accepts any input
type that implements ToTengwar, and is a passthrough to the
ToTengwar::to_tengwar method.
use tengwar::{Quenya, transcribe};
let text: String = transcribe::<Quenya>("namárië !");
assert_eq!(text, " ");§In Detail
The core of this library is the Token enum. A Token may hold a simple
char, a Glyph, or a Numeral. An iterator of Tokens can be
collected into a String; This is where the rendering of Tengwar
text truly takes place.
The rest of the library is geared around the creation of Tokens, usually
by iteration, and modifying them before the final call to collect.
§Mode
A “Mode” of the Tengwar is essentially an orthography mapping; It correlates conventions of writing in a primary world alphabet to the conventions of writing in the Tengwar.
For this purpose, the TengwarMode trait is provided. A type implementing
this trait is expected to perform essentially as a state machine, taking
input in the form of slices of chars, and using them to progressively
construct Tokens.
§Tokenizer
The first level of iteration is the Tokenizer. This
iterator takes UTF-8 text, breaks it down into a Vec of normalized
Unicode codepoints, and assembles Tokens according to the rules
specified by an implementation of TengwarMode.
Short slices of chars are passed to the Mode type, which determines
whether to accept them as part of a Token. If the chars are not
accepted, the slice is narrowed and tried again, until the width reaches
zero; At this point, the Mode type is shown the full remaining data and
asked whether it can get anything at all from it. If it cannot, a char
is returned unchanged as a Token.
When the Tokenizer yields a Token, the following one is generated. This
allows for one last call to the Mode type, to TengwarMode::finalize,
to modify a Token in light of the one that follows it; This is a very
important step, as some modes require that different base characters are
used depending on what follows them.
§TokenIter / Transcriber
The second level of iteration is the TokenIter. This iterator can wrap
any other iterator that produces Tokens, and its purpose is to apply
contextual rules and transformations specified at runtime. This is what
allows the executable transcriber to take CLI options that change rules,
such as the treatment of “long” tehta variants.
A TokenIter that wraps a Tokenizer can also be called
a Transcriber for simplicity, because it is known that its Tokens
are being produced directly from text.
§Policy
A “Policy” is similar to a Mode, but rather than defining details about orthography, it instead defines details about typography. This includes details such as valid ligatures and placements of Sa-Rinci.
The Policy trait is provided for this purpose, and is
used as a generic parameter for the Glyph type. Because of this, it
is also a generic parameter for the Token and TokenIter types;
The Tokenizer type is considered to be out of scope
of the Policy system, and simply yields all of its Tokens with the
default policy (policy::Standard).
Re-exports§
pub use characters::Glyph;pub use characters::Numeral;pub use characters::VowelStyle;pub use mode::Beleriand;pub use mode::Gondor;pub use mode::Quenya;pub use mode::TengwarMode;
Modules§
- characters
- This module defines the basic constants and data structures required to work effectively with the Tengwar.
- mode
- This module defines the interface, and default implementations, for “modes” of the Tengwar: High-level rules for text representations.
- policy
Structs§
- Token
Iter - An iterator over a sequence of
Tokens which applies various rules. This is the top level construct of the transcription process. - Transcriber
Settings - Behavior settings to be used by a
TokenIter.
Enums§
- Token
- A small container for either plain text or a
Glyphspecification. Serves as the top level of throughput for the transliteration process.
Traits§
- ToTengwar
- A very small trait serving to implement ergonomic transcription methods directly onto text objects.
Functions§
- transcribe
- Convert a compatible object (typically text) into the Tengwar.
Type Aliases§
- Transcriber
- An iterator over a sequence of
Tokens which applies various rules. This is the top level construct of the transcription process.