Crate tengwar

Expand description

Library for conversion of Latin UTF-8 text into Tengwar, using the unicode codepoints of the Free Tengwar Font Project. Specifically, but not exclusively, designed with Tengwar Telcontar in mind, for the purpose of use within LaTeX macros.

Overview

The library is split into two main modules. The characters module is primarily concerned with defining the data and datastructures needed to represent Tengwar. The mode module, on the other hand, is mainly concerned with transcription, defining the TengwarMode trait for the rules and the Tokenizer type for applying them.

However, this first level of transcription is usually not enough; Therefore, the top level of the crate defines the TokenIter type to perform additional transformations. This higher-level iterator can be configured at runtime, and is capable of looking ahead and behind to determine the context, enabling critical situational behaviors.

Three modes are currently provided by default: Quenya (“Classical”), Beleriand, and Gondor. Each mode implements the TengwarMode trait.

Examples

`TengwarMode` trait

The most direct way to convert text is TengwarMode::transcribe. This function accepts any input type that implements AsRef<str>, and can return any type that implements FromIterator<Token>; This includes Vec<Token> and String.

use tengwar::{Quenya, TengwarMode};

let text: String = Quenya::transcribe("namárië !");
assert_eq!(text, " ");

`ToTengwar` trait

With the use of the ToTengwar helper trait (automatically implemented for any type implementing AsRef<str>), three methods are provided on the input type directly. The first is ToTengwar::transcriber, which constructs a Transcriber for the text, allowing iteration over Tokens.

The Transcriber also has TranscriberSettings, holding several public fields, which can be changed to adjust various aspects of its behavior.

use tengwar::{Quenya, ToTengwar};

let mut transcriber = "namárië !".transcriber::<Quenya>();
transcriber.settings.alt_a = true; // Use the alternate form of the A-tehta.

let text: String = transcriber.collect();
assert_eq!(text, " ");

The second method is ToTengwar::to_tengwar. This is mostly a convenience method, which simply calls ToTengwar::transcriber and immediately collects the Iterator into a String.

use tengwar::{Quenya, ToTengwar};

let text: String = "namárië !".to_tengwar::<Quenya>();
assert_eq!(text, " ");

The third method is ToTengwar::to_tengwar_with, which does the same, but takes TranscriberSettings to modify the Transcriber before it is collected. This allows settings to be specified once and reused.

use tengwar::{Quenya, ToTengwar, TranscriberSettings};

let mut settings = TranscriberSettings::new();
settings.alt_a = true;
settings.nuquerna = true;

let text: String = "namárië !".to_tengwar_with::<Quenya>(settings);
assert_eq!(text, " ");

let text: String = "lotsë súva".to_tengwar_with::<Quenya>(settings);
assert_eq!(text, " ");

Crate-level function

Also available, and likely the easiest to discover via code completion, is the top-level transcribe function, which takes an implementor of TengwarMode as a generic parameter. This function accepts any input type that implements ToTengwar, and is a passthrough to the ToTengwar::to_tengwar method.

use tengwar::{Quenya, transcribe};

let text: String = transcribe::<Quenya>("namárië !");
assert_eq!(text, " ");

In Detail

The core of this library is the Token enum. A Token may hold a simple char, a Glyph, or a Numeral. An iterator of Tokens can be collected into a String; This is where the rendering of Tengwar text truly takes place.

The rest of the library is geared around the creation of Tokens, usually by iteration, and modifying them before the final call to collect.

Mode

A “Mode” of the Tengwar is essentially an orthography mapping; It correlates conventions of writing in a primary world alphabet to the conventions of writing in the Tengwar.

For this purpose, the TengwarMode trait is provided. A type implementing this trait is expected to perform essentially as a state machine, taking input in the form of slices of chars, and using them to progressively construct Tokens.

Tokenizer

The first level of iteration is the Tokenizer. This iterator takes UTF-8 text, breaks it down into a Vec of normalized Unicode codepoints, and assembles Tokens according to the rules specified by an implementation of TengwarMode.

Short slices of chars are passed to the Mode type, which determines whether to accept them as part of a Token. If the chars are not accepted, the slice is narrowed and tried again, until the width reaches zero; At this point, the Mode type is shown the full remaining data and asked whether it can get anything at all from it. If it cannot, a char is returned unchanged as a Token.

When the Tokenizer yields a Token, the following one is generated. This allows for one last call to the Mode type, to TengwarMode::finalize, to modify a Token in light of the one that follows it; This is a very important step, as some modes require that different base characters are used depending on what follows them.

TokenIter / Transcriber

The second level of iteration is the TokenIter. This iterator can wrap any other iterator that produces Tokens, and its purpose is to apply contextual rules and transformations specified at runtime. This is what allows the executable transcriber to take CLI options that change rules, such as the treatment of “long” tehta variants.

A TokenIter that wraps a Tokenizer can also be called a Transcriber for simplicity, because it is known that its Tokens are being produced directly from text.

Policy

A “Policy” is similar to a Mode, but rather than defining details about orthography, it instead defines details about typography. This includes details such as valid ligatures and placements of Sa-Rinci.

The Policy trait is provided for this purpose, and is used as a generic parameter for the Glyph type. Because of this, it is also a generic parameter for the Token and TokenIter types; The Tokenizer type is considered to be out of scope of the Policy system, and simply yields all of its Tokens with the default policy (policy::Standard).

Re-exports

pub use characters::Glyph;
pub use characters::Numeral;
pub use characters::VowelStyle;
pub use mode::Beleriand;
pub use mode::Gondor;
pub use mode::Quenya;
pub use mode::TengwarMode;

Modules

characters
This module defines the basic constants and data structures required to work effectively with the Tengwar.
mode
This module defines the interface, and default implementations, for “modes” of the Tengwar: High-level rules for text representations.
policy

Structs

TokenIter
An iterator over a sequence of Tokens which applies various rules. This is the top level construct of the transcription process.
TranscriberSettings
Behavior settings to be used by a TokenIter.

Enums

Token
A small container for either plain text or a Glyph specification. Serves as the top level of throughput for the transliteration process.

Traits

ToTengwar
A very small trait serving to implement ergonomic transcription methods directly onto text objects.

Functions

transcribe
Convert a compatible object (typically text) into the Tengwar.

Type Aliases

Transcriber
An iterator over a sequence of Tokens which applies various rules. This is the top level construct of the transcription process.