phonet 0.8.0

A CLI tool and library to validate phonotactic patterns for constructed languages
Documentation

Phonet

Phonet is a CLI tool and library to validate phonotactic patterns for constructed languages. It is compatible with either romanization and phonetic transcription. Words can be randomly generated (see Argument Syntax).

Syntax Highlighting Extension for VSCode

Formerly named 'Phoner'

Usage

This project can be used as a rust library, or as a binary.

Binary use

Download latest version here

Argument Syntax

$ phonet --help

Usage: phonet.exe [OPTIONS] [TESTS]

Options:
  -t, --tests <TESTS>
      Custom test, separate with comma (Ignores tests in file)

  -f, --file <FILE>
      Name and path of file to run and test

      Eg. `phonet -f ./myfile.phonet`

      [default: phonet]

  -d, --display-level <DISPLAY_LEVEL>
      What types of outputs to display

      Options can be single letter

      Eg. `phonet -d just-fails` or `phonet -df`

      [default: show-all]

      Possible values:
        - show-all:        Show everything (passes, notes, fails)
        - notes-and-fails: Show most (notes, fails), but not passes
        - just-fails:      Show only fails, not passes or notes
        - hide-all:        Show nothing: not passes, notes, or fails

  -m, --minify [<MINIFY>]
      Minify file and save

      Possible values:
        - tests: Include tests

  -g, --generate [<GENERATE>]
      Generate random words

      Default count 1, specify with number

      --gmin <GENERATE_MIN_LEN>
          Set minimum length for generated words

          Use with the `--generate` or `-g` flag

          Note: This increases generation time exponentially

      --gmax <GENERATE_MAX_LEN>
          Set maximum length for generated words

          Use with the `--generate` or `-g` flag

  -n, --no-color
      Display output in default color

      Use for piping standard output to a file

  -h, --help
      Print help information (use `-h` for a summary)

Example

# Runs ./phonet
phonet

# Runs ./phonet, with tests: 'some', 'words' (instead of tests in file)
phonet -t some,words

# Runs ./myfile.phonet
phonet -f myfile.phonet

# Runs ./phonet, only showing fails
phonet -df
# Alternatives:
phonet -d just-fails
phonet -d fails

# Runs ./phonet, and minifies to ./min.phonet without tests
phonet -m

# Runs ./myfile.phonet, without outputting any results, and minifies to ./myfile.min.phonet with tests
phonet -f myfile.phonet -dh -mt

# Runs ./phonet, and generates 1 random word
phonet -g

# Runs ./myfile.phonet, and generates 10 random words
phonet -g10 -g myfile.phonet

# Runs ./phonet, with no color, and writes output to ./phonet.txt
phonet -n > phonet.txt

# Runs ./myfile.phonet, with all test output hidden, and generates 3 random words with length 6-8, writes output to ./phonet.txt (with no color)
phonet -f myfile.phonet -nd h -g 3 --gmin 6 --gmax 8 > ./phonet.txt

Create Alias / Path

Replace <path_to_file> with the directory of the downloaded binary.

Bash

Add alias in .bashrc in user directory

# ~/.bashrc
alias phonet="<path_to_file>/phonet.exe"

Powershell

Add to $env:PATH

$env:Path = "$env:Path;<path_to_file>\phonet.exe"

Library use

Add phonet = "0.7.0" to your Crates.toml file

Short example:

use phonet::Phonet;

fn main() {
  let file = std::fs::read_to_string("phonet").unwrap();

  // Parse file
  Phonet::parse(&file).unwrap()
    // Run tests
    .run(scheme)
    // Display results
    .display(Default::default());
}

Long example:

use phonet::{Phonet, DisplayLevel};

fn main() {
  let file = std::fs::read_to_string("phonet").unwrap();

  // Parse file
  let scheme = Phonet::parse(&file).unwrap();

  // Run tests
  let results = scheme.run(scheme);

  // Display results - This could be manually implemented
  results.display(DisplayLevel::ShowAll, false);

  // Generate random words
  let words = scheme.generate(10, 3..14).unwrap();
  println!("{words:?}");
}

File syntax

A Phonet file is used to define the rules, classes, and tests for the program.

The file should either be called phonet, or end in .phonet

Syntax Highlighting Extension for VSCode

Statements

The syntax is a statements, each separated by a semicolon ; or a linebreak.

Comments will only end with a linebreak.

All whitespace is ignored, except to separate words in tests.

Note! This will replace spaces in Regex as well!

Each statement must begin with an operator:

  • # Hashtag: A whole line comment. A linebreak (not a semicolon) ends the comment
  • $ Dollar: Define a class
  • + Plus or ! Bang: Define a rule
  • @ Commat: Define a reason if a test fails
  • ? Question: Create a test
  • * Star: Create a test note (also with @*)
  • ~ Tilde: Define the mode of the file

Classes

Classes are used as shorthand Regular Expressions, substituted into rules at runtime.

Note: Angle brackets will not parse as class names directly after:

  • An opening round bracket and a question mark: (?
  • An opening round bracket, question mark, and letter 'P': (?P
  • A backslash and letter 'k': \k

This is the syntax used for look-behinds and named groups

Syntax:

  • $ Dollar
  • Name - Must be only characters from [a-zA-Z0-9_]
  • = Equals
  • Value - Regular Expression, may contain other classes in angle brackets <> (as with rules)

The any class, defined with $_ = ..., is used for random word generation.

Example:

# Some consonants
$C = [ptksmn]

# Some vowels
$V = [iueoa]

# Only sibilant consonants
$C_s = [sz]

Rules

Rules are Regular Expressions used to test if a word is valid.

Rules are defined with an intent, either + for positive, or ! for negative.

  • A positive rule must be followed for a word to be valid
  • A negative rule must not be followed for a word to be valid

To use a class, use the class name, surrounded by angle brackets <>.

Syntax:

  • + Plus or ! Bang - Plus for positive rule, Bang for negative rule
  • Pattern - Regular Expression, may contain classes in angle brackets <>

Example (with predefined classes):

# Must be (C)V syllable structure
+ ^ (<C>? <V>)+ $

# Must not have two vowels in a row
! <V>{2}

Tests

Tests are checked against all rules, and the result is displayed in the output.

Tests are ran in the order of definition.

Like rules, tests must have a defined intent, either + for positive, or ! for negative.

  • A positive test will pass if it is valid
  • A negative test will fail if it is valid

Syntax:

  • ? Question mark
  • + Plus or ! Bang - Plus for positive test, Bang for negative test
  • Tests - A word, or multiple words separated by a space

Example (with predefined rules):

# This should match, to pass
?+ taso
# This test should NOT match, to pass
?! tax
# Each word is a test, all should match to pass
?+ taso sato tasa

Reasons

Reasons are used before rules as an explanation if a test fails.

Syntax:

  • @ Commat
  • Optional * Star - Use as a note as well (a noted reason)
  • Text to define reason as (And print, if being used as note)

Example:

@ Syllable structure
+ ^ (<C>? <V>)+ $

# This test will NOT match, however it SHOULD (due to the Plus), so it will FAIL, with the above reason
?+ tasto

# This reason has a Star, so it will be used as a note as well
@* Must not have two vowels in a row
! <V>{2}

?+ taso

Notes

Notes are printed to the terminal output, alongside tests.

They can be used to separate tests into sections, however this is only cosmetic.

Syntax:

  • * Star
  • Text to print to terminal

Example (with predefined rules):

* Should match
?+ taso

* Should not match
?! tatso

Mode

The mode of a Phonet file can be one of these:

  • Romanized: Using <>
  • Broad transcription: Using //
  • Narrow transcription: Using []

This can optionally be specified in a file, although it does not add any functionality.

Syntax:

  • ~ Tilde
  • <.>, /./, or [.] - Mode identifier, with . being any string, or blank

Examples:

# Specify romanized mode (fish icon)
~<>
# Specify broad transcription
~ / this is the mode /

Examples

See the examples folder for Phonet file examples.

Recommended Syntax Patterns

These formatting tips are not required, but recommended to make the file easier to read.

  1. Specify the mode at the very top of the file
  2. Define all classes at the top of the file
    • Also define an any class first, for word generation
  3. Group related rules and tests, using a noted reason
    • Define rules first, then positive tests, then negative tests
  4. Indent rules and tests under notes or reasons
    • Rules should use 1 intent, tests use 2

Example (this is from example.phonet):

~<> ;# Mode (optional) - This file uses romanized letters

# Class definitions
$_ = [ptkmnswjlaeiou] ;# Any / all letters (required for generating words)
$C = [ptkmnswjl]      ;# Consonants
$V = [aeiou]          ;# Vowels

@* Invalid letters    ;# Noted reason - Prints like a note to standard output
  + ^ <_>+ $          ;# Check that every letter is in the 'any' group
    ?+ taso
    ?! tyxo

* Examples of failing tests
    ?+ tyxo           ;# This test will fail - with the reason 'Invalid Letters' (above)
    ?! taso           ;# This test will fail, as a false positive

@* Syllable structure
  + ^ ( <C> <V> )+ $  ;# Check that word is Consonant + Vowel, repeating at least once
    ?+ taso kili
    ?! ano taaso

* Some more tests     ;# Note - Prints to standard output
    ?+ silo tila
    ?! aka axe

@* No repeated letters
  ! (.)\1             ;# This is an unnamed back-reference
  ! (?<x> .) \k<x>    ;# This is a named back-reference (NOT a class)
    ?+ taso
    ?! taaso ttaso

Phonet Icon

TODO

  • Check all .len() calls on strings, check for non-ascii problems (use .chars().count())

  • Add line number traceback to initial class substitution error

  • Add more docs

  • Add tests !

  • Add more info to Error variants

  • Refactor modules (without breaking api?)

  • Remove unnecessary clones where possible

  • Move gen.rs functionality to gen feature