Crate hebrew_unicode_script

Source
Expand description

Hebrew_Unicode_Script

Crates.io License Crates.io Version Static Badge no-std docs.rs Build & Test Clippy Analyze

Table of contents

§Project Status

This project is currently in maintenance mode.

Current Version: The latest stable release is version 2.0.0. This version includes all core functionalities for identifying and validating Unicode characters associated with the Hebrew script and its relevant Unicode code blocks.

Stability: The crate has been thoroughly tested and is considered stable for production use. Users can rely on its existing features without concern for significant changes or new functionalities being introduced.

Updates: While the project is not actively seeking new features or major enhancements, critical bug fixes and security updates will be addressed as needed. Users are encouraged to report any issues they encounter.

§Description

This crate (hebrew_unicode_script) is a low-level library written in Rust and designed to facilitate the identification and validation of Unicode characters (Unicode code points) related to the Hebrew script and associated Unicode code blocks.

Both a check on individual characters and membership of collections are possible. Examples of collections are vowels, Yiddish characters, punctations etc..

More information can be found in the file ARCHITECTURE.

This library provides two types of interfaces:

  1. functions

  2. trait (the same functions but behind a trait).

Each function in this library returns a boolean value, making it easy to integrate these controls into existing or new applications.

For an overview of released versions see releases.

§Examples

§Using the function API

Basic usage:

use hebrew_unicode_script::is_hbr_consonant_mem;
use hebrew_unicode_script::is_hbr_consonant_normal;
use hebrew_unicode_script::is_hbr_consonant;
use hebrew_unicode_script::is_script_hbr_consonant;

assert!(is_hbr_consonant_mem('מ'));
assert!(is_hbr_consonant_normal('מ'));
assert!(is_hbr_consonant('מ'));
assert!(is_script_hbr_consonant('מ'));
use hebrew_unicode_script::is_hbr_block;

if is_hbr_block('מ') {
	println!("The character you entered is part of the 'unicode code block Hebrew'");
}
use hebrew_unicode_script::is_hbr_block;

if is_hbr_block('מ') {
	println!("The character you entered is part of the 'unicode code block Hebrew'");
}
use hebrew_unicode_script::{is_hbr_consonant_final, is_hbr_consonant};

let test_str = "ךםןףץ";
for c in test_str.chars() {
    assert!(is_hbr_consonant_final(c));
    assert!(is_hbr_consonant(c));
}

A more complex example:

use hebrew_unicode_script::{is_hbr_accent,is_hbr_mark, is_hbr_point, is_hbr_punctuation};
use hebrew_unicode_script::{is_hbr_consonant_final,is_hbr_yod_triangle,is_hbr_ligature_yiddish};

fn main() {
   // define a strings of characters
   let string_of_chars = "יָ֭דַעְתָּ שִׁבְתִּ֣י abcdefg וְקוּמִ֑י";
   // get a structures that indicates if a type is present or not (bool)
   let chartypes = get_character_types(string_of_chars);
   // print the results
   println!("The following letter types are found in: {}", string_of_chars);
   println!("{:?}",chartypes);
}

#[derive(Debug, Default)]
pub struct HebrewCharacterTypes {
    accent: bool,
    mark: bool,
    point: bool,
    punctuation: bool,
    letter: bool,
    letter_normal: bool,
    letter_final: bool,
    yod_triangle: bool,
    ligature_yiddish: bool,
    whitespace: bool,
    non_hebrew: bool,
}

impl HebrewCharacterTypes {
    fn new() -> Self {
        Default::default()
    }
}

pub fn get_character_types(s: &str) -> HebrewCharacterTypes {
    let mut found_character_types = HebrewCharacterTypes::new();
    for c in s.chars() {
        match c {
            c if is_hbr_accent(c) => found_character_types.accent = true,
            c if is_hbr_mark(c) => found_character_types.mark = true,
            c if is_hbr_point(c) => found_character_types.point = true,
            c if is_hbr_punctuation(c) => found_character_types.punctuation = true,
            c if is_hbr_consonant_final(c) => found_character_types.letter_final = true,
            c if is_hbr_yod_triangle(c) => found_character_types.yod_triangle = true,
            c if is_hbr_ligature_yiddish(c) => found_character_types.ligature_yiddish = true,
            c if c.is_whitespace() => found_character_types.whitespace = true,
            _ => found_character_types.non_hebrew = true,
        }
    }
    found_character_types.letter =
        found_character_types.letter_normal | found_character_types.letter_final;
    found_character_types
}

Output result:

The following character types were found:
HebrewCharacterTypes {
    accent: true,
    mark: false,
    point: true,
    punctuation: false,
    letter: true,
    letter_normal: true,
    letter_final: false,
    yod_triangle: false,
    ligature_yiddish: false,
    whitespace: true,
    non_hebrew: true,
}

§Using the trait API

use hebrew_unicode_script::HebrewUnicodeScript;

assert!( 'מ'.is_script_hbr() );
assert!( !'מ'.is_script_hbr_point() );
assert!( 'ױ'.is_script_hbr_ligature_yiddisch() );
assert!( 'מ'.is_hbr_block() );
assert!( !'מ'.is_hbr_point() );

See the crate modules for more examples.

§Characteristics

This crate (hebrew_unicode_script) uses the #![no_std] attribute.

It does not depend on any standard library, nor a system allocator.

§Install

For installation see the hebrew_unicode_script page at crates.io.

§Safety

All functions are written in safe Rust.

§Panics

Not that I am aware of.

§Errors

All (trait)functions return either true or false.

§Code Coverage

Current code coverage is 100%

Code Coverage

To generate the code coverage, I used grconv (see here how to use it).

§Notes

§Points

  • Hebrew points can be subdivided in:
    • Vowels (code points: U+05B4 .. U+05BB and U+05C7)
    • Semi-Vowels (code points: U+05B0 .. U+05B3)
    • Reading Signs (code points: U+05BC, U+05BD, U+05BF, U+05C1, U+05C2 and U+FB1E)

§Letters

  • Hebrew letters (consonants) can be subdivided in:
    • Normal consonants (code points: U+05D0 .. U+05D9, U+05DB, U+05DC, U+05DE, U+05E0 .. U+05E2, U+05E4, U+05E6 .. U+05EA)
    • Final consonants (code points: U+05DA, U+05DD, U+05DF, U+05E3 and U+05E5)
    • Wide consonants (code points: U+FB21 .. U+FB28)
    • Alternative consonants (code points: U+FB20, U+FB29)

§References

§Unicode Script ‘Hebrew’

§Unicode Block ‘Hebrew’

§Unicode Block ‘Alphabetic Presentation Forms’ (APF)

§Unicode problems for Hebrew

There are some issues with Unicode and Hebrew. These are described on the following web page: Unicode Problems


To learn more about Unicode see: Unicode main site, Unicode Scripts and Unicode Blocks

See Hebrew Cantillation Marks And Their Encoding for more specifics on this matter.

§License

The hebrew_unicode_script library is distributed under either of

at your option.

§Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

§Questions, requests, bugs

I invite you to:

  • Ask questions
  • Make requests for new features or improvements
  • Report bugs or suggest enhancements

Any input is welcome. To do this, you can submit a request here.

Traits§

HebrewUnicodeScript
A trait for identification and validation of Hebrew characters

Functions§

is_apf_alternative
Checks if the given character is an AFP alternative letter.
is_apf_block
Checks if the given character belongs to the unicode block ‘Alphabetic Presentation Form’.
is_apf_consonant
Checks if the given character is an AFP consonant.
is_apf_consonant_alternative_ayin
Checks if the given character is a APF consonant: alternative ayin.
is_apf_consonant_vowel_alef_mapiq
Checks if the given character is a APF consonant with vowel: alef & mapiq.
is_apf_consonant_vowel_alef_patah
Checks if the given character is a APF consonant with vowel: alef & patah.
is_apf_consonant_vowel_alef_qamats
Checks if the given character is a APF consonant with vowel: alef & qamats.
is_apf_consonant_vowel_bet_dagesh
Checks if the given character is a APF consonant with vowel: bet & dagesh.
is_apf_consonant_vowel_bet_rafe
Checks if the given character is a APF consonant with vowel:bet & rafe.
is_apf_consonant_vowel_dalet_dagesh
Checks if the given character is a APF consonant with vowel: dalet & dagesh.
is_apf_consonant_vowel_final_kaf_dagesh
Checks if the given character is a APF consonant with vowel: final-kaf & dagesh.
is_apf_consonant_vowel_final_pe_dagesh
Checks if the given character is a APF consonant with vowel: final-pe & dagesh.
is_apf_consonant_vowel_gimmel_dagesh
Checks if the given character is a APF consonant with vowel: gimel & dagesh.
is_apf_consonant_vowel_he_mapiq
Checks if the given character is a APF consonant with vowel: he & mapiq.
is_apf_consonant_vowel_kaf_dagesh
Checks if the given character is a APF consonant with vowel: kaf & dagesh.
is_apf_consonant_vowel_kaf_rafe
Checks if the given character is a APF consonant with vowel: kaf & rafe.
is_apf_consonant_vowel_lamed_dagesh
Checks if the given character is a APF consonant with vowel: lamed & dagesh.
is_apf_consonant_vowel_mem_dagesh
Checks if the given character is a APF consonant with vowel: mem & dagesh.
is_apf_consonant_vowel_nun_dagesh
Checks if the given character is a APF consonant with vowel: nun & dagesh.
is_apf_consonant_vowel_pe_dagesh
Checks if the given character is a APF consonant with vowel: pe & dagesh.
is_apf_consonant_vowel_pe_rafe
Checks if the given character is a APF consonant with vowel: pe & rafe.
is_apf_consonant_vowel_qof_dagesh
Checks if the given character is a APF consonant with vowel: qof & dagesh.
is_apf_consonant_vowel_resh_dagesh
Checks if the given character is a APF consonant with vowel: resh & dagesh.
is_apf_consonant_vowel_samekh_dagesh
Checks if the given character is a APF consonant with vowel: samekh & dagesh.
is_apf_consonant_vowel_shin_dagesh
Checks if the given character is a APF consonant with vowel: shin & dagesh.
is_apf_consonant_vowel_shin_dagesh_shindot
Checks if the character is a APF consonant with vowel: shin & dagesh & shindot.
is_apf_consonant_vowel_shin_dagesh_sindot
Checks if the character is a APF consonant with vowel: shin & dagesh & sindot.
is_apf_consonant_vowel_shin_shindot
Checks if the character is a APF consonant with vowel: shin & shindot.
is_apf_consonant_vowel_shin_sindot
Checks if the character is a APF consonant with vowel: shin & sindot.
is_apf_consonant_vowel_tav_dagesh
Checks if the given character is a APF consonant with vowel: tav & dagesh.
is_apf_consonant_vowel_tet_dagesh
Checks if the given character is a APF consonant with vowel: tet & dagesh.
is_apf_consonant_vowel_tsadi_dagesh
Checks if the given character is a APF consonant with vowel: tsadi & dagesh.
is_apf_consonant_vowel_vav_dagesh
Checks if the given character is a APF consonant with vowel: vav & dagesh.
is_apf_consonant_vowel_vav_holam
Checks if the given character is a APF consonant with vowel: vav & holam.
is_apf_consonant_vowel_yod_dagesh
Checks if the given character is a APF consonant with vowel: yod & dagesh.
is_apf_consonant_vowel_yod_hiriq
Checks if the character is a APF consonant with vowel: yod & hiriq.
is_apf_consonant_vowel_zayin_dagesh
Checks if the given character is a APF consonant with vowel: zayin & dagesh.
is_apf_consonant_wide
Checks if the given character is an AFP wide consonant.
is_apf_consonant_wide_alef
Checks if the given character is a APF consonant: wide alef.
is_apf_consonant_wide_dalet
Checks if the given character is a APF consonant: wide dalet.
is_apf_consonant_wide_final_mem
Checks if the given character is a APF consonant wide: final-mem.
is_apf_consonant_wide_he
Checks if the given character is a APF consonant: wide he.
is_apf_consonant_wide_kaf
Checks if the given character is a APF consonant: wide kaf.
is_apf_consonant_wide_lamed
Checks if the given character is a APF consonant: wide lamed.
is_apf_consonant_wide_resh
Checks if the given character is a APF consonant: wide resh.
is_apf_consonant_wide_tav
Checks if the given character is a APF consonant: with tav.
is_apf_consonant_with_vowel
Checks if the given character is an APF letter with vowel sign
is_apf_letter_alternative_plus_sign
Checks if the given character is a APF letter: alternative plus sign.
is_apf_ligature
Checks if the given character is an APF ligature.
is_apf_ligature_alef_lamed
Checks if the given character is a APF ligature: alef-lamed.
is_apf_ligature_yiddisch_yod_yod_patah
Checks if the given character is a APF ligature: yiddish yod-yod-patah.
is_apf_point_judeo_spanish_varika
Checks if the given character is a APF point: judeo-spanish_varika.
is_apf_point_reading_sign
Checks if the given character is an AFP point.
is_hbr_accent
Checks if the given character is a HBR accent.
is_hbr_accent_atnah_hafukh
Checks if the given character is a HBR accent atnah-hafukh.
is_hbr_accent_darga
Checks if the given character is a HBR accent darga.
is_hbr_accent_dehi
Checks if the given character is a HBR accent dehi.
is_hbr_accent_etnahta
Checks if the given character is a HBR accent etnahta.
is_hbr_accent_geresh
Checks if the given character is a HBR accent geresh.
is_hbr_accent_geresh_muqdam
Checks if the given character is a HBR accent geresh-muqdam.
is_hbr_accent_gershayim
Checks if the given character is a HBR accent gershayim.
is_hbr_accent_iluy
Checks if the given character is a HBR accent iluy.
is_hbr_accent_mahapakh
Checks if the given character is a HBR accent mahapakh.
is_hbr_accent_merkha
Checks if the given character is a HBR accent merkha.
is_hbr_accent_merkha_kefula
Checks if the given character is a HBR accent merkha-kefula.
is_hbr_accent_munah
Checks if the given character is a HBR accent munah.
is_hbr_accent_ole
Checks if the given character is a HBR accent ole.
is_hbr_accent_pashta
Checks if the given character is a HBR accent pashta.
is_hbr_accent_pazer
Checks if the given character is a HBR accent pazer.
is_hbr_accent_qadma
Checks if the given character is a HBR accent qadma.
is_hbr_accent_qarney_para
Checks if the given character is a HBR accent qarney-para.
is_hbr_accent_revia
Checks if the given character is a HBR accent revia.
is_hbr_accent_segol
Checks if the given character is a HBR accent segol.
is_hbr_accent_shalshelet
Checks if the given character is a HBR accent shalshelet.
is_hbr_accent_telisha_gedola
Checks if the given character is a HBR accent telisha-gedola.
is_hbr_accent_telisha_qetana
Checks if the given character is a HBR accent telisha-qetana.
is_hbr_accent_tevir
Checks if the given character is a HBR accent tevir.
is_hbr_accent_tipeha
Checks if the given character is a HBR accent tipeha.
is_hbr_accent_yerah_ben_yomo
Checks if the given character is a HBR accent yerah-ben-yomo.
is_hbr_accent_yetiv
Checks if the given character is a HBR accent yetiv.
is_hbr_accent_zaqef_gadol
Checks if the given character is a HBR accent zaqef-gadol.
is_hbr_accent_zaqef_qatan
Checks if the given character is a HBR accent zaqef-qatan.
is_hbr_accent_zarqa
Checks if the given character is a HBR accent zarqa.
is_hbr_accent_zinor
Checks if the given character is a HBR accent zinor.
is_hbr_block
Checks if the given character belongs to the unicode block ‘Hebrew’ (HBR)
is_hbr_consonant
Checks if the given character is a HBR consonant (final OR normal)
is_hbr_consonant_alef
Checks if the given character is a HBR consonant alef.
is_hbr_consonant_ayin
Checks if the given character is a HBR consonant ayin.
is_hbr_consonant_bet
Checks if the given character is a HBR consonant bet.
is_hbr_consonant_dalet
Checks if the given character is a HBR consonant dalet.
is_hbr_consonant_final
Checks if the given character is a HBR consonant final.
is_hbr_consonant_final_kaf
Checks if the given character is a HBR consonant final-kaf.
is_hbr_consonant_final_mem
Checks if the given character is a HBR consonant final-mem.
is_hbr_consonant_final_nun
Checks if the given character is a HBR consonant final-nun.
is_hbr_consonant_final_pe
Checks if the given character is a HBR consonant final-pe.
is_hbr_consonant_final_tsadi
Checks if the given character is a HBR consonant final-tsadi.
is_hbr_consonant_gimel
Checks if the given character is a HBR consonant gimel.
is_hbr_consonant_he
Checks if the given character is a HBR consonant he.
is_hbr_consonant_het
Checks if the given character is a HBR consonant het.
is_hbr_consonant_kaf
Checks if the given character is a HBR consonant kaf.
is_hbr_consonant_lamed
Checks if the given character is a HBR consonant lamed.
is_hbr_consonant_mem
Checks if the given character is a HBR consonant mem.
is_hbr_consonant_normal
Checks if the given character is a HBR consonant normal.
is_hbr_consonant_nun
Checks if the given character is a HBR consonant nun.
is_hbr_consonant_pe
Checks if the given character is a HBR consonant pe.
is_hbr_consonant_qof
Checks if the given character is a HBR consonant qof.
is_hbr_consonant_resh
Checks if the given character is a HBR consonant resh.
is_hbr_consonant_samekh
Checks if the given character is a HBR consonant samekh.
is_hbr_consonant_shin
Checks if the given character is a HBR consonant shin.
is_hbr_consonant_tav
Checks if the given character is a HBR consonant tav.
is_hbr_consonant_tet
Checks if the given character is a HBR consonant tet.
is_hbr_consonant_tsadi
Checks if the given character is a HBR consonant tsadi.
is_hbr_consonant_vav
Checks if the given character is a HBR consonant vav.
is_hbr_consonant_yod
Checks if the given character is a HBR consonant yod.
is_hbr_consonant_zayin
Checks if the given character is a HBR consonant zayin.
is_hbr_ligature_yiddisch_double_vav
Checks if the given character is a HBR ligature yiddisch-double-vav.
is_hbr_ligature_yiddisch_double_yod
Checks if the given character is a HBR ligature yiddisch-double-yod.
is_hbr_ligature_yiddisch_vav_yod
Checks if the given character is a HBR ligature yiddisch-vav-yod.
is_hbr_ligature_yiddish
Checks if the given character is a HBR Yiddish ligature.
is_hbr_mark
Checks if the given character is a HBR mark.
is_hbr_mark_lower_dot
Checks if the given character is a HBR mark lower-dot.
is_hbr_mark_masora_circle
Checks if the given character is a HBR mark masora-circle.
is_hbr_mark_upper_dot
Checks if the given character is a HBR mark upper-dot.
is_hbr_point
Checks if the given character is a HBR point.
is_hbr_point_dagesh_or_mapiq
Checks if the given character is a HBR point dagesh_or_mapiq.
is_hbr_point_hataf_patah
Checks if the given character is a HBR point hataf-patah.
is_hbr_point_hataf_qamats
Checks if the given character is a HBR point hataf-qamats.
is_hbr_point_hataf_segol
Checks if the given character is a HBR point hataf-segol
is_hbr_point_hiriq
Checks if the given character is a HBR point hiriq.
is_hbr_point_holam
Checks if the given character is a HBR point holam.
is_hbr_point_holam_haser_for_vav
Checks if the given character is a HBR point holam-haser_for_vav.
is_hbr_point_meteg
Checks if the given character is a HBR point meteg.
is_hbr_point_patah
Checks if the given character is a HBR point patah.
is_hbr_point_qamats
Checks if the given character is a HBR point qamats.
is_hbr_point_qamats_qatan
Checks if the given character is a HBR point qamats-qatan.
is_hbr_point_qubuts
Checks if the given character is a HBR point qubuts.
is_hbr_point_rafe
Checks if the given character is a HBR point rafe.
is_hbr_point_reading_sign
Checks if the given character is a HBR reading sign.
is_hbr_point_segol
Checks if the given character is a HBR point segol.
is_hbr_point_semi_vowel
Checks if the given character is a HBR point semi-vowel.
is_hbr_point_sheva
Checks if the given character is a HBR point sheva.
is_hbr_point_shin_dot
Checks if the given character is a HBR point shin-dot.
is_hbr_point_sin_dot
Checks if the given character is a HBR point sin-dot.
is_hbr_point_tsere
Checks if the given character is a HBR point tsere.
is_hbr_point_vowel
Checks if the given character is a HBR point vowel.
is_hbr_punctuation
Checks if the given character is a HBR punctuation.
is_hbr_punctuation_geresh
Checks if the given character is a HBR punctuation geresh.
is_hbr_punctuation_gershayim
Checks if the given character is a HBR punctuation gershayim.
is_hbr_punctuation_maqaf
Checks if the given character is a HBR punctuation maqaf.
is_hbr_punctuation_nun_hafukha
Checks if the given character is a HBR punctuation nun-hafukha.
is_hbr_punctuation_paseq
Checks if the given character is a HBR punctuation paseq.
is_hbr_punctuation_sof_pasuq
Checks if the given character is a HBR punctuation sof-pasuq.
is_hbr_yod_triangle
Checks if the given character is a HBR yod-triangle.
is_script_hbr
Checks if the given character belongs to the unicode script ‘Hebrew’.
is_script_hbr_consonant
Checks if the given character is a ‘consonant’ type within the unicode script ‘Hebrew’.
is_script_hbr_ligature
Checks if the given character is a ‘ligature’ type within the unicode script ‘Hebrew’.
is_script_hbr_ligature_yiddisch
Checks if the given character is a ‘ligature_yiddisch’ type within the unicode script ‘Hebrew’.
is_script_hbr_point
Checks if the given character is a ‘point’ type within the unicode script ‘Hebrew’.
is_script_hbr_point_reading_sign
Checks if the given character is a ‘point’ type within the unicode script ‘Hebrew’.