Expand description
Hebrew_Unicode_Script
- Project Status
- Description
- Examples
- Characteristics
- Install
- Safety
- Panics
- Errors
- Code Coverage
- Notes
- References
- License
- Questions, requests, bugs
§Project Status
This project is currently in maintenance mode.
Current Version:
The latest stable release is version 2.0.0. This version includes all core functionalities for identifying and validating Unicode characters associated with the Hebrew script and its relevant Unicode code blocks.
Stability:
The crate has been thoroughly tested and is considered stable for production use. Users can rely on its existing features without concern for significant changes or new functionalities being introduced.
Updates:
While the project is not actively seeking new features or major enhancements, critical bug fixes and security updates will be addressed as needed. Users are encouraged to report any issues they encounter.
§Description
This crate (hebrew_unicode_script
) is a low-level library written in Rust and designed to facilitate the identification and validation of Unicode characters (Unicode code points) related to the Hebrew script and associated Unicode code blocks.
Both a check on individual characters and membership of collections are possible. Examples of collections are vowels, Yiddish characters, punctations etc..
More information can be found in the file ARCHITECTURE.
This library provides two types of interfaces:
-
functions
-
trait (the same functions but behind a trait).
Each function in this library returns a boolean value, making it easy to integrate these controls into existing or new applications.
For an overview of released versions see releases.
§Examples
§Using the function API
Basic usage:
use hebrew_unicode_script::is_hbr_consonant_mem;
use hebrew_unicode_script::is_hbr_consonant_normal;
use hebrew_unicode_script::is_hbr_consonant;
use hebrew_unicode_script::is_script_hbr_consonant;
assert!(is_hbr_consonant_mem('מ'));
assert!(is_hbr_consonant_normal('מ'));
assert!(is_hbr_consonant('מ'));
assert!(is_script_hbr_consonant('מ'));
use hebrew_unicode_script::is_hbr_block;
if is_hbr_block('מ') {
println!("The character you entered is part of the 'unicode code block Hebrew'");
}
use hebrew_unicode_script::is_hbr_block;
if is_hbr_block('מ') {
println!("The character you entered is part of the 'unicode code block Hebrew'");
}
use hebrew_unicode_script::{is_hbr_consonant_final, is_hbr_consonant};
let test_str = "ךםןףץ";
for c in test_str.chars() {
assert!(is_hbr_consonant_final(c));
assert!(is_hbr_consonant(c));
}
A more complex example:
use hebrew_unicode_script::{is_hbr_accent,is_hbr_mark, is_hbr_point, is_hbr_punctuation};
use hebrew_unicode_script::{is_hbr_consonant_final,is_hbr_yod_triangle,is_hbr_ligature_yiddish};
fn main() {
// define a strings of characters
let string_of_chars = "יָ֭דַעְתָּ שִׁבְתִּ֣י abcdefg וְקוּמִ֑י";
// get a structures that indicates if a type is present or not (bool)
let chartypes = get_character_types(string_of_chars);
// print the results
println!("The following letter types are found in: {}", string_of_chars);
println!("{:?}",chartypes);
}
#[derive(Debug, Default)]
pub struct HebrewCharacterTypes {
accent: bool,
mark: bool,
point: bool,
punctuation: bool,
letter: bool,
letter_normal: bool,
letter_final: bool,
yod_triangle: bool,
ligature_yiddish: bool,
whitespace: bool,
non_hebrew: bool,
}
impl HebrewCharacterTypes {
fn new() -> Self {
Default::default()
}
}
pub fn get_character_types(s: &str) -> HebrewCharacterTypes {
let mut found_character_types = HebrewCharacterTypes::new();
for c in s.chars() {
match c {
c if is_hbr_accent(c) => found_character_types.accent = true,
c if is_hbr_mark(c) => found_character_types.mark = true,
c if is_hbr_point(c) => found_character_types.point = true,
c if is_hbr_punctuation(c) => found_character_types.punctuation = true,
c if is_hbr_consonant_final(c) => found_character_types.letter_final = true,
c if is_hbr_yod_triangle(c) => found_character_types.yod_triangle = true,
c if is_hbr_ligature_yiddish(c) => found_character_types.ligature_yiddish = true,
c if c.is_whitespace() => found_character_types.whitespace = true,
_ => found_character_types.non_hebrew = true,
}
}
found_character_types.letter =
found_character_types.letter_normal | found_character_types.letter_final;
found_character_types
}
Output result:
The following character types were found:
HebrewCharacterTypes {
accent: true,
mark: false,
point: true,
punctuation: false,
letter: true,
letter_normal: true,
letter_final: false,
yod_triangle: false,
ligature_yiddish: false,
whitespace: true,
non_hebrew: true,
}
§Using the trait API
use hebrew_unicode_script::HebrewUnicodeScript;
assert!( 'מ'.is_script_hbr() );
assert!( !'מ'.is_script_hbr_point() );
assert!( 'ױ'.is_script_hbr_ligature_yiddisch() );
assert!( 'מ'.is_hbr_block() );
assert!( !'מ'.is_hbr_point() );
See the crate modules for more examples.
§Characteristics
This crate (hebrew_unicode_script
) uses the #![no_std]
attribute.
It does not depend on any standard library, nor a system allocator.
§Install
For installation see the hebrew_unicode_script page at crates.io.
§Safety
All functions are written in safe Rust.
§Panics
Not that I am aware of.
§Errors
All (trait)functions return either true or false.
§Code Coverage
Current code coverage is 100%
To generate the code coverage, I used grconv (see here how to use it).
§Notes
§Points
- Hebrew points can be subdivided in:
- Vowels (code points: U+05B4 .. U+05BB and U+05C7)
- Semi-Vowels (code points: U+05B0 .. U+05B3)
- Reading Signs (code points: U+05BC, U+05BD, U+05BF, U+05C1, U+05C2 and U+FB1E)
§Letters
- Hebrew letters (consonants) can be subdivided in:
- Normal consonants (code points: U+05D0 .. U+05D9, U+05DB, U+05DC, U+05DE, U+05E0 .. U+05E2, U+05E4, U+05E6 .. U+05EA)
- Final consonants (code points: U+05DA, U+05DD, U+05DF, U+05E3 and U+05E5)
- Wide consonants (code points: U+FB21 .. U+FB28)
- Alternative consonants (code points: U+FB20, U+FB29)
§References
§Unicode Script ‘Hebrew’
§Unicode Block ‘Hebrew’
-
See https://www.unicode.org/charts/PDF/U0590.pdf
- Note: only the following code-point range is applicable: U+0590 .. U+05FF
-
See also: https://graphemica.com/blocks/hebrew/
§Unicode Block ‘Alphabetic Presentation Forms’ (APF)
- See https://www.unicode.org/charts/PDF/UFB00.pdf
- Note: only the following code-point range is applicable: U+FB1D .. U+FB4F
- See also: https://graphemica.com/blocks/alphabetic-presentation-forms
§Unicode problems for Hebrew
There are some issues with Unicode and Hebrew. These are described on the following web page: Unicode Problems
To learn more about Unicode see: Unicode main site, Unicode Scripts and Unicode Blocks
See Hebrew Cantillation Marks And Their Encoding for more specifics on this matter.
§License
The hebrew_unicode_script
library is distributed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
§Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
§Questions, requests, bugs
I invite you to:
- Ask questions
- Make requests for new features or improvements
- Report bugs or suggest enhancements
Any input is welcome. To do this, you can submit a request here.
Traits§
- Hebrew
Unicode Script - A trait for identification and validation of Hebrew characters
Functions§
- is_
apf_ alternative - Checks if the given character is an AFP alternative letter.
- is_
apf_ block - Checks if the given character belongs to the unicode block ‘Alphabetic Presentation Form’.
- is_
apf_ consonant - Checks if the given character is an AFP consonant.
- is_
apf_ consonant_ alternative_ ayin - Checks if the given character is a APF consonant: alternative ayin.
- is_
apf_ consonant_ vowel_ alef_ mapiq - Checks if the given character is a APF consonant with vowel: alef & mapiq.
- is_
apf_ consonant_ vowel_ alef_ patah - Checks if the given character is a APF consonant with vowel: alef & patah.
- is_
apf_ consonant_ vowel_ alef_ qamats - Checks if the given character is a APF consonant with vowel: alef & qamats.
- is_
apf_ consonant_ vowel_ bet_ dagesh - Checks if the given character is a APF consonant with vowel: bet & dagesh.
- is_
apf_ consonant_ vowel_ bet_ rafe - Checks if the given character is a APF consonant with vowel:bet & rafe.
- is_
apf_ consonant_ vowel_ dalet_ dagesh - Checks if the given character is a APF consonant with vowel: dalet & dagesh.
- is_
apf_ consonant_ vowel_ final_ kaf_ dagesh - Checks if the given character is a APF consonant with vowel: final-kaf & dagesh.
- is_
apf_ consonant_ vowel_ final_ pe_ dagesh - Checks if the given character is a APF consonant with vowel: final-pe & dagesh.
- is_
apf_ consonant_ vowel_ gimmel_ dagesh - Checks if the given character is a APF consonant with vowel: gimel & dagesh.
- is_
apf_ consonant_ vowel_ he_ mapiq - Checks if the given character is a APF consonant with vowel: he & mapiq.
- is_
apf_ consonant_ vowel_ kaf_ dagesh - Checks if the given character is a APF consonant with vowel: kaf & dagesh.
- is_
apf_ consonant_ vowel_ kaf_ rafe - Checks if the given character is a APF consonant with vowel: kaf & rafe.
- is_
apf_ consonant_ vowel_ lamed_ dagesh - Checks if the given character is a APF consonant with vowel: lamed & dagesh.
- is_
apf_ consonant_ vowel_ mem_ dagesh - Checks if the given character is a APF consonant with vowel: mem & dagesh.
- is_
apf_ consonant_ vowel_ nun_ dagesh - Checks if the given character is a APF consonant with vowel: nun & dagesh.
- is_
apf_ consonant_ vowel_ pe_ dagesh - Checks if the given character is a APF consonant with vowel: pe & dagesh.
- is_
apf_ consonant_ vowel_ pe_ rafe - Checks if the given character is a APF consonant with vowel: pe & rafe.
- is_
apf_ consonant_ vowel_ qof_ dagesh - Checks if the given character is a APF consonant with vowel: qof & dagesh.
- is_
apf_ consonant_ vowel_ resh_ dagesh - Checks if the given character is a APF consonant with vowel: resh & dagesh.
- is_
apf_ consonant_ vowel_ samekh_ dagesh - Checks if the given character is a APF consonant with vowel: samekh & dagesh.
- is_
apf_ consonant_ vowel_ shin_ dagesh - Checks if the given character is a APF consonant with vowel: shin & dagesh.
- is_
apf_ consonant_ vowel_ shin_ dagesh_ shindot - Checks if the character is a APF consonant with vowel: shin & dagesh & shindot.
- is_
apf_ consonant_ vowel_ shin_ dagesh_ sindot - Checks if the character is a APF consonant with vowel: shin & dagesh & sindot.
- is_
apf_ consonant_ vowel_ shin_ shindot - Checks if the character is a APF consonant with vowel: shin & shindot.
- is_
apf_ consonant_ vowel_ shin_ sindot - Checks if the character is a APF consonant with vowel: shin & sindot.
- is_
apf_ consonant_ vowel_ tav_ dagesh - Checks if the given character is a APF consonant with vowel: tav & dagesh.
- is_
apf_ consonant_ vowel_ tet_ dagesh - Checks if the given character is a APF consonant with vowel: tet & dagesh.
- is_
apf_ consonant_ vowel_ tsadi_ dagesh - Checks if the given character is a APF consonant with vowel: tsadi & dagesh.
- is_
apf_ consonant_ vowel_ vav_ dagesh - Checks if the given character is a APF consonant with vowel: vav & dagesh.
- is_
apf_ consonant_ vowel_ vav_ holam - Checks if the given character is a APF consonant with vowel: vav & holam.
- is_
apf_ consonant_ vowel_ yod_ dagesh - Checks if the given character is a APF consonant with vowel: yod & dagesh.
- is_
apf_ consonant_ vowel_ yod_ hiriq - Checks if the character is a APF consonant with vowel: yod & hiriq.
- is_
apf_ consonant_ vowel_ zayin_ dagesh - Checks if the given character is a APF consonant with vowel: zayin & dagesh.
- is_
apf_ consonant_ wide - Checks if the given character is an AFP wide consonant.
- is_
apf_ consonant_ wide_ alef - Checks if the given character is a APF consonant: wide alef.
- is_
apf_ consonant_ wide_ dalet - Checks if the given character is a APF consonant: wide dalet.
- is_
apf_ consonant_ wide_ final_ mem - Checks if the given character is a APF consonant wide: final-mem.
- is_
apf_ consonant_ wide_ he - Checks if the given character is a APF consonant: wide he.
- is_
apf_ consonant_ wide_ kaf - Checks if the given character is a APF consonant: wide kaf.
- is_
apf_ consonant_ wide_ lamed - Checks if the given character is a APF consonant: wide lamed.
- is_
apf_ consonant_ wide_ resh - Checks if the given character is a APF consonant: wide resh.
- is_
apf_ consonant_ wide_ tav - Checks if the given character is a APF consonant: with tav.
- is_
apf_ consonant_ with_ vowel - Checks if the given character is an APF letter with vowel sign
- is_
apf_ letter_ alternative_ plus_ sign - Checks if the given character is a APF letter: alternative plus sign.
- is_
apf_ ligature - Checks if the given character is an APF ligature.
- is_
apf_ ligature_ alef_ lamed - Checks if the given character is a APF ligature: alef-lamed.
- is_
apf_ ligature_ yiddisch_ yod_ yod_ patah - Checks if the given character is a APF ligature: yiddish yod-yod-patah.
- is_
apf_ point_ judeo_ spanish_ varika - Checks if the given character is a APF point: judeo-spanish_varika.
- is_
apf_ point_ reading_ sign - Checks if the given character is an AFP point.
- is_
hbr_ accent - Checks if the given character is a HBR accent.
- is_
hbr_ accent_ atnah_ hafukh - Checks if the given character is a HBR accent atnah-hafukh.
- is_
hbr_ accent_ darga - Checks if the given character is a HBR accent darga.
- is_
hbr_ accent_ dehi - Checks if the given character is a HBR accent dehi.
- is_
hbr_ accent_ etnahta - Checks if the given character is a HBR accent etnahta.
- is_
hbr_ accent_ geresh - Checks if the given character is a HBR accent geresh.
- is_
hbr_ accent_ geresh_ muqdam - Checks if the given character is a HBR accent geresh-muqdam.
- is_
hbr_ accent_ gershayim - Checks if the given character is a HBR accent gershayim.
- is_
hbr_ accent_ iluy - Checks if the given character is a HBR accent iluy.
- is_
hbr_ accent_ mahapakh - Checks if the given character is a HBR accent mahapakh.
- is_
hbr_ accent_ merkha - Checks if the given character is a HBR accent merkha.
- is_
hbr_ accent_ merkha_ kefula - Checks if the given character is a HBR accent merkha-kefula.
- is_
hbr_ accent_ munah - Checks if the given character is a HBR accent munah.
- is_
hbr_ accent_ ole - Checks if the given character is a HBR accent ole.
- is_
hbr_ accent_ pashta - Checks if the given character is a HBR accent pashta.
- is_
hbr_ accent_ pazer - Checks if the given character is a HBR accent pazer.
- is_
hbr_ accent_ qadma - Checks if the given character is a HBR accent qadma.
- is_
hbr_ accent_ qarney_ para - Checks if the given character is a HBR accent qarney-para.
- is_
hbr_ accent_ revia - Checks if the given character is a HBR accent revia.
- is_
hbr_ accent_ segol - Checks if the given character is a HBR accent segol.
- is_
hbr_ accent_ shalshelet - Checks if the given character is a HBR accent shalshelet.
- is_
hbr_ accent_ telisha_ gedola - Checks if the given character is a HBR accent telisha-gedola.
- is_
hbr_ accent_ telisha_ qetana - Checks if the given character is a HBR accent telisha-qetana.
- is_
hbr_ accent_ tevir - Checks if the given character is a HBR accent tevir.
- is_
hbr_ accent_ tipeha - Checks if the given character is a HBR accent tipeha.
- is_
hbr_ accent_ yerah_ ben_ yomo - Checks if the given character is a HBR accent yerah-ben-yomo.
- is_
hbr_ accent_ yetiv - Checks if the given character is a HBR accent yetiv.
- is_
hbr_ accent_ zaqef_ gadol - Checks if the given character is a HBR accent zaqef-gadol.
- is_
hbr_ accent_ zaqef_ qatan - Checks if the given character is a HBR accent zaqef-qatan.
- is_
hbr_ accent_ zarqa - Checks if the given character is a HBR accent zarqa.
- is_
hbr_ accent_ zinor - Checks if the given character is a HBR accent zinor.
- is_
hbr_ block - Checks if the given character belongs to the unicode block ‘Hebrew’ (HBR)
- is_
hbr_ consonant - Checks if the given character is a HBR consonant (final OR normal)
- is_
hbr_ consonant_ alef - Checks if the given character is a HBR consonant alef.
- is_
hbr_ consonant_ ayin - Checks if the given character is a HBR consonant ayin.
- is_
hbr_ consonant_ bet - Checks if the given character is a HBR consonant bet.
- is_
hbr_ consonant_ dalet - Checks if the given character is a HBR consonant dalet.
- is_
hbr_ consonant_ final - Checks if the given character is a HBR consonant final.
- is_
hbr_ consonant_ final_ kaf - Checks if the given character is a HBR consonant final-kaf.
- is_
hbr_ consonant_ final_ mem - Checks if the given character is a HBR consonant final-mem.
- is_
hbr_ consonant_ final_ nun - Checks if the given character is a HBR consonant final-nun.
- is_
hbr_ consonant_ final_ pe - Checks if the given character is a HBR consonant final-pe.
- is_
hbr_ consonant_ final_ tsadi - Checks if the given character is a HBR consonant final-tsadi.
- is_
hbr_ consonant_ gimel - Checks if the given character is a HBR consonant gimel.
- is_
hbr_ consonant_ he - Checks if the given character is a HBR consonant he.
- is_
hbr_ consonant_ het - Checks if the given character is a HBR consonant het.
- is_
hbr_ consonant_ kaf - Checks if the given character is a HBR consonant kaf.
- is_
hbr_ consonant_ lamed - Checks if the given character is a HBR consonant lamed.
- is_
hbr_ consonant_ mem - Checks if the given character is a HBR consonant mem.
- is_
hbr_ consonant_ normal - Checks if the given character is a HBR consonant normal.
- is_
hbr_ consonant_ nun - Checks if the given character is a HBR consonant nun.
- is_
hbr_ consonant_ pe - Checks if the given character is a HBR consonant pe.
- is_
hbr_ consonant_ qof - Checks if the given character is a HBR consonant qof.
- is_
hbr_ consonant_ resh - Checks if the given character is a HBR consonant resh.
- is_
hbr_ consonant_ samekh - Checks if the given character is a HBR consonant samekh.
- is_
hbr_ consonant_ shin - Checks if the given character is a HBR consonant shin.
- is_
hbr_ consonant_ tav - Checks if the given character is a HBR consonant tav.
- is_
hbr_ consonant_ tet - Checks if the given character is a HBR consonant tet.
- is_
hbr_ consonant_ tsadi - Checks if the given character is a HBR consonant tsadi.
- is_
hbr_ consonant_ vav - Checks if the given character is a HBR consonant vav.
- is_
hbr_ consonant_ yod - Checks if the given character is a HBR consonant yod.
- is_
hbr_ consonant_ zayin - Checks if the given character is a HBR consonant zayin.
- is_
hbr_ ligature_ yiddisch_ double_ vav - Checks if the given character is a HBR ligature yiddisch-double-vav.
- is_
hbr_ ligature_ yiddisch_ double_ yod - Checks if the given character is a HBR ligature yiddisch-double-yod.
- is_
hbr_ ligature_ yiddisch_ vav_ yod - Checks if the given character is a HBR ligature yiddisch-vav-yod.
- is_
hbr_ ligature_ yiddish - Checks if the given character is a HBR Yiddish ligature.
- is_
hbr_ mark - Checks if the given character is a HBR mark.
- is_
hbr_ mark_ lower_ dot - Checks if the given character is a HBR mark lower-dot.
- is_
hbr_ mark_ masora_ circle - Checks if the given character is a HBR mark masora-circle.
- is_
hbr_ mark_ upper_ dot - Checks if the given character is a HBR mark upper-dot.
- is_
hbr_ point - Checks if the given character is a HBR point.
- is_
hbr_ point_ dagesh_ or_ mapiq - Checks if the given character is a HBR point dagesh_or_mapiq.
- is_
hbr_ point_ hataf_ patah - Checks if the given character is a HBR point hataf-patah.
- is_
hbr_ point_ hataf_ qamats - Checks if the given character is a HBR point hataf-qamats.
- is_
hbr_ point_ hataf_ segol - Checks if the given character is a HBR point hataf-segol
- is_
hbr_ point_ hiriq - Checks if the given character is a HBR point hiriq.
- is_
hbr_ point_ holam - Checks if the given character is a HBR point holam.
- is_
hbr_ point_ holam_ haser_ for_ vav - Checks if the given character is a HBR point holam-haser_for_vav.
- is_
hbr_ point_ meteg - Checks if the given character is a HBR point meteg.
- is_
hbr_ point_ patah - Checks if the given character is a HBR point patah.
- is_
hbr_ point_ qamats - Checks if the given character is a HBR point qamats.
- is_
hbr_ point_ qamats_ qatan - Checks if the given character is a HBR point qamats-qatan.
- is_
hbr_ point_ qubuts - Checks if the given character is a HBR point qubuts.
- is_
hbr_ point_ rafe - Checks if the given character is a HBR point rafe.
- is_
hbr_ point_ reading_ sign - Checks if the given character is a HBR reading sign.
- is_
hbr_ point_ segol - Checks if the given character is a HBR point segol.
- is_
hbr_ point_ semi_ vowel - Checks if the given character is a HBR point semi-vowel.
- is_
hbr_ point_ sheva - Checks if the given character is a HBR point sheva.
- is_
hbr_ point_ shin_ dot - Checks if the given character is a HBR point shin-dot.
- is_
hbr_ point_ sin_ dot - Checks if the given character is a HBR point sin-dot.
- is_
hbr_ point_ tsere - Checks if the given character is a HBR point tsere.
- is_
hbr_ point_ vowel - Checks if the given character is a HBR point vowel.
- is_
hbr_ punctuation - Checks if the given character is a HBR punctuation.
- is_
hbr_ punctuation_ geresh - Checks if the given character is a HBR punctuation geresh.
- is_
hbr_ punctuation_ gershayim - Checks if the given character is a HBR punctuation gershayim.
- is_
hbr_ punctuation_ maqaf - Checks if the given character is a HBR punctuation maqaf.
- is_
hbr_ punctuation_ nun_ hafukha - Checks if the given character is a HBR punctuation nun-hafukha.
- is_
hbr_ punctuation_ paseq - Checks if the given character is a HBR punctuation paseq.
- is_
hbr_ punctuation_ sof_ pasuq - Checks if the given character is a HBR punctuation sof-pasuq.
- is_
hbr_ yod_ triangle - Checks if the given character is a HBR yod-triangle.
- is_
script_ hbr - Checks if the given character belongs to the unicode script ‘Hebrew’.
- is_
script_ hbr_ consonant - Checks if the given character is a ‘consonant’ type within the unicode script ‘Hebrew’.
- is_
script_ hbr_ ligature - Checks if the given character is a ‘ligature’ type within the unicode script ‘Hebrew’.
- is_
script_ hbr_ ligature_ yiddisch - Checks if the given character is a ‘ligature_yiddisch’ type within the unicode script ‘Hebrew’.
- is_
script_ hbr_ point - Checks if the given character is a ‘point’ type within the unicode script ‘Hebrew’.
- is_
script_ hbr_ point_ reading_ sign - Checks if the given character is a ‘point’ type within the unicode script ‘Hebrew’.