Crate hebrew_unicode_utils

Source
Expand description

§Hebrew_Unicode_Utils

Crates.io License Crates.io Version docs.rs Build & Test Clippy Analyze

§Table of contents

§Description

This crate (hebrew_unicode_utils) is a library written in Rust and can be used for editing strings which contains Hebrew characters. It is built on top of the low-level crate hebrew_unicode_script.

Functionality of this crate will only focus on the Unicode Block Hebrew.

The types of functionality of this library can be captured in the following three categories:

  1. Removing characters

    This is about removing a certain set of Hebrew character types from a string.

  2. Showing characters

    This category is all about showing a particular type of Hebrew characters, for example, only vowel characters.

    The idea behind this is that this could help people (who want to learn Hebrew) to distinguish the different characters.

    Note:
    Consonants will always be shown in combination with e.g. vowel characters, otherwise the readability will decrease. For example, if there are multiple vowels in one sentence, then if there are no consonants shown, then all vowels will be displayed on top of each other. Which would make the sentence unreadable.

  3. Statistics

    This category contains functionality that gives the user information about the particular statistics of a text string.

    For example an answer on the following question: “What Hebrew character types are in my text string?”

For an overview of released versions see releases.

§Notes

  • Vowels are sometimes called Hebrew Points
  • Accents are sometimes called Hebrew Cantilationmarks

^ TOC

§Examples

§Removing characters

use hebrew_unicode_utils::remove_hbr_ligature_yiddish;
    
let test_str = "XװױײZ";
let test_str_filtered = remove_hbr_ligature_yiddish(test_str);

assert_eq!(test_str_filtered.as_ref(),"XZ");
use hebrew_unicode_utils::remove_hbr_accent;

let test_str = "בְּרֵאשִׁ֖ית";
let test_str_filtered = remove_hbr_accent(test_str);

assert_eq!(test_str_filtered.as_ref(), "בְּרֵאשִׁית");

§Showing characters

use hebrew_unicode_utils::show_hbr_mark;

let input_str = "Q מִצְרָ֑יְמָה ה֯";
let input_str_showed = show_hbr_mark(input_str);
        
assert_eq!(input_str_showed.as_ref(), "Q מצרימה ה֯");
use hebrew_unicode_utils::show_hbr_point_semi_vowel;

let input_str = "ֲדְ נָפֶשׁ גֱכֳע";
let input_str_showed = show_hbr_point_semi_vowel(input_str);

assert_eq!(input_str_showed.as_ref(), "ֲדְ נפש גֱכֳע");
    

§Statistics

use hebrew_unicode_utils::get_hbr_character_frequency;
    
let input_string = "Xבהב";
let freq_map = get_hbr_character_frequency(input_string);
assert_eq!(freq_map.contains_key("X"), false);
assert_eq!(freq_map.get(&"ב".to_string()), Some(&2));  
assert_eq!(freq_map.get(&"ה".to_string()), Some(&1));
use hebrew_unicode_utils::get_hbr_character_types;
    
let input_string = "Xבהב";
let type_struct = get_hbr_character_types(input_string);
assert_eq!(type_struct.accent, false);
assert_eq!(type_struct.consonant, true);
assert_eq!(type_struct.non_hebrew, true);

§Install

For installation see the hebrew_unicode_utils page at crates.io.

§Safety

All functions are written in safe Rust.

^ TOC

§Panics

No panics for so far I know of.

^ TOC

§Errors

All functions return either a Cow, a Struct or a HashMap.

^ TOC

§License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.

^ TOC

§Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

^ TOC

This crate has been inspired by niqqud

Structs§

HebrewCharacterTypes

Functions§

get_hbr_character_frequency
Get the frequency hebrew characters of the given string
get_hbr_character_types
Get Hebrew character types for a given string
remove_hbr_accent
Removes all Hebrew acccents from the given string.
remove_hbr_block
Removes all characters belonging to the unicode block ‘Hebrew’ from the given string.
remove_hbr_consonant
Removes all Hebrew letters (final and normal) from the given string.
remove_hbr_consonant_final
Removes all Hebrew final letters from the given string.
remove_hbr_consonant_normal
Removes all Hebrew normal letters from the given string.
remove_hbr_ligature_yiddish
Removes all Yiddish ligatures from the given string.
remove_hbr_mark
Removes all Hebrew marks from the given string.
remove_hbr_point
Removes all Hebrew points from the given string.
remove_hbr_point_reading_sign
Removes all Hebrew point reading signs from the given string.
remove_hbr_point_semi_vowel
Removes all Hebrew point semi-vowels from the given string.
remove_hbr_point_vowel
Removes all Hebrew point vowels from the given string.
remove_hbr_punctuation
Removes all Hebrew punctuations from the given string.
remove_hbr_yod_triangle
Removes all Hebrew yod triangles from the given string.
show_hbr_accent
Shows all Hebrew acccents that are found in the given string.
show_hbr_consonant
Shows all Hebrew letters (final and normal) that are found in the given string.
show_hbr_consonant_final
Shows all Hebrew final letters that are found in the given string.
show_hbr_consonant_normal
Shows all Hebrew normal letters that are found in the given string.
show_hbr_ligature_yiddish
Shows all Yiddish ligatures that are found in the given string.
show_hbr_mark
Shows all Hebrew marks that are found in the given string.
show_hbr_point
Shows all Hebrew points that are found in the given string.
show_hbr_point_reading_sign
Shows all Hebrew point reading signs that are found in the given string.
show_hbr_point_semi_vowel
Shows all Hebrew point semi-vowels that are found in the given string.
show_hbr_point_vowel
Shows all Hebrew point vowels that are found in the given string.
show_hbr_punctuation
Shows all Hebrew punctuations that are found in the given string.
show_hbr_yod_triangle
Shows all Hebrew yod triangles that are found in the given string.