Crate jmdict

Source
Expand description

The JMdict file is a comprehensive multilingual dictionary of the Japanese language. The original JMdict file, included in this repository (and hence, in releases of this crate) comes as XML. Instead of stuffing the XML in the binary directly, this crate parses the XML at compile-time and generates an optimized representation that is compiled into the binary. The crate’s API affords type-safe access to this embedded database.

§WARNING: Licensing on database files

The database files compiled into the crate are licensed from the Electronic Dictionary Research and Development Group under Creative Commons licenses. Applications linking this crate directly oder indirectly must display appropriate copyright notices to users. Please refer to the EDRDG’s license statement for details.

§Basic usage

The database is accessed through the entries() function which provides an iterator over all database entries compiled into the application. While traversing the database and its entries, you will find that, whenever you expect a list of something, you will get an iterator instead. These iterators provide an abstraction between you as the user of the library, and the physical representation of the database as embedded in the binary.

The following example looks up the reading for お母さん in the database:

let kanji_form = "お母さん";

let entry = jmdict::entries().find(|e| {
    e.kanji_elements().any(|k| k.text == kanji_form)
}).unwrap();

let reading_form = entry.reading_elements().next().unwrap().text;
assert_eq!(reading_form, "おかあさん");

§Cargo features

§Common configurations

  • The default feature includes the most common words (about 30000 entries) and only their English translations.
  • The full feature includes everything in the JMdict.

§Entry selection

  • The scope-uncommon feature includes uncommon words and glosses.
  • The scope-archaic feature includes glosses with the “archaic” label. If disabled, the PartOfSpeech enum will not include variants that are only relevant for archaic vocabulary, such as obsolete conjugation patterns. (The AllPartOfSpeech enum always contains all variants.)

§Target languages

At least one target language must be selected. Selecting a target language will include all available translations in that language. Entries that do not have any translation in any of the selected languages will be skipped.

  • translations-eng: English (included in default)
  • translations-dut: Dutch
  • translations-fre: French
  • translations-ger: German
  • translations-hun: Hungarian
  • translations-rus: Russian
  • translations-slv: Slovenian
  • translations-spa: Spanish
  • translations-swe: Swedish

The GlossLanguage enum will only contain variants corresponding to the enabled target languages. For example, in the default configuration, GlossLanguage::English will be the only variant. (The AllGlossLanguage enum always contains all variants.)

§Crippled builds: db-minimal

When the db-minimal feature is enabled, only a severly reduced portion of the JMdict will be parsed (to be exact, only chunks 000, 100 and 999). This is also completely useless for actual usage, but allows for quick edit-compile-test cycles while working on this crate’s code.

§Crippled builds: db-empty

When the db-empty feature is enabled, downloading and parsing of the JMdict contents is disabled entirely. The crate is compiled as usual, but entries() will be an empty list. This is useful for documentation builds like for docs.rs, where --all-features is given.

Structs§

  • An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
  • Error type for all enum conversions of the form impl TryFrom<AllFoo> for Foo.
  • An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
  • An entry in the JMdict dictionary.
  • A particular translation or explanation for a Japanese word or phrase in a different language.
  • An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
  • A representation of a dictionary entry using kanji or other non-kana scripts.
  • An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
  • An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
  • A source word in other language which a particular Sense of an Entry has been borrowed from.
  • An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
  • An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
  • Relative priority of a ReadingElement or KanjiElement.
  • A representation of a dictionary entry using only kana.
  • An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
  • An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
  • The translational equivalent of a Japanese word or phrase.
  • An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
  • An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
  • An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
  • An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.

Enums§

  • The language of a particular Gloss. This enum contains all possible variants, including those that have been disabled by compile-time flags in enum GlossLanguage.
  • Where a word can appear in a sentence for a particular Sense of the word. This enum contains all possible variants, including those that have been disabled by compile-time flags in enum PartOfSpeech.
  • Dialect of Japanese in which a certain vocabulary occurs.
  • The language of a particular Gloss.
  • Type of gloss.
  • Information regarding a certain KanjiElement.
  • Where a word can appear in a sentence for a particular Sense of the word.
  • PriorityInCorpus appears in struct Priority. It describes how often a dictionary entry appears in a certain corpus of text.
  • Information regarding a certain ReadingElement.
  • Information regarding a certain Sense.
  • Field of study where a certain Sense originates.

Traits§

  • Common methods provided by all enums in this crate.

Functions§

  • Returns an iterator over all entries in the database.