Crate jmdict

Source
Expand description

The JMdict file is a comprehensive multilingual dictionary of the Japanese language. The original JMdict file, included in this repository (and hence, in releases of this crate) comes as XML. Instead of stuffing the XML in the binary directly, this crate parses the XML at compile-time and generates an optimized representation that is compiled into the binary. The crate’s API affords type-safe access to this embedded database.

§WARNING: Licensing on database files

The database files compiled into the crate are licensed from the Electronic Dictionary Research and Development Group under Creative Commons licenses. Applications linking this crate directly oder indirectly must display appropriate copyright notices to users. Please refer to the EDRDG’s license statement for details.

§Basic usage

The database is accessed through the entries() function which provides an iterator over all database entries compiled into the application. While traversing the database and its entries, you will find that, whenever you expect a list of something, you will get an iterator instead. These iterators provide an abstraction between you as the user of the library, and the physical representation of the database as embedded in the binary.

The following example looks up the reading for お母さん in the database:

let kanji_form = "お母さん";

let entry = jmdict::entries().find(|e| {
    e.kanji_elements().any(|k| k.text == kanji_form)
}).unwrap();

let reading_form = entry.reading_elements().next().unwrap().text;
assert_eq!(reading_form, "おかあさん");

§Cargo features

§Common configurations

  • The default feature includes the most common words (about 30000 entries) and only their English translations.
  • The full feature includes everything in the JMdict.

§Entry selection

  • The scope-uncommon feature includes uncommon words and glosses.
  • The scope-archaic feature includes glosses with the “archaic” label. If disabled, the PartOfSpeech enum will not include variants that are only relevant for archaic vocabulary, such as obsolete conjugation patterns. (The AllPartOfSpeech enum always contains all variants.)

§Target languages

At least one target language must be selected. Selecting a target language will include all available translations in that language. Entries that do not have any translation in any of the selected languages will be skipped.

  • translations-eng: English (included in default)
  • translations-dut: Dutch
  • translations-fre: French
  • translations-ger: German
  • translations-hun: Hungarian
  • translations-rus: Russian
  • translations-slv: Slovenian
  • translations-spa: Spanish
  • translations-swe: Swedish

The GlossLanguage enum will only contain variants corresponding to the enabled target languages. For example, in the default configuration, GlossLanguage::English will be the only variant. (The AllGlossLanguage enum always contains all variants.)

§Crippled builds: db-minimal

When the db-minimal feature is enabled, only a severly reduced portion of the JMdict will be parsed (to be exact, only chunks 000, 100 and 999). This is also completely useless for actual usage, but allows for quick edit-compile-test cycles while working on this crate’s code.

§Crippled builds: db-empty

When the db-empty feature is enabled, downloading and parsing of the JMdict contents is disabled entirely. The crate is compiled as usual, but entries() will be an empty list. This is useful for documentation builds like for docs.rs, where --all-features is given.

Structs§

Dialects
An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
DisabledVariant
Error type for all enum conversions of the form impl TryFrom<AllFoo> for Foo.
Entries
An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
Entry
An entry in the JMdict dictionary.
Gloss
A particular translation or explanation for a Japanese word or phrase in a different language.
Glosses
An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
KanjiElement
A representation of a dictionary entry using kanji or other non-kana scripts.
KanjiElements
An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
KanjiInfos
An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
LoanwordSource
A source word in other language which a particular Sense of an Entry has been borrowed from.
LoanwordSources
An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
PartsOfSpeech
An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
Priority
Relative priority of a ReadingElement or KanjiElement.
ReadingElement
A representation of a dictionary entry using only kana.
ReadingElements
An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
ReadingInfos
An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
Sense
The translational equivalent of a Japanese word or phrase.
SenseInfos
An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
SenseTopics
An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
Senses
An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
Strings
An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.

Enums§

AllGlossLanguage
The language of a particular Gloss. This enum contains all possible variants, including those that have been disabled by compile-time flags in enum GlossLanguage.
AllPartOfSpeech
Where a word can appear in a sentence for a particular Sense of the word. This enum contains all possible variants, including those that have been disabled by compile-time flags in enum PartOfSpeech.
Dialect
Dialect of Japanese in which a certain vocabulary occurs.
GlossLanguage
The language of a particular Gloss.
GlossType
Type of gloss.
KanjiInfo
Information regarding a certain KanjiElement.
PartOfSpeech
Where a word can appear in a sentence for a particular Sense of the word.
PriorityInCorpus
PriorityInCorpus appears in struct Priority. It describes how often a dictionary entry appears in a certain corpus of text.
ReadingInfo
Information regarding a certain ReadingElement.
SenseInfo
Information regarding a certain Sense.
SenseTopic
Field of study where a certain Sense originates.

Traits§

Enum
Common methods provided by all enums in this crate.

Functions§

entries
Returns an iterator over all entries in the database.