Crate icu_datagen
source ·Expand description
icu_datagen
is a library to generate data files that can be used in ICU4X data providers.
Data files can be generated either programmatically (i.e. in build.rs
), or through a
command-line utility.
Also see our datagen tutorial.
Examples
Rust API
use icu_datagen::blob_exporter::*;
use icu_datagen::prelude::*;
use std::fs::File;
DatagenDriver::new()
.with_keys([icu::list::provider::AndListV1Marker::KEY])
.with_all_locales()
.export(
&DatagenProvider::new_latest_tested(),
BlobExporter::new_v2_with_sink(Box::new(
File::create("data.postcard").unwrap(),
)),
)
.unwrap();
Command line
The command line interface can be installed through Cargo.
$ cargo install icu_datagen
Once the tool is installed, you can invoke it like this:
$ icu4x-datagen --keys all --locales de en-AU --format blob --out data.postcard
For complex invocations, the CLI also supports configuration files:
$ icu4x-datagen config.json
config.json
{
"keys": {
"explicit": [
"core/helloworld@1",
"fallback/likelysubtags@1",
"fallback/parents@1",
"fallback/supplement/co@1"
]
},
"fallback": "runtimeManual",
"locales": "all",
"segmenterModels": ["burmesedict"],
"additionalCollations": ["big5han"],
"cldr": "latest",
"icuExport": "73.1",
"segmenterLstm": "none",
"export": {
"blob": {
"path": "blob.postcard"
}
},
"overwrite": true
}
More details can be found by running --help
.
Cargo features
This crate has a lot of dependencies, some of which are not required for all operating modes. These default Cargo features can be disabled to reduce dependencies:
baked_exporter
- enables the
baked_exporter
module - enables the
--format mod
CLI argument
- enables the
blob_exporter
- enables the
blob_exporter
module, a reexport oficu_provider_blob::export
- enables the
--format blob
CLI argument
- enables the
fs_exporter
- enables the
fs_exporter
module, a reexport oficu_provider_fs::export
- enables the
--format dir
CLI argument
- enables the
networking
- enables methods on
DatagenProvider
that fetch source data from the network - enables the
--cldr-tag
,--icu-export-tag
, and--segmenter-lstm-tag
CLI arguments that download data
- enables methods on
rayon
- enables parallelism during export
use_wasm
/use_icu4c
- see the documentation on
icu_codepointtrie_builder
- see the documentation on
bin
- required by the CLI and enabled by default to make
cargo install
work
- required by the CLI and enabled by default to make
legacy_api
- enables the deprecated pre-1.3 API
- enabled by default for semver stability
- will be removed in 2.0.
Experimental unstable ICU4X components are behind Cargo features which are not enabled by default. Note that these Cargo features
affect the behaviour of all_keys
:
icu_compactdecimal
icu_displaynames
icu_relativetime
icu_transliterate
- …
The meta-feature experimental_components
is available to activate all experimental components.
Modules
- A data exporter that bakes the data into Rust code.
- Data exporter that creates a binary blob for use with
BlobDataProvider
. - Data exporter that creates a file system structure for use with
FsDataProvider
. - A prelude for using the datagen API
- syntaxDeprecated
Out::Fs
serialization formats.
Structs
- Configuration for a data export operation.
- An
ExportableProvider
backed by raw CLDR and ICU data. - SourceDataDeprecatedBag of options for
datagen
.
Enums
- Specifies the collation Han database to use.
- A language’s CLDR coverage level.
- Defines how fallback will apply to the generated data.
- OutDeprecatedThe output format for
datagen
.
Functions
- List of all keys that are available.
- all_keys_with_experimentalDeprecatedSame as
all_keys
. - datagenDeprecatedRuns data generation
- is_missing_cldr_errorDeprecatedIdentifies errors that are due to missing CLDR data.
- is_missing_icuexport_errorDeprecatedIdentifies errors that are due to missing ICU export data.
- Parses a human-readable key identifier into a
DataKey
. - Parses a list of human-readable key identifiers and returns a list of
DataKey
s. - Parses a compiled binary and returns a list of
DataKey
s that it uses at runtime. - keys_from_fileDeprecatedParses a file of human-readable key identifiers and returns a list of
DataKey
s.