Crate icu_datagen

Source
Expand description

icu_datagen is a library to generate data files that can be used in ICU4X data providers.

Data files can be generated either programmatically (i.e. in build.rs), or through a command-line utility.

Also see our datagen tutorial.

§Examples

§Rust API

use icu_datagen::blob_exporter::*;
use icu_datagen::prelude::*;
use std::fs::File;

DatagenDriver::new()
    .with_keys([icu::list::provider::AndListV1Marker::KEY])
    .with_locales_and_fallback([LocaleFamily::FULL], Default::default())
    .export(
        &DatagenProvider::new_latest_tested(),
        BlobExporter::new_v2_with_sink(Box::new(
            File::create("data.postcard").unwrap(),
        )),
    )
    .unwrap();

§Command line

The command line interface can be installed through Cargo.

$ cargo install icu_datagen

Once the tool is installed, you can invoke it like this:

$ icu4x-datagen --keys all --locales de en-AU --format blob --out data.postcard

More details can be found by running --help.

§Cargo features

This crate has a lot of dependencies, some of which are not required for all operating modes. These default Cargo features can be disabled to reduce dependencies:

  • baked_exporter
    • enables the baked_exporter module
    • enables the --format mod CLI argument
  • blob_exporter
  • fs_exporter
  • networking
    • enables methods on DatagenProvider that fetch source data from the network
    • enables the --cldr-tag, --icu-export-tag, and --segmenter-lstm-tag CLI arguments that download data
  • rayon
    • enables parallelism during export
  • use_wasm / use_icu4c
  • bin
    • required by the CLI and enabled by default to make cargo install work
  • legacy_api
    • enables the deprecated pre-1.3 API
    • enabled by default for semver stability
    • will be removed in 2.0.
  • icu_experimental
    • enables data generation for keys defined in the unstable icu_experimental crate
    • note that this features affects the behaviour of all_keys

The meta-feature experimental_components is available to activate all experimental components.

Modules§

baked_exporter
A data exporter that bakes the data into Rust code.
blob_exporter
Data exporter that creates a binary blob for use with BlobDataProvider.
fs_exporter
Data exporter that creates a file system structure for use with FsDataProvider.
prelude
A prelude for using the datagen API
syntaxDeprecated
Out::Fs serialization formats.

Structs§

DatagenDriver
Configuration for a data export operation.
DatagenProvider
An ExportableProvider backed by raw CLDR and ICU data.
FallbackOptions
Options bag configuring locale inclusion and behavior when runtime fallback is enabled.
LocaleFamily
A family of locales to export.
NoFallbackOptions
Options bag configuring locale inclusion and behavior when runtime fallback is disabled.
SourceDataDeprecated
Bag of options for datagen.

Enums§

CollationHanDatabase
Specifies the collation Han database to use.
CoverageLevel
A language’s CLDR coverage level.
DeduplicationStrategy
Choices for determining the deduplication of locales for exported data payloads.
FallbackMode
Defines how fallback will apply to the generated data.
OutDeprecated
The output format for datagen.
RuntimeFallbackLocation
Choices for the code location of runtime fallback.

Functions§

all_keys
List of all keys that are available.
all_keys_with_experimentalDeprecated
Same as all_keys.
datagenDeprecated
Runs data generation
is_missing_cldr_errorDeprecated
Identifies errors that are due to missing CLDR data.
is_missing_icuexport_errorDeprecated
Identifies errors that are due to missing ICU export data.
key
Parses a human-readable key identifier into a DataKey.
keys
Parses a list of human-readable key identifiers and returns a list of DataKeys.
keys_from_bin
Parses a compiled binary and returns a list of DataKeys that it uses at runtime.
keys_from_fileDeprecated
Parses a file of human-readable key identifiers and returns a list of DataKeys.