Expand description
Documentation on zero-copy deserialization of locale types.
Locale
and LanguageIdentifier
are highly structured types that cannot be directly
stored in a zero-copy data structure, such as those provided by the [zerovec
] crate.
This page explains how to indirectly store these types in a [zerovec
].
There are two main use cases, which have different solutions:
- Lookup: You need to locate a locale in a zero-copy vector, such as when querying a map.
- Obtain: You have a locale stored in a zero-copy vector, and you need to obtain a proper
Locale
orLanguageIdentifier
for use elsewhere in your program.
Lookup
To perform lookup, store the stringified locale in a canonical BCP-47 form as a byte array,
and then use Locale::cmp_bytes()
to perform an efficient, zero-allocation lookup.
use std::str::FromStr;
use icu_locid::Locale;
use zerovec::ZeroMap;
// ZeroMap from locales to integers
let data: &[(&[u8], u32)] = &[
(b"de-DE-u-hc-h12", 5),
(b"en-US-u-ca-buddhist", 10),
(b"my-MM", 15),
(b"sr-Cyrl-ME", 20),
(b"zh-TW", 25),
];
let zm: ZeroMap<[u8], u32> = data.iter().copied().collect();
// Get the value associated with a locale
let loc: Locale = "en-US-u-ca-buddhist".parse().unwrap();
let value = zm.get_copied_by(|bytes| loc.cmp_bytes(bytes).reverse());
assert_eq!(value, Some(10));
Obtain
Obtaining a Locale
or LanguageIdentifier
is not generally a zero-copy operation, since
both of these types may require memory allocation. If possible, architect your code such that
you do not need to obtain a structured type.
If need the structured type, such as if you need to manipulate it in some way, there are two options: storing subtags, and storing a string for parsing.
Storing Subtags
If the data being stored only contains a limited number of subtags, you can store them as a
tuple, and then construct the LanguageIdentifier
externally.
use icu_locid::LanguageIdentifier;
use icu_locid::subtags::{Language, Script, Region};
use icu_locid::{language, script, region, langid};
use zerovec::ZeroMap;
// ZeroMap from integer to LSR (language-script-region)
let data: &[(u32, (Language, Option<Script>, Option<Region>))] = &[
(5, (language!("de"), None, Some(region!("DE")))),
(10, (language!("en"), None, Some(region!("US")))),
(15, (language!("my"), None, Some(region!("MM")))),
(20, (language!("sr"), Some(script!("Cyrl")), Some(region!("ME")))),
(25, (language!("zh"), None, Some(region!("TW")))),
];
let zm: ZeroMap<u32, (Language, Option<Script>, Option<Region>)> =
data.iter().copied().collect();
// Construct a LanguageIdentifier from a tuple entry
let lid: LanguageIdentifier = zm.get_copied(&25)
.expect("element is present")
.into();
assert_eq!(lid, langid!("zh-TW"));
Storing Strings
If it is necessary to store and obtain an arbitrary locale, it is currently recommended to store a BCP-47 string and parse it when needed.
Since the string is stored in an unparsed state, it is not safe to unwrap
the result from
Locale::from_bytes()
. See icu4x#831
for a discussion on potential data models that could ensure that the locale is valid during
deserialization.
use icu_locid::Locale;
use icu_locid::langid;
use zerovec::ZeroMap;
// ZeroMap from integer to locale string
let data: &[(u32, &[u8])] = &[
(5, b"de-DE-u-hc-h12"),
(10, b"en-US-u-ca-buddhist"),
(15, b"my-MM"),
(20, b"sr-Cyrl-ME"),
(25, b"zh-TW"),
(30, b"INVALID"),
];
let zm: ZeroMap<u32, [u8]> = data.iter().copied().collect();
// Construct a Locale by parsing the string.
let value = zm.get(&25).expect("element is present");
let loc = Locale::from_bytes(value);
assert_eq!(loc, Ok(langid!("zh-TW").into()));
// Invalid entries are fallible
let err_value = zm.get(&30).expect("element is present");
let err_loc = Locale::from_bytes(err_value);
assert!(matches!(err_loc, Err(_)));