# ICU MessageFormat Parser (Rust)
A Rust implementation of the ICU MessageFormat parser, optimized for performance and WebAssembly compilation.
## Features
- Full ICU MessageFormat syntax support
- High-performance parsing
- WebAssembly-ready with wasm-bindgen
- Zero-copy parsing where possible
- Comprehensive error handling
## Project Structure
- `lib.rs` - Main library and WASM bindings
- `parser.rs` - Core parser implementation
- `types.rs` - AST types
- `error.rs` - Error types
- `date_time_pattern_generator.rs` - Date/time pattern support
- `manipulator.rs` - AST manipulation utilities
- `printer.rs` - AST printing utilities
## Building
### Native Rust Library
```bash
# Run tests
bazel test //rust/icu_messageformat_parser:icu_messageformat_parser_test
# Build library
bazel build //rust/icu_messageformat_parser:icu_messageformat_parser
# Run benchmarks
bazel run //rust/icu_messageformat_parser:parser_bench
```
### WebAssembly
The parser can be compiled to WebAssembly using Bazel's platform transition approach.
#### Build with Bazel
```bash
bazel build //rust/icu_messageformat_parser:formatjs_icu_messageformat_parser_wasm
```
This uses `rust_shared_library` with `platform = "@rules_rust//rust/platform:wasm"` to cross-compile to wasm32.
#### What Gets Built
The WASM build includes:
- `formatjs_icu_messageformat_parser_bg.wasm` - The WASM binary (~1.2MB)
- `formatjs_icu_messageformat_parser.js` - JavaScript glue code generated by wasm-bindgen
- `formatjs_icu_messageformat_parser.d.ts` - TypeScript type definitions
- `formatjs_icu_messageformat_parser_bg.wasm.d.ts` - WASM module types
#### WASM Configuration
The WASM build uses:
- **crate-type**: `cdylib` for dynamic library output
- **features**: `wasm` feature flag enables wasm-bindgen dependencies
- **platform**: `@rules_rust//rust/platform:wasm` for wasm32 target
- **dependencies**: `wasm-bindgen` and `serde-wasm-bindgen` for JS interop
See [BUILD.bazel](./BUILD.bazel) for the full configuration.
## WASM API
When compiled to WASM, the parser exports two functions:
### `parse(input: string): MessageFormatElement[]`
Parse ICU MessageFormat with default options.
```javascript
import init, { parse } from './formatjs_icu_messageformat_parser.js';
await init();
const ast = parse('Hello {name}!');
console.log(ast);
```
### `parse_ignore_tag(input: string): MessageFormatElement[]`
Parse with `ignore_tag` option enabled (treats HTML-like tags as literals).
```javascript
import init, { parse_ignore_tag } from './formatjs_icu_messageformat_parser.js';
await init();
const ast = parse_ignore_tag('<b>Bold {name}</b>');
console.log(ast);
```
Both functions return the parsed AST as a JavaScript object or throw an error on parse failure.
## Usage in Packages
The WASM binary is used by the `@formatjs/icu-messageformat-parser-wasm` npm package, which provides a convenient JavaScript wrapper:
```javascript
import { parse, parseIgnoreTag } from '@formatjs/icu-messageformat-parser-wasm';
// Automatically initializes WASM on first call
const ast = await parse('Hello {name}!');
```
## Implementation Notes
### Platform Transition
The build uses Bazel's platform transition feature to cross-compile from the host platform to wasm32:
```python
rust_shared_library(
name = "formatjs_icu_messageformat_parser_wasm",
platform = "@rules_rust//rust/platform:wasm",
crate_features = ["wasm"],
# ...
)
```
This approach:
- ✅ Works entirely within Bazel's hermetic build system
- ✅ No external tools (like wasm-pack) required at build time
- ✅ Leverages rules_rust's native WASM support
- ✅ Automatically uses the wasm32 dummy CC toolchain
### WASM Bindgen Integration
The `wasm` feature flag in Cargo.toml enables:
- `wasm-bindgen` for JS interop
- `serde-wasm-bindgen` for serializing complex types to JS
- Exported `parse` and `parse_ignore_tag` functions
The Rust code uses `#[cfg(feature = "wasm")]` to conditionally compile WASM-specific code.
## Dependencies
- `icu` - Unicode/ICU functionality
- `regex` - Pattern matching
- `serde` - Serialization framework
- `once_cell` - Lazy static initialization
WASM-only dependencies (behind `wasm` feature):
- `wasm-bindgen` - JS interop
- `serde-wasm-bindgen` - Serialize to JS values
## Development
### Regenerate Generated Files
```bash
# Regenerate time data
bazel run //rust/icu_messageformat_parser:time-data
# Regenerate regex patterns
bazel run //rust/icu_messageformat_parser:regex
```
### Testing
```bash
# Run Rust tests
bazel test //rust/icu_messageformat_parser:icu_messageformat_parser_test
# Run benchmarks
bazel run //rust/icu_messageformat_parser:parser_bench
```
## References
- [ICU MessageFormat Syntax](https://unicode-org.github.io/icu/userguide/format_parse/messages/)
- [wasm-bindgen Guide](https://rustwasm.github.io/wasm-bindgen/)
- [rules_rust WASM Documentation](https://bazelbuild.github.io/rules_rust/)
- [Bazel Platform Transitions](https://bazel.build/extending/config#user-defined-transitions)