# ICU MessageFormat Parser (Rust)
A Rust implementation of the ICU MessageFormat parser, optimized for performance and WebAssembly compilation.
## Features
- Full ICU MessageFormat syntax support
- High-performance parsing - **2.6-3.7x faster than JavaScript parser**
- WebAssembly-ready with wasm-bindgen
- Zero-copy parsing where possible
- Comprehensive error handling
## Performance
The Rust parser (optimized build) significantly outperforms both the JavaScript parser and other Rust implementations:
```bash
$ bazel run -c opt //crates/icu_messageformat_parser:comparison_bench
```
| complex_msg | 9.22 µs | 23.85 µs | **2.59x faster** | 10.3 µs | 1.11x faster |
| normal_msg | 1.14 µs | 3.27 µs | **2.87x faster** | 1.25 µs | 1.10x faster |
| simple_msg | 163 ns | 600 ns | **3.68x faster** | 184 ns | 1.13x faster |
| string_msg | 118 ns | 320 ns | **2.71x faster** | 126 ns | 1.07x faster |
**Note:** Always use `-c opt` for benchmarking to enable release optimizations.
## Project Structure
- `lib.rs` - Main library and WASM bindings
- `parser.rs` - Core parser implementation
- `types.rs` - AST types
- `error.rs` - Error types
- `date_time_pattern_generator.rs` - Date/time pattern support
- `manipulator.rs` - AST manipulation utilities
- `printer.rs` - AST printing utilities
## Building
### Native Rust Library
```bash
# Run tests
bazel test //crates/icu_messageformat_parser:icu_messageformat_parser_test
# Build library
bazel build //crates/icu_messageformat_parser:icu_messageformat_parser
# Run benchmarks
bazel run //crates/icu_messageformat_parser:parser_bench
```
### WebAssembly
The parser can be compiled to WebAssembly using Bazel's platform transition approach.
#### Build with Bazel
```bash
bazel build //crates/icu_messageformat_parser:formatjs_icu_messageformat_parser_wasm
```
This uses `rust_shared_library` with `platform = "@rules_rust//rust/platform:wasm"` to cross-compile to wasm32.
#### What Gets Built
The WASM build includes:
- `formatjs_icu_messageformat_parser_bg.wasm` - The WASM binary (~1.2MB)
- `formatjs_icu_messageformat_parser.js` - JavaScript glue code generated by wasm-bindgen
- `formatjs_icu_messageformat_parser.d.ts` - TypeScript type definitions
- `formatjs_icu_messageformat_parser_bg.wasm.d.ts` - WASM module types
#### WASM Configuration
The WASM build uses:
- **crate-type**: `cdylib` for dynamic library output
- **features**: `wasm` feature flag enables wasm-bindgen dependencies
- **platform**: `@rules_rust//rust/platform:wasm` for wasm32 target
- **dependencies**: `wasm-bindgen` and `serde-wasm-bindgen` for JS interop
See [BUILD.bazel](./BUILD.bazel) for the full configuration.
## WASM API
When compiled to WASM, the parser exports two functions:
### `parse(input: string): MessageFormatElement[]`
Parse ICU MessageFormat with default options.
```javascript
import init, {parse} from './formatjs_icu_messageformat_parser.js'
await init()
const ast = parse('Hello {name}!')
console.log(ast)
```
### `parse_ignore_tag(input: string): MessageFormatElement[]`
Parse with `ignore_tag` option enabled (treats HTML-like tags as literals).
```javascript
import init, {parse_ignore_tag} from './formatjs_icu_messageformat_parser.js'
await init()
const ast = parse_ignore_tag('<b>Bold {name}</b>')
console.log(ast)
```
Both functions return the parsed AST as a JavaScript object or throw an error on parse failure.
## Usage in Packages
The WASM binary is used by the `@formatjs/icu-messageformat-parser-wasm` npm package, which provides a convenient JavaScript wrapper:
```javascript
import {parse, parseIgnoreTag} from '@formatjs/icu-messageformat-parser-wasm'
// Automatically initializes WASM on first call
const ast = await parse('Hello {name}!')
```
## Implementation Notes
### Platform Transition
The build uses Bazel's platform transition feature to cross-compile from the host platform to wasm32:
```python
rust_shared_library(
name = "formatjs_icu_messageformat_parser_wasm",
platform = "@rules_rust//rust/platform:wasm",
crate_features = ["wasm"],
# ...
)
```
This approach:
- ✅ Works entirely within Bazel's hermetic build system
- ✅ No external tools (like wasm-pack) required at build time
- ✅ Leverages rules_rust's native WASM support
- ✅ Automatically uses the wasm32 dummy CC toolchain
### WASM Bindgen Integration
The `wasm` feature flag in Cargo.toml enables:
- `wasm-bindgen` for JS interop
- `serde-wasm-bindgen` for serializing complex types to JS
- Exported `parse` and `parse_ignore_tag` functions
The Rust code uses `#[cfg(feature = "wasm")]` to conditionally compile WASM-specific code.
## Dependencies
- `icu` - Unicode/ICU functionality
- `regex` - Pattern matching
- `serde` - Serialization framework
- `once_cell` - Lazy static initialization
WASM-only dependencies (behind `wasm` feature):
- `wasm-bindgen` - JS interop
- `serde-wasm-bindgen` - Serialize to JS values
## Development
### Regenerate Generated Files
```bash
# Regenerate time data
bazel run //crates/icu_messageformat_parser:time-data
# Regenerate regex patterns
bazel run //crates/icu_messageformat_parser:regex
```
### Testing
```bash
# Run Rust tests
bazel test //crates/icu_messageformat_parser:icu_messageformat_parser_test
# Run benchmarks
bazel run //crates/icu_messageformat_parser:parser_bench
```
## References
- [ICU MessageFormat Syntax](https://unicode-org.github.io/icu/userguide/format_parse/messages/)
- [wasm-bindgen Guide](https://rustwasm.github.io/wasm-bindgen/)
- [rules_rust WASM Documentation](https://bazelbuild.github.io/rules_crates/)
- [Bazel Platform Transitions](https://bazel.build/extending/config#user-defined-transitions)