formatjs_icu_messageformat_parser 0.2.4

ICU MessageFormat parser implementation in Rust
Documentation
# ICU MessageFormat Parser (Rust)

A Rust implementation of the ICU MessageFormat parser, optimized for performance and WebAssembly compilation.

## Features

- Full ICU MessageFormat syntax support
- High-performance parsing - **2.6-3.7x faster than JavaScript parser**
- WebAssembly-ready with wasm-bindgen
- Zero-copy parsing where possible
- Comprehensive error handling

## Performance

The Rust parser (optimized build) significantly outperforms both the JavaScript parser and other Rust implementations:

```bash
$ bazel run -c opt //crates/icu_messageformat_parser:comparison_bench
```

| Message Type | Rust Parser | JavaScript | Speedup vs JS    | SWC Parser | vs SWC       |
| ------------ | ----------- | ---------- | ---------------- | ---------- | ------------ |
| complex_msg  | 9.22 µs     | 23.85 µs   | **2.59x faster** | 10.3 µs    | 1.11x faster |
| normal_msg   | 1.14 µs     | 3.27 µs    | **2.87x faster** | 1.25 µs    | 1.10x faster |
| simple_msg   | 163 ns      | 600 ns     | **3.68x faster** | 184 ns     | 1.13x faster |
| string_msg   | 118 ns      | 320 ns     | **2.71x faster** | 126 ns     | 1.07x faster |

**Note:** Always use `-c opt` for benchmarking to enable release optimizations.

## Project Structure

- `lib.rs` - Main library and WASM bindings
- `parser.rs` - Core parser implementation
- `types.rs` - AST types
- `error.rs` - Error types
- `date_time_pattern_generator.rs` - Date/time pattern support
- `manipulator.rs` - AST manipulation utilities
- `printer.rs` - AST printing utilities

## Building

### Native Rust Library

```bash
# Run tests
bazel test //crates/icu_messageformat_parser:icu_messageformat_parser_test

# Build library
bazel build //crates/icu_messageformat_parser:icu_messageformat_parser

# Run benchmarks
bazel run //crates/icu_messageformat_parser:parser_bench
```

### WebAssembly

The parser can be compiled to WebAssembly using Bazel's platform transition approach.

#### Build with Bazel

```bash
bazel build //crates/icu_messageformat_parser:formatjs_icu_messageformat_parser_wasm
```

This uses `rust_shared_library` with `platform = "@rules_rust//rust/platform:wasm"` to cross-compile to wasm32.

#### What Gets Built

The WASM build includes:

- `formatjs_icu_messageformat_parser_bg.wasm` - The WASM binary (~1.2MB)
- `formatjs_icu_messageformat_parser.js` - JavaScript glue code generated by wasm-bindgen
- `formatjs_icu_messageformat_parser.d.ts` - TypeScript type definitions
- `formatjs_icu_messageformat_parser_bg.wasm.d.ts` - WASM module types

#### WASM Configuration

The WASM build uses:

- **crate-type**: `cdylib` for dynamic library output
- **features**: `wasm` feature flag enables wasm-bindgen dependencies
- **platform**: `@rules_rust//rust/platform:wasm` for wasm32 target
- **dependencies**: `wasm-bindgen` and `serde-wasm-bindgen` for JS interop

See [BUILD.bazel](./BUILD.bazel) for the full configuration.

## WASM API

When compiled to WASM, the parser exports two functions:

### `parse(input: string): MessageFormatElement[]`

Parse ICU MessageFormat with default options.

```javascript
import init, {parse} from './formatjs_icu_messageformat_parser.js'

await init()
const ast = parse('Hello {name}!')
console.log(ast)
```

### `parse_ignore_tag(input: string): MessageFormatElement[]`

Parse with `ignore_tag` option enabled (treats HTML-like tags as literals).

```javascript
import init, {parse_ignore_tag} from './formatjs_icu_messageformat_parser.js'

await init()
const ast = parse_ignore_tag('<b>Bold {name}</b>')
console.log(ast)
```

Both functions return the parsed AST as a JavaScript object or throw an error on parse failure.

## Usage in Packages

The WASM binary is used by the `@formatjs/icu-messageformat-parser-wasm` npm package, which provides a convenient JavaScript wrapper:

```javascript
import {parse, parseIgnoreTag} from '@formatjs/icu-messageformat-parser-wasm'

// Automatically initializes WASM on first call
const ast = await parse('Hello {name}!')
```

## Implementation Notes

### Platform Transition

The build uses Bazel's platform transition feature to cross-compile from the host platform to wasm32:

```python
rust_shared_library(
    name = "formatjs_icu_messageformat_parser_wasm",
    platform = "@rules_rust//rust/platform:wasm",
    crate_features = ["wasm"],
    # ...
)
```

This approach:

- ✅ Works entirely within Bazel's hermetic build system
- ✅ No external tools (like wasm-pack) required at build time
- ✅ Leverages rules_rust's native WASM support
- ✅ Automatically uses the wasm32 dummy CC toolchain

### WASM Bindgen Integration

The `wasm` feature flag in Cargo.toml enables:

- `wasm-bindgen` for JS interop
- `serde-wasm-bindgen` for serializing complex types to JS
- Exported `parse` and `parse_ignore_tag` functions

The Rust code uses `#[cfg(feature = "wasm")]` to conditionally compile WASM-specific code.

## Dependencies

- `icu` - Unicode/ICU functionality
- `regex` - Pattern matching
- `serde` - Serialization framework
- `once_cell` - Lazy static initialization

WASM-only dependencies (behind `wasm` feature):

- `wasm-bindgen` - JS interop
- `serde-wasm-bindgen` - Serialize to JS values

## Development

### Regenerate Generated Files

```bash
# Regenerate time data
bazel run //crates/icu_messageformat_parser:time-data

# Regenerate regex patterns
bazel run //crates/icu_messageformat_parser:regex
```

### Testing

```bash
# Run Rust tests
bazel test //crates/icu_messageformat_parser:icu_messageformat_parser_test

# Run benchmarks
bazel run //crates/icu_messageformat_parser:parser_bench
```

## References

- [ICU MessageFormat Syntax]https://unicode-org.github.io/icu/userguide/format_parse/messages/
- [wasm-bindgen Guide]https://rustwasm.github.io/wasm-bindgen/
- [rules_rust WASM Documentation]https://bazelbuild.github.io/rules_crates/
- [Bazel Platform Transitions]https://bazel.build/extending/config#user-defined-transitions