formatjs_icu_messageformat_parser 0.2.0

ICU MessageFormat parser implementation in Rust
Documentation
# ICU MessageFormat Parser (Rust)

A Rust implementation of the ICU MessageFormat parser, optimized for performance and WebAssembly compilation.

## Features

- Full ICU MessageFormat syntax support
- High-performance parsing
- WebAssembly-ready with wasm-bindgen
- Zero-copy parsing where possible
- Comprehensive error handling

## Project Structure

- `lib.rs` - Main library and WASM bindings
- `parser.rs` - Core parser implementation
- `types.rs` - AST types
- `error.rs` - Error types
- `date_time_pattern_generator.rs` - Date/time pattern support
- `manipulator.rs` - AST manipulation utilities
- `printer.rs` - AST printing utilities

## Building

### Native Rust Library

```bash
# Run tests
bazel test //rust/icu_messageformat_parser:icu_messageformat_parser_test

# Build library
bazel build //rust/icu_messageformat_parser:icu_messageformat_parser

# Run benchmarks
bazel run //rust/icu_messageformat_parser:parser_bench
```

### WebAssembly

The parser can be compiled to WebAssembly using Bazel's platform transition approach.

#### Build with Bazel

```bash
bazel build //rust/icu_messageformat_parser:formatjs_icu_messageformat_parser_wasm
```

This uses `rust_shared_library` with `platform = "@rules_rust//rust/platform:wasm"` to cross-compile to wasm32.

#### What Gets Built

The WASM build includes:

- `formatjs_icu_messageformat_parser_bg.wasm` - The WASM binary (~1.2MB)
- `formatjs_icu_messageformat_parser.js` - JavaScript glue code generated by wasm-bindgen
- `formatjs_icu_messageformat_parser.d.ts` - TypeScript type definitions
- `formatjs_icu_messageformat_parser_bg.wasm.d.ts` - WASM module types

#### WASM Configuration

The WASM build uses:

- **crate-type**: `cdylib` for dynamic library output
- **features**: `wasm` feature flag enables wasm-bindgen dependencies
- **platform**: `@rules_rust//rust/platform:wasm` for wasm32 target
- **dependencies**: `wasm-bindgen` and `serde-wasm-bindgen` for JS interop

See [BUILD.bazel](./BUILD.bazel) for the full configuration.

## WASM API

When compiled to WASM, the parser exports two functions:

### `parse(input: string): MessageFormatElement[]`

Parse ICU MessageFormat with default options.

```javascript
import init, { parse } from './formatjs_icu_messageformat_parser.js';

await init();
const ast = parse('Hello {name}!');
console.log(ast);
```

### `parse_ignore_tag(input: string): MessageFormatElement[]`

Parse with `ignore_tag` option enabled (treats HTML-like tags as literals).

```javascript
import init, { parse_ignore_tag } from './formatjs_icu_messageformat_parser.js';

await init();
const ast = parse_ignore_tag('<b>Bold {name}</b>');
console.log(ast);
```

Both functions return the parsed AST as a JavaScript object or throw an error on parse failure.

## Usage in Packages

The WASM binary is used by the `@formatjs/icu-messageformat-parser-wasm` npm package, which provides a convenient JavaScript wrapper:

```javascript
import { parse, parseIgnoreTag } from '@formatjs/icu-messageformat-parser-wasm';

// Automatically initializes WASM on first call
const ast = await parse('Hello {name}!');
```

## Implementation Notes

### Platform Transition

The build uses Bazel's platform transition feature to cross-compile from the host platform to wasm32:

```python
rust_shared_library(
    name = "formatjs_icu_messageformat_parser_wasm",
    platform = "@rules_rust//rust/platform:wasm",
    crate_features = ["wasm"],
    # ...
)
```

This approach:

- ✅ Works entirely within Bazel's hermetic build system
- ✅ No external tools (like wasm-pack) required at build time
- ✅ Leverages rules_rust's native WASM support
- ✅ Automatically uses the wasm32 dummy CC toolchain

### WASM Bindgen Integration

The `wasm` feature flag in Cargo.toml enables:

- `wasm-bindgen` for JS interop
- `serde-wasm-bindgen` for serializing complex types to JS
- Exported `parse` and `parse_ignore_tag` functions

The Rust code uses `#[cfg(feature = "wasm")]` to conditionally compile WASM-specific code.

## Dependencies

- `icu` - Unicode/ICU functionality
- `regex` - Pattern matching
- `serde` - Serialization framework
- `once_cell` - Lazy static initialization

WASM-only dependencies (behind `wasm` feature):

- `wasm-bindgen` - JS interop
- `serde-wasm-bindgen` - Serialize to JS values

## Development

### Regenerate Generated Files

```bash
# Regenerate time data
bazel run //rust/icu_messageformat_parser:time-data

# Regenerate regex patterns
bazel run //rust/icu_messageformat_parser:regex
```

### Testing

```bash
# Run Rust tests
bazel test //rust/icu_messageformat_parser:icu_messageformat_parser_test

# Run benchmarks
bazel run //rust/icu_messageformat_parser:parser_bench
```

## References

- [ICU MessageFormat Syntax]https://unicode-org.github.io/icu/userguide/format_parse/messages/
- [wasm-bindgen Guide]https://rustwasm.github.io/wasm-bindgen/
- [rules_rust WASM Documentation]https://bazelbuild.github.io/rules_rust/
- [Bazel Platform Transitions]https://bazel.build/extending/config#user-defined-transitions