# Byte Segment Highlighting for M-Bus Parser
## Context
The parser currently produces structured data (frames, data records, values) but loses the mapping back to raw byte positions. Users who want to visualize or inspect raw frames have no way to know which bytes correspond to which protocol fields. This feature adds a "meta layer" that labels every byte in a frame with its protocol role, enabling UIs to render hex views with hover tooltips, coloring by layer, and grouping of related fields (e.g., all bytes of a data record).
## Approach
Add a new `src/annotate.rs` module behind `#[cfg(feature = "std")]` in the root crate. It takes raw frame bytes, parses them using the existing parser, and returns a flat `Vec<ByteSegment>` covering every byte of the original input. No modifications to existing sub-crates or the no_std parser API.
The annotation contract is original-byte based: `start` and `end` always refer to offsets in the exact byte slice passed to `annotate_frame`. The implementation may build a normalized scratch buffer for parsing (for example, wireless Format A with CRC bytes stripped), but it must carry an offset map back to the original frame before emitting segments. This keeps the API useful for hex viewers and avoids silently annotating bytes that the user cannot find in their input.
## Core Types
```rust
// src/annotate.rs
#[non_exhaustive]
pub enum SegmentKind {
// Link layer
StartByte, Length, CField, AField, Checksum, StopByte,
// Application layer header
CiField, IdentificationNumber, ManufacturerCode, Version,
DeviceType, AccessNumber, Status, ConfigurationField,
// Data record fields
Dif, Dife, Vif, Vife, PlaintextVif, DataPayload,
// Wireless
LField, WirelessManufacturerId, Crc,
// Extended link layer
ExtendedLinkLayer,
// Opaque payloads
EncryptedPayload, ManufacturerSpecific,
// Fallback
IdleFiller, Unknown,
}
pub struct ByteSegment {
pub start: usize, // inclusive byte offset in original frame
pub end: usize, // exclusive byte offset
pub kind: SegmentKind, // type-safe field identifier
pub detail: String, // hover tooltip (e.g. "C Field: RspUd (ACD: false)")
pub group: Option<usize>, // data record index, links related DIF/VIF/data
pub layer: u8, // 0=frame, 1=app header, 2=record sub-field
}
```
Label text is derived from `SegmentKind` via `Display` impl rather than stored separately. `group` uses `Option<usize>` instead of `Option<String>` to avoid allocations.
## Public API
```rust
pub fn annotate_frame(data: &[u8]) -> Result<Vec<ByteSegment>, MbusError>
```
Also integrated into `serialize_mbus_data` as format `"annotated"` (serializes segments to JSON).
`annotate_frame` does not decrypt payloads. If a variable data block is encrypted, annotate the transport headers and then emit one `EncryptedPayload` segment for the encrypted bytes instead of trying to parse DIF/VIF records. A key-aware API can be added later, but it must still emit offsets for the original ciphertext bytes, not offsets into decrypted scratch data.
## Implementation Steps
### 1. Create `src/annotate.rs` with types and entry point
- `SegmentKind` enum with `Display` impl for labels
- `ByteSegment` struct with `serde::Serialize`
- `annotate_frame()` that tries wired, then wireless parsing
### 2. Annotate wired frame link layer
Deterministic offsets from frame format:
| Start | 0 | 0 |
| Length | 1..3 | - |
| Start repeat | 3 | - |
| C Field | 4 | 1 |
| A Field | 5 | 2 |
| User Data | 6..L+4 | - |
| Checksum | L+4 | 3 |
| Stop | L+5 | 4 |
### 3. Annotate application layer header
Within user data (starting at frame byte 6 for long frames):
**CI=0x72/0x76 (variable data, long TPL, 13-byte header):**
- CI(0), ID(1..5), Manufacturer(5..7), Version(7), DeviceType(8), AccessNum(9), Status(10), Config(11..13)
- Data records start at relative offset 13
**CI=0x7A and encrypted CI variants 0xA0..0xAF (short TPL):**
- CI(0), AccessNum(1), Status(2), Config(3..5)
- Data records start at relative offset 5
- For CI=0xA0, account for the extra encryption configuration byte after CI. The data record offset is relative byte 6, matching the current parser.
**CI=0x8C/0x8D/0x8E (extended link layers):**
- CI=0x8C (ELL I): annotate CI + 2 ELL bytes, then annotate the nested short TPL header parsed by the current application-layer parser.
- CI=0x8D (ELL II): annotate CI + 8 ELL bytes, then treat the remaining bytes as the variable data block. The current parser creates the short TPL context from ELL fields; there is no inner CI/TPL header to annotate.
- CI=0x8E (ELL III): annotate CI + 16 ELL bytes, then treat the remaining bytes as the variable data block. The current parser creates the short TPL context from ELL fields; there is no inner CI/TPL header to annotate.
- If an ELL frame is encrypted or otherwise cannot be parsed into records, emit a single opaque payload segment for the remaining bytes.
### 4. Annotate data records
Walk variable data block byte-by-byte using existing parsers:
- Parse `DataInformationBlock::try_from()` -> emit `Dif` (1 byte) + `Dife` (extension bytes)
- Parse `ValueInformationBlock::try_from()` -> emit `Vif` (1 byte) + `Vife` (extension bytes)
- If `ValueInformationBlock::get_size()` includes plaintext VIF bytes (`0x7C`/`0xFC` with length-prefixed ASCII), emit the extra length/text bytes as `PlaintextVif`
- Parse `DataRecord` -> compute data payload size = `record.get_size() - header_size`
- Emit `DataPayload` for remaining bytes
- All segments in one record share `group: Some(record_index)`
- Handle idle fillers (0x2F), manufacturer-specific blocks (0x0F), and "more records follow" blocks (0x1F). Manufacturer-specific and more-records-follow tails should be one opaque segment covering the remaining bytes, because the existing parser treats them as a terminal special record.
- On parse failure, mark remaining bytes as `Unknown` (never leave gaps)
### 5. Annotate wireless frames
- LField(0), CField(1), ManufacturerID(2..10), then user data from byte 10+
- For Format B / already stripped frames, annotate offsets directly from the input.
- For Format A frames, validate and strip CRCs into a scratch buffer for parsing, but keep an original-offset map:
- CRC after the first 10 bytes -> `Crc`
- CRC after each 16-byte data block -> `Crc`
- All parsed application-layer segments map back to their non-CRC original offsets
- The original L-field remains `LField` even though the normalized scratch buffer rewrites it
- If CRC validation fails but wireless parsing succeeds without stripping, annotate it as an already stripped/non-Format-A frame.
### 6. Wire into lib.rs and mbus_data.rs
- `src/lib.rs`: add `#[cfg(feature = "std")] pub mod annotate;` and re-export
- `src/mbus_data.rs`: add `"annotated"` branch in `serialize_mbus_data`
- The `"annotated"` serializer should ignore `key` for the first version and use `EncryptedPayload` for encrypted variable data. Do not parse decrypted scratch bytes unless a separate key-aware annotation API is added.
### 7. Tests
- Verify a known long frame: all bytes covered, no gaps, correct segment kinds
- Verify short frame and single character frame
- Verify data record sub-field offsets match expected positions
- Verify contiguity: segments sorted by start, each `end == next.start`
- Verify wireless Format A: CRC bytes are emitted as `Crc`, parsed fields map to original offsets, and all original bytes are covered
- Verify ELL II and ELL III: no nonexistent inner CI/TPL header is annotated
- Verify encrypted variable data: payload is marked `EncryptedPayload` and no DIF/VIF records are emitted
- Verify plaintext VIF (`0x7C`/`0xFC`): length/text bytes are annotated as `PlaintextVif`
- Verify manufacturer-specific and more-records-follow tails are opaque and terminal
## Files to Modify
- **Create**: `src/annotate.rs` (new module, all types + logic + tests)
- **Edit**: `src/lib.rs` (add module declaration + re-export, ~3 lines)
- **Edit**: `src/mbus_data.rs` (add `"annotated"` format branch, ~10 lines)
No changes to sub-crates or Cargo.toml needed.
## Verification
1. `cargo test --features std` - new tests pass
2. `cargo test` (no features) - existing no_std tests unaffected
3. Manual: call `serialize_mbus_data(hex_string, "annotated", None)` with the example frame from lib.rs docs and verify JSON output covers all 83 bytes with correct labels
4. Manual: call `"annotated"` on a wireless Format A sample and verify the JSON contains explicit CRC segments at original byte offsets