qubit-codec-misc 0.1.0

Miscellaneous byte and text format codecs for Rust applications
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
# Qubit Misc Codec

[![Rust CI](https://github.com/qubit-ltd/rs-codec-misc/actions/workflows/ci.yml/badge.svg)](https://github.com/qubit-ltd/rs-codec-misc/actions/workflows/ci.yml)
[![Coverage](https://img.shields.io/endpoint?url=https://qubit-ltd.github.io/rs-codec-misc/coverage-badge.json)](https://qubit-ltd.github.io/rs-codec-misc/coverage/)
[![Crates.io](https://img.shields.io/crates/v/qubit-codec-misc.svg?color=blue)](https://crates.io/crates/qubit-codec-misc)
[![Rust](https://img.shields.io/badge/rust-1.94+-blue.svg?logo=rust)](https://www.rust-lang.org)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
[![中文文档](https://img.shields.io/badge/文档-中文版-blue.svg)](README.zh_CN.md)

Reusable byte and text codecs for Rust applications.

## Overview

Qubit Misc Codec provides small, explicit codecs for stable byte and text
encodings commonly needed across Qubit Rust crates and applications. Its API
stays lightweight, typed, and idiomatic, with direct concrete methods for common
use cases and traits for generic boundaries.

This crate focuses on textual encodings with clear wire-format semantics:

- hexadecimal byte strings
- Base64 byte strings
- C integer literal fragments
- C string literal byte fragments
- percent-encoded UTF-8 text
- `application/x-www-form-urlencoded` UTF-8 text fragments

It intentionally does not replace Rust's `Display`, `FromStr`, `TryFrom`, or
`serde` APIs for ordinary object conversion.

## Design Goals

- **Explicit Semantics**: each codec documents its alphabet, separator, padding,
  and decoding rules.
- **Small API Surface**: expose direct `encode` and `decode` methods first, with
  traits available for generic call sites.
- **No Hidden Panics**: malformed input is reported as `MiscCodecError` instead of
  panicking.
- **Layered Traits**: `Codec` covers low-level single-value or quantum
  conversion, while `ValueEncoder` and `ValueDecoder` remain owned whole-value
  convenience traits. Generic adapters, buffered engines, and policy hooks live
  in `qubit-codec`; import them from the core crate when building custom
  adapters around these codecs.
- **Reusable Implementations**: common encodings live in one crate instead of
  being reimplemented by downstream crates.
- **Minimal Dependencies**: rely on well-maintained crates only where they add
  real value.

## Features

### 🔡 **Hexadecimal Bytes**

- **Lowercase by Default**: `HexCodec::new()` produces contiguous lowercase hex.
- **Uppercase Mode**: `HexCodec::upper()` or `with_uppercase(true)` produces
  uppercase digits.
- **Optional Whole Prefix**: add and require a prefix such as `0x` before the
  entire encoded value.
- **Optional Per-Byte Prefix**: add and require a byte prefix such as `0x`
  before each encoded byte.
- **Optional Separator**: write and accept separators between bytes, such as
  `:` or a space.
- **Whitespace Handling**: optionally ignore ASCII whitespace while decoding.
- **Prefix Case Handling**: optionally ignore ASCII case when matching
  configured prefixes while decoding.
- **Buffer APIs**: `encode_into` and `decode_into` append into existing buffers.

### 🔐 **Base64 Bytes**

- **Standard Alphabet**: padded and no-padding standard Base64.
- **URL-Safe Alphabet**: padded and no-padding URL-safe Base64.
- **Quantum Core**: `Base64QuantumCodec` handles complete three-byte to
  four-unit Base64 quanta; final padding stays in the facade/transcoder layer.
- **Typed Errors**: malformed input is reported as `MiscCodecError::InvalidInput`.

### 🔤 **C String Literal Bytes**

- **Mixed Text and Escapes**: decodes fragments such as `PK\003\004` and
  `\xd0\xcf`.
- **C Escape Support**: handles simple, octal, hexadecimal, and universal byte
  escapes.
- **Byte-Oriented Output**: decodes directly to raw bytes without requiring
  UTF-8.

### 🔢 **C Integer Literals**

- **Radix Detection**: decodes decimal, octal, and `0x`/`0X` hexadecimal
  integer literals.
- **Unsigned Output**: returns `u64` for non-negative integer literal fragments.
- **Precise Errors**: reports invalid digits with their original input index.
- **Value-Token Decode**: remains a `ValueDecoder<str>` convenience codec because
  integer literal encoding strategy and token boundaries are not part of the
  single-value core abstraction yet.

### 🌐 **Percent-Encoding**

- **UTF-8 Text**: encodes and decodes UTF-8 strings.
- **RFC 3986 Unreserved Set**: leaves ASCII letters, digits, `-`, `.`, `_`, and
  `~` unchanged.
- **Uppercase Escapes**: writes percent escapes such as `%2F` and `%E4`.
- **Malformed Escape Detection**: reports truncated or invalid `%XX` sequences.

### 📝 **Form URL Encoding**

- **Form Fragment Codec**: handles `application/x-www-form-urlencoded` text
  fragments.
- **Space as Plus**: encodes spaces as `+` and decodes `+` back to spaces.
- **Percent Compatibility**: shares the same UTF-8 and `%XX` validation behavior
  as `PercentCodec`.

### 🎯 **Focused Public API**

- **`ValueEncoder<Input>`**: encodes borrowed input into an associated output type.
- **`ValueDecoder<Input>`**: decodes borrowed input into an associated output type.
- **`Codec` with associated `Value` and `Unit`**: low-level unsafe trait for one value or one codec
  quantum over caller-provided unit buffers.
- **`CodecValueEncoder<C>` / `CodecBufferedEncoder<C>` /
  `CodecBufferedDecoder<C>`**: default value and buffered adapters
  available from `qubit-codec`.
- **`BufferedEncodeEngine` / `BufferedEncodeHooks` /
  `BufferedDecodeEngine` / `BufferedDecodeHooks`**: reusable buffered engines
  and policy hooks available from `qubit-codec` for custom adapters.
- **`MiscCodecError` / `MiscCodecResult`**: common error and result types for bundled
  codecs.

## Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
qubit-codec-misc = "0.1"
```

## Quick Start

### Hexadecimal Bytes

```rust
use qubit_codec_misc::HexCodec;

fn main() {
    let codec = HexCodec::upper()
        .with_prefix("0x")
        .with_separator(" ");

    let encoded = codec.encode(&[0x1f, 0x8b, 0x00, 0xff]);
    assert_eq!("0x1F 8B 00 FF", encoded);

    let decoded = codec
        .decode("0x1F 8B 00 FF")
        .expect("hex text should decode");
    assert_eq!(vec![0x1f, 0x8b, 0x00, 0xff], decoded);
}
```

### Base64 Bytes

```rust
use qubit_codec_misc::Base64Codec;

fn main() {
    let codec = Base64Codec::standard();

    let encoded = codec.encode(b"hello");
    assert_eq!("aGVsbG8=", encoded);

    let decoded = codec
        .decode("aGVsbG8=")
        .expect("Base64 text should decode");
    assert_eq!(b"hello".to_vec(), decoded);
}
```

### URL-Safe Base64 Without Padding

```rust
use qubit_codec_misc::Base64Codec;

fn main() {
    let codec = Base64Codec::url_safe_no_pad();

    let encoded = codec.encode(&[251, 255, 239]);
    assert_eq!("-__v", encoded);

    let decoded = codec
        .decode("-__v")
        .expect("URL-safe Base64 text should decode");
    assert_eq!(vec![251, 255, 239], decoded);
}
```

### C String Literal Bytes

```rust
use qubit_codec_misc::CStringLiteralCodec;

fn main() {
    let codec = CStringLiteralCodec::new();

    let decoded = codec
        .decode(r"PK\003\004")
        .expect("C string literal should decode");
    assert_eq!(b"PK\x03\x04".to_vec(), decoded);

    let encoded = codec.encode(&[0xd0, 0xcf, 0x11, 0xe0]);
    assert_eq!(r"\xD0\xCF\x11\xE0", encoded);
}
```

### C Integer Literals

```rust
use qubit_codec_misc::CIntegerLiteralCodec;

fn main() {
    let codec = CIntegerLiteralCodec::new();

    assert_eq!(123, codec.decode("123").expect("decimal should decode"));
    assert_eq!(83, codec.decode("0123").expect("octal should decode"));
    assert_eq!(
        0xbeef_c0de,
        codec.decode("0xBEEFC0DE").expect("hex should decode")
    );
}
```

### Percent-Encoding UTF-8 Text

```rust
use qubit_codec_misc::PercentCodec;

fn main() {
    let codec = PercentCodec::new();

    let encoded = codec.encode("a b/中");
    assert_eq!("a%20b%2F%E4%B8%AD", encoded);

    let decoded = codec
        .decode("a%20b%2F%E4%B8%AD")
        .expect("percent-encoded text should decode");
    assert_eq!("a b/中", decoded);
}
```

### Form URL Encoding

```rust
use qubit_codec_misc::FormUrlencodedCodec;

fn main() {
    let codec = FormUrlencodedCodec::new();

    let encoded = codec.encode("name=Qubit Codec");
    assert_eq!("name%3DQubit+Codec", encoded);

    let decoded = codec
        .decode("name%3DQubit+Codec")
        .expect("form-url-encoded text should decode");
    assert_eq!("name=Qubit Codec", decoded);
}
```

### Generic Trait Usage

Use the traits when application code should depend on an encoding capability
instead of a concrete codec type.

```rust
use qubit_codec_misc::{
    MiscCodecError,
    ValueEncoder,
    HexCodec,
};

fn encode_payload<C>(codec: &C, payload: &[u8]) -> Result<String, MiscCodecError>
where
    C: ValueEncoder<[u8], Output = String, Error = MiscCodecError>,
{
    codec.encode(payload)
}

fn main() {
    let text = encode_payload(&HexCodec::new(), &[0xab, 0xcd])
        .expect("hex encoding should not fail");
    assert_eq!("abcd", text);
}
```

## API Reference

### Trait Operations

| Trait | Method | Description |
|-------|--------|-------------|
| `ValueEncoder<Input>` | `encode(&Input)` | Encode borrowed input into an associated output type |
| `ValueDecoder<Input>` | `decode(&Input)` | Decode borrowed input into an associated output type |
| `Codec` with associated `Value` and `Unit` | `decode_unchecked`, `encode_unchecked` | Convert one value or codec quantum against caller-provided unit buffers |

The low-level `Codec` implementations intentionally exclude facade concerns:
hex prefix/separator handling, UTF-8 `String` validation, and Base64 final
padding are handled by value helpers or future buffered layers.

### `HexCodec` Operations

| Method | Description |
|--------|-------------|
| `new()` | Create a lowercase codec without prefix or separators |
| `upper()` | Create an uppercase codec without prefix or separators |
| `with_uppercase(enabled)` | Configure digit case |
| `with_prefix(prefix)` | Add and require a whole-output prefix, such as `0x1F8B` |
| `with_byte_prefix(prefix)` | Add and require a prefix before every byte, such as `0x1F 0x8B` |
| `with_separator(separator)` | Add and accept a separator between bytes |
| `with_ignored_ascii_whitespace(enabled)` | Ignore ASCII whitespace while decoding |
| `with_ignore_prefix_case(enabled)` | Ignore ASCII case when matching configured prefixes while decoding |
| `encode(bytes)` | Encode bytes into hexadecimal text |
| `encode_into(bytes, output)` | Append encoded text into an existing `String` |
| `decode(text)` | Decode hexadecimal text into bytes |
| `decode_into(text, output)` | Append decoded bytes into an existing `Vec<u8>` |

### `Base64Codec` Operations

| Method | Alphabet | Padding | Description |
|--------|----------|---------|-------------|
| `standard()` | Standard | Yes | Create standard Base64 codec |
| `standard_no_pad()` | Standard | No | Create standard Base64 codec without padding |
| `url_safe()` | URL-safe | Yes | Create URL-safe Base64 codec |
| `url_safe_no_pad()` | URL-safe | No | Create URL-safe Base64 codec without padding |
| `encode(bytes)` | Configured | Configured | Encode bytes into Base64 text |
| `decode(text)` | Configured | Configured | Decode Base64 text into bytes |

### `Base64QuantumCodec` Operations

| Method | Alphabet | Units | Description |
|--------|----------|-------|-------------|
| `standard()` | Standard | 4 | Create a standard Base64 quantum codec |
| `url_safe()` | URL-safe | 4 | Create a URL-safe Base64 quantum codec |
| `Codec<Value = [u8; 3], Unit = u8>` | Configured | 4 | Encode or decode one complete Base64 quantum without padding finalization |

### `CStringLiteralCodec` Operations

| Method | Description |
|--------|-------------|
| `new()` | Create a C string literal byte codec |
| `encode(bytes)` | Encode bytes into a C string literal fragment |
| `decode(text)` | Decode a C string literal fragment into bytes |

### `CIntegerLiteralCodec` Operations

| Method | Description |
|--------|-------------|
| `new()` | Create a C integer literal decoder |
| `decode(text)` | Decode a non-negative C integer literal fragment into `u64` |

`CIntegerLiteralCodec` intentionally remains a value-token decoder. It does not
implement `Codec<Value = u64, Unit = u8>` yet because that would require committing to token
boundary and encode-format policy that belongs above the single-value core.

### Text Codec Operations

| Type | Method | Description |
|------|--------|-------------|
| `PercentCodec` | `new()` | Create a percent codec |
| `PercentCodec` | `encode(text)` | Encode UTF-8 text using percent encoding |
| `PercentCodec` | `decode(text)` | Decode percent-encoded UTF-8 text |
| `FormUrlencodedCodec` | `new()` | Create a form-url-encoded codec |
| `FormUrlencodedCodec` | `encode(text)` | Encode UTF-8 text, using `+` for spaces |
| `FormUrlencodedCodec` | `decode(text)` | Decode UTF-8 text, treating `+` as spaces |

## Error Handling

Bundled decoders return `MiscCodecResult<T>`, an alias for
`Result<T, MiscCodecError>`.

| Error | Meaning |
|-------|---------|
| `MissingPrefix` | A configured whole or per-byte hex prefix was required but missing |
| `InvalidDigit` | Input contained a digit that is invalid for the requested radix |
| `InvalidLength` | Input length does not satisfy a codec requirement |
| `InvalidEscape` | Input contained a malformed or unsupported escape sequence |
| `InvalidCharacter` | Input contained a character that cannot appear in that context |
| `InvalidInput` | Input was rejected by a codec-specific validator |
| `InvalidUtf8` | Decoded bytes were not valid UTF-8 |

## Performance Considerations

Codec implementations operate on borrowed byte slices or strings and return
owned output only when the target format requires it. Configuration is stored in
small value types, and generic trait use does not require dynamic dispatch.

## Testing & Code Coverage

This project keeps codec behavior covered by integration tests under `tests/`.

### Running Tests

```bash
# Run all tests
cargo test

# Run with coverage report
./coverage.sh

# Generate text format report
./coverage.sh text

# Align code with CI requirements
./align-ci.sh

# Run CI checks (format, clippy, test, coverage, audit)
./ci-check.sh
```

## Dependencies

Runtime dependencies are intentionally small:

- `base64` provides the Base64 engines.
- `thiserror` provides the public error type implementation.

## License

Copyright (c) 2026. Haixing Hu.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

See [LICENSE](LICENSE) for the full license text.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

### Development Guidelines

- Follow Rust API Guidelines.
- Keep tests comprehensive and deterministic.
- Document public APIs and behavior changes.
- Ensure all checks pass before submitting a PR.

## Author

**Haixing Hu**

## Related Projects

More Rust libraries from Qubit are available under the
[qubit-ltd](https://github.com/qubit-ltd) GitHub organization.

---

Repository: [https://github.com/qubit-ltd/rs-codec-misc](https://github.com/qubit-ltd/rs-codec-misc)