fmtbuf
======
Write a formatted string into a fixed buffer.
This is useful when you have a user-provided buffer you want to write into, which frequently arises when writing foreign
function interfaces for C, where strings are expected to have a null terminator.
```rust
use fmtbuf::WriteBuf;
use core::fmt::Write;
let mut buf: [u8; 10] = [0; 10];
let mut writer = WriteBuf::new(&mut buf);
if let Err(e) = write!(&mut writer, "πππ") {
println!("write error: {e:?}");
}
let written = match writer.finish_with_or("!", "β¦") {
Ok(s) => s, // <- won't be hit since πππ is 12 bytes
Err(e) => {
println!("writing was truncated");
e.written()
}
};
assert_eq!("πβ¦", written);
```
A few things happened in that example:
1. We started with a 10 byte buffer
2. Tried to write `"πππ"` to it, which is encoded as 3 `b"\xf0\x9f\x9a\x80"`s (12 bytes)
3. This can't fit into 10 bytes, so only `"ππ"` is stored and the `writer` is noted as having truncated writes
4. We finish the buffer with `"!"` on success or `"β¦"` (a.k.a. `b"\xe2\x80\xa6"`) on truncation
5. Since we noted truncation in step #3, we try to write `"β¦"`, but this can not fit into the buffer either, since
8 (`"ππ".len()`) + 3 (`"β¦".len()`) > 10 (`buf.len()`)
6. Roll the buffer back to the end of the first π, then add β¦, leaving us with `"πβ¦"`
Usage
-----
The primary use case is for implementing APIs like [`strerror_r`](https://linux.die.net/man/3/strerror_r), where the
user provides the buffer.
```rust
use std::{ffi, fmt::Write, io::Error};
use fmtbuf::{TruncatedResultExt, WriteBuf};
#[no_mangle]
pub unsafe extern "C" fn mylib_strerror(
err: *mut Error,
buf: *mut ffi::c_char,
buf_len: usize
) {
let mut buf = unsafe {
// Buffer provided by user
std::slice::from_raw_parts_mut(buf as *mut u8, buf_len)
};
// Reserve at least 1 byte at the end because we will always
// write '\0'
let mut writer = WriteBuf::with_reserve(buf, 1);
// Use the standard `write!` macro (no error handling for
// brevity) -- note that an error here might only indicate
// write truncation, which is handled gracefully by this
// library's finish___ functions
let _ = write!(writer, "{}", err.as_ref().unwrap());
// null-terminate buffer or add "..." if it was truncated.
// `TruncatedResultExt` extracts the written `&str` (and its
// byte length) regardless of whether truncation occurred --
// useful for setting an FFI out-param.
let result = writer.finish_with_or("\0", "...\0");
let _written = result.written();
let _written_len = result.written_len();
}
```
Features
--------
### `#![no_std]`
Support for `#![no_std]` is enabled by disabling the default features and not re-enabling the `"std"` feature.
```toml
fmtbuf = { version = "*", default-features = false }
```
F.A.Q.
------
### Why not write to `&mut [u8]`?
The Rust Standard Library trait [`std::io::Write`](https://doc.rust-lang.org/stable/std/io/trait.Write.html) is
implemented for [`&mut [u8]`](https://doc.rust-lang.org/stable/std/io/trait.Write.html#impl-Write-for-%26mut+%5Bu8%5D)
which could be used instead of this library.
The problem with this approach is the lack of UTF-8 encoding support (also, it is not available in `#![no_std]`).
```rust
use std::io::{Cursor, Write};
fn main() {
let mut buf: [u8; 10] = [0; 10];
let mut writer = Cursor::<&mut [u8]>::new(&mut buf);
if let Err(e) = write!(&mut writer, "rocket: π") {
println!("write error: {e:?}");
}
let written_len = writer.position() as usize;
let written = &buf[..written_len];
println!("wrote {written_len} bytes: {written:?}");
println!("result: {:?}", std::str::from_utf8(written));
}
```
Running this program will show you the error:
```text
write error: Error { kind: WriteZero, message: "failed to write whole buffer" }
wrote 10 bytes: [114, 111, 99, 107, 101, 116, 58, 32, 240, 159]
result: Err(Utf8Error { valid_up_to: 8, error_len: None })
```
The problem is that `"rocket: π"` is encoded as a 12 byte sequence -- the π emoji is encoded in UTF-8 as the 4 bytes
`b"\xf0\x9f\x9a\x80"` -- but our target buffer is only 10 bytes long.
The `write!` to the cursor naΓ―vely cuts off the π mid-encode, making the encoded string invalid UTF-8, even though it
advanced the cursor the entire 10 bytes.
This is expected, since `std::io::Write` comes from `io` and does not know anything about string encoding; it operates
on the `u8` level.
One _could_ use the [`std::str::Utf8Error`](https://doc.rust-lang.org/stable/std/str/struct.Utf8Error.html) to properly
cut off the `buf`.
The only issue with this is performance.
Since `std::str::from_utf8` scans the whole string moving forward, it costs _O(n)_ to test this, whereas `fmtbuf` will
do this in _O(1)_, since it only looks at the final few bytes.
### What about Unicode _weird format characters_?
This library only guarantees that the contents of the target buffer is valid UTF-8.
It does not make any guarantees of semantics resulting from truncation due to the Unicode format characters,
specifically `U+200D`, `U+200E`, and `U+200F`.
**What?**
If you don't know what those are, that's okay.
Suffice it to say that human language is complicated and Unicode has a set a features to make things possible, but when
you run out of space to store that in your fixed-size buffer, things go awry.
If you're looking for details, see the mini sections below.
#### `U+200D`: Zero Width Joiner
Certain graphemes like "πββ" (which you might see as two separate graphemes) are comprised of three code points:
1. π [`U+1F647` "Person Bowing Deeply"](https://codepoints.net/U+1F647)
2. [`U+200D` "Zero Width Joiner"](https://codepoints.net/U+200D)
3. β [`U+2640` "Female Sign"](https://codepoints.net/U+2640)
So the single grapheme is the 10 byte sequence `b"\xf0\x9f\x99\x87\xe2\x80\x8d\xe2\x99\x80"`.
The question arises: What should happen if the buffer size is only 9?
**On truncation, this library will discard code points which are meant to be modifiers.**
This library will truncate the last Unicode code point, leaving you with `b"\xf0\x9f\x99\x87\xe2\x80\x8d"`--a person
bowing and a zero-width joiner joining with nothing, as the female modifier can not fit.
#### `U+200E` and `U+200F`: Direction Markers
Consider Arabic, which is a right-to-left language:
> βΨ’Ω
Ω Ψ£Ω ΩΨΩ βRustβ Ω
ΨΩ βC++β ΩΩΩ
ΩΨ§ Ω
Ψ§.β
Depending on how compliant with right-to-left presentation your text editor or browser is, you might see that text any
number of ways (if "Ψ’Ω
Ω" on the right-hand side of the text, then the presentation is working).
But note the borrowed words "Rust" and "C++" are still spelled in a left-to-right manner within the right-to-left text
(or they _should_ be).
This is done by encoding [`U+200E` left-to-right mark](https://codepoints.net/U+200E), then writing the borrowed
text, then [`U+200F` right-to-left mark](https://codepoints.net/U+200F) to continue.
What happens if text is reversed, but there is not enough space in the buffer to flip it back?
**On truncation, this library might leave you in the middle of a text-reversed run.**
The construction of [Egyptian Hieroglyphs](https://codepoints.net/egyptian_hieroglyphs) and other languages of this sort
face a similar issue.
Where should the cutoff be?
This library does not know the difference between "πͺππ»" and "πͺπ".
Figuring that out is the responsibility of a higher-level construct.
#### οΏ½
This library implements [`core::fmt::Write`](https://doc.rust-lang.org/stable/core/fmt/trait.Write.html), which only
accepts UTF-8-encoded data.
There is no place for οΏ½ in this library.
However, the result of a truncated run might be replaced by οΏ½ for presentation at a higher level.