Skip to main content

Crate fmtbuf

Crate fmtbuf 

Source
Expand description

ยงfmtbuf

Write a formatted string into a fixed buffer. This is useful when you have a user-provided buffer you want to write into, which frequently arises when writing foreign function interfaces for C, where strings are expected to have a null terminator.

use fmtbuf::WriteBuf;
use core::fmt::Write;

let mut buf: [u8; 10] = [0; 10];
let mut writer = WriteBuf::new(&mut buf);
if let Err(e) = write!(&mut writer, "๐Ÿš€๐Ÿš€๐Ÿš€") {
    println!("write error: {e:?}");
}
let written = match writer.finish_with_or("!", "โ€ฆ") {
    Ok(s) => s, // <- won't be hit since ๐Ÿš€๐Ÿš€๐Ÿš€ is 12 bytes
    Err(e) => {
        println!("writing was truncated");
        e.written()
    }
};
assert_eq!("๐Ÿš€โ€ฆ", written);

A few things happened in that example:

  1. We started with a 10 byte buffer
  2. Tried to write "๐Ÿš€๐Ÿš€๐Ÿš€" to it, which is encoded as 3 b"\xf0\x9f\x9a\x80"s (12 bytes)
  3. This canโ€™t fit into 10 bytes, so only "๐Ÿš€๐Ÿš€" is stored and the writer is noted as having truncated writes
  4. We finish the buffer with "!" on success or "โ€ฆ" (a.k.a. b"\xe2\x80\xa6") on truncation
  5. Since we noted truncation in step #3, we try to write "โ€ฆ", but this can not fit into the buffer either, since 8 ("๐Ÿš€๐Ÿš€".len()) + 3 ("โ€ฆ".len()) > 10 (buf.len())
  6. Roll the buffer back to the end of the first ๐Ÿš€, then add โ€ฆ, leaving us with "๐Ÿš€โ€ฆ"

ยงUsage

The primary use case is for implementing APIs like strerror_r, where the user provides the buffer.

use std::{ffi, fmt::Write, io::Error};
use fmtbuf::{TruncatedResultExt, WriteBuf};

#[no_mangle]
pub unsafe extern "C" fn mylib_strerror(
    err: *mut Error,
    buf: *mut ffi::c_char,
    buf_len: usize
) {
    let mut buf = unsafe {
        // Buffer provided by user
        std::slice::from_raw_parts_mut(buf as *mut u8, buf_len)
    };
    // Reserve at least 1 byte at the end because we will always
    // write '\0'
    let mut writer = WriteBuf::with_reserve(buf, 1);

    // Use the standard `write!` macro (no error handling for
    // brevity) -- note that an error here might only indicate
    // write truncation, which is handled gracefully by this
    // library's finish___ functions
    let _ = write!(writer, "{}", err.as_ref().unwrap());

    // null-terminate buffer or add "..." if it was truncated.
    // `TruncatedResultExt` extracts the written `&str` (and its
    // byte length) regardless of whether truncation occurred --
    // useful for setting an FFI out-param.
    let result = writer.finish_with_or("\0", "...\0");
    let _written = result.written();
    let _written_len = result.written_len();
}

ยงFeatures

ยง#![no_std]

Support for #![no_std] is enabled by disabling the default features and not re-enabling the "std" feature.

fmtbuf = { version = "*", default-features = false }

ยงF.A.Q.

ยงWhy not write to &mut [u8]?

The Rust Standard Library trait std::io::Write is implemented for &mut [u8] which could be used instead of this library. The problem with this approach is the lack of UTF-8 encoding support (also, it is not available in #![no_std]).

use std::io::{Cursor, Write};

fn main() {
    let mut buf: [u8; 10] = [0; 10];
    let mut writer = Cursor::<&mut [u8]>::new(&mut buf);
    if let Err(e) = write!(&mut writer, "rocket: ๐Ÿš€") {
        println!("write error: {e:?}");
    }
    let written_len = writer.position() as usize;
    let written = &buf[..written_len];
    println!("wrote {written_len} bytes: {written:?}");
    println!("result: {:?}", std::str::from_utf8(written));
}

Running this program will show you the error:

write error: Error { kind: WriteZero, message: "failed to write whole buffer" }
wrote 10 bytes: [114, 111, 99, 107, 101, 116, 58, 32, 240, 159]
result: Err(Utf8Error { valid_up_to: 8, error_len: None })

The problem is that "rocket: ๐Ÿš€" is encoded as a 12 byte sequence โ€“ the ๐Ÿš€ emoji is encoded in UTF-8 as the 4 bytes b"\xf0\x9f\x9a\x80" โ€“ but our target buffer is only 10 bytes long. The write! to the cursor naรฏvely cuts off the ๐Ÿš€ mid-encode, making the encoded string invalid UTF-8, even though it advanced the cursor the entire 10 bytes. This is expected, since std::io::Write comes from io and does not know anything about string encoding; it operates on the u8 level.

One could use the std::str::Utf8Error to properly cut off the buf. The only issue with this is performance. Since std::str::from_utf8 scans the whole string moving forward, it costs O(n) to test this, whereas fmtbuf will do this in O(1), since it only looks at the final few bytes.

ยงWhat about Unicode weird format characters?

This library only guarantees that the contents of the target buffer is valid UTF-8. It does not make any guarantees of semantics resulting from truncation due to the Unicode format characters, specifically U+200D, U+200E, and U+200F.

What?

If you donโ€™t know what those are, thatโ€™s okay. Suffice it to say that human language is complicated and Unicode has a set a features to make things possible, but when you run out of space to store that in your fixed-size buffer, things go awry. If youโ€™re looking for details, see the mini sections below.

ยงU+200D: Zero Width Joiner

Certain graphemes like โ€œ๐Ÿ™‡โ€โ™€โ€ (which you might see as two separate graphemes) are comprised of three code points:

  1. ๐Ÿ™‡ U+1F647 โ€œPerson Bowing Deeplyโ€
  2. U+200D โ€œZero Width Joinerโ€
  3. โ™€ U+2640 โ€œFemale Signโ€

So the single grapheme is the 10 byte sequence b"\xf0\x9f\x99\x87\xe2\x80\x8d\xe2\x99\x80". The question arises: What should happen if the buffer size is only 9? On truncation, this library will discard code points which are meant to be modifiers. This library will truncate the last Unicode code point, leaving you with b"\xf0\x9f\x99\x87\xe2\x80\x8d"โ€“a person bowing and a zero-width joiner joining with nothing, as the female modifier can not fit.

ยงU+200E and U+200F: Direction Markers

Consider Arabic, which is a right-to-left language:

โ€ุขู…ู„ ุฃู† ูŠุญู„ โ€ŽRustโ€ ู…ุญู„ โ€ŽC++โ€ ูŠูˆู…ู‹ุง ู…ุง.โ€Ž

Depending on how compliant with right-to-left presentation your text editor or browser is, you might see that text any number of ways (if โ€œุขู…ู„โ€ on the right-hand side of the text, then the presentation is working). But note the borrowed words โ€œRustโ€ and โ€œC++โ€ are still spelled in a left-to-right manner within the right-to-left text (or they should be). This is done by encoding U+200E left-to-right mark, then writing the borrowed text, then U+200F right-to-left mark to continue.

What happens if text is reversed, but there is not enough space in the buffer to flip it back? On truncation, this library might leave you in the middle of a text-reversed run.

The construction of Egyptian Hieroglyphs and other languages of this sort face a similar issue. Where should the cutoff be? This library does not know the difference between โ€œ๐“ช๐“Œ๐“ƒปโ€ and โ€œ๐“ช๐“Œโ€. Figuring that out is the responsibility of a higher-level construct.

ยง๏ฟฝ

This library implements core::fmt::Write, which only accepts UTF-8-encoded data. There is no place for ๏ฟฝ in this library. However, the result of a truncated run might be replaced by ๏ฟฝ for presentation at a higher level.

Structsยง

Truncated
An error type indicating that a result was truncated.
WriteBuf
A write buffer pointing to a &mut [u8].

Traitsยง

TruncatedResultExt
Extension trait for Result<&str, Truncated> providing uniform access to the written content regardless of whether truncation occurred.