embedded_ffi/lib.rs
1#![forbid(intra_doc_link_resolution_failure)]
2
3//! Utilities related to FFI bindings, for embedded platforms that use
4//! Unix-like conventions. This is mostly copy & pasted from the Rust
5//! standard library.
6//!
7//! Note that OsString and CString require the `alloc` feature enabled
8//! in your Cargo.toml.
9//!
10//! This module provides utilities to handle data across non-Rust
11//! interfaces, like other programming languages and the underlying
12//! operating system. It is mainly of use for FFI (Foreign Function
13//! Interface) bindings and code that needs to exchange C-like strings
14//! with other languages.
15//!
16//! # Overview
17//!
18//! Rust represents owned strings with the [`String`] type, and
19//! borrowed slices of strings with the [`str`] primitive. Both are
20//! always in UTF-8 encoding, and may contain nul bytes in the middle,
21//! i.e., if you look at the bytes that make up the string, there may
22//! be a `\0` among them. Both `String` and `str` store their length
23//! explicitly; there are no nul terminators at the end of strings
24//! like in C.
25//!
26//! C strings are different from Rust strings:
27//!
28//! * **Encodings** - Rust strings are UTF-8, but C strings may use
29//! other encodings. If you are using a string from C, you should
30//! check its encoding explicitly, rather than just assuming that it
31//! is UTF-8 like you can do in Rust.
32//!
33//! * **Character size** - C strings may use `char` or `wchar_t`-sized
34//! characters; please **note** that C's `char` is different from Rust's.
35//! The C standard leaves the actual sizes of those types open to
36//! interpretation, but defines different APIs for strings made up of
37//! each character type. Rust strings are always UTF-8, so different
38//! Unicode characters will be encoded in a variable number of bytes
39//! each. The Rust type [`char`] represents a '[Unicode scalar
40//! value]', which is similar to, but not the same as, a '[Unicode
41//! code point]'.
42//!
43//! * **Nul terminators and implicit string lengths** - Often, C
44//! strings are nul-terminated, i.e., they have a `\0` character at the
45//! end. The length of a string buffer is not stored, but has to be
46//! calculated; to compute the length of a string, C code must
47//! manually call a function like `strlen()` for `char`-based strings,
48//! or `wcslen()` for `wchar_t`-based ones. Those functions return
49//! the number of characters in the string excluding the nul
50//! terminator, so the buffer length is really `len+1` characters.
51//! Rust strings don't have a nul terminator; their length is always
52//! stored and does not need to be calculated. While in Rust
53//! accessing a string's length is a O(1) operation (because the
54//! length is stored); in C it is an O(length) operation because the
55//! length needs to be computed by scanning the string for the nul
56//! terminator.
57//!
58//! * **Internal nul characters** - When C strings have a nul
59//! terminator character, this usually means that they cannot have nul
60//! characters in the middle — a nul character would essentially
61//! truncate the string. Rust strings *can* have nul characters in
62//! the middle, because nul does not have to mark the end of the
63//! string in Rust.
64//!
65//! # Representations of non-Rust strings
66//!
67//! [`CString`] and [`CStr`] are useful when you need to transfer
68//! UTF-8 strings to and from languages with a C ABI, like Python.
69//!
70//! * **From Rust to C:** [`CString`] represents an owned, C-friendly
71//! string: it is nul-terminated, and has no internal nul characters.
72//! Rust code can create a [`CString`] out of a normal string (provided
73//! that the string doesn't have nul characters in the middle), and
74//! then use a variety of methods to obtain a raw `*mut `[`u8`] that can
75//! then be passed as an argument to functions which use the C
76//! conventions for strings.
77//!
78//! * **From C to Rust:** [`CStr`] represents a borrowed C string; it
79//! is what you would use to wrap a raw `*const `[`u8`] that you got from
80//! a C function. A [`CStr`] is guaranteed to be a nul-terminated array
81//! of bytes. Once you have a [`CStr`], you can convert it to a Rust
82//! [`&str`][`str`] if it's valid UTF-8, or lossily convert it by adding
83//! replacement characters.
84//!
85//! [`OsString`] and [`OsStr`] are useful when you need to transfer
86//! strings to and from the operating system itself, or when capturing
87//! the output of external commands. Conversions between [`OsString`],
88//! [`OsStr`] and Rust strings work similarly to those for [`CString`]
89//! and [`CStr`].
90//!
91//! * [`OsString`] represents an owned string in whatever
92//! representation the operating system prefers. In the Rust standard
93//! library, various APIs that transfer strings to/from the operating
94//! system use [`OsString`] instead of plain strings.
95//!
96//! * [`OsStr`] represents a borrowed reference to a string in a
97//! format that can be passed to the operating system. It can be
98//! converted into an UTF-8 Rust string slice in a similar way to
99//! [`OsString`].
100//!
101//! # Conversions
102//!
103//! ## On Unix
104//!
105//! On Unix, [`OsStr`] implements the
106//! [`OsStrExt`] trait, which
107//! augments it with two methods, [`from_bytes`] and [`as_bytes`].
108//! These do inexpensive conversions from and to UTF-8 byte slices.
109//!
110//! Additionally, on Unix [`OsString`] implements the
111//! [`OsStringExt`] trait,
112//! which provides [`from_vec`] and [`into_vec`] methods that consume
113//! their arguments, and take or produce vectors of [`u8`].
114//!
115//! [`String`]: alloc::string::String
116//! [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value
117//! [Unicode code point]: http://www.unicode.org/glossary/#code_point
118//! [`from_vec`]: OsStringExt::from_vec
119//! [`into_vec`]: OsStringExt::into_vec
120//! [`from_bytes`]: OsStrExt::from_bytes
121//! [`as_bytes`]: OsStrExt::as_bytes
122#![no_std]
123#[cfg(feature = "alloc")]
124extern crate alloc;
125
126#[doc(no_inline)]
127pub use cstr_core::CStr;
128#[cfg(feature = "alloc")]
129#[doc(no_inline)]
130pub use cstr_core::CString;
131
132#[cfg(feature = "alloc")]
133pub use inner::inner_alloc::OsStringExt;
134pub use inner::OsStrExt;
135pub use os_str::OsStr;
136#[cfg(feature = "alloc")]
137pub use os_str::OsString;
138
139mod inner;
140mod lossy;
141mod os_str;
142
143mod sys_common {
144 #[doc(hidden)]
145 pub trait AsInner<Inner: ?Sized> {
146 fn as_inner(&self) -> &Inner;
147 }
148
149 /// A trait for extracting representations from std types
150 #[doc(hidden)]
151 pub trait IntoInner<Inner> {
152 fn into_inner(self) -> Inner;
153 }
154
155 /// A trait for creating std types from internal representations
156 #[doc(hidden)]
157 pub trait FromInner<Inner> {
158 fn from_inner(inner: Inner) -> Self;
159 }
160
161 pub mod bytestring {
162 use core::fmt::{Formatter, Result, Write};
163
164 use crate::lossy::{Utf8Lossy, Utf8LossyChunk};
165
166 pub fn debug_fmt_bytestring(slice: &[u8], f: &mut Formatter<'_>) -> Result {
167 // Writes out a valid unicode string with the correct escape sequences
168 fn write_str_escaped(f: &mut Formatter<'_>, s: &str) -> Result {
169 for c in s.chars().flat_map(|c| c.escape_debug()) {
170 f.write_char(c)?
171 }
172 Ok(())
173 }
174
175 f.write_str("\"")?;
176 for Utf8LossyChunk { valid, broken } in Utf8Lossy::from_bytes(slice).chunks() {
177 write_str_escaped(f, valid)?;
178 for b in broken {
179 write!(f, "\\x{:02X}", b)?;
180 }
181 }
182 f.write_str("\"")
183 }
184 }
185}
186
187// https://tools.ietf.org/html/rfc3629
188#[rustfmt::skip]
189static UTF8_CHAR_WIDTH: [u8; 256] = [
190 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
191 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // 0x1F
192 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
193 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // 0x3F
194 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
195 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // 0x5F
196 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
197 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // 0x7F
198 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
199 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 0x9F
200 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
201 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 0xBF
202 0,0,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
203 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, // 0xDF
204 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3, // 0xEF
205 4,4,4,4,4,0,0,0,0,0,0,0,0,0,0,0, // 0xFF
206];
207
208/// Given a first byte, determines how many bytes are in this UTF-8 character.
209#[inline]
210fn utf8_char_width(b: u8) -> usize {
211 UTF8_CHAR_WIDTH[b as usize] as usize
212}