1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110
#![forbid(unsafe_code, rust_2018_idioms)]
//! This crate contains an assortment of utilities to deal with paths and their conversions.
//!
//! Generally `git` treats paths as bytes, but inherently assumes non-illformed UTF-8 as encoding on windows. Internally, it expects
//! slashes to be used as path separators and paths in files must have slashes, with conversions being performed on windows accordingly.
//!
//! <details>
//!
//! ### Research
//!
//! * **windows**
//! - [`dirent.c`](https://github.com/git/git/blob/main/compat/win32/dirent.c#L31:L31) contains all implementation (seemingly) of opening directories and reading their entries, along with all path conversions (UTF-16 for windows). This is done on the fly so git can work with [in UTF-8](https://github.com/git/git/blob/main/compat/win32/dirent.c#L12:L12).
//! - mingw [is used for the conversion](https://github.com/git/git/blob/main/compat/mingw.h#L579:L579) and it appears they handle surrogates during the conversion, maybe some sort of non-strict UTF-8 converter? Actually it uses [WideCharToMultiByte](https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte)
//! under the hood which by now does fail if the UTF-8 would be invalid unicode, i.e. unicode pairs.
//! - `OsString` on windows already stores strings as WTF-8, which supports [surrogate pairs](https://unicodebook.readthedocs.io/unicode_encodings.html),
//! something that UTF-8 isn't allowed do it for security reasons, after all it's UTF-16 specific and exists only to extend
//! the encodable code-points.
//! - informative reading on [WTF-8](https://simonsapin.github.io/wtf-8/#motivation) which is the encoding used by Rust
//! internally that deals with surrogates and non-wellformed surrogates (those that aren't in pairs).
//! * **unix**
//! - It uses [opendir](https://man7.org/linux/man-pages/man3/opendir.3.html) and [readdir](https://man7.org/linux/man-pages/man3/readdir.3.html)
//! respectively. There is no encoding specified, except that these paths are null-terminated.
//!
//! ### Learnings
//!
//! Surrogate pairs are a way to extend the encodable value range in UTF-16 encodings, used primarily on windows and in Javascript.
//! For a long time these codepoints used for surrogates, always to be used in pairs, were not assigned, until…they were for rare
//! emojies and the likes. The unicode standard does not require surrogates to happen in pairs, even though by now unpaired surrogates
//! in UTF-16 are considered ill-formed, which aren't supposed to be converted to UTF-8 for example.
//!
//! This is the reason we have to deal with `to_string_lossy()`, it's _just_ for that quirk.
//!
//! This also means the only platform ever eligible to see conversion errors is windows, and there it's only older pre-vista
//! windows versions which incorrectly allow ill-formed UTF-16 strings. Newer versions don't perform such conversions anymore, for
//! example when going from UTF-16 to UTF-8, they will trigger an error.
//!
//! ### Conclusions
//!
//! Since [WideCharToMultiByte](https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte) by now is
//! fixed (Vista onward) to produce valid UTF-8, lone surrogate codepoints will cause failure, which `git`
//! [doesn't care about](https://github.com/git/git/blob/main/compat/win32/dirent.c#L12:L12).
//!
//! We will, though, which means from now on we can just convert to UTF-8 on windows and bubble up errors where necessary,
//! preventing potential mismatched surrogate pairs to ever be saved on disk by gitoxide.
//!
//! Even though the error only exists on older windows versions, we will represent it in the type system through fallible function calls.
//! Callers may `.expect()` on the result to indicate they don't wish to handle this special and rare case. Note that servers should not
//! ever get into a code-path which does panic though.
//! </details>
/// A dummy type to represent path specs and help finding all spots that take path specs once it is implemented.
/// A preliminary version of a path-spec based on glances of the code.
#[derive(Clone, Debug)]
pub struct Spec(bstr::BString);
mod convert;
mod spec;
use std::{fs::create_dir_all, ops::Deref, path::Path};
pub use convert::*;
use tempfile::tempdir_in;
///
pub mod realpath;
pub use realpath::function::{realpath, realpath_opts};
pub fn create_symlink(from: &Path, to: &Path) {
create_dir_all(from.parent().unwrap()).unwrap();
#[cfg(not(target_os = "windows"))]
std::os::unix::fs::symlink(to, &from).unwrap();
#[cfg(target_os = "windows")]
std::os::windows::fs::symlink_file(to, &from).unwrap();
}
pub struct CanonicalizedTempDir {
pub dir: tempfile::TempDir,
}
impl CanonicalizedTempDir {
pub fn new() -> Self {
#[cfg(windows)]
let canonicalized_tempdir = std::env::temp_dir();
#[cfg(not(windows))]
let canonicalized_tempdir = std::env::temp_dir().canonicalize().unwrap();
let dir = tempdir_in(canonicalized_tempdir).unwrap();
Self { dir }
}
}
impl Default for CanonicalizedTempDir {
fn default() -> Self {
Self::new()
}
}
impl AsRef<Path> for CanonicalizedTempDir {
fn as_ref(&self) -> &Path {
self
}
}
impl Deref for CanonicalizedTempDir {
type Target = Path;
fn deref(&self) -> &Self::Target {
self.dir.path()
}
}