soft-canonicalize
Path canonicalization that works with non-existing paths.
Rust implementation inspired by Python 3.6+ pathlib.Path.resolve(strict=False), providing the same functionality as std::fs::canonicalize (Rust's equivalent to Unix realpath()) but extended to handle non-existing paths, with optional features for simplified Windows output (dunce) and virtual filesystem semantics (anchored).
Why Use This?
🚀 Works with non-existing paths - Plan file locations before creating them
⚡ Fast - Mixed workload median performance: Windows ~1.8x (13,840 paths/s), Linux ~3.0x (379,119 paths/s) faster than Python's pathlib (see benchmark methodology for 5-run protocol and environment details)
✅ Compatible - 100% behavioral match with std::fs::canonicalize for existing paths, with optional UNC simplification via dunce feature (Windows)
🎯 Virtual filesystem support - Optional anchored feature for bounded canonicalization within directory boundaries
🔒 Robust - 500+ comprehensive tests including symlink cycle protection, malicious stream validation, and edge case handling
🛡️ Safe traversal - Proper .. and symlink resolution with cycle detection
🌍 Cross-platform - Windows, macOS, Linux with comprehensive UNC/symlink handling
🔧 Zero dependencies - Optional features may add minimal dependencies
Lexical vs. Filesystem-Based Resolution
Path resolution libraries fall into two categories:
Lexical Resolution (no I/O):
- Performance: Fast - no filesystem access
- Accuracy: Incorrect if symlinks are present (doesn't resolve them)
- Use when: You're 100% certain no symlinks exist and need maximum performance
- Examples:
std::path::absolute,normpath::normalize
Filesystem-Based Resolution (performs I/O):
- Performance: Slower - requires filesystem syscalls to resolve symlinks
- Accuracy: Correct - follows symlinks to their targets
- Use when: Safety is priority over performance, or symlinks may be present
- Examples:
std::fs::canonicalize,soft_canonicalize,dunce::canonicalize
Rule of thumb: If you cannot guarantee symlinks won't be introduced, or if correctness is critical, use filesystem-based resolution.
Use Cases
Path Comparison
- Equality: Determine if two different path strings point to the same location
- Containment: Check if one path is inside another directory
Common Applications
- Build Systems: Resolve output paths during build planning before directories exist
- Configuration Validation: Ensure user-provided paths stay within allowed boundaries
- Deduplication: Detect when different path strings refer to the same planned location
- Cross-Platform Normalization: Handle Windows UNC paths and symlinks consistently
Quick Start
Cargo.toml
[]
= "0.5"
Code Example
use soft_canonicalize;
let non_existing_path = r"C:\Users\user\documents\..\non\existing\config.json";
// Using Rust's own std canonicalize function:
let result = canonicalize;
assert!;
// Using our crate's function:
let result = soft_canonicalize;
assert!;
// Shows the UNC path conversion and path normalization
assert_eq!;
// With `dunce` feature enabled, paths are simplified when safe
// assert_eq!(
// result.unwrap().to_string_lossy(),
// r"C:\Users\user\non\existing\config.json"
// );
Optional Features
anchored- Virtual filesystem/bounded canonicalization (cross-platform)dunce- Simplified Windows path output (Windows-only target-conditional dependency)
Anchored Canonicalization (anchored feature)
For correct symlink resolution within virtual/constrained directory spaces, use anchored_canonicalize. This function implements true virtual filesystem semantics by clamping ALL paths (including absolute symlink targets) to the anchor directory:
[]
= { = "0.5", = ["anchored"] }
use anchored_canonicalize;
use fs;
// Set up an anchor/root directory (no need to pre-canonicalize)
let anchor = temp_dir.join;
create_dir_all?;
// Canonicalize paths relative to the anchor (anchor is soft-canonicalized internally)
let resolved_path = anchored_canonicalize?;
// Result: /tmp/workspace_root/etc/passwd (lexical .. clamped to anchor)
// Absolute symlinks are also clamped to the anchor
// If there's a symlink: workspace_root/config -> /etc/config
// It resolves to: workspace_root/etc/config (clamped to anchor)
let symlink_path = anchored_canonicalize?;
// Safe: always stays within workspace_root, even if symlink points to /etc/config
Key features of anchored_canonicalize:
- Virtual filesystem semantics: All absolute paths (including symlink targets) are clamped to anchor
- Anchor-relative canonicalization: Resolves paths relative to a specific anchor directory
- Complete symlink clamping: Follows symlink chains with clamping at each step
- Component-by-component: Processes path components in proper order
- Absolute results: Always returns absolute canonical paths within the anchor boundary
For a complete multi-tenant security example, see:
Simplified Path Output (dunce feature, Windows-only)
By default on Windows, soft_canonicalize returns paths in extended-length UNC format (\\?\C:\foo) for maximum robustness and compatibility with long paths, reserved names, and other Windows filesystem edge cases.
If you need simplified paths (C:\foo) for compatibility with legacy Windows applications or user-facing output, enable the dunce feature:
[]
= { = "0.5", = ["dunce"] }
Example:
use soft_canonicalize;
let path = soft_canonicalize?;
// Without dunce feature (default):
// Returns: \\?\C:\Users\user\config.json (extended-length UNC)
// With dunce feature enabled:
// Returns: C:\Users\user\config.json (simplified when safe)
When to use:
- ✅ Legacy applications that don't support UNC paths
- ✅ User-facing output requiring familiar path format
- ✅ Tools expecting traditional Windows path format
How it works:
The dunce crate intelligently simplifies Windows UNC paths (\\?\C:\foo → C:\foo) only when safe:
- Automatically keeps UNC for paths >260 chars
- Automatically keeps UNC for reserved names (CON, PRN, NUL, COM1-9, LPT1-9)
- Automatically keeps UNC for paths with trailing spaces/dots
- Automatically keeps UNC for paths containing
..(literal interpretation)
When Paths Must Exist: proc-canonicalize
Since v0.5.0, soft_canonicalize uses proc-canonicalize by default for existing-path canonicalization instead of std::fs::canonicalize. This fixes a critical issue with Linux namespace boundaries.
The Problem with std::fs::canonicalize
On Linux, std::fs::canonicalize resolves "magic symlinks" like /proc/PID/root to their targets:
// std::fs::canonicalize follows magic symlinks incorrectly
let path = canonicalize?; // Returns "/" (wrong!)
// This loses the namespace boundary - dangerous for container tooling
The Solution
proc-canonicalize preserves namespace boundaries:
use canonicalize;
let path = canonicalize?; // Returns "/proc/1/root" (correct!)
// Namespace boundary is preserved
When to Use Which
| Use Case | Function | Reason |
|---|---|---|
| Paths that may not exist | soft_canonicalize |
Handles non-existing paths |
| Existing paths (general) | proc_canonicalize::canonicalize |
Correct namespace handling |
| Existing paths (std behavior) | std::fs::canonicalize |
Legacy compatibility only |
Recommendation: If you need to canonicalize paths that must exist (and would previously use std::fs::canonicalize), use proc_canonicalize::canonicalize for correct Linux namespace handling:
[]
= "0.0"
Comparison with Alternatives
Feature Comparison
| Feature | soft_canonicalize |
proc_canonicalize |
std::fs::canonicalize |
std::path::absolute |
dunce::canonicalize |
|---|---|---|---|---|---|
| Resolution type | Filesystem-based | Filesystem-based | Filesystem-based | Lexical | Filesystem-based |
| Works with non-existing paths | ✅ | ❌ | ❌ | ✅ | ❌ |
| Resolves symlinks | ✅ | ✅ | ✅ | ❌ | ✅ |
| Preserves Linux namespaces | ✅ (default) | ✅ | ❌ | N/A | ❌ |
| Simplified Windows paths | ✅ (opt-in dunce feature) |
✅ (opt-in) | ❌ (UNC) | ❌ (varies) | ✅ |
| Virtual/bounded canonicalization | ✅ (opt-in anchored feature) |
❌ | ❌ | ❌ | ❌ |
| Zero dependencies | ✅ (default) | ✅ | ✅ | ✅ | ✅ |
When to Use Each
Choose soft_canonicalize when:
- ✅ You need
std::fs::canonicalizebehavior for paths that don't exist yet - ✅ Planning file locations before creating them (build systems, config generation)
- ✅ You want virtual filesystem/bounded canonicalization (with
anchoredfeature) - ✅ You need simplified Windows paths for legacy apps (with
duncefeature)
Choose alternatives when:
proc_canonicalize::canonicalize- All paths exist and you need correct Linux namespace handling (recommended overstd::fs::canonicalize)std::fs::canonicalize- All paths exist; only when you specifically need the legacy behavior that resolves/proc/PID/rootto/std::path::absolute- You only need absolute paths without symlink resolution (lexical, fast)dunce::canonicalize- Windows-only, all paths exist, just need UNC simplificationnormpath::normalize- Lexical normalization only, no filesystem I/O (fast but doesn't resolve symlinks)path_absolutize- Absolute path resolution without symlink following, with CWD caching optimizations
Related Projects
- strict-path - Type-safe path restriction with compile-time guarantees. Uses
soft-canonicalizeinternally for path validation and boundary enforcement.
Security & CVE Coverage
Security does not depend on enabling features. The core API is secure-by-default; the optional anchored feature is a convenience for virtual roots. We test all modes (no features; --features anchored; --features anchored,dunce).
Built-in protections include:
- NTFS Alternate Data Stream (ADS) validation - Blocks malicious stream placements and traversal attempts
- Symlink cycle detection - Bounded depth tracking prevents infinite loops
- Path traversal clamping - Never ascends past root/share/device boundaries
- Null byte rejection - Early validation prevents injection attacks
- UNC/device semantics - Preserves Windows extended-length and device namespace integrity
- TOCTOU race resistance - Tested against time-of-check-time-of-use attacks
See docs/SECURITY.md for detailed analysis, attack scenarios, and test references.
Known Limitations
Windows Short Filename Equivalence
On Windows, the filesystem may generate short filenames (8.3 format) for long directory names. For non-existing paths, this library cannot determine if a short filename form (e.g., PROGRA~1) and its corresponding long form (e.g., Program Files) refer to the same future location:
use soft_canonicalize;
// These non-existing paths are treated as different (correctly)
let short_form = soft_canonicalize?;
let long_form = soft_canonicalize?;
// They will NOT be equal because we cannot determine equivalence
// without filesystem existence
assert_ne!;
This is a fundamental limitation shared by Python's pathlib.Path.resolve(strict=False) and other path canonicalization libraries across languages. Short filename mapping only exists when files/directories are actually created by the filesystem.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
Changelog
See CHANGELOG.md for a detailed history of changes.