proc-canonicalize 0.1.2

Fix std::fs::canonicalize for /proc/PID/root and /proc/PID/cwd paths on Linux
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
# AGENTS: AI Contributor Guide

This repo contains a zero-dependency Rust crate that fixes `std::fs::canonicalize` for Linux `/proc/PID/root` and `/proc/PID/cwd` paths. It preserves namespace boundaries that `std::fs::canonicalize` incorrectly resolves to `/`.

Use this guide when proposing changes, refactors, tests, or docs with an automated agent.

## Golden Rules

- Compatibility: Normal paths must behave exactly like `std::fs::canonicalize`.
- Namespace preservation: `/proc/PID/root` and `/proc/PID/cwd` prefixes must be preserved.
- Zero deps: Keep runtime dependencies at 0 (dev-only `tempfile` is allowed in tests).
- Linux-focused: Primary functionality is Linux-specific; other platforms fall back to std.
- MSRV: Keep Minimum Supported Rust Version at `1.70.0` (edition 2021; no unstable features).
- CI clean: `cargo fmt`, `clippy -D warnings`, tests, docs (rustdoc `-D warnings`), and MSRV all pass locally.

## Public Surface (do not break)

- `pub fn canonicalize(path: impl AsRef<std::path::Path>) -> std::io::Result<std::path::PathBuf>`

Do not change signatures or remove items without a clear migration plan and tests.

## Behavioral Invariants

- Normal paths: Return exactly what `std::fs::canonicalize` returns.
- `/proc/PID/root` paths: Preserve the namespace prefix instead of resolving to `/`.
- `/proc/PID/cwd` paths: Preserve the namespace prefix instead of resolving to the actual cwd.
- `/proc/self/root` and `/proc/self/cwd`: Treated as namespace boundaries.
- `/proc/thread-self/root` and `/proc/thread-self/cwd`: Treated as namespace boundaries.
- Non-Linux platforms: Fall back to `std::fs::canonicalize` directly.
- Optional `dunce` feature (Windows only): Simplifies extended-length paths on Windows.

## Security Architecture & Path Resolution

To prevent namespace escapes and ensure security, the library follows a strict resolution order:

1. **Indirect Symlink Detection**: Before processing, we check if the path contains symlinks that point to a `/proc` magic path (e.g., `/tmp/link -> /proc`).
   - *Why*: If we passed `/tmp/link/self/root` directly to `std::fs::canonicalize`, it would resolve the symlink to `/proc`, see it's a magic link, and resolve it to `/` (host root), bypassing our protection.
   - *Rule*: Any path that *resolves* to a magic path must be treated as a magic path.

2. **Prefix Resolution**: When a magic path is found (e.g., `/proc/self/root/etc`), we:
   - Resolve the prefix (`/proc/self/root`) to its absolute host path (e.g., `/proc/1234/root`).
   - Canonicalize the full path (`/proc/1234/root/etc`).
   - Strip the resolved prefix from the full path.
   - Re-attach the original magic prefix.

3. **Escape Detection**: If the stripped path cannot be re-attached (e.g., it resolved to `../../etc`), we return the absolute path. We never return a path that claims to be inside the namespace if it actually escaped it.

## Known Limitations

This library provides best-effort namespace preservation suitable for **cooperative environments**. It is NOT a security boundary against adversarial filesystem manipulation.

### Fundamental Limitations (require API/design changes to fix)

- **TOCTOU After Symlink Scan**: The code detects symlinks into `/proc` before calling `std::fs::canonicalize`, but an attacker who controls the path can swap components to point into `/proc` after the scan and before canonicalization. Avoidable only with `openat`-style, race-free resolution.

- **PID Reuse Race**: Paths like `/proc/<pid>/root/...` are resolved using the live proc entry. If the target process exits and a new process reuses that PID between your initial check and the canonicalize call, the namespace being consulted may silently change.

- **Mount-Namespace Churn**: If the target PID switches mount namespaces mid-resolution (e.g., during container teardown/startup), the resolved path could straddle namespaces unexpectedly. There's no stabilization of the namespace view during canonicalization.

- **Untrusted /proc Mount**: Logic assumes `/proc` is the real procfs. If an attacker controls a bind-mount or chroot that provides a fake `/proc`, the pattern match will treat it as authoritative and could misrepresent a hostile namespace boundary.

### Detection Limitations (pattern-matching constraints)

- **Bind Mounts of /proc**: Paths involving bind mounts (e.g., `mount --bind /proc /mnt/proc`) are not detected as magic paths unless they explicitly use the `/proc` path. The library relies on path pattern matching (`/proc/PID/...`).

- **Non-Standard /proc Mounts**: If `/proc` is mounted at a different location (e.g., `/custom/proc`), it will not be detected.

### Safe Use Cases

- Host-side container monitoring (container cannot modify host filesystem)
- Path canonicalization where users control input strings but not the filesystem
- Best-effort namespace prefix preservation for logging/display

### Unsafe Use Cases

- Security decisions on paths under attacker-controlled directories
- Validation on shared/remote filesystems (NFS, FUSE)
- PID-based validation during process lifecycle transitions

For adversarial environments, use `O_NOFOLLOW` + `openat()` chains and validate via `fstat()` on the opened file descriptor.

## Repository Layout

- `src/lib.rs`: Core algorithm, Linux/non-Linux branches, helpers, tests.
- `README.md`: User-facing documentation.
- `CHANGELOG.md`: Version history.
- CI helpers: `ci-local.sh` and `ci-local.ps1` replicate GitHub Actions locally.

## Local CI (run before any PR)

- Bash (Linux/macOS/WSL): `bash ci-local.sh`
- PowerShell (Windows): `.\ci-local.ps1`

These scripts:
- Check UTF-8 encodings and BOM for critical files.
- Run `cargo fmt --check`, `clippy -D warnings`, `cargo test --verbose` (includes doctests), and `cargo doc` with `RUSTDOCFLAGS='-D warnings'`.
- Test feature combinations: default features and `--features dunce` on Windows.
- Verify MSRV by building and linting on Rust 1.70.0.

## Coding Guidelines

- Style: Follow `rustfmt` defaults; keep code clear and small; avoid over-abstraction.
- Error handling: Return meaningful `io::Error` with context when appropriate.
- Allocation: Avoid unnecessary allocations; prefer `PathBuf` and component streaming.
- Platform cfg: Keep Linux/non-Linux branches correct; don't introduce behavioral drift.
- Dependencies: Do not add runtime dependencies. If you believe one is strictly necessary, open an issue first.
- **MSRV Compliance**: We support Rust 1.70.0. Run `rustup run 1.70.0 cargo clippy` to catch version-specific lints (e.g., `clippy::needless_borrow` behavior changes between versions).
- **dunce feature usage (CRITICAL)**: Any code that uses `dunce::` functions MUST be guarded with `#[cfg(all(feature = "dunce", windows))]`. The dunce crate is a Windows-only target-conditional dependency.

## Documentation Examples (CRITICAL)

Documentation examples in `lib.rs` must be **testable and tested**. Claims about behavior must be backed by assertions, not just comments.

**Note**: `README.md` examples are NOT tested by `cargo test --doc`, so they should be clean and readable without hidden test scaffolding. However, they should still use `assert!` macros to demonstrate expected behavior clearly.

### Examples Must Demonstrate Unique Value

Don't waste reader time showing that "normal behavior works like std". Examples should show what makes THIS crate different.

**BAD** (adds no value):
```rust
// Normal paths work like std::fs::canonicalize
let our_result = canonicalize(".")?;
let std_result = std::fs::canonicalize(".")?;
assert_eq!(our_result, std_result);
```

**GOOD** (shows why this crate exists):
```rust
// We preserve namespace boundaries that std breaks
let our_result = canonicalize("/proc/self/root")?;
let std_result = std::fs::canonicalize("/proc/self/root")?;
assert_eq!(std_result, PathBuf::from("/"));  // std breaks it
assert_eq!(our_result, PathBuf::from("/proc/self/root"));  // we fix it
```

### Code Example Style for Clarity

1. **Use fully qualified paths when comparing APIs**: Show `std::fs::canonicalize` vs `proc_canonicalize::canonicalize` side-by-side so readers see the difference without mental mapping.
2. **Prefer exact comparisons over partial checks**: Use `assert_eq!(path, Path::new("/proc/self/root"))` instead of `assert!(path.starts_with("/proc/self/root"))`. Exact values reduce cognitive load.
3. **Use `Path::new()` over `PathBuf::from()`**: For assertions, `Path::new("/proc/self/root")` is cleaner than `PathBuf::from("/proc/self/root")` since `PathBuf` derefs to `Path`.
4. **Merge related examples**: When showing problem vs solution, use one code block with both to make the contrast obvious.
5. **Use modern format syntax**: Prefer `format!("/proc/{container_pid}/root")` over `format!("/proc/{}/root", container_pid)`.
6. **Choose information-dense examples**: Use one rich example that demonstrates multiple properties instead of multiple simple examples. For instance, `/proc/self/root/etc` shows both namespace preservation AND path resolution, eliminating the need for separate `/proc/self/root` and `/proc/self/root/etc` examples.

### Rules for `lib.rs` Examples

1. **Use `assert!` macros**: When showing behavior, use assertions to prove the claim.
2. **No `no_run` or `ignore`**: Examples must actually execute. These attributes are forbidden.
3. **Platform-gating is allowed**: Use `#[cfg(target_os = "linux")]` to limit tests to relevant platforms.
4. **Examples are tests**: Doc examples are compiled and run by `cargo test --doc`. They must pass.
5. **No failure messages in doc assertions**: In documentation, the assertion IS the teaching moment. Failure messages are noise.

**BAD** (in documentation):
```rust
assert!(canonical.starts_with(&container_root), "path escapes container boundary");
```

**GOOD** (in documentation):
```rust
// Security: canonical path must stay inside container_root
assert!(canonical.starts_with(&container_root));
```

**Note**: In unit tests (not documentation), failure messages ARE helpful for debugging CI failures:
```rust
// In unit tests, failure messages help diagnose issues quickly
assert!(result.starts_with("/proc/self/root"), "expected /proc prefix, got: {:?}", result);
```

### Correct Pattern (`lib.rs`)

```rust
//! ```rust
//! # #[cfg(target_os = "linux")]
//! # fn main() -> std::io::Result<()> {
//! use std::path::PathBuf;
//!
//! // Prove the claim with an assertion
//! let resolved = proc_canonicalize::canonicalize("/proc/self/root")?;
//! assert_eq!(resolved, PathBuf::from("/proc/self/root"));
//! # Ok(())
//! # }
//! # #[cfg(not(target_os = "linux"))]
//! # fn main() {}
//! ```
```

### Correct Pattern (`README.md`)

README examples are not tested, so keep them clean without hidden scaffolding:

```rust
use proc_canonicalize::canonicalize;
use std::path::PathBuf;

let resolved = canonicalize("/proc/self/root")?;
assert_eq!(resolved, PathBuf::from("/proc/self/root"));
```

### Incorrect Patterns (do NOT use)

```rust
// BAD: Uses no_run - test never executes
//! ```rust,no_run

// BAD: Uses ignore - test is skipped entirely  
//! ```rust,ignore

// BAD: Comment-only claim with no assertion
//! let result = canonicalize(path)?;  // Returns "/proc/self/root"
```

### Why This Matters

- Doc examples are the first thing users see and copy
- Assertions ensure examples stay correct as code evolves
- `cargo test --doc` catches documentation rot automatically
- Comments can lie; assertions cannot

## Tests & Quality Gates

- Run `cargo test` on Linux (primary platform) and optionally Windows/macOS.
- Coverage areas include: namespace boundary detection, PID validation, existing/non-existing paths, permission errors, symlink resolution within namespaces.
- When changing behavior, add focused tests alongside the changed logic.
- Keep tests deterministic and filesystem-safe.

### Namespace Testing Guidelines (CRITICAL)

When writing or reviewing tests for `/proc` namespace paths, follow these rules to ensure comprehensive coverage:

#### 1. Namespace Type Symmetry
For every test on `/proc/.../root`, write a parallel test for `/proc/.../cwd`. They behave differently:
- `/proc/self/root` typically resolves to `/` on the host
- `/proc/self/cwd` resolves to the actual working directory (NOT `/`)

**Example of missing symmetry (BAD):**
```rust
// Only tests root - what about cwd?
fn test_canonicalize_proc_self_root() { ... }
```

**Correct approach (GOOD):**
```rust
fn test_canonicalize_proc_self_root() { ... }
fn test_canonicalize_proc_self_cwd() { ... }  // Parallel test
```

#### 2. Test "Paths Through", Not Just "Prefix Alone"
Always test accessing files *through* the namespace, not just the namespace prefix itself:

**Insufficient (BAD):**
```rust
canonicalize("/proc/self/cwd")  // Only tests the prefix
```

**Comprehensive (GOOD):**
```rust
canonicalize("/proc/self/cwd")           // Prefix alone
canonicalize("/proc/self/cwd/file.txt")  // Path THROUGH the namespace
canonicalize("/proc/self/cwd/dir/file")  // Deeper path through
```

#### 3. Test Namespace Escape via `..`
Paths using `..` can escape the namespace. Test both cases:
- `..` that stays inside (e.g., `/proc/self/root/../etc` → still inside because `..` from `/` is `/`)
- `..` that escapes (e.g., `/proc/self/cwd/..` → escapes to parent of cwd)

#### 4. Test Non-Root Namespace Targets
The namespace prefix may not resolve to `/`. Ensure tests cover:
- Direct paths where prefix resolves to `/` (e.g., `/proc/self/root` on host)
- Indirect symlinks that exercise non-`/` prefix resolution
- Container-like scenarios where `/proc/PID/root` is not `/`

#### 5. Invariant Tests
Add property-based tests that verify invariants:
```rust
// Invariant 1: Result is either prefixed OR escaped absolute
// Invariant 2: canonicalize(canonicalize(x)) == canonicalize(x)
```

### Bug Fix Methodology (CRITICAL)

When fixing bugs, you **MUST** follow this test-driven process to verify both the bug and the fix:

1. **Write tests first**: Create tests that exercise the buggy behavior.
2. **Verify tests FAIL**: Run the tests and confirm they fail, proving the bug exists.
3. **Implement the fix**: Write the code to address the bug.
4. **Verify tests PASS**: Run the tests again and confirm they now pass.
5. **Run full CI**: Ensure no regressions in existing functionality.

**Why this matters**: If you write tests and fixes simultaneously without step 2, you have no proof that:
- The bug actually exists in the codebase
- Your fix addresses the bug (tests might pass for the wrong reason)

**Example workflow**:
```bash
# 1. Write tests for the bug
# 2. Run tests - they should FAIL
cargo test my_new_bug_tests  # Expect failures

# 3. Implement the fix
# 4. Run tests again - they should PASS
cargo test my_new_bug_tests  # Expect success

# 5. Full CI
bash ci-local.sh
```

**Temporary disable technique**: If you accidentally write the fix first, you can still verify by temporarily commenting out the fix, running the tests (should fail), then re-enabling the fix (should pass).

### Self-Documenting Tests (CRITICAL)

Tests should be **self-explanatory through code**, not comments. The code itself must clearly show how the API should or should not be used.

#### Principles

1. **Function names read like sentences**: Use descriptive names that explain what's being tested.
   ```rust
   // GOOD: Name explains the behavior
   fn proc_self_root_preserved_not_resolved_to_slash() { ... }
   fn symlink_to_proc_self_root_preserves_namespace() { ... }
   
   // BAD: Generic or unclear names
   fn test_case_1() { ... }
   fn test_canonicalize() { ... }
   ```

2. **Variable names explain intent**: Names should tell the story.
   ```rust
   // GOOD: Variables explain the scenario
   let container_pid = std::process::id();
   let container_root = format!("/proc/{}/root", container_pid);
   let file_inside_container = format!("{}/etc", container_root);
   let is_inside_container = canonical.starts_with(&container_root);
   
   // BAD: Generic names require mental mapping
   let p = std::process::id();
   let r = format!("/proc/{}/root", p);
   let f = format!("{}/etc", r);
   ```

3. **Tests are copy-able usage examples**: A user should be able to copy test code directly.
   ```rust
   #[test]
   fn reading_container_file_from_host() {
       let container_pid = std::process::id();
       let container_root = format!("/proc/{}/root", container_pid);
       let file_inside_container = format!("{}/etc", container_root);
   
       let canonical_path = canonicalize(file_inside_container).unwrap();
   
       assert!(canonical_path.starts_with(&container_root));
   }
   ```

4. **Show contrast with std when relevant**: Make the difference obvious.
   ```rust
   let our_result = canonicalize(path).unwrap();
   let std_result = std::fs::canonicalize(path).unwrap();
   
   assert_eq!(std_result, PathBuf::from("/"));           // std breaks it
   assert_eq!(our_result, PathBuf::from("/proc/self/root")); // we fix it
   ```

5. **Minimal comments**: If you need a comment to explain what code does, rename the variables instead.
   ```rust
   // BAD: Comment explains what code should say
   let r = canonicalize(p)?;  // Returns the preserved namespace path
   
   // GOOD: Code speaks for itself
   let preserved_namespace_path = canonicalize(container_file)?;
   ```

6. **Section headers organize test modules**: Group related tests under clear headers.
   ```rust
   // ==========================================================================
   // USAGE EXAMPLES: How to use this crate for container monitoring
   // ==========================================================================
   
   // ==========================================================================
   // ERROR CASES: What happens with invalid input
   // ==========================================================================
   ```

#### Why This Matters

- Tests are documentation that can't lie (they compile and run)
- New contributors learn the API by reading tests
- Self-documenting tests don't rot when behavior changes (comments do)
- Copy-able examples reduce user friction

### Testing on Linux (Required)

Most functionality is Linux-specific. Test primarily on Linux or WSL:
```bash
# From WSL or Linux
cargo test --verbose
```

### Testing on Windows (Optional)

Windows tests verify the fallback behavior and optional dunce feature:
```powershell
cargo test --verbose
cargo test --features dunce --verbose
```

## Common Pitfalls (avoid)

- Breaking namespace detection: Don't change the `find_namespace_boundary` logic without thorough testing.
- Incorrect PID validation: PIDs must be numeric, "self", or "thread-self".
- Over-canonicalizing: Only preserve the namespace prefix; let std handle the actual path resolution within the namespace.
- Dropping error context: Don't return bare `io::Error` without meaningful context.

## PR Checklist (agent self-check)

- Normal path behavior unchanged and equal to `std::fs::canonicalize`.
- Namespace boundary preservation working correctly.
- All CI steps pass locally.
- New/changed logic covered by tests.
- Docs updated (README/lib.rs) if user-visible behavior changed.
- No new runtime dependencies; MSRV respected; no unstable features.

## Quick Commands

- Run all local CI: `bash ci-local.sh` or `.\ci-local.ps1`
- Tests (verbose): `cargo test --verbose`
- Tests on Linux (WSL): `wsl -e bash -ilc "cd /mnt/c/path/to/repo && cargo test"`
- Lints: `cargo clippy --all-targets --all-features -- -D warnings`
- Docs (warnings as errors): `RUSTDOCFLAGS='-D warnings' cargo doc --no-deps --all-features`

## Releasing (maintainers)

- Tag as `vX.Y.Z` to trigger publish and GitHub Release via workflows.
- Update `CHANGELOG.md` with clear, user-facing notes.

### CHANGELOG Guidelines

The CHANGELOG is for **users**, not developers. Only include changes that affect the public API or user experience.

**DO include:**
- Bug fixes that affected user-facing behavior
- New features or API additions
- Breaking changes or deprecations
- Security fixes
- Performance improvements users would notice

**DO NOT include:**
- Internal refactoring
- Test additions or changes
- Documentation updates (unless significant)
- CI/tooling changes
- Code style fixes
- Updates to AGENTS.md or other contributor docs

**Format:**
- Keep entries concise (1-2 sentences)
- Focus on *what changed* for the user, not *how* it was implemented
- Technical root cause details belong in commit messages or PR descriptions, not CHANGELOG

---

If anything in this guide appears to conflict with the existing tests, treat the tests as the source of truth and open an issue to correct the guide.