eclipse-sanitizer 0.1.1

A fast Rust CLI for sanitizing metadata from documents and images
<div align="center">

<img src="assets/Eclipse-logo.png" alt="Eclipse logo" width="240" />

# Eclipse


**Local-first metadata sanitization for documents and images.**  
One Rust CLI to silence the gossip, keep the payload, and leave your files intact. The GitHub repo is now [Eclipse](https://github.com/Karmanya03/Eclipse), the crates.io package stays `eclipse-sanitizer` because `eclipse` is already taken, and the binary keeps the simpler `eclipse` name because even tools deserve a stage name.

![release](https://img.shields.io/badge/release-v0.1.1-f97316?style=flat)
![crates.io](https://img.shields.io/crates/v/eclipse-sanitizer?style=flat&logo=rust&label=crates.io)
![docs.rs](https://img.shields.io/docsrs/eclipse-sanitizer?style=flat&logo=rust&label=docs.rs)
![written in Rust](https://img.shields.io/badge/written%20in-Rust-ff7f00?style=flat&logo=rust&logoColor=white)
![mode](https://img.shields.io/badge/mode-local--first-111827?style=flat)
![audit](https://img.shields.io/badge/audit-JSON-8b5cf6?style=flat)
![platform](https://img.shields.io/badge/platform-Windows%20%7C%20macOS%20%7C%20Linux-2563eb?style=flat)

[What is this]#what-is-this | [Install]#install | [Update]#update | [Uninstall]#uninstall | [Quick Start]#quick-start | [CLI Options]#cli-options | [Commands]#commands | [Supported Files]#supported-files | [Security]#security | [Audit Log]#audit-log | [Project Structure]#project-structure | [FAQ]#faq | [License]#license

</div>

---

> Metadata has one job: stay in its lane. Eclipse gives it the boot.

## What is this


Eclipse is built to remove common metadata from files without mangling the actual content. It works recursively over directories, can rewrite files in place or into a separate output directory, and checks hashes before and after persistence so the sanitized file does not quietly go feral.

In plain English: it is the digital equivalent of telling the metadata to sit down, shut up, and stop posting on main.

## Supported Files


| Type | Extensions | Notes |
| --- | --- | --- |
| PDF | `pdf` | Strips trailer and document metadata fields where possible. |
| OOXML | `docx`, `docm`, `dotx`, `dotm`, `xlsx`, `xlsm`, `xltx`, `xltm`, `pptx`, `pptm`, `potx`, `potm` | Rewrites archives and removes document properties XML. |
| PNG | `png` | Removes metadata chunks such as text and time chunks. |
| JPEG | `jpg`, `jpeg` | Removes common metadata segments like EXIF, XMP, IPTC, and comments. |

## Install


### Prerequisites


- Rust stable toolchain
- Cargo

If you do not already have Rust installed, use [rustup](https://rustup.rs/) first. It is the least dramatic way to get Cargo on your machine.

### From crates.io


```powershell
cargo install eclipse-sanitizer
```

That is the one-liner if you want the published build and would rather not flirt with `cargo build` on a Friday.

### From source


```powershell
git clone <REPO_URL>
cd Eclipse
cargo build --release
```

### From a local path


```powershell
cargo install --path .
```

That installs the `eclipse` binary into your Cargo bin directory, because the binary should still have a respectable name even if the package name got longer.

## Update


If you installed Eclipse from crates.io, update it with the one-liner:

```powershell
cargo install eclipse-sanitizer --force
```

If you installed from the local source tree, update it with:

```powershell
cargo install --path . --force
```

If you are developing from the repository itself, the usual update flow is:

```powershell
git pull
cargo build --release
```

## Uninstall


If installed with Cargo from crates.io, remove it with:

```powershell
cargo uninstall eclipse-sanitizer
```

If you built from source without installing globally, remove the project directory or the compiled binary in `target/release`.

## Quick Start


1. Build or install Eclipse.
2. Point it at a file or folder.
3. Use `--dry-run` first if you enjoy not being ambushed by your own filesystem.

Run the binary against either a single file or a directory:

```powershell
eclipse <INPUT>
```

Write sanitized output to a separate folder:

```powershell
eclipse <INPUT> --output <OUTPUT_DIR>
```

Preview changes without writing anything:

```powershell
eclipse <INPUT> --dry-run
```

Override the worker thread count:

```powershell
eclipse <INPUT> --jobs 8
```

Example: sanitize a folder into a new destination with four workers.

```powershell
eclipse .\sample-files --output .\sanitized --jobs 4
```

Example: preview a single file and keep your hands clean.

```powershell
eclipse .\report.pdf --dry-run
```

## CLI Options


| Option | Description |
| --- | --- |
| `INPUT` | File or directory to scan. |
| `-o, --output <OUTPUT_DIR>` | Write sanitized output to a separate directory. If omitted, files are rewritten in place. |
| `--jobs <THREADS>` | Override the Rayon worker thread count. Use this when you want to tell the scheduler how hard to flex. |
| `--dry-run` | Report what would be removed without writing output. |

## Commands


### Runtime Commands


| Command | What It Does |
| --- | --- |
| `eclipse <INPUT>` | Scan and sanitize a file or directory in place. |
| `eclipse <INPUT> --output <OUTPUT_DIR>` | Sanitize into a separate output directory while preserving relative paths. |
| `eclipse <INPUT> --dry-run` | Show the planned changes without touching files. |
| `eclipse <INPUT> --jobs <THREADS>` | Run with a custom worker pool size. |
| `eclipse <INPUT> --output <OUTPUT_DIR> --dry-run` | Preview output mapping and metadata removal together. |
| `eclipse <INPUT> --output <OUTPUT_DIR> --jobs <THREADS>` | Full-speed output mode with a custom thread count. |

### Cargo Commands


| Command | What It Does |
| --- | --- |
| `cargo build` | Build the project in debug mode. |
| `cargo build --release` | Build an optimized release binary. |
| `cargo run -- --help` | Show the CLI help screen. |
| `cargo test` | Run the unit tests for the sanitizers. |
| `cargo check` | Validate the code compiles without building the final binary. |
| `cargo install eclipse-sanitizer` | Install the published package from crates.io. |
| `cargo install eclipse-sanitizer --force` | Update the published package from crates.io. |
| `cargo uninstall eclipse-sanitizer` | Remove the published package from your machine. |
| `cargo install --path .` | Install Eclipse locally as a Cargo binary. |
| `cargo install --path . --force` | Update a local Cargo installation. |

## Security


Eclipse uses a defensive workflow rather than a hopeful one.

- Files are written through a temporary file first, then moved into place atomically.
- SHA-256 hashes are calculated before and after processing.
- The program verifies the persisted file hash before considering the job successful.
- `--dry-run` avoids writes entirely.
- Interrupt handling is wired so CTRL+C stops further processing cleanly.
- Audit output is kept separate from the files being sanitized.

This is not a “trust me bro” pipeline. It double-checks its own homework and then asks the compiler to sign it in blood.

## Audit Log


Each run writes structured audit events to `.eclipse_audit.log` in the output directory when one is used. If no output directory is configured, the log is written beside the input data when possible.

The audit stream records details such as:

- timestamp
- status
- file kind
- source path
- destination path
- original hash
- sanitized hash
- removed items
- errors, when present

## Project Structure


| File | Purpose |
| --- | --- |
| [src/main.rs]src/main.rs | Application entry point. |
| [src/cli.rs]src/cli.rs | Clap CLI definition. |
| [src/app.rs]src/app.rs | Discovery, orchestration, reporting, and persistence. |
| [src/models.rs]src/models.rs | Shared file and run models. |
| [src/hashing.rs]src/hashing.rs | SHA-256 hashing helper. |
| [src/audit.rs]src/audit.rs | Tracing-based audit logger. |
| [src/sanitizers/mod.rs]src/sanitizers/mod.rs | Sanitizer trait and registry. |
| [src/sanitizers/pdf.rs]src/sanitizers/pdf.rs | PDF sanitization logic. |
| [src/sanitizers/ooxml.rs]src/sanitizers/ooxml.rs | OOXML archive rewriting. |
| [src/sanitizers/png.rs]src/sanitizers/png.rs | PNG metadata stripping. |
| [src/sanitizers/jpeg.rs]src/sanitizers/jpeg.rs | JPEG metadata stripping. |

## FAQ


### No supported files found?


Double-check the extensions. Eclipse only processes files it knows how to sanitize, not every random blob with confidence issues.

### Output directory must not be the same as the input directory?


That is a safety guard. Point `--output` somewhere else so the sanitizer does not politely eat its own lunch.

### One or more files failed to sanitize?


Inspect the console output and `.eclipse_audit.log` for the exact file and error. Some files may be malformed, cursed, or just plain rude to parse.

### Build or test fails?


Try these commands in order:

```powershell
cargo check
cargo test
cargo run -- --help
```

## License


MIT License. See [LICENSE](LICENSE) for the legally boring but necessary bits.