<div align="center">
<img src="assets/Eclipse-logo.png" alt="Eclipse logo" width="240" />
# Eclipse
**Local-first metadata sanitization for documents and images.**
One Rust CLI to silence the gossip, keep the payload, and leave your files intact. The GitHub repo is now [Eclipse](https://github.com/Karmanya03/Eclipse), the crates.io package stays `eclipse-sanitizer` because `eclipse` is already taken, and the binary keeps the simpler `eclipse` name because even tools deserve a stage name.







</div>
---
> Metadata has one job: stay in its lane. Eclipse gives it the boot.
## What is this
Eclipse is built to remove common metadata from files without mangling the actual content. It works recursively over directories, can rewrite files in place or into a separate output directory, and checks hashes before and after persistence so the sanitized file does not quietly go feral.
In plain English: it is the digital equivalent of telling the metadata to sit down, shut up, and stop posting on main.
## Supported Files
| PDF | `pdf` | Strips trailer and document metadata fields where possible. |
| OOXML | `docx`, `docm`, `dotx`, `dotm`, `xlsx`, `xlsm`, `xltx`, `xltm`, `pptx`, `pptm`, `potx`, `potm` | Rewrites archives and removes document properties XML. |
| PNG | `png` | Removes metadata chunks such as text and time chunks. |
| JPEG | `jpg`, `jpeg` | Removes common metadata segments like EXIF, XMP, IPTC, and comments. |
## Install
### Prerequisites
- Rust stable toolchain
- Cargo
If you do not already have Rust installed, use [rustup](https://rustup.rs/) first. It is the least dramatic way to get Cargo on your machine.
### From crates.io
```powershell
cargo install eclipse-sanitizer
```
That is the one-liner if you want the published build and would rather not flirt with `cargo build` on a Friday.
### From source
```powershell
git clone <REPO_URL>
cd Eclipse
cargo build --release
```
### From a local path
```powershell
cargo install --path .
```
That installs the `eclipse` binary into your Cargo bin directory, because the binary should still have a respectable name even if the package name got longer.
## Update
If you installed Eclipse from crates.io, update it with the one-liner:
```powershell
cargo install eclipse-sanitizer --force
```
If you installed from the local source tree, update it with:
```powershell
cargo install --path . --force
```
If you are developing from the repository itself, the usual update flow is:
```powershell
git pull
cargo build --release
```
## Uninstall
If installed with Cargo from crates.io, remove it with:
```powershell
cargo uninstall eclipse-sanitizer
```
If you built from source without installing globally, remove the project directory or the compiled binary in `target/release`.
## Quick Start
1. Build or install Eclipse.
2. Point it at a file or folder.
3. Use `--dry-run` first if you enjoy not being ambushed by your own filesystem.
Run the binary against either a single file or a directory:
```powershell
eclipse <INPUT>
```
Write sanitized output to a separate folder:
```powershell
eclipse <INPUT> --output <OUTPUT_DIR>
```
Preview changes without writing anything:
```powershell
eclipse <INPUT> --dry-run
```
Override the worker thread count:
```powershell
eclipse <INPUT> --jobs 8
```
Example: sanitize a folder into a new destination with four workers.
```powershell
eclipse .\sample-files --output .\sanitized --jobs 4
```
Example: preview a single file and keep your hands clean.
```powershell
eclipse .\report.pdf --dry-run
```
## CLI Options
| `INPUT` | File or directory to scan. |
| `-o, --output <OUTPUT_DIR>` | Write sanitized output to a separate directory. If omitted, files are rewritten in place. |
| `--jobs <THREADS>` | Override the Rayon worker thread count. Use this when you want to tell the scheduler how hard to flex. |
| `--dry-run` | Report what would be removed without writing output. |
## Commands
### Runtime Commands
| `eclipse <INPUT>` | Scan and sanitize a file or directory in place. |
| `eclipse <INPUT> --output <OUTPUT_DIR>` | Sanitize into a separate output directory while preserving relative paths. |
| `eclipse <INPUT> --dry-run` | Show the planned changes without touching files. |
| `eclipse <INPUT> --jobs <THREADS>` | Run with a custom worker pool size. |
| `eclipse <INPUT> --output <OUTPUT_DIR> --dry-run` | Preview output mapping and metadata removal together. |
| `eclipse <INPUT> --output <OUTPUT_DIR> --jobs <THREADS>` | Full-speed output mode with a custom thread count. |
### Cargo Commands
| `cargo build` | Build the project in debug mode. |
| `cargo build --release` | Build an optimized release binary. |
| `cargo run -- --help` | Show the CLI help screen. |
| `cargo test` | Run the unit tests for the sanitizers. |
| `cargo check` | Validate the code compiles without building the final binary. |
| `cargo install eclipse-sanitizer` | Install the published package from crates.io. |
| `cargo install eclipse-sanitizer --force` | Update the published package from crates.io. |
| `cargo uninstall eclipse-sanitizer` | Remove the published package from your machine. |
| `cargo install --path .` | Install Eclipse locally as a Cargo binary. |
| `cargo install --path . --force` | Update a local Cargo installation. |
## Security
Eclipse uses a defensive workflow rather than a hopeful one.
- Files are written through a temporary file first, then moved into place atomically.
- SHA-256 hashes are calculated before and after processing.
- The program verifies the persisted file hash before considering the job successful.
- `--dry-run` avoids writes entirely.
- Interrupt handling is wired so CTRL+C stops further processing cleanly.
- Audit output is kept separate from the files being sanitized.
This is not a “trust me bro” pipeline. It double-checks its own homework and then asks the compiler to sign it in blood.
## Audit Log
Each run writes structured audit events to `.eclipse_audit.log` in the output directory when one is used. If no output directory is configured, the log is written beside the input data when possible.
The audit stream records details such as:
- timestamp
- status
- file kind
- source path
- destination path
- original hash
- sanitized hash
- removed items
- errors, when present
## Project Structure
| [src/main.rs](src/main.rs) | Application entry point. |
| [src/cli.rs](src/cli.rs) | Clap CLI definition. |
| [src/app.rs](src/app.rs) | Discovery, orchestration, reporting, and persistence. |
| [src/models.rs](src/models.rs) | Shared file and run models. |
| [src/hashing.rs](src/hashing.rs) | SHA-256 hashing helper. |
| [src/audit.rs](src/audit.rs) | Tracing-based audit logger. |
| [src/sanitizers/mod.rs](src/sanitizers/mod.rs) | Sanitizer trait and registry. |
| [src/sanitizers/pdf.rs](src/sanitizers/pdf.rs) | PDF sanitization logic. |
| [src/sanitizers/ooxml.rs](src/sanitizers/ooxml.rs) | OOXML archive rewriting. |
| [src/sanitizers/png.rs](src/sanitizers/png.rs) | PNG metadata stripping. |
| [src/sanitizers/jpeg.rs](src/sanitizers/jpeg.rs) | JPEG metadata stripping. |
## FAQ
### No supported files found?
Double-check the extensions. Eclipse only processes files it knows how to sanitize, not every random blob with confidence issues.
### Output directory must not be the same as the input directory?
That is a safety guard. Point `--output` somewhere else so the sanitizer does not politely eat its own lunch.
### One or more files failed to sanitize?
Inspect the console output and `.eclipse_audit.log` for the exact file and error. Some files may be malformed, cursed, or just plain rude to parse.
### Build or test fails?
Try these commands in order:
```powershell
cargo check
cargo test
cargo run -- --help
```
## License
MIT License. See [LICENSE](LICENSE) for the legally boring but necessary bits.