velka 1.4.0

The Code Sin Judge - AST-powered secret scanner and security analyzer
Documentation
# Contributing to Velka

Thank you for your interest in contributing to Velka!

## Development Setup

1. Install Rust (1.70+): https://rustup.rs/
2. Clone the repository
3. Run tests: `cargo test`
4. Run lints: `cargo clippy`

## Code Standards

### Rust Style
- Run `cargo fmt` before committing
- All code must pass `cargo clippy -- -D warnings`
- No `unwrap()` or `expect()` in production code (use `?` or proper error handling)
- Use `anyhow` for binary errors, `thiserror` for library errors

### Commit Messages
- Use conventional commits: `feat:`, `fix:`, `docs:`, `refactor:`, `test:`, `chore:`
- Keep the first line under 72 characters
- Reference issues when applicable: `fix: handle empty files (#123)`

### Branching and Protected Branches

- **Default branch** (e.g. `master` or `main`): protected; changes only via Pull Request.
- **Feature work:** create a branch from the default, then open a PR:
  - `git checkout -b feat/my-feature` (or `fix/`, `docs/`, `chore/`)
  - Make changes, push, open PR. Merge after CI passes and review (if any).

**GitHub branch protection (recommended):**  
Repo → **Settings → Branches → Add rule** for `master` (or `main`):

- Require a pull request before merging (optional: require 1 approval).
- Require status checks to pass: e.g. `check`, `test`, `fmt`, `clippy`, `audit`, `build`.
- Do not allow bypassing the above (optional: allow maintainers to bypass).
- Save.

### Pull Requests
1. Fork the repository (or create a branch if you have write access)
2. Create a feature branch: `git checkout -b feat/my-feature`
3. Make your changes
4. Run tests: `cargo test`
5. Run lints: `cargo clippy`
6. Push and create a PR targeting the default branch
7. Wait for CI to pass, then merge

### Adding a New Detection Rule

Adding a new rule to Velka involves 4 files. Here's the complete workflow:

#### Step 1: Define the rule in `src/engine/rules.rs`

Add an entry to the `RULES` static array:

```rust
Rule {
    id: "MY_PROVIDER_KEY",
    description: "My Provider API key detected",
    pattern: define_regex!(r"(?i)my_provider[_-]?key\s*[:=]\s*['\"]?([a-zA-Z0-9]{32,})['\"]?"),
    severity: Severity::Mortal,        // Mortal = credential, Venial = PII
    expected_len: Some((32, 64)),       // expected token length range
    required_prefix: Some("mpk_"),     // known prefix (or None)
    charset: Some("alphanum"),          // "alphanum", "base64", "hex"
},
```

Key fields:
- `id`: SCREAMING_SNAKE_CASE, unique identifier
- `pattern`: regex that captures the secret value in group 1
- `severity`: `Mortal` for credentials/keys, `Venial` for PII
- `expected_len`, `required_prefix`, `charset`: used by ML classifier for structural scoring

#### Step 2: Add structural validation in `src/engine/structural_validators.rs`

If your rule has a check digit or structural constraint (like CPF, IBAN), add a match arm in `validate_for_rule`:

```rust
"MY_PROVIDER_KEY" => Some(validate_my_provider(snippet)),
```

Then implement the validation function. If the rule has no mathematical validation (just pattern matching), skip this step.

#### Step 3: Add integration tests in `tests/integration_test.rs`

Every rule needs at least 3 tests:

```rust
#[test]
fn test_my_provider_key_detected() {
    let code = r#"MY_PROVIDER_KEY=mpk_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6"#;
    let sins = scan_str(code).unwrap();
    assert!(!sins.is_empty());
    assert_eq!(sins[0].rule_id, "MY_PROVIDER_KEY");
}

#[test]
fn test_my_provider_key_invalid_rejected() {
    let code = r#"MY_PROVIDER_KEY=mpk_short"#;  // too short
    let sins = scan_str(code).unwrap();
    assert!(sins.is_empty());
}

#[test]
fn test_my_provider_key_placeholder_rejected() {
    let code = r#"MY_PROVIDER_KEY=mpk_00000000000000000000000000000000"#;
    let sins = scan_str(code).unwrap();
    assert!(sins.is_empty());
}
```

#### Step 4: Document in `README.md`

Add the new rule to the detection rules table in the README.

### Security Considerations
- Never log or store actual secret values
- Redaction must be enabled by default
- Error messages must not leak sensitive paths
- All paths must be validated before scanning

## Testing
- Unit tests: `cargo test`
- Integration tests: `cargo test --test '*'`
- Run Velka on itself: `cargo run -- scan .`

## Benchmarks
- Run all: `cargo bench`
- Run only cache benchmarks: `cargo bench scan_1000_files_cache`
- Run a specific bench (e.g. 1000 files): `cargo bench scan_1000_files`
- Benchmarks are in `benches/scan_bench.rs` (throughput with cache off; cache cold vs cache hit for 1000 files).

## Versioning

- **Single source of truth**: `version` in `Cargo.toml`.
- **Release tag**: must be `vX.Y.Z` (e.g. `v1.2.0`) and must match `Cargo.toml`. The publish workflow fails if they differ.
- **Semi-automatic bump** (optional): install [cargo-release]https://github.com/crate-ci/cargo-release, then:
  - `cargo release patch` (1.2.0 → 1.2.1) or `cargo release minor` (1.2.0 → 1.3.0) or `cargo release major`
  - This bumps `Cargo.toml`, commits, creates tag `vX.Y.Z`, and pushes. Then create a **GitHub Release** from that tag to trigger publish.

## Publishing to crates.io

### Automated (recommended)

1. In this repo: **Settings → Secrets and variables → Actions** → add secret `CRATES_IO_TOKEN` (create token at https://crates.io/settings/tokens).
2. Bump `version` in `Cargo.toml` (or use `cargo release patch`), commit and push.
3. Create a **GitHub Release** with tag `vX.Y.Z` (e.g. `v1.2.0`) **matching** the version in `Cargo.toml`.
4. The workflow **Publish to crates.io** runs on release, verifies tag ↔ Cargo.toml, then runs `cargo publish`. You can also trigger it manually: **Actions → Publish to crates.io → Run workflow**.

### Manual

1. Ensure all tests pass: `cargo test`
2. Run `cargo publish --dry-run` to verify the package
3. Log in: `cargo login` (get token from crates.io)
4. Publish: `cargo publish`
5. Docs are built automatically at https://docs.rs/velka

## Questions?

Open an issue for any questions or discussions.