eth-id 0.1.0 - Docs.rs

# ETH.id Privacy Architecture

## Version 1.0.0

---

## Core Privacy Principle

**The document never leaves your machine.**

ETH.id processes documents locally and sends only the minimum necessary information to answer your verification question. This document explains exactly what data is sent to LLM providers in each scenario.

---

## Privacy Filter Modes

The Privacy Filter operates in three modes depending on the claim type:

### 1. Virtualization Mode

**Used for**: Age verification, date-based claims

**How it works**:
1. Document is parsed locally
2. Birth date is extracted
3. Age is calculated locally
4. **Only the calculation result is sent to LLM**

**Example**:

```
Document contains: "Data de Nascimento: 15/03/1990"
Claim: "Is this person over 18 years old?"

What is sent to LLM:
"Age calculation result: true"

What is NOT sent:
- Birth date (15/03/1990)
- Current age (36)
- Any other document content
```

**Privacy guarantee**: Birth date never leaves your machine. LLM receives only a boolean result.

---

### 2. Hash Partial Mode

**Used for**: Identity verification (CPF, RG, etc.)

**How it works**:
1. Document is parsed locally
2. Sensitive field is extracted
3. Field is masked/hashed
4. **Only masked version is sent to LLM**

**Example**:

```
Document contains: "CPF: 123.456.789-00"
Claim: "Does the CPF match 123.456.789-00?"

What is sent to LLM:
"CPF: 123.***.***-00"

What is NOT sent:
- Full CPF digits
- Any other document content
```

**Privacy guarantee**: Only first 3 and last 2 digits are exposed. Middle 6 digits never leave your machine.

---

### 3. Minimization Mode

**Used for**: Amount verification, signature verification, field presence

**How it works**:
1. Document is parsed locally
2. Only relevant fields are extracted
3. **Only those specific fields are sent to LLM**

**Example**:

```
Document contains:
"Nome: João Silva
CPF: 123.456.789-00
Renda: R$ 8.500,00
Endereço: Rua X, 123"

Claim: "Is the income above R$ 5,000?"

What is sent to LLM:
"amount: 8500.00"

What is NOT sent:
- Name
- CPF
- Address
- Any other fields
```

**Privacy guarantee**: Only claim-relevant fields are sent. Unrelated data never leaves your machine.

---

## What is Sent to LLM Providers

### For Each Claim Type

#### Age Verification
```json
{
  "filtered_data": "Age calculation result: true",
  "claim_context": "Virtualized age verification"
}
```

#### CPF Verification
```json
{
  "filtered_data": "CPF: 123.***.***-00",
  "claim_context": "Verify CPF identity claim"
}
```

#### Amount Verification
```json
{
  "filtered_data": "amount: 8500.00",
  "claim_context": "Minimal context for Amount claim"
}
```

#### Signature Verification
```json
{
  "filtered_data": "signature: present\nsigner_count: 2",
  "claim_context": "Minimal context for Signature claim"
}
```

---

## Offline Mode

When you use `--offline` flag:

1. **No network calls are made**
2. **Only ZK proofs are used**
3. **LLM verification is disabled**
4. **Document processing is 100% local**

```bash
eth verify --doc passport.pdf --claim "over 18 years old" --offline --zk-only
```

**Privacy guarantee**: Zero data leaves your machine. Complete air-gap operation.

---

## Debug Mode

Use `--debug` to see exactly what will be sent to the LLM:

```bash
eth verify --doc id.pdf --claim "over 18" --debug
```

**Output**:
```
📊 Privacy Filter Output:
{
  "content": "Age calculation result: true",
  "metadata": {
    "mode": "Virtualization",
    "fields_included": ["date_calculation"],
    "fields_masked": [],
    "original_hash": "a3f2...",
    "filtered_hash": "b8e1..."
  },
  "claim_context": "Virtualized age verification"
}
```

**Recommendation**: Always use `--debug` on your first verification to understand what data is being sent.

---

## What is Stored Locally

### Audit Log (`~/.eth-id/audit/audit.json`)

```json
{
  "session_id": "uuid-v4",
  "timestamp": "2026-02-24T15:30:00Z",
  "document_path": "/path/to/doc.pdf",
  "document_hash": "sha256_hash_64_chars",
  "claim": "Is this person over 18 years old?",
  "result": true,
  "confidence": 1.0,
  "proof_type": "zk"
}
```

**What is NOT stored**:
- Document content
- Extracted fields
- Birth dates, CPFs, or any sensitive data
- Only SHA-256 hash of document

---

### Attestation Bundle (`~/.eth-id/attestations/`)

```json
{
  "version": "1.0.0",
  "session_id": "uuid-v4",
  "timestamp": "2026-02-24T15:30:00Z",
  "document_hash": "sha256_hash",
  "claim": "Is this person over 18 years old?",
  "result": {
    "answer": true,
    "confidence": 1.0,
    "reasoning": "Zero-Knowledge proof verified"
  },
  "proof_type": {
    "ZeroKnowledge": {
      "circuit": "age_check",
      "proof": "zk_proof_data"
    }
  },
  "bundle_hash": "sha256_of_bundle"
}
```

**What is NOT stored**:
- Document content
- Birth date or any extracted fields
- Only hash and verification result

---

## Memory Safety

ETH.id uses Rust's `zeroize` crate to securely erase sensitive data from memory:

```rust
impl Drop for ParsedDocument {
    fn drop(&mut self) {
        // Securely zero out all fields
        for value in self.fields.values_mut() {
            value.zeroize();
        }
        self.raw_text.zeroize();
    }
}
```

**Privacy guarantee**: When document processing completes, all sensitive data is overwritten with zeros before memory is freed.

---

## Network Traffic Analysis

### With LLM Provider (OpenAI/Claude)

**Outbound**:
- HTTPS POST to api.openai.com or api.anthropic.com
- Payload: Filtered data only (see examples above)
- Headers: API key, content-type

**Inbound**:
- JSON response with boolean answer and confidence
- No document data in response

### With Ollama (Local)

**Outbound**:
- HTTP POST to localhost:11434
- Payload: Filtered data (same as above)
- No external network traffic

**Inbound**:
- JSON response from local Ollama instance
- All processing happens on your machine

---

## Comparison with Traditional KYC

### Traditional KYC
```
User → Upload full document → KYC Provider
                ↓
         Store in database
                ↓
         Process and verify
                ↓
         Return result
```

**Privacy**: Provider has full document forever.

### ETH.id
```
User → Parse locally → Privacy Filter → Send minimal data → LLM
                                                ↓
                                         Return boolean
                                                ↓
                                    Discard filtered data
```

**Privacy**: Provider never sees full document. Filtered data is ephemeral.

---

## Privacy Best Practices

### Maximum Privacy
1. Use `--offline --zk-only` for all deterministic claims
2. Run Ollama locally for semantic claims
3. Use `--debug` to audit filtered data
4. Never share attestation bundles with document hashes

### Balanced Privacy
1. Use OpenAI/Claude for convenience
2. Review `--debug` output periodically
3. Use ZK for age/amount, LLM for signatures
4. Keep audit logs private

### Compliance Mode
1. Generate attestations for all verifications
2. Export audit logs for compliance review
3. Use deterministic ZK proofs for legal requirements
4. Document Privacy Filter mode for each claim type

---

## Privacy Guarantees Summary

| Claim Type | Data Sent | Privacy Level | Offline Capable |
|------------|-----------|---------------|-----------------|
| Age verification | Calculation result only | ★★★★★ | Yes (ZK) |
| CPF verification | Masked format | ★★★★☆ | No |
| Amount verification | Single field value | ★★★☆☆ | Yes (ZK) |
| Signature verification | Presence + count | ★★★★☆ | No |
| Field presence | Field name only | ★★★★★ | Yes (ZK) |

---

## Frequently Asked Questions

### Q: Can the LLM provider see my full document?
**A**: No. The Privacy Filter ensures only minimal, claim-relevant data is sent.

### Q: What if I don't trust any LLM provider?
**A**: Use `--offline` mode with Ollama running locally, or `--zk-only` for complete isolation.

### Q: Is my document stored anywhere?
**A**: No. Documents are processed in memory only and never written to disk by ETH.id.

### Q: Can someone reconstruct my document from audit logs?
**A**: No. Audit logs contain only SHA-256 hashes, which are one-way functions.

### Q: What happens to filtered data after verification?
**A**: It's discarded immediately. Only the boolean result is kept.

### Q: Can I verify what data is sent before it's sent?
**A**: Yes. Use `--debug` flag to see exact filtered data before LLM call.

---

## Privacy Contact

For privacy questions: privacy@eth.id (placeholder)

---

## Version History

- **1.0.0** (2026-02-24): Initial privacy documentation