# Corky Functional Specification
> Language-independent specification for the corky email sync and collaboration tool.
> This document captures the exact behavior a port must reproduce.
## 1. Overview
Corky syncs email threads from IMAP providers into a flat directory of Markdown files,
supports AI-assisted drafting, manages mailbox sharing via git submodules or plain directories,
and pushes routing intelligence to Cloudflare.
## 2. Data Directory
### 2.1 Layout
```
{data_dir}/
conversations/ # One .md file per thread
drafts/ # Outgoing email drafts
contacts/ # Per-contact context
{name}/
AGENTS.md
CLAUDE.md -> AGENTS.md
mailboxes/ # Named mailboxes (plain dirs or git submodules)
{name}/
conversations/
drafts/
contacts/
AGENTS.md
CLAUDE.md -> AGENTS.md
README.md
voice.md
.gitignore
social/ # Social media drafts (YAML frontmatter + body)
{YYYYMMDD-HHMMSS-platform}.md
profiles.toml # Social media profile registry
manifest.toml # Thread index (generated by sync)
.sync-state.json # IMAP + contact sync state
```
### 2.2 Resolution Order
The data directory is resolved at runtime in this order:
1. `mail/` directory in current working directory (developer workflow)
2. `CORKY_DATA` environment variable
3. App config mailbox (see §2.4)
4. `~/Documents/mail` (hardcoded fallback)
### 2.3 Config Directory
Config always lives inside the data directory (`mail/`).
Config files: `.corky.toml`, `voice.md`, `credentials.json`
### 2.4 App Config
Location: `{platformdirs.user_config_dir("corky")}/config.toml`
- Linux: `~/.config/corky/config.toml`
- macOS: `~/Library/Application Support/corky/config.toml`
- Windows: `%APPDATA%/corky/config.toml`
Stores named mailboxes (data directory references) and a default. Used in resolution step 3.
Mailbox resolution (when no explicit name given):
1. `default_mailbox` set → use that mailbox
2. Exactly one mailbox → use it implicitly
3. Multiple mailboxes, no default → error with list
4. No mailboxes → return None (fall through to step 4)
### 2.5 Directory Naming Convention
| `mail/` | Corky data root | Short, familiar, Unix precedent (`maildir`, `/var/mail`). Avoids `mailbox/mailboxes/` stutter. |
| `mailboxes/` | Collaborator namespace | Namespace boundary — prevents collisions between mailbox names and system dirs (`social/`, `contacts/`). |
| `social/` | Social media drafts | Channel-specific content at data root level. |
| `contacts/` | Contact registry | Shared across all mailboxes. |
**Why "mailbox" internally, "mail" as directory:**
- "Mailbox" is the correct abstraction — per-entity message collection (actor model, Hewitt 1973)
- `mail/` is shorthand for the data root; it doesn't need to carry the full semantic weight
- `mailbox/` was considered but creates `mailbox/mailboxes/` path stutter
- `comms/` rejected — militaristic tone clashes with "corky"
- `correspondence/` rejected — too long (15 chars)
- Flatten approaches rejected — namespace collisions between mailbox names and system dirs
**The `mailboxes/` subdir is load-bearing architecture**, not cosmetic. It prevents a collaborator named "social" or "contacts" from colliding with system directories. Do not flatten without solving the namespace problem first.
## 3. File Formats
### 3.1 Conversation Markdown
```markdown
# {Subject}
**Labels**: {label1}, {label2}
**Accounts**: {account1}, {account2}
**Thread ID**: {thread_key}
**Last updated**: {RFC 2822 date}
---
## {Sender Name} <{email}> — {RFC 2822 date}
**To**: {recipient1}, {recipient2}
**CC**: {cc1}
{Body text}
---
## {Reply sender} — {date}
{Body text}
```
Per-message `**To**:` and `**CC**:` lines are emitted after the message header when non-empty. Old files without these lines parse correctly (fields default to empty).
Metadata regex: `^\*\*(.+?)\*\*:\s*(.+)$` (multiline)
Message header regex: `^## (.+?) — (.+)$` (multiline, em dash U+2014)
### 3.2 Draft Markdown
Drafts use YAML frontmatter with the subject as a Markdown heading in the body:
```markdown
---
to: alice@example.com
cc: bob@example.com
status: draft
author: Brian
account: personal
from: brian@example.com
in_reply_to: "<msg-id>"
scheduled_at: null
attachments:
- /tmp/screenshot.png
- ~/Documents/report.pdf
---
# Subject Line
Body text here.
```
Optional fields: `attachments` (list of file paths)
Required fields: `# Subject` heading (in body), `to`, `---` delimiters
Recommended fields: `status`, `author`
Status values: `draft` → `review` → `approved` → `scheduled` → `sent`
Valid send statuses (for draft push --send): `review`, `approved`, `scheduled`
**Legacy format:** The `**Key**: value` format is still supported for backward compatibility:
```markdown
# {Subject}
**To**: {recipient}
**CC**: {optional}
**Status**: draft
**Author**: {name}
**Account**: {optional — account name from .corky.toml}
**From**: {optional — email address, used to resolve account}
**In-Reply-To**: {optional — message ID}
---
{Draft body}
```
`corky draft migrate [--dry-run]` converts legacy drafts to YAML frontmatter format.
### 3.3 .corky.toml
```toml
[owner]
github_user = "username"
name = "Display Name"
[accounts.{name}]
password = "" # Inline password (not recommended)
password_cmd = "" # Shell command to retrieve password
labels = ["correspondence"]
imap_host = "" # Auto-filled by provider preset
imap_port = 993
imap_starttls = false
smtp_host = ""
smtp_port = 465
drafts_folder = "Drafts"
sync_days = 3650 # How far back to sync
default = false # Mark one account as default
[contacts.{name}]
emails = ["addr@example.com"]
shared_with = ["mailbox-name"] # Explicitly share with mailboxes (even without conversation match)
aliases = ["Display Name"] # Match sender names that don't slugify to the directory name
[routing]
for-alex = ["mailboxes/alex"]
shared = ["mailboxes/alice", "mailboxes/bob"]
[mailboxes.alex]
auto_send = false
[watch]
poll_interval = 300 # Seconds between polls
notify = false # Desktop notifications
[gmail]
client_id = "" # OAuth2 client ID for Gmail API
client_id_cmd = "" # Shell command (e.g. "pass corky/gmail/client_id")
client_secret = "" # OAuth2 client secret
client_secret_cmd = "" # Shell command (e.g. "pass corky/gmail/client_secret")
[[gmail.filters]]
label = "for-lucas" # Gmail label to apply (resolved to ID via API)
match = ["from"] # Match fields: "from", "to" (default: ["from"])
addresses = ["alice@example.com", "bob@example.com"]
forward_to = "" # Optional forwarding address
star = false # Add STARRED label
never_spam = false # Remove SPAM label
always_important = false # Add IMPORTANT label
[linkedin]
client_id = "" # Inline client ID
client_id_cmd = "" # Shell command (e.g. "pass corky/linkedin/client_id")
client_secret = "" # Inline client secret
client_secret_cmd = "" # Shell command (e.g. "pass corky/linkedin/client_secret")
```
Secret resolution order (shared `util::resolve_secret`):
1. Inline field value
2. `_cmd` field (shell command, capture stdout, strip trailing whitespace)
3. Error if both empty (LinkedIn credentials fall back to env vars before error — see §12.5)
Label scoping syntax: `account:label` (e.g. `"proton-dev:INBOX"`) binds a label to a specific account.
### 3.4 .sync-state.json
```json
{
"accounts": {
"{account_name}": {
"labels": {
"{label_name}": {
"uidvalidity": 12345,
"last_uid": 67890
}
}
}
},
"contacts": {
"{contact_name}": {
"mailboxes": {
"{mailbox_name}": "fnv1a_hash_hex"
}
}
}
}
```
### 3.5 manifest.toml
```toml
[threads.{slug}]
subject = "Thread Subject"
thread_id = "thread key"
labels = ["label1", "label2"]
accounts = ["account1"]
last_updated = "RFC 2822 date"
contacts = ["contact-name"]
```
Generated after each sync by scanning conversation files and matching sender emails against `[contacts]` in `.corky.toml`.
### 3.6 config.toml (App Config)
```toml
default_mailbox = "personal"
[mailboxes.personal]
path = "~/Documents/mail"
[mailboxes.work]
path = "~/work/mail"
```
Top-level fields:
- `default_mailbox`: name of the default mailbox (set automatically to the first mailbox added)
Mailbox fields:
- `path`: absolute or `~`-relative path to a mail data directory
## 4. Algorithms
### 4.1 Thread Slug Generation
```
fn slugify(text: &str) -> String:
text = text.to_lowercase()
text = regex_replace(r"[^a-z0-9]+", "-", text)
text = text.trim_matches('-')
text = text[..min(60, text.len())]
if text.is_empty(): return "untitled"
return text
```
Slug collisions: If `{slug}.md` exists, try `{slug}-2.md`, `{slug}-3.md`, etc.
### 4.2 Thread Key Derivation
```
fn thread_key_from_subject(subject: &str) -> String:
regex_replace(r"^(re|fwd?):\s*", "", subject.to_lowercase().trim())
```
Strips one layer of `Re:` or `Fwd:` prefix (case-insensitive), then lowercases.
### 4.3 Message Deduplication
Messages are deduplicated by `(from, date)` tuple. If both match an existing message in the thread, the message is skipped but labels/accounts metadata is still updated.
### 4.4 Multi-Source Accumulation
When the same thread is fetched from multiple labels or accounts:
- Labels are appended (no duplicates)
- Accounts are appended (no duplicates)
- Messages are merged and deduplicated
- Messages are sorted by parsed date
### 4.5 Label Routing
Labels in the `[routing]` section of `.corky.toml` route to configured mailbox directories.
Fan-out: one label can route to multiple mailboxes (array of paths).
Plain labels (no routing entry) route to `{data_dir}/conversations/`.
Routing values are paths like `mailboxes/{name}`, resolved relative to data_dir, with `/conversations/` appended.
Account:label syntax (`"proton-dev:INBOX"`):
- Only matches when syncing the named account
- The IMAP folder used is the part after the colon
### 4.6 Manifest Generation
After sync, scan all `.md` files in `conversations/`:
1. Parse each file back into a Thread object
2. For each message, extract emails from `from`, `to`, and `cc` fields (`<email>` regex)
3. Match against `[contacts]` email→name mapping in `.corky.toml`
4. Write `manifest.toml` with thread metadata and matched contacts
A contact appears in the manifest if they sent, received, or were CC'd on any message in the thread.
## 5. Commands
### 5.1 init
```
corky init --user EMAIL [PATH] [--provider PROVIDER]
[--password-cmd CMD] [--labels LABEL,...] [--github-user USER]
[--name NAME] [--mailbox-name NAME] [--sync] [--force]
```
- `PATH`: project directory (default: `.` — current directory)
- Creates `{path}/mail/{conversations,drafts,contacts}/` with `.gitkeep` files
- Generates `.corky.toml` at `{path}/mail/`
- Installs `voice.md` at `{path}/mail/` if not present
- If inside a git repo: adds `mail` to `.gitignore`
- Installs the email skill to `.claude/skills/email/`
- Registers the project dir as a named mailbox in app config
- `--force`: overwrite existing config; without it, exit 1 if `.corky.toml` exists
- `--sync`: set `CORKY_DATA` env, run sync
- `--provider`: `gmail` (default), `protonmail-bridge`, `imap`
- `--labels`: default `correspondence` (comma-separated)
- `--mailbox-name`: mailbox name to register (default: `"default"`)
### 5.1.1 install-skill
```
corky install-skill NAME
```
- Install an agent skill into the current directory
- Currently supported: `email` (installs `.claude/skills/email/SKILL.md` and `README.md`)
- Skips files that already exist (never overwrites)
- Works from any directory (mailbox repos ship the skill automatically via `mb add`/`mb reset`)
### 5.2 sync
```
corky sync # incremental IMAP sync (default)
corky sync full # full IMAP resync (ignore saved state)
corky sync account NAME # sync one account
corky sync routes # apply routing to existing conversations
corky sync mailbox [NAME] # push/pull shared mailboxes
```
Bare `corky sync` runs incremental IMAP sync for all configured accounts.
Subcommands:
- `full`: ignore saved state, re-fetch all messages within `sync_days`
- `account NAME`: sync only the named account
- `routes`: apply `[routing]` rules to existing `conversations/*.md` files,
copying matching threads into mailbox `conversations/` directories
- `mailbox [NAME]`: git push/pull shared mailbox repos (alias for `mailbox sync`)
Exit code: 0 on success.
### 5.3 sync-auth
```
corky sync-auth
```
Gmail OAuth setup. Requires `credentials.json` from Google Cloud Console.
Runs a local server on port 3000 for the OAuth callback.
Outputs the refresh token for `.env`.
### 5.4 list-folders
```
corky list-folders [ACCOUNT]
```
Without argument: lists available account names.
With argument: connects to IMAP and lists all folders with flags.
### 5.5 draft push
```
corky draft push FILE [--send]
corky mailbox draft push FILE [--send]
```
Alias: `corky push-draft` (hidden, backwards-compatible).
Default: creates a draft via IMAP APPEND to the drafts folder.
`--send`: sends via SMTP. Requires Status to be `review` or `approved`.
After sending, updates Status field in the file to `sent`.
**Attachments:** When `attachments` is present in YAML frontmatter, the email is sent as
`multipart/mixed` with the text body and binary attachment parts. Content-type is auto-detected
via `mime_guess` (falls back to `application/octet-stream`). File existence is validated at
send time, not draft creation time.
Account resolution for sending:
1. `**Account**` field → match by name in `.corky.toml`
2. `**From**` field → match by email address
3. Fall back to default account
### 5.6 add-label
```
corky add-label LABEL --account NAME
```
Text-level TOML edit to add a label to an account's `labels` array.
Preserves comments and formatting. Returns false if label already present.
### 5.7 contact-add (hidden alias)
```
corky contact-add NAME --email EMAIL [--email EMAIL2]
```
Hidden backward-compatible alias for `contact add`. The `--label` and `--account` flags are accepted but ignored.
### 5.8 watch
```
corky watch [--interval N]
```
IMAP polling daemon. Syncs all accounts, then pushes to shared mailboxes.
Desktop notifications on new messages if `notify = true` in `.corky.toml`.
Clean shutdown on SIGTERM/SIGINT.
### 5.9 audit-docs
```
corky audit-docs
```
Checks instruction files (AGENTS.md, README.md, SKILL.md) against codebase:
- Referenced paths exist on disk
- `uv run` scripts are registered
- Type conventions (msgspec, not dataclasses)
- Combined line budget (1000 lines max)
- Staleness (docs older than source)
### 5.10 help
```
corky help [FILTER]
corky --help
```
Shows command reference. Optional filter matches command names.
### 5.11 mailbox add
```
corky mailbox add NAME --label LABEL [--name NAME] [--github] [--pat] [--public] [--account ACCT] [--org ORG]
```
Alias: `corky mb add`
Without `--github`: creates a plain directory at `mailboxes/{name}/` with conversations/drafts/contacts subdirectories and template files (AGENTS.md, README.md, voice.md, .gitignore).
With `--github`: creates a private GitHub repo (`{org}/to-{name}`), initializes with template files, adds as git submodule at `mailboxes/{name}/`. Updates `.corky.toml`.
`--github`: use a git submodule instead of a plain directory
`--pat`: PAT-based access (prints instructions instead of GitHub collaborator invite)
`--public`: public repo visibility
`--org`: override GitHub org (default: owner's github_user)
### 5.12 mailbox sync
```
corky mailbox sync [NAME]
```
Alias: `corky mb sync`
For each mailbox (or one named): git pull --rebase, copy voice.md if newer, sync GitHub Actions workflow, bidirectional topic sync (§7.7), stage+commit+push local changes, update submodule ref in parent. Skips git ops for plain (non-submodule) directories.
### 5.13 mailbox status
```
corky mailbox status
```
Alias: `corky mb status`
Shows incoming/outgoing commit counts for each mailbox submodule.
### 5.14 mailbox remove
```
corky mailbox remove NAME [--delete-repo]
```
Alias: `corky mb remove`
For plain directories: `rm -rf mailboxes/{name}/`.
For submodules: `git submodule deinit -f`, `git rm`, clean up `.git/modules/{path}`.
Removes from `.corky.toml`.
`--delete-repo`: interactively confirms, then deletes GitHub repo via `gh`.
### 5.15 mailbox rename
```
corky mailbox rename OLD NEW [--rename-repo]
```
Alias: `corky mb rename`
Moves `mailboxes/{old}` to `mailboxes/{new}`. Uses `git mv` for submodules, `mv` for plain dirs.
Updates `.corky.toml`.
`--rename-repo`: also rename the GitHub repo via `gh repo rename`.
### 5.16 mailbox reset
```
corky mailbox reset [NAME] [--no-sync]
```
Alias: `corky mb reset`
Pull latest, regenerate all template files (AGENTS.md, README.md, CLAUDE.md symlink, .gitignore, voice.md, notify.yml) at `mailboxes/{name}/`, commit, push.
`--no-sync`: regenerate files without pull/push.
### 5.17 unanswered
```
corky unanswered [SCOPE] [--from NAME]
corky mailbox unanswered [SCOPE] [--from NAME]
```
Alias: `corky find-unanswered` (hidden, backwards-compatible).
Scans conversations for threads where the last message sender doesn't match `--from`.
Scope argument:
- Omitted → scan root `conversations/` + all `mailboxes/*/conversations/`
- `.` → root `conversations/` only
- `NAME` → `mailboxes/{name}/conversations/` only
`--from` resolution: CLI flag > `[owner] name` in `.corky.toml` > error.
Output is grouped by scope when scanning multiple directories.
Sender regex: `^## (.+?) —` (multiline, em dash)
### 5.18 draft validate
```
corky draft validate [FILE|SCOPE...]
corky mailbox draft validate [FILE|SCOPE...]
```
Alias: `corky validate-draft` (hidden, backwards-compatible).
Validates draft files. Checks: subject heading, required fields (To), recommended fields (Status, Author), valid status value, `---` separator, non-empty body.
Scope argument (when no files given):
- Omitted → scan root `drafts/` + all `mailboxes/*/drafts/`
- `.` → root `drafts/` only
- `NAME` → `mailboxes/{name}/drafts/` only
Exit code: 0 if all valid, 1 if any errors.
### 5.19 mailbox list
```
corky mailbox list
```
Lists all registered mailboxes with paths. Marks the default mailbox. If no mailboxes configured, prints setup instructions.
### 5.20 Global `--mailbox` Flag
```
corky --mailbox NAME <subcommand> [args...]
```
Available on all commands. Resolves the named mailbox via app config and sets `CORKY_DATA` before dispatching to the subcommand.
### 5.21 draft new
```
corky draft new SUBJECT --to EMAIL [--cc EMAIL] [--account NAME] [--from EMAIL]
[--in-reply-to MSG-ID] [--mailbox NAME] [--attach FILE ...]
corky mailbox draft new SUBJECT --to EMAIL [...]
```
Scaffolds a new draft file with pre-filled metadata.
Output: creates `drafts/YYYY-MM-DD-{slug}.md` and prints the path.
- `--mailbox NAME`: create in `mailboxes/{name}/drafts/` instead of root `drafts/`
- `--cc`: CC recipient
- `--account`: sending account name from `.corky.toml`
- `--from`: sending email address
- `--in-reply-to`: message ID for threading
- `--attach`: file path to attach (repeatable)
- Author resolved from `[owner] name` in `.corky.toml`
- Slug collisions handled with `-2`, `-3` suffix (same as sync)
### 5.22 contact add
```
corky contact add NAME --email EMAIL [--email EMAIL2]
corky contact add --from SLUG [NAME]
```
Creates `{data_dir}/contacts/{name}/` with `AGENTS.md` template and `CLAUDE.md` symlink.
Updates `.corky.toml` with the contact's email addresses.
Manual mode (`--email`): requires `NAME` positional. Creates contact with default AGENTS.md.
From-conversation mode (`--from`):
1. Find `conversations/{slug}.md` or `mailboxes/*/conversations/{slug}.md`
2. Parse thread, extract non-owner participants from `from`, `to`, `cc` fields
3. Filter owner emails via `accounts.*.user` in `.corky.toml`
4. Single participant: auto-derive name from display name (slugified)
5. Multiple participants: require positional `NAME` to select one
6. Generate enriched AGENTS.md with pre-filled Topics, Formality, Tone, and Research sections
`--from` and `--email` conflict (clap `conflicts_with`).
### 5.23 contact info
```
corky contact info NAME
```
Aggregates and displays contact information:
1. Contact config from `.corky.toml` (emails)
2. `contacts/{name}/AGENTS.md` content
3. Matching threads from `manifest.toml` (root) and `mailboxes/*/manifest.toml`
4. Summary: thread count, last activity date
Threads are matched where the `contacts` array in manifest contains `NAME`.
### 5.24 contact sync
```
corky contact sync
```
Syncs `contacts/{name}/CLAUDE.md` between root contacts/ and each mailbox contacts/ directory.
**Eligibility (root → mailbox):** A contact syncs to a mailbox if either:
- **Conversation match:** a sender in `mailboxes/{mb}/conversations/*.md` slugifies to the contact name (or an alias)
- **Explicit sharing:** `[contacts.{name}].shared_with` includes the mailbox name
**Mailbox → root:** Always allowed (no eligibility check).
**Resolution:** 3-way merge via content hashes stored in `.sync-state.json`:
1. If only one side changed since last sync → take that side
2. If both changed → conflict: fall back to newest-wins by mtime + warning
3. First sync (no base hash) → mtime wins
Only `CLAUDE.md` is synced; `CLAUDE.local.md` and other files are skipped.
### 5.25 filter auth
```
corky filter auth [--account NAME]
```
Gmail OAuth2 authorization for filter management. Opens a browser for the authorization code flow, starts a local callback server on `127.0.0.1:8484`, and stores the token in the shared token store (keyed as `gmail:{account}` or `gmail:default`).
Required scopes: `gmail.settings.basic` (read/write filters), `gmail.labels` (list labels for name-to-ID resolution).
Client credentials resolved from `[gmail]` in `.corky.toml` (`client_id`/`client_id_cmd`, `client_secret`/`client_secret_cmd`) or env vars (`CORKY_GMAIL_CLIENT_ID`, `CORKY_GMAIL_CLIENT_SECRET`).
### 5.26 filter build
```
corky filter build [--input FILE] [--output FILE]
```
Generates `mailFilters.xml` (Gmail importable format) from filter definitions.
Without `--input`: reads `[[gmail.filters]]` from `.corky.toml`, writes `mailFilters.xml` to the data directory.
With `--input`: reads a standalone TOML file, writes `mailFilters.xml` next to it (or to `--output`).
### 5.27 filter pull
```
corky filter pull [--account NAME]
```
Fetches and displays all current Gmail filters via the Gmail Settings API (read-only). Shows criteria (from, to, subject, query) and actions (add/remove labels, forward) for each filter.
Requires a valid token from `corky filter auth`.
### 5.28 filter push
```
corky filter push [--account NAME] [--dry-run]
```
Pushes `[[gmail.filters]]` from `.corky.toml` to Gmail via the Settings API, replacing all existing filters.
Flow:
1. Load `[[gmail.filters]]` entries from `.corky.toml` (error if none defined)
2. Authenticate via stored token (auto-refreshes if expired)
3. Fetch Gmail label name-to-ID mapping (needed to resolve user label names to API IDs)
4. Convert each config filter to Gmail API format (addresses joined with `OR`, match fields default to `from`)
5. Fetch all existing Gmail filters
6. Delete all existing filters, then create new ones from config
With `--dry-run`: shows existing filters, what would be deleted, and what would be created — no changes made.
**Label resolution:** System labels (INBOX, STARRED, IMPORTANT, SENT, DRAFT, SPAM, TRASH, UNREAD, CATEGORY_*) resolve by name. User labels resolve via case-insensitive lookup against the fetched label map.
**Required OAuth scopes:** `gmail.settings.basic` + `gmail.labels`. If the token was issued before `gmail.labels` was added to the scope list, re-authenticate with `corky filter auth`.
### 5.29 filter check
```
corky filter check [--account NAME]
```
Read-only drift detection: compares `[[gmail.filters]]` in `.corky.toml` against live Gmail filters without making changes.
Flow:
1. Load local filters from `.corky.toml`, convert to API format
2. Fetch remote filters from Gmail (2 API calls: filters + labels)
3. Normalize both to comparable criteria signatures (from/to/query)
4. Report: filters in `.corky.toml` but not on Gmail, filters on Gmail but not in `.corky.toml`
**Watch integration:** `corky watch` runs this check hourly (same cadence as auto-upgrade). On drift, prints a warning suggesting `corky filter push`. Auth or config errors are silently ignored in watch mode.
### 5.30 transcribe
```
corky transcribe FILE [--model NAME] [--language CODE]
[--output FILE] [--speakers NAME,...]
[--diarize]
```
Transcribe an audio file to timestamped text using whisper-rs. Requires the `transcribe` feature flag (`transcribe-cuda` for GPU acceleration).
**Supported formats:** WAV, MP3, FLAC, OGG, M4A, AAC (via symphonia), AMR and others (via ffmpeg fallback).
**Audio pipeline:** All formats are decoded to 16kHz mono f32 PCM. Multi-channel audio is averaged to mono. Resampling uses sinc interpolation via rubato.
**Model resolution:**
1. `--model` flag (e.g. `--model tiny.en`)
2. `[transcription] model` in `.corky.toml`
3. Default: `large-v3-turbo`
Models auto-download from HuggingFace to `~/.cache/corky/models/` (or custom path from `[transcription] model_path`).
**Output modes:**
1. **Plain** (no flags): `[HH:MM:SS.mmm --> HH:MM:SS.mmm] text`
2. **Speaker labels** (`--speakers "Ron,Brian"`): YAML frontmatter + bold speaker names with timestamps. Uses whisper's tdrz speaker turn detection.
3. **Diarization** (`--diarize`): Uses pyannote-rs (ONNX Runtime) for speaker segmentation and embedding-based clustering. More accurate than tdrz for mono phone recordings where both speakers share the same audio channel.
**Diarization pipeline** (requires `diarize` feature):
1. Run whisper transcription → timestamped text segments
2. Convert f32 audio to i16 for pyannote-rs
3. Run segmentation model (`segmentation-3.0.onnx`) → speech segments with timestamps
4. Extract speaker embeddings per segment (`wespeaker_en_voxceleb_CAM++.onnx`)
5. Cluster embeddings via cosine similarity (threshold: 0.5) → speaker IDs
6. Merge: for each whisper segment, find the diarized speaker with most temporal overlap
7. Label output with speaker names
**Interactive speaker labeling** (when `--diarize` without `--speakers`):
The diarization pipeline detects speakers as numeric IDs. When no `--speakers` names are provided, corky shows representative text excerpts for each detected speaker and prompts interactively:
```
Speaker 1 excerpts:
1. "Manufacturing China is up and running..."
2. "the traditional model of payment processing..."
3. "crypto is just superior in so many ways"
Who is Speaker 1? (name or enter to skip): Ron Berkes
Speaker 2 excerpts:
1. "I want to plan for the future of people accepting..."
2. "let's catch up soon"
Who is Speaker 2? (name or enter to skip): Brian Takita
```
When `--speakers` is provided with `--diarize`, names are mapped to speaker IDs in order of first appearance (no interactive prompt).
**ONNX models:** Auto-downloaded from pyannote-rs GitHub releases to `~/.cache/corky/models/`. No gated HuggingFace access required.
**Output format** (with speakers or diarize):
```markdown
---
date: 2026-02-27
type: phone-call
participants:
- Brian Takita
- Ron Berkes
duration: "3:16"
---
**Ron Berkes** [00:00:00.000 → 00:00:05.780]
Good. That post went viral. Oh, yeah.
**Brian Takita** [00:00:05.780 → 00:00:07.360]
Yeah.
```
**Feature flags:**
- `transcribe` — whisper-rs transcription + tdrz speaker turns
- `transcribe-cuda` — GPU-accelerated transcription
- `diarize` — pyannote-rs speaker diarization (implies `transcribe`)
**Edge cases:**
| T1 | File not found | Exit with error message |
| T2 | Unknown model name | Exit with list of known models |
| T3 | No ffmpeg installed | Error with install instructions per OS |
| T4 | `--diarize` without feature | Error: "Diarization support not compiled" |
| T5 | No speakers detected | Output with "Unknown" speaker labels |
| T6 | Single speaker detected | All segments labeled as that speaker |
| T7 | Interactive prompt, user skips | Label as "Speaker N" |
| T8 | Very short segments | May fail embedding extraction; labeled "Unknown" |
## 6. Sync Algorithm
### 6.1 State
Per-account, per-label state: `(uidvalidity: u32, last_uid: u32)`
### 6.2 Incremental Sync
For each account, for each label:
1. `SELECT` the IMAP folder
2. Check `UIDVALIDITY` — if changed from stored value, do full sync
3. If incremental: `SEARCH UID {last_uid+1}:*`, filter out `<= last_uid`
4. If full: `SEARCH SINCE {today - sync_days}`
5. For each UID: `FETCH RFC822`, parse email, merge to thread file
6. Update `(uidvalidity, last_uid)` in state
### 6.3 Message Parsing
From RFC822:
- Subject: `email.header.decode_header()` (handles encoded words)
- From: `email.header.decode_header()`
- To: `email.header.decode_header()` (comma-separated recipients)
- CC: `email.header.decode_header()` (comma-separated recipients)
- Date: raw header string
- Body: walk multipart for `text/plain` without `Content-Disposition`, or get payload for non-multipart
- Thread key: `thread_key_from_subject(subject)`
### 6.4 Merge
For each message:
1. Find existing thread file by scanning `**Thread ID**` metadata in all `.md` files
2. If found, parse back into Thread object
3. Check dedup: `(from, date)` tuple
4. If new: append message, sort by date, update `last_date`
5. Accumulate labels and accounts
6. Write markdown, set file mtime to last message date
### 6.5 Orphan Cleanup
On `--full` sync: track all files written/updated. After sync, delete any `.md` files in `conversations/` not in the touched set.
### 6.6 State Persistence
State is saved only after all accounts complete successfully. If sync crashes mid-way, state is not saved — next run re-fetches.
### 6.7 Contact Sync
**State:** Per-contact, per-mailbox FNV-1a content hash in `.sync-state.json` under `contacts.{name}.mailboxes.{mb}`.
**Slugification:** Sender names from `## {Name} — {Date}` headers are slugified: strip " via ..." suffix, strip `<email>` brackets, lowercase, replace non-alphanumeric with hyphens, collapse runs, trim.
**Eligibility:** `build_eligible_set()` scans conversation headers + checks `shared_with` + resolves aliases per mailbox.
**3-way merge:** For each contact-mailbox pair where both files exist:
1. Hash root and mailbox content
2. If hashes match → already in sync, record hash
3. Compare each hash against stored base:
- Root matches base → root unchanged, take mailbox
- Mailbox matches base → mailbox unchanged, take root
- Neither matches → conflict, mtime tiebreak + warning
- No base → first sync, mtime tiebreak
4. Write winning content, preserve mtime, update stored hash
**Ineligible contacts (both exist):** Same 3-way logic but only allows mailbox → root direction.
## 7. Mailbox Lifecycle
### 7.1 Add
Without `--github` (plain directory):
1. Create `mailboxes/{name}/` with conversations/drafts/contacts subdirectories
2. Write template files (AGENTS.md, CLAUDE.md symlink, README.md, voice.md, .gitignore, `.claude/skills/email/`)
3. Update `.corky.toml`
With `--github` (submodule):
1. Create GitHub repo (`gh repo create`)
2. Add collaborator (`gh api repos/.../collaborators/...`) or print PAT instructions
3. Clone to temp dir, write template files, commit, push
4. Add as git submodule at `mailboxes/{name}/`
5. Update `.corky.toml`
### 7.2 Sync
1. `git pull --rebase` in submodule (skipped for plain directories)
2. Copy `voice.md` if root copy is newer
3. Sync workflow template if newer
4. **Bidirectional topic sync** (see §7.7)
5. Stage, commit, push local changes (skipped for plain directories)
6. Update submodule ref in parent (`git add {submodule_path}`) (skipped for plain directories)
### 7.3 Status
For each mailbox submodule:
1. `git fetch`
2. `git rev-list --count HEAD..@{u}` (incoming)
3. `git rev-list --count @{u}..HEAD` (outgoing)
### 7.4 Remove
For plain directories: `rm -rf mailboxes/{name}/`.
For submodules:
1. `git submodule deinit -f`
2. `git rm -f`
3. Clean up `.git/modules/{path}`
Then:
4. Remove from `.corky.toml`
5. Optionally delete GitHub repo (interactive confirmation)
### 7.5 Rename
1. Move `mailboxes/{old}` to `mailboxes/{new}` (`git mv` for submodules, `mv` for plain dirs)
2. Optionally `gh repo rename`
3. Update `.corky.toml` entry
### 7.6 Reset
1. `git pull --rebase` (submodules only)
2. Regenerate: AGENTS.md, CLAUDE.md (symlink), README.md, .gitignore, voice.md, `.claude/skills/email/` at `mailboxes/{name}/`
3. Stage, commit, push (submodules only)
4. Update submodule ref in parent (submodules only)
### 7.7 Topic Sync
Topics are bidirectionally synced between root and mailbox during `corky mailbox sync`.
**Configuration:** Topics opt in via the `mailboxes` field in `.corky.toml`:
```toml
[topics.brian-takita]
keywords = ["corky", "agent-doc"]
mailboxes = ["lucas"]
description = "Personal brand and tooling identity"
```
**Directory mapping:**
- Root: `topics/{name}/` (e.g., `topics/brian-takita/README.md`)
- Mailbox: `mailboxes/{mailbox}/topics/{name}/` (e.g., `mailboxes/lucas/topics/brian-takita/README.md`)
**Algorithm:**
1. Load topics from `.corky.toml`, filter by `mailboxes` containing the current mailbox name
2. For each matching topic, collect all files from both root and mailbox topic directories
3. For each file in the union:
- **Forward (root → mailbox):** Copy if root file is newer or mailbox file is missing
- **Reverse (mailbox → root):** Copy if mailbox file is newer or root file is missing
4. Newer file wins (mtime comparison). Missing files are always copied.
**Edge cases:**
| TS1 | No topics list this mailbox | No-op |
| TS2 | Root topic dir exists, mailbox doesn't | Forward copy creates mailbox topic dir |
| TS3 | Mailbox topic dir exists, root doesn't | Reverse copy creates root topic dir |
| TS4 | Both exist, root newer | Forward copy (root wins) |
| TS5 | Both exist, mailbox newer | Reverse copy (mailbox wins) |
| TS6 | New file on one side only | Copied to other side |
| TS7 | Different files on each side | Both copied (union of files) |
| TS8 | Subdirectories in topic | Recursively synced |
## 8. Draft Lifecycle
### 8.1 Create
Manual: create file in `{data_dir}/drafts/` or `{data_dir}/mailboxes/{name}/drafts/`.
Filename convention: `YYYY-MM-DD-{slug}.md`.
### 8.2 Validate
`corky draft validate` checks format. See section 5.18.
### 8.3 Push / Send
`corky draft push FILE`: IMAP APPEND to drafts folder.
`corky draft push FILE --send`: SMTP send, update Status to `sent`.
Account resolution: Account field → From field → default account.
## 9. Watch Daemon
### 9.1 Poll Loop
```
while not shutdown:
for each account:
sync_account(full=false)
save_state()
count_new = compare uid snapshots before/after
if count_new > 0:
sync_mailboxes()
notify(count_new)
schedule_run() # publish any due scheduled items (email + social)
wait(interval) or shutdown
```
### 9.2 Signals
SIGTERM, SIGINT → clean shutdown (finish current poll, then exit).
### 9.3 Notifications
- macOS: `osascript -e 'display notification ...'`
- Linux: `notify-send`
- Silently degrades if tool not installed.
### 9.4 Config
`[watch]` section in `.corky.toml`:
- `poll_interval`: seconds (default 300)
- `notify`: bool (default false)
CLI `--interval` overrides config.
## 10. Provider Presets
| imap_host | imap.gmail.com | 127.0.0.1 | (required) |
| imap_port | 993 | 1143 | 993 |
| imap_starttls | false | true | false |
| smtp_host | smtp.gmail.com | 127.0.0.1 | (required) |
| smtp_port | 465 | 1025 | 465 |
| drafts_folder | [Gmail]/Drafts | Drafts | Drafts |
Preset values are defaults — any field explicitly set on the account wins.
## 11. Account Resolution
### 11.1 Secret Resolution
Shared pattern (`util::resolve_secret`) used by both account passwords and social OAuth credentials:
1. Inline value (non-empty string)
2. `_cmd` field (shell command via `sh -c`, capture stdout, strip trailing whitespace)
3. Error with context message if both are empty
**Account passwords:** `password` > `password_cmd` > error
**Social credentials:** `client_id` > `client_id_cmd` > env var fallback (see §12.5)
### 11.2 Sending Account
For `draft push`:
1. `**Account**` metadata field → lookup by name in `.corky.toml`
2. `**From**` metadata field → lookup by email address (case-insensitive)
3. Default account (first with `default = true`, or first in file)
4. Credential bubbling (see §11.3)
### 11.3 Credential Bubbling
When a draft lives inside a `mailboxes/` subtree, the child mailbox may not have its own IMAP/SMTP credentials. Corky resolves credentials bottom-up:
1. Check the leaf mailbox's `.corky.toml` for matching account credentials
2. Walk parent directories upward, checking each `.corky.toml` for an account whose `user` matches the `**From**` address
3. First match wins
4. If no credentials found at any level, bail with error
This enables child mailboxes to draft replies that the parent's account sends.
## 12. Social Media Posting
Corky supports publishing social media posts through a unified multi-platform architecture.
### 12.1 Supported Platforms
| LinkedIn | Implemented | REST (OAuth2 authorization code) |
| Bluesky | Planned | AT Protocol |
| Mastodon | Planned | ActivityPub |
| Twitter | Planned | OAuth2 |
### 12.2 Profile Registry (profiles.toml)
Lives at `{data_dir}/profiles.toml`. Maps human profile names to platform handles and URNs.
```toml
[btakita]
[btakita.linkedin]
handle = "brian-takita"
urn = "urn:li:person:abc123"
[btakita.twitter]
handle = "btakita"
```
**Validation checks:**
1. No duplicate handles within the same platform
2. No duplicate URNs within the same platform
3. No cross-profile URN reuse (same URN across different profiles)
4. Every profile should have at least one platform entry (warning)
5. Cross-platform coherence info (same profile, multiple platforms)
### 12.3 Social Drafts
Social drafts live in `{data_dir}/social/` as Markdown files with YAML frontmatter.
**Filename convention:** `YYYYMMDD-HHMMSS-{platform}.md`
**Frontmatter fields:**
| `platform` | yes | — | linkedin, bluesky, mastodon, twitter |
| `author` | yes | — | Profile name in profiles.toml |
| `visibility` | no | `public` | public, connections (platform-specific) |
| `status` | no | `draft` | draft → ready → published |
| `tags` | no | `[]` | Freeform tags |
| `scheduled_at` | no | — | Future publish time (not yet implemented) |
| `published_at` | no | — | Set on publish |
| `post_id` | no | — | Set on publish (platform post ID) |
| `post_url` | no | — | Set on publish (permalink) |
| `images` | no | `[]` | List of image paths (relative to draft file) |
**Images:** The `images` field accepts a list of file paths relative to the draft file location. On publish, each image is uploaded to the platform and attached to the post. LinkedIn supports up to 20 images per post (1 image = single image post, 2+ = carousel).
**Status transitions:** `draft` → `ready` → `published` (one-way).
### 12.4 Token Store
OAuth tokens stored at `{app_config_dir}/tokens.json` keyed by platform URN.
- File permissions: 0600 (owner read/write only)
- Tokens have a 5-minute grace window: tokens expiring within 5 minutes are treated as expired
- Token fields: access_token, refresh_token (optional), expires_at, scopes, platform
### 12.5 OAuth Flow
Authorization code flow (LinkedIn):
1. Build auth URL with client_id, redirect_uri, state, scopes
2. Open browser (`open` crate)
3. Start local HTTP server on `127.0.0.1:8484` (`tiny_http`)
4. Wait for callback (120s timeout)
5. Verify state parameter (CSRF protection)
6. Exchange code for token via POST
7. Fetch user URN via `/v2/userinfo`
8. Store token in tokens.json
Client credentials resolution order per field:
1. Inline value in `.corky.toml` (e.g. `client_id = "..."`)
2. Shell command via `_cmd` suffix (e.g. `client_id_cmd = "pass corky/linkedin/client_id"`)
3. Environment variable (`CORKY_LINKEDIN_CLIENT_ID` / `CORKY_LINKEDIN_CLIENT_SECRET`)
### 12.6 Publish Flow
1. Parse draft file (YAML frontmatter + body)
2. Verify status is `ready` (not `draft` or `published`)
3. Resolve author → URN via profiles.toml
4. Lookup valid token for URN in token store
5. Upload images (if any): resolve paths relative to draft file, call platform image upload API
6. Call platform API (LinkedIn: POST /rest/posts) with image URNs
7. Update draft frontmatter: status=published, post_id, post_url, published_at
**LinkedIn image upload flow:**
1. `POST /rest/images?action=initializeUpload` with `owner: urn:li:person:{id}` → returns upload URL + image URN
2. `PUT` binary image data to the upload URL
3. Include image URN(s) in post payload: single image → `content.media.id`, 2+ images → `content.multiImage.images[]`
**LinkedIn limits:** 3000 character post body, 20 images max, visibility: PUBLIC or CONNECTIONS.
### 12.7 CLI Commands
```
corky linkedin auth [--profile NAME] # OAuth flow, stores token
corky linkedin draft [BODY] [--author X] [--visibility public] [--tags X,Y]
corky linkedin publish <file> # Publish ready draft
corky linkedin check # Validate profiles.toml
corky linkedin list [--status X] # List LinkedIn drafts
corky linkedin rename-author <old> <new> # Rename across drafts + profiles
```
### 12.8 Edge Case Table
| **Profile Validation** | | |
| P1 | Duplicate handle within same platform | Error: handle already mapped to different URN |
| P2 | Duplicate URN within same platform | Error: URN used by multiple profiles |
| P3 | Same URN across different profiles | Error: cross-profile URN conflict |
| P4 | Profile with no platform entries | Warning: profile has no platform entries |
| P5 | Cross-platform coherence (same profile, multiple platforms) | Info: verify same person |
| P6 | Empty profiles.toml | OK: empty map, no errors |
| P7 | Malformed TOML | Error with parse location |
| P8 | Missing profiles.toml | Error with guidance to run `corky linkedin check` |
| P9 | Collaborator merge introduces duplicate URN | Validation surfaces conflict |
| **Draft Parsing** | | |
| D1 | Valid draft with all fields | Parsed correctly |
| D2 | Missing required field (platform) | Error: missing field |
| D3 | Missing required field (author) | Error: missing field |
| D4 | Unknown platform | Error with list of supported platforms |
| D5 | Invalid status value | Error with valid statuses |
| D6 | No YAML frontmatter delimiters | Error: missing `---` |
| D7 | Empty body after frontmatter | Warning: empty post body |
| D8 | Render/parse round-trip | Identical meta |
| D9 | Malformed YAML | Error with parse details |
| **Token Store** | | |
| T1 | Missing tokens.json | Returns empty store |
| T2 | Save/load round-trip | All fields preserved |
| T3 | Get valid token | Returns Some |
| T4 | Get expired token | Returns None |
| T5 | Token in grace window (<5 min) | Returns None |
| T6 | Multiple tokens for different URNs | Correct per-URN lookup |
| T7 | Upsert overwrites existing | New replaces old |
| T8 | File permissions 0600 on save | Verified via metadata |
| T9 | Malformed tokens.json | Error with parse details |
| **Auth Flow** | | |
| A1 | Correct auth URL generation | Contains client_id, redirect_uri, scopes, state |
| A2 | Valid callback parse | Extracts code + state |
| A3 | Callback missing code | Error message |
| A4 | State mismatch (CSRF) | Error message |
| A5 | Callback with error param | Error: user denied |
| **Publish Flow** | | |
| PB1 | Draft not in "ready" status | Error: wrong status |
| PB2 | Already published | Error: already published |
| PB3 | Author not in profiles.toml | Error with available profiles |
| PB4 | Author has no entry for draft's platform | Error: no platform entry |
| PB5 | No token for resolved URN | Error with `corky linkedin auth` guidance |
| PB6 | Expired token | Error with re-auth guidance |
| PB7 | Successful publish | Updates post_id, post_url, published_at, status |
| PB8 | Network error during API call | Error propagated with context |
| PB9 | API error response (403, etc.) | Error with HTTP status + body |
| PB10 | Body exceeds 3000 char limit | Error with char count |
| **Image Upload** | | |
| IM1 | No images in draft | Text-only post (no content field) |
| IM2 | Image file not found | Error with resolved path |
| IM3 | Too many images (> 20) | Error with count and limit |
| IM4 | Draft round-trip with images | Images preserved in YAML |
| IM5 | Image path resolution | Resolved relative to draft file directory |
| IM6 | Empty images list | Same as no images (omitted from YAML) |
## 13. Scheduling
Unified scheduling for social media posts and email drafts. A single `schedule` module scans both draft systems and dispatches to their existing publish paths.
### 13.1 Architecture
`corky/src/schedule.rs` — single-file module, no traits or shared abstractions.
**Types:**
- `ScheduledKind` — enum: `Social`, `Email`
- `ScheduledItem` — `{ path, kind, scheduled_at, label }`
- `ProcessResult` — `{ path, kind, success, message }`
**Flow:**
1. Scan `social/` for `.md` files where `status: ready` and `scheduled_at <= now + grace`
2. Scan `drafts/` and `mailboxes/*/drafts/` for `.md` files where `**Status**: scheduled` and `**Scheduled-At** <= now + grace`
3. Sort by `scheduled_at` ascending (earliest first)
4. Dispatch: `Social` → `social::publish::publish(path)`, `Email` → `draft::run(path, send=true)`
5. Report results per-item
**Grace window:** 30 seconds. Items scheduled up to 30s in the future are still considered due (handles cron drift / clock skew).
### 13.2 Email Draft Scheduling
Added `**Status**: scheduled` as a valid send status alongside `review` and `approved`.
Email draft format:
```markdown
**Status**: scheduled
**Scheduled-At**: 2026-02-25T09:00:00Z
```
Status flow: `draft` → `scheduled` → `sent`
The `Scheduled-At` field uses RFC 3339 / ISO 8601 format with timezone (UTC recommended).
### 13.3 Social Draft Scheduling
Social drafts already have `scheduled_at: Option<DateTime<Utc>>` in YAML frontmatter (§12.3). The scheduler checks for `status: ready` combined with `scheduled_at` in the past.
Status flow: `draft` → `ready` (with `scheduled_at` set) → `published`
### 13.4 CLI Commands
```
corky schedule run # Process all due scheduled items
corky schedule run --dry-run # Show what would be published without doing it
corky schedule list # List all pending scheduled items with times
```
`corky watch` includes scheduled publishing in its poll loop — no separate cron entry needed.
Run `corky watch` and it handles both IMAP sync and scheduled publishing.
`corky schedule run` remains available for one-shot use.
**Standalone cron** (alternative to `corky watch`):
```
*/5 * * * * corky schedule run
```
### 13.5 Edge Case Table
| S1 | No scheduled items | Exit 0, no output |
| S2 | Social item due | Publish via `social::publish`, print summary |
| S3 | Email item due | Send via `draft::run(send=true)`, print summary |
| S4 | Item in future | Skipped |
| S5 | `scheduled_at` missing on ready/scheduled item | Skipped (not a scheduled item) |
| S6 | Multiple items due | Process all, sorted by time, per-item results |
| S7 | Email with wrong status (not "scheduled") | Skipped |
| S8 | Social with wrong status (not "ready") | Skipped |
| S9 | Non-.md files in scan directories | Ignored |
| S10 | Item within 30s grace window | Treated as due |
| S11 | Publish fails (network) | Error logged, item stays scheduled, exit 1 |
| S12 | `--dry-run` flag | Print what would happen, don't publish |