canon-archive 0.2.2

A CLI tool for organizing large media libraries into a canonical archive
# Pattern Expressions

Pattern expressions define how files are organized in archives. They use `{expr}` syntax to insert dynamic values based on facts.

Patterns are used in the `pattern` field of [cluster manifests](../commands/archive/cluster.md). When you run `canon cluster generate`, it creates a manifest with a default `pattern = "{filename}"` that you can customize.

## Basic Syntax

Patterns consist of literal path segments and expressions in curly braces:

```
{content.DateTimeOriginal|year}/{content.DateTimeOriginal|month}/{filename}
```

This would produce paths like: `2024/07/IMG_001.jpg`

## Fact Keys

Any fact key can be used in a pattern:

- `{source.ext}` - File extension
- `{source.mtime}` - Modification time
- `{content.Make}` - Camera manufacturer (from EXIF)
- `{content.hash.sha256}` - Content hash

The `content.` prefix is optional for content facts, so `{Make}` is equivalent to `{content.Make}`.

## Modifiers

Transform values using the `|` syntax. See [Facts](facts.md#modifiers) for the complete list.

```
{source.mtime|year}           → 2024
{source.mtime|yearmonth}      → 2024-07
{content.hash.sha256|short}   → a1b2c3d4
{source.ext|uppercase}        → JPG
```

Multiple modifiers can be chained:

```
{filename|stem|lowercase}     → img_001
```

## Path Accessors

Extract segments from path values using Python-style indexing:

| Syntax | Meaning |
|--------|---------|
| `key[-1]` | Last segment (filename) |
| `key[0]` | First segment |
| `key[1:3]` | Slice segments 1 and 2 |
| `key[:-1]` | All but last segment |

Examples with `source.rel_path = "photos/2024/vacation/IMG_001.jpg"`:

```
{source.rel_path[-1]}         → IMG_001.jpg
{source.rel_path[0]}          → photos
{source.rel_path[1:-1]}       → 2024/vacation
{source.rel_path[-1]|stem}    → IMG_001
```

## Aliases

Aliases provide shorthand for common expressions. Use `canon facts --show-aliases` to see all available aliases.

| Alias | Expands To |
|-------|------------|
| `filename` | `source.rel_path[-1]` |
| `stem` | `source.rel_path[-1]\|stem` |
| `ext` | `source.rel_path[-1]\|ext` |
| `hash` | `content.hash.sha256` |
| `hash_short` | `content.hash.sha256\|short` |
| `id` | `source.id` |

Example using aliases:

```
{hash_short}_{filename}       → a1b2c3d4_IMG_001.jpg
```

## Missing Values

Canon requires all facts used in a pattern to have values for every source. If any source is missing a required fact, `canon apply` will refuse to proceed and report which facts are missing.

When you run `canon cluster generate`, the manifest includes comments listing all facts with 100% coverage—these are safe to use in your pattern.

If sources are missing required facts, you can:
- Filter them out during generation: `--where 'DateTimeOriginal?'`
- Import the missing facts via the [enrichment pipeline]../commands/enrich/index.md

## Common Patterns

```toml
# Flat (all files in one directory)
pattern = "{filename}"

# Preserve original structure
pattern = "{source.rel_path}"

# By EXIF capture date
pattern = "{content.DateTimeOriginal|year}/{content.DateTimeOriginal|month}/{filename}"

# By date with hash prefix (collision-safe)
pattern = "{content.DateTimeOriginal|date}/{hash_short}_{filename}"

# By camera
pattern = "{content.Make}/{content.Model}/{filename}"

# By file type and year
pattern = "{source.ext}/{source.mtime|year}/{filename}"
```