ascfix 0.5.5

Automatic ASCII diagram repair tool for Markdown files
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
# Architecture Overview

## High-Level Design

```
Input Markdown File
   Scanner (extract diagram blocks)
    Grid (2D character representation)
  Detector (identify primitives)
 Normalizer (fix alignment & padding)
   Renderer (convert back to ASCII)
    Output Markdown
```

Each component is **independent, testable, and focused**.

---

## Core Philosophy

### Conservative & Idempotent
- **Only process well-understood structures** (boxes, arrows, text)
- **Unknown patterns are preserved unchanged**
- **Running twice produces identical output** (safe to apply repeatedly)
- **No content is added or deleted** (only formatting improved)

### Minimal Assumptions
- Don't infer business logic
- Don't guess intent
- Process only what we can verify
- Fail safely (leave content unchanged if ambiguous)

---

## Module Structure

### `src/main.rs` - Entry Point (32 lines)
**Purpose:** CLI interface and error handling

```rust
fn main() -> Result<()> {
    let args = Args::parse_args();
    let processor = Processor::new(args);
    let exit_code = processor.process_all()?;
    std::process::exit(exit_code);
}
```

**Responsibilities:**
- Parse command-line arguments
- Create processor
- Handle exit codes (0 = success, 1 = check mode failure)
- Error reporting to stderr

---

### `src/lib.rs` - Library API (13 lines)
**Purpose:** Expose public API for library users

```rust
pub mod cli;      // CLI types
pub mod modes;    // Mode-specific processing
```

Enables programmatic usage:
```rust
use ascfix::modes::process_by_mode;
use ascfix::cli::Mode;

let result = process_by_mode(&Mode::Diagram, content);
```

---

### `src/cli.rs` - Command-Line Interface (100+ lines)
**Purpose:** Argument parsing and mode definition

**Key Types:**
```rust
#[derive(ValueEnum)]
pub enum Mode {
    Safe,     // Fix tables only
    Diagram,  // Fix tables + diagrams
    Check,    // Validate without writing
}

pub struct Args {
    pub files: Vec<PathBuf>,
    pub mode: Mode,
    pub in_place: bool,
    pub check: bool,
}
```

**Validation:**
- Files must exist
- Mode selected via `--mode` flag
- Check mode activated with `--check` flag

---

### `src/modes.rs` - Processing Strategies (355+ lines)
**Purpose:** Mode-specific processing implementations

**Three Modes:**

#### Safe Mode
- Detects Markdown table patterns (pipes + separators)
- Normalizes column widths
- Leaves diagrams untouched
- **Safest mode** - minimal changes

#### Diagram Mode
- Full pipeline: scan → detect → normalize → render
- Processes both tables and ASCII diagrams
- Only modifies blocks with detected primitives
- Preserves unknown structures

#### Check Mode
- Same as Diagram mode
- No file writes
- Returns exit code (1 if changes needed)
- Useful for CI/CD validation

**Key Function:**
```rust
pub fn process_by_mode(mode: &Mode, content: &str) -> String
```

---

### `src/scanner.rs` - Block Extraction (180+ lines)
**Purpose:** Identify diagram blocks in Markdown

**Algorithm:**
1. Scan for lines with ASCII diagram characters (box corners, arrows)
2. Group consecutive diagram lines into blocks
3. Preserve line numbers for later reinsertion

**Output:**
```rust
pub struct DiagramBlock {
    pub start_line: usize,
    pub lines: Vec<String>,
}
```

**Safety:**
- Ignores content inside code fences (backticks, tildes)
- Preserves original line positions
- No assumptions about diagram structure

---

### `src/grid.rs` - 2D Grid Representation (200+ lines)
**Purpose:** Convert ASCII diagrams to 2D grid for manipulation

**Design:**
```rust
pub struct Grid {
    cells: Vec<Vec<char>>,
}
```

**Operations:**
- `from_lines()` - Parse ASCII text into grid
- `get(row, col)` - Safe access with bounds checking
- `get_mut(row, col)` - Safe mutable access
- `render()` - Convert back to text
- Dimensions calculated from content

**Key Property:**
- All access is bounds-checked
- Returns `Option<char>` for safe access
- No panics on out-of-bounds

---

### `src/primitives.rs` - Type Definitions (300+ lines)
**Purpose:** Define diagram primitives and detection results

**Core Primitives:**
```rust
// Box style variants
pub enum BoxStyle { Single, Double, Rounded }

pub struct Box {
    pub top_left: (usize, usize),
    pub bottom_right: (usize, usize),
    pub style: BoxStyle,
    pub parent_idx: Option<usize>,      // Hierarchy support
    pub child_indices: Vec<usize>,      // Hierarchy support
}

// Arrow types
pub enum ArrowType { Standard, Double, Long, Dashed }

pub struct HorizontalArrow {
    pub row: usize,
    pub start_col: usize,
    pub end_col: usize,
    pub arrow_type: ArrowType,
    pub rightward: bool,
}

pub struct VerticalArrow {
    pub start_row: usize,
    pub end_row: usize,
    pub col: usize,
    pub arrow_type: ArrowType,
    pub downward: bool,
}

pub struct TextRow {
    pub row: usize,
    pub start_col: usize,
    pub end_col: usize,
    pub content: String,
}

// Connection lines (L-shaped paths)
pub struct Segment {
    pub row: usize,
    pub start_col: usize,
    pub end_col: usize,
}

pub struct ConnectionLine {
    pub segments: Vec<Segment>,
    pub from_box: Option<usize>,
    pub to_box: Option<usize>,
}

// Labels attached to primitives
pub enum LabelAttachment {
    Box(usize),
    HorizontalArrow(usize),
    VerticalArrow(usize),
    ConnectionLine(usize),
}

pub struct Label {
    pub row: usize,
    pub col: usize,
    pub content: String,
    pub attached_to: LabelAttachment,
    pub offset: (isize, isize),      // Relative offset from attachment
}

pub struct PrimitiveInventory {
    pub boxes: Vec<Box>,
    pub horizontal_arrows: Vec<HorizontalArrow>,
    pub vertical_arrows: Vec<VerticalArrow>,
    pub text_rows: Vec<TextRow>,
    pub connection_lines: Vec<ConnectionLine>,  // NEW: Phase 4
    pub labels: Vec<Label>,                     // NEW: Phase 6
}
```

**New in Phases 1-6:**
- **BoxStyle enum** (Phase 1): Support for single-line, double-line, and rounded boxes
- **ArrowType enum** (Phase 2): Standard, double, long, and dashed arrow support
- **Box hierarchy fields** (Phase 5): Parent/child relationships for nested boxes
- **ConnectionLine primitive** (Phase 4): L-shaped connection paths with segments
- **Label primitive** (Phase 6): Text labels with attachment tracking and offset preservation

---

### `src/detector.rs` - Primitive Detection (900+ lines)
**Purpose:** Identify boxes, arrows, text rows, hierarchies, connections, and labels

**Detection Algorithm:**

#### Box Detection (All Styles)
1. Find box corners for all supported styles:
   - Single-line: ┌, ┐, └, ┘
   - Double-line: ╔, ╗, ╚, ╝
   - Rounded: ╭, ╮, ╰, ╯
2. Flood-fill from corner to find connected component
3. Verify rectangle shape (straight lines, all four corners)
4. Detect style from corner characters
5. Record box with style information

#### Arrow Detection (Enhanced Types)
- **Horizontal arrows** (→, ←, ⇒, ⇐, ─): Line-based detection with arrow tips required
- **Vertical arrows** (↓, ↑, ⇓, ⇑, │): Column-based detection with arrow tips required
- **Arrow types detected**: Standard, Double, Long, Dashed
- **Direction detection**: Rightward/downward flags set from arrow tips
- **Requirement:** Must have at least one arrow tip to be recognized

#### Text Detection
- Extract content from inside boxes
- Preserve original text exactly
- Record position within box

#### Box Hierarchy Detection (Phase 5)
- For each pair of boxes, check containment relationship
- Box A is inside Box B if: A's corners are strictly inside B's interior (with 1-cell margin)
- Set parent_idx on child boxes
- Populate child_indices on parent boxes
- Handles multiple children and single parent relationships

#### Connection Line Detection (Phase 4 - Conservative)
- Detect L-shaped paths (limited to 4 segments maximum)
- Trace paths from box edges
- Distinguish from box borders
- Record from_box and to_box indices if endpoints connect
- Skip ambiguous or too-complex structures

#### Label Detection (Phase 6 - Framework)
- Identify text near boxes, arrows, and connections
- Calculate attachment type and offset
- Distance threshold: within 2 cells for attachment
- Length limit: labels max 20 characters
- Conservative: skips ambiguous cases

**Safety:**
- Conservative: Unknown patterns not identified
- Errors don't cause panics
- Ambiguous structures left unchanged
- Hierarchy detection skips overlapping (non-nested) boxes
- Connection line detection requires clear endpoints
- Label detection requires close proximity

---

### `src/normalizer.rs` - Layout Improvement (1200+ lines)
**Purpose:** Fix alignment, padding, sizing, hierarchies, connections, and labels

**Normalization Pipeline:**

1. **Box Width Expansion**
   - Find longest text row in each box
   - Calculate required width (content + 2 borders + padding)
   - Expand right edge if needed
   - Idempotent: only expands, never shrinks

2. **Box Style Preservation**
   - Render boxes with correct characters for their detected style
   - Single-line: ┌─┐│└┘
   - Double-line: ╔═╗║╚╝
   - Rounded: ╭─╮│╰╯

3. **Side-by-Side Box Balancing** (Phase 3)
   - Find groups of vertically overlapping, horizontally adjacent boxes
   - Expand each box to match the widest in its group
   - Applies only to clearly adjacent boxes (gap ≤ 1 cell)

4. **Nested Box Hierarchy Expansion** (Phase 5)
   - For each parent box with children, expand to encompass them
   - Add 1-cell margin around each child
   - Only expands parent, never shrinks
   - Processes innermost to outermost (deterministic order)
   - **Known limitation**: Re-detection on second pass can find new hierarchies

5. **Horizontal Arrow Alignment**
   - Detect arrow types and directions
   - Align start/end columns with box edges
   - Maintain relative position
   - Uses BTreeMap for deterministic ordering

6. **Vertical Arrow Alignment**
   - Align arrow column with box center, edge
   - Choose closest alignment point
   - Preserve relative spacing
   - Handle all arrow types

7. **Connection Line Normalization** (Phase 4)
   - Snap endpoints to box edges
   - Straighten segments
   - Preserve L-shape topology
   - Conservative: skip if ambiguous

8. **Label Normalization** (Phase 6)
   - Move labels with their attached primitives via offset
   - Preserve relative positioning
   - Skip if collision detected

9. **Padding Normalization**
   - Enforce uniform 1-space padding inside boxes
   - Add space inside boxes if missing
   - Consistent formatting

**Key Property:**
- **Idempotent (with limitations)**: Running normalization twice produces identical output for simple diagrams
- Complex diagrams with hierarchies may not be idempotent due to re-detection of hierarchies on second pass
- Verified by tests: `test_normalization_idempotent_*` and `idempotence_tests.rs`

**Data Flow:**
```
PrimitiveInventory
normalize_box_widths()
balance_horizontal_boxes()
normalize_nested_boxes()
align_horizontal_arrows()
align_vertical_arrows()
normalize_connection_lines()
normalize_labels()
normalize_padding()
PrimitiveInventory (normalized)
```

---

### `src/renderer.rs` - ASCII Reconstruction (250+ lines)
**Purpose:** Convert normalized primitives back to ASCII text

**Rendering Order:**
1. Create empty grid with sufficient dimensions
2. Draw boxes (borders + corners)
3. Draw text rows
4. Draw horizontal arrows
5. Draw vertical arrows

**Box Drawing:**
```
┌─────┐    ← corners (┌, ┐, └, ┘)
│ txt │    ← borders (─, │)
└─────┘
```

**Arrow Drawing:**
- Horizontal: Lines of ─ with → or ← tips
- Vertical: Lines of │ with ↓ or ↑ tips
- Only overwrites space cells (doesn't corrupt boxes)

**Output:**
```rust
pub fn render_diagram(inventory: &PrimitiveInventory) -> Grid
```

---

### `src/parser.rs` - Markdown Parsing (150+ lines)
**Purpose:** Extract non-diagram content (titles, text, code blocks)

**Features:**
- Ignores content in code fences (backticks, tildes)
- Extracts inline code preservation info
- Preserves line numbers

**Safety:**
- Respects code block boundaries
- No processing inside fenced code

---

### `src/processor.rs` - Main Orchestrator (200+ lines)
**Purpose:** Coordinate file I/O, mode selection, and output

**Key Function:**
```rust
pub fn process_all(&self) -> Result<i32>
```

**Responsibilities:**
1. Read files (with glob support)
2. Select mode based on CLI args
3. Apply processing
4. Handle output (stdout or in-place write)
5. Return appropriate exit code

**Exit Codes:**
- `0` - Success (or no changes needed in check mode)
- `1` - Check mode found changes OR error occurred

---

### `src/io.rs` - File Operations (150+ lines)
**Purpose:** Safe file reading and writing

**Features:**
- Handles glob patterns (e.g., `docs/*.md`)
- Respects file permissions
- Error handling for missing files
- In-place modification with validation

---

## Data Flow Example

### Processing a Diagram

```
Input: "┌──┐\n│Hi│\n└──┘"
Scanner: [DiagramBlock { start_line: 0, lines: ["┌──┐", "│Hi│", "└──┘"] }]
Grid: cells = [['┌', '─', '─', '┐'],
               ['│', 'H', 'i', '│'],
               ['└', '─', '─', '┘']]
Detector: PrimitiveInventory {
  boxes: [Box { top_left: (0,0), bottom_right: (2,3) }],
  text_rows: [TextRow { row: 1, content: "Hi" }],
  ...
}
Normalizer: (no changes needed - already normalized)
Renderer: Grid with same cells
Output: "┌──┐\n│Hi│\n└──┘"
```

### Processing with Changes

```
Input: "┌──┐\n│Hello│\n└──┘"  (too narrow!)
Detector: Box width = 4, text = "Hello" (length 5)
Normalizer: Expand box to width 7
Renderer: "┌─────┐\n│Hello│\n└─────┘"
```

---

## Testing Strategy

### Unit Tests (122 tests)
Located in source files: `src/**/*.rs`
- Test individual functions
- No external dependencies
- Fast execution

### Integration Tests (129 tests)
Located in `tests/` directory
- Test mode processing
- Test full pipeline
- Real scenarios

### Golden File Tests (6 tests)
Located in `tests/golden_file_tests.rs`
- Test real examples
- Compare against expected output
- Catch regressions

### Test Coverage

| Module | Tests | Focus |
|--------|-------|-------|
| normalizer.rs | 110 | Idempotence, correctness |
| detector.rs | 31 | Primitive detection accuracy |
| renderer.rs | 6 | ASCII reconstruction |
| modes.rs | 129 | Mode-specific behavior |

---

## Design Decisions

### Why Grid-Based?
- **Easy reasoning** - All operations are spatial
- **Visual debugging** - Can print grids to understand state
- **Safe access** - Grid.get() returns Option, no panics
- **Testable** - Each grid state is verifiable

### Why Conservative Detection?
- **Safety** - Unknown patterns not modified
- **Predictability** - Only obvious fixes applied
- **No surprises** - Users always control output

### Why Modes?
- **Safe mode** - For any Markdown file (tables only)
- **Diagram mode** - For files with ASCII diagrams
- **Check mode** - For CI/CD validation
- **Flexibility** - Users choose safety level

### Why Idempotence?
- **Safe to apply repeatedly** - No degradation on re-runs
- **Tested property** - Verified by tests
- **User-friendly** - Can process files multiple times
- **CI/CD compatible** - Can be part of automated workflows

---

## Performance

### Time Complexity
- **Scanner**: O(n) - one pass through content
- **Detector**: O(n) - bounds checking on grid
- **Normalizer**: O(p) where p = primitives (typically small)
- **Renderer**: O(n) - render grid to text

**Overall**: O(n) - linear in file size

### Space Complexity
- **Grid**: O(w × h) where w, h = diagram dimensions
- **Primitives**: O(p) - constant typically
- **Overall**: O(n) - linear in file size

### Optimization
- Processes blocks independently (parallelizable)
- Single pass through content
- No unnecessary allocations

---

## Future Extensibility

### Adding New Primitives
1. Define in `src/primitives.rs`
2. Add detection in `src/detector.rs`
3. Add normalization in `src/normalizer.rs`
4. Add rendering in `src/renderer.rs`
5. Add tests for each step

### Adding New Modes
1. Add variant to `Mode` enum in `src/cli.rs`
2. Implement in `src/modes.rs`
3. Add tests

### Adding New Normalizations
1. Implement in `src/normalizer.rs`
2. Call from pipeline in `src/modes.rs`
3. Add tests

---

## Compilation & Optimization

### Build Modes
```bash
cargo build              # Debug (fast compile, slow runtime)
cargo build --release   # Release (slow compile, fast runtime)
```

### Linting
```bash
cargo clippy --all-targets --all-features -- -D warnings
```

### Code Style
```bash
cargo fmt                # Format code
cargo fmt --check        # Check formatting
```

### Testing
```bash
cargo test               # All tests
cargo test --lib        # Unit tests only
cargo test --release    # Optimized tests
```

---

## Dependencies Rationale

### clap 4.5
- **Why**: Industry standard for CLI parsing
- **Benefit**: Declarative argument definition, auto help generation
- **Alternative considered**: manual argument parsing (too error-prone)

### anyhow 1.0
- **Why**: Ergonomic error handling
- **Benefit**: Minimal overhead, context preservation
- **Alternative considered**: custom error types (too verbose for this project)

### Dev Dependencies
- **tempfile**: Safe temporary file creation for tests
- **insta**: Snapshot testing for golden files

---

## References

- See [CONTRIBUTING.md]CONTRIBUTING.md for development guidelines
- See [SECURITY.md]SECURITY.md for security policies
- See [CHANGELOG.md]CHANGELOG.md for version history