hephasm 0.1.0

Assembler for Asmodeus architecture with macro support and extended instructions
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
# Hephasm

**Assembler for Asmodeus Language**

```
┌───────────────────────────────────────────────────────────────┐
│                                                               │
│  ██╗  ██╗███████╗██████╗ ██╗  ██╗ █████╗ ███████╗███╗   ███╗  │
│  ██║  ██║██╔════╝██╔══██╗██║  ██║██╔══██╗██╔════╝████╗ ████║  │
│  ███████║█████╗  ██████╔╝███████║███████║███████╗██╔████╔██║  │
│  ██╔══██║██╔══╝  ██╔═══╝ ██╔══██║██╔══██║╚════██║██║╚██╔╝██║  │
│  ██║  ██║███████╗██║     ██║  ██║██║  ██║███████║██║ ╚═╝ ██║  │
│  ╚═╝  ╚═╝╚══════╝╚═╝     ╚═╝  ╚═╝╚═╝  ╚═╝╚══════╝╚═╝     ╚═╝  │
│                                                               │
│             AST Converter for Asmodeus Language               │
└───────────────────────────────────────────────────────────────┘
```

**Hephasm** is the assembler component of the Asmodeus toolchain. It takes the Abstract Syntax Tree (AST) from Parseid and generates binary machine code that can be executed on the Machine W virtual machine (Asmachina). Features multi-pass assembly, macro expansion, symbol resolution, and extended instruction set support.

## 🎯 Features

### Core Assembly Capabilities
- **Multi-Pass Assembly**: Three-pass assembler for complete symbol resolution
- **Macro Expansion**: Full macro system with parameter substitution
- **Symbol Table Management**: Forward and backward label references
- **Extended Instruction Set**: Support for MNO, DZI, MOD operations
- **Multiple Addressing Modes**: All Machine W addressing modes supported
- **Directive Processing**: Data definition and memory reservation

### Advanced Features
- **Error Reporting**: Detailed error messages with line numbers
- **Optimization**: Basic code optimization during assembly
- **Binary Generation**: Compact 16-bit machine code output
- **Address Validation**: Bounds checking for all memory references
- **Type Safety**: Operand type validation and conversion

## 🚀 Quick Start

### Basic Usage

```rust
use hephasm::{assemble_source, assemble_program};
use parseid::parse_source;

// Assemble from source code directly
let source = r#"
    start:
        POB #42     ; Load immediate value
        WYJSCIE     ; Output the value
        STP         ; Stop program
"#;

let machine_code = assemble_source(source)?;
println!("Generated {} words of machine code", machine_code.len());

// Or assemble from AST
let ast = parse_source(source)?;
let machine_code = assemble_program(&ast)?;
```

### Extended Instruction Set

```rust
use hephasm::assemble_source_extended;

let extended_program = r#"
    ; Extended arithmetic operations
    start:
        POB #15     ; Load 15
        MNO #3      ; Multiply by 3 (45)
        DZI #5      ; Divide by 5 (9)
        MOD #7      ; Modulo 7 (2)
        WYJSCIE     ; Output result
        STP
"#;

// Enable extended instruction set
let machine_code = assemble_source_extended(extended_program, true)?;
```

### Examining Generated Code

```rust
let source = r#"
    main:
        POB data
        DOD #10
        WYJSCIE
        STP
    data: RST 42
"#;

let machine_code = assemble_source(source)?;

// Print generated instructions in hex
for (addr, word) in machine_code.iter().enumerate() {
    println!("0x{:04X}: 0x{:04X} ({})", addr, word, word);
}

// Expected output:
// 0x0000: 0x2004 (8196)  -- POB 4 (direct addressing)
// 0x0001: 0x090A (2314)  -- DOD #10 (immediate addressing)  
// 0x0002: 0x7800 (30720) -- WYJSCIE
// 0x0003: 0x3800 (14336) -- STP
// 0x0004: 0x002A (42)    -- data: RST 42
```

## 🏗️ Assembly Process

### Three-Pass Assembly

Hephasm uses a sophisticated three-pass assembly process:

```
Pass 1: Macro Expansion
├── Collect macro definitions
├── Expand macro calls with parameter substitution
└── Generate expanded program without macros

Pass 2: Symbol Table Building  
├── Scan all labels and data definitions
├── Calculate addresses for all symbols
├── Build complete symbol table
└── Validate symbol references

Pass 3: Code Generation
├── Process instructions into machine code
├── Resolve all symbol references
├── Apply addressing mode encoding
└── Generate final binary output
```

### Instruction Encoding

Machine W instructions use 16-bit encoding:

```
┌─────────────┬─────────────┬─────────────────────────┐
│   Opcode    │ Addr Mode   │       Operand           │
│   (5 bits)  │  (3 bits)   │      (8 bits)           │
└─────────────┴─────────────┴─────────────────────────┘
 15         11 10          8 7                       0
```

## 🔧 API Reference

### Main Functions

```rust
// Assemble from source code
pub fn assemble_source(source: &str) -> Result<Vec<u16>, Box<dyn std::error::Error>>;

// Assemble with extended instruction set
pub fn assemble_source_extended(source: &str, extended_mode: bool) 
    -> Result<Vec<u16>, Box<dyn std::error::Error>>;

// Assemble from AST
pub fn assemble_program(program: &Program) -> Result<Vec<u16>, AssemblerError>;
pub fn assemble_program_extended(program: &Program, extended_mode: bool) 
    -> Result<Vec<u16>, AssemblerError>;
```

### Assembler Class

For advanced usage and control:

```rust
use hephasm::Assembler;

let mut assembler = Assembler::new();
// or with extended instruction set
let mut assembler = Assembler::new_with_extended(true);

let machine_code = assembler.assemble(&ast)?;
```

### Error Types

```rust
#[derive(Debug, thiserror::Error)]
pub enum AssemblerError {
    #[error("Undefined symbol '{symbol}' at line {line}")]
    UndefinedSymbol { symbol: String, line: usize },
    
    #[error("Invalid opcode '{opcode}' at line {line}")]
    InvalidOpcode { opcode: String, line: usize },
    
    #[error("Address out of bounds: {address} at line {line}")]
    AddressOutOfBounds { address: u16, line: usize },
    
    #[error("Extended instruction '{instruction}' not enabled at line {line}")]
    ExtendedInstructionNotEnabled { instruction: String, line: usize },
    
    #[error("Invalid addressing mode for instruction at line {line}")]
    InvalidAddressingMode { line: usize },
    
    #[error("Macro '{name}' already defined at line {line}")]
    DuplicateMacro { name: String, line: usize },
    
    #[error("Parser error: {0}")]
    ParserError(#[from] parseid::ParserError),
}
```

## 📖 Examples

### Basic Assembly

```rust
use hephasm::assemble_source;

let basic_program = r#"
    ; Simple addition program
    start:
        POB first       ; Load first number
        DOD second      ; Add second number
        WYJSCIE         ; Output result
        STP             ; Stop

    first:  RST 25      ; Data: 25
    second: RST 17      ; Data: 17
"#;

let machine_code = assemble_source(basic_program)?;

// Verify the generated code
assert_eq!(machine_code.len(), 6);

// Check instruction encoding
// POB first (address 4) -> direct addressing
let pob_instruction = machine_code[0];
let opcode = (pob_instruction >> 11) & 0b11111;
let addr_mode = (pob_instruction >> 8) & 0b111;
let operand = pob_instruction & 0xFF;

assert_eq!(opcode, 0b00100);  // POB opcode
assert_eq!(addr_mode, 0b000); // Direct addressing
assert_eq!(operand, 4);       // Address of 'first'
```

### Macro Assembly

```rust
let macro_program = r#"
    ; Define a macro for adding two values
    MAKRO add_values val1 val2
        POB val1
        DOD val2
        WYJSCIE
    KONM
    
    ; Define another macro with complex logic
    MAKRO conditional_add condition value
        POB condition
        SOM skip_add
        POB result
        DOD value
        LAD result
    skip_add:
        ; Continue...
    KONM
    
    start:
        add_values #10 #20      ; Expands to POB #10, DOD #20, WYJSCIE
        conditional_add flag data_value
        STP
        
    flag: RST 1
    data_value: RST 15
    result: RPA
"#;

let machine_code = assemble_source(macro_program)?;

// The assembler will expand macros and resolve all symbols
println!("Macro program assembled to {} words", machine_code.len());
```

### Extended Instruction Assembly

```rust
use hephasm::assemble_source_extended;

let extended_program = r#"
    ; Factorial calculation using extended instructions
    start:
        POB n           ; Load 5
        LAD counter     ; counter = 5
        POB one         ; result = 1
        LAD result
        
    factorial_loop:
        POB counter     ; if counter == 0, done
        SOZ done
        
        POB result      ; result *= counter
        MNO counter     ; Extended multiplication
        LAD result
        
        POB counter     ; counter--
        ODE one
        LAD counter
        
        SOB factorial_loop
        
    done:
        POB result      ; Output result (120)
        WYJSCIE
        STP
        
    n:       RST 5
    one:     RST 1
    counter: RPA
    result:  RPA
"#;

let machine_code = assemble_source_extended(extended_program, true)?;
```

### Addressing Mode Examples

```rust
let addressing_program = r#"
    test_addressing:
        ; Direct addressing
        POB value           ; Load from memory address
        
        ; Immediate addressing
        POB #42             ; Load literal value
        DOD #10             ; Add literal value
        
        ; Indirect addressing
        POB [pointer]       ; Load from address stored at pointer
        
        ; Register addressing (if supported)
        POB R1              ; Load from register
        LAD R2              ; Store to register
        
        STP
        
    value:   RST 100
    pointer: RST value      ; Points to 'value'
"#;

let machine_code = assemble_source(addressing_program)?;

// Each addressing mode gets encoded differently
for (i, instruction) in machine_code.iter().enumerate() {
    let addr_mode = (instruction >> 8) & 0b111;
    match addr_mode {
        0b000 => println!("Instruction {} uses direct addressing", i),
        0b001 => println!("Instruction {} uses immediate addressing", i),
        0b010 => println!("Instruction {} uses indirect addressing", i),
        0b011 => println!("Instruction {} uses register addressing", i),
        _ => {}
    }
}
```

### Data Definition and Directives

```rust
let data_program = r#"
    ; Data section with various formats
    program_start:
        POB number
        DOD hex_value
        WYJSCIE
        STP
        
    ; Data definitions
    number:     RST 42          ; Decimal
    hex_value:  RST 0x2A        ; Hexadecimal (same as 42)
    binary_val: RST 0b101010    ; Binary (same as 42)
    negative:   RST -10         ; Negative number
    
    ; Memory reservations
    buffer:     RPA             ; Reserve one word (initialized to 0)
    array:      RPA, RPA, RPA   ; Reserve three words
"#;

let machine_code = assemble_source(data_program)?;

// Data values are placed in memory after code
let code_size = 4; // 4 instructions
println!("Data starts at word {}", code_size);
println!("number = {}", machine_code[code_size]);     // 42
println!("hex_value = {}", machine_code[code_size + 1]); // 42
```

### Error Handling

```rust
use hephasm::{assemble_source, AssemblerError};

// Program with undefined symbol
let bad_program = r#"
    start:
        POB undefined_symbol    ; Error: symbol not defined
        STP
"#;

match assemble_source(bad_program) {
    Ok(_) => println!("Assembly successful"),
    Err(e) => {
        if let Some(assembler_err) = e.downcast_ref::<AssemblerError>() {
            match assembler_err {
                AssemblerError::UndefinedSymbol { symbol, line } => {
                    println!("Undefined symbol '{}' at line {}", symbol, line);
                }
                AssemblerError::AddressOutOfBounds { address, line } => {
                    println!("Address {} out of bounds at line {}", address, line);
                }
                _ => println!("Other assembler error: {}", assembler_err),
            }
        }
    }
}

// Program with extended instruction but no extended mode
let extended_without_flag = r#"
    start:
        MNO #5      ; Error: extended instruction not enabled
        STP
"#;

match assemble_source(extended_without_flag) {
    Err(e) => {
        if let Some(AssemblerError::ExtendedInstructionNotEnabled { instruction, line }) 
            = e.downcast_ref::<AssemblerError>() {
            println!("Extended instruction '{}' not enabled at line {}", 
                     instruction, line);
        }
    }
    Ok(_) => unreachable!(),
}
```

## 🧪 Testing

### Unit Tests

```bash
cargo test -p hephasm
```

### Specific Test Categories

```bash
# Test instruction assembly
cargo test -p hephasm instruction_tests

# Test addressing mode encoding
cargo test -p hephasm addressing_tests

# Test macro expansion
cargo test -p hephasm macro_tests

# Test symbol resolution
cargo test -p hephasm symbol_tests

# Test directive processing
cargo test -p hephasm directive_tests

# Test error conditions
cargo test -p hephasm error_tests
```

### Integration Tests

```bash
cargo test -p hephasm --test integration_tests
```

## 🔍 Performance Characteristics

- **Speed**: ~100K instructions per second assembly
- **Memory**: O(n) where n is program size
- **Passes**: Fixed 3-pass overhead regardless of program size
- **Symbol Resolution**: O(log n) lookup time with hash tables

### Performance Testing

```rust
use hephasm::assemble_source;
use std::time::Instant;

let large_program = include_str!("large_program.asmod");
let start = Instant::now();
let machine_code = assemble_source(large_program)?;
let duration = start.elapsed();

println!("Assembled {} lines into {} words in {:?}", 
         large_program.lines().count(), machine_code.len(), duration);
```

## 🛠️ Advanced Features

### Custom Assembler Configuration

```rust
use hephasm::Assembler;

let mut assembler = Assembler::new_with_extended(true);

// The assembler handles all configuration internally
// Extended mode enables MNO, DZI, MOD instructions
let machine_code = assembler.assemble(&ast)?;
```

### Manual Assembly Control

```rust
use hephasm::Assembler;
use parseid::parse_source;

let source = r#"
    start:
        POB data
        WYJSCIE
        STP
    data: RST 42
"#;

let ast = parse_source(source)?;
let mut assembler = Assembler::new();

// The assembler runs three passes automatically:
// 1. Macro expansion
// 2. Symbol table building  
// 3. Code generation
let machine_code = assembler.assemble(&ast)?;

println!("Final code size: {} words", machine_code.len());
```

## 🔗 Integration with Asmodeus Pipeline

Hephasm is the final transformation step before execution:

```
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Parseid   │───▶│   Hephasm   │───▶│  Asmachina  │
│  (Parser)   │    │ (Assembler) │    │    (VM)     │
│             │    │             │    │             │
└─────────────┘    └─────────────┘    └─────────────┘
        │                   │                   │
        ▼                   ▼                   ▼
   ┌─────────┐         ┌─────────┐         ┌─────────┐
   │   AST   │         │ Machine │         │Execution│
   │         │         │  Code   │         │ Results │
   └─────────┘         └─────────┘         └─────────┘
```

### Complete Pipeline Usage

```rust
use lexariel::tokenize;
use parseid::parse;
use hephasm::assemble_program;
use asmachina::MachineW;

let source = "POB #42\nWYJSCIE\nSTP";

// Complete compilation pipeline
let tokens = tokenize(source)?;              // Lexariel
let ast = parse(tokens)?;                    // Parseid
let machine_code = assemble_program(&ast)?;  // Hephasm

// Execute the result
let mut machine = MachineW::new();
machine.load_program(&machine_code)?;       // Asmachina
machine.run()?;

assert_eq!(machine.get_output_buffer(), &[42]);
```

## 📊 Instruction Set Mapping

### Basic Instructions

| Assembly | Opcode | Encoding | Description |
|----------|--------|----------|-------------|
| `DOD addr` | 0001 | `0001_000_aaaaaaaa` | Add memory[addr] to AK |
| `DOD #val` | 0001 | `0001_001_vvvvvvvv` | Add immediate value to AK |
| `ODE addr` | 0010 | `0010_000_aaaaaaaa` | Subtract memory[addr] from AK |
| `LAD addr` | 0011 | `0011_000_aaaaaaaa` | Store AK to memory[addr] |
| `POB addr` | 0100 | `0100_000_aaaaaaaa` | Load memory[addr] to AK |
| `POB #val` | 0100 | `0100_001_vvvvvvvv` | Load immediate value to AK |
| `SOB addr` | 0101 | `0101_000_aaaaaaaa` | Jump to addr |
| `SOM addr` | 0110 | `0110_000_aaaaaaaa` | Jump to addr if AK < 0 |
| `SOZ addr` | 10000 | `10000_000_aaaaaaa` | Jump to addr if AK = 0 |
| `STP` | 0111 | `0111_000_00000000` | Stop execution |

### Extended Instructions

| Assembly | Opcode | Encoding | Description |
|----------|--------|----------|-------------|
| `MNO addr` | 10001 | `10001_000_aaaaaaa` | Multiply AK by memory[addr] |
| `MNO #val` | 10001 | `10001_001_vvvvvvv` | Multiply AK by immediate |
| `DZI addr` | 10010 | `10010_000_aaaaaaa` | Divide AK by memory[addr] |
| `DZI #val` | 10010 | `10010_001_vvvvvvv` | Divide AK by immediate |
| `MOD addr` | 10011 | `10011_000_aaaaaaa` | AK = AK % memory[addr] |
| `MOD #val` | 10011 | `10011_001_vvvvvvv` | AK = AK % immediate |

## 📜 License

This crate is part of the Asmodeus project and is licensed under the MIT License.

## 🔗 Related Components

- **[Parseid]../parseid/** - Parser that generates AST for Hephasm
- **[Asmachina]../asmachina/** - Virtual machine that executes Hephasm output
- **[Shared]../shared/** - Common types and instruction encoding utilities
- **[Main Asmodeus]../** - Complete language toolchain