umjunsik 0.1.1

Umjunsik Language compiler targeting Lamina IR
Documentation
# Umjunsik Lang - Lamina

A compiler for Umjunsik Language (엄랭) that targets [Lamina IR](https://github.com/SkuldNorniern/lamina).

## Overview

This project compiles Umjunsik Language to Lamina Intermediate Representation. Lamina is a high-performance compiler backend supporting x86_64 and AArch64 architectures.

## Building

```bash
cargo build --release
```

## Usage

```bash
cargo run -- <file.umm>
```

Or use the compiled binary:

```bash
./target/release/umjunsik <file.umm>
```

## Examples

### Hello World (Print number 3)

`examples/hello.umm`:
```
어떻게
식...!
이 사람이름이냐ㅋㅋ
```

### Multiplication (2 * 2 = 4)

`examples/multiply.umm`:
```
어떻게
식.. ..!
이 사람이름이냐ㅋㅋ
```

### Variables

`examples/variable.umm`:
```
어떻게
엄...
엄어어....
식어어!
이 사람이름이냐ㅋㅋ
```

Explanation:
1. `엄...` : Assign 3 to variable 1
2. `엄어어....` : Assign 4 to variable 2
3. `식어어!` : Print variable 2 (outputs 4)

### Character Output

`examples/printchar.umm`:
```
어떻게
식........... .......ㅋ
식ㅋ
이 사람이름이냐ㅋㅋ
```

Explanation:
1. `식........... .......ㅋ` : Print ASCII character 18
2. `식ㅋ` : Print newline

### Conditional

`examples/conditional.umm`:
```
어떻게
엄...
동탄어?식.....!
식..!
이 사람이름이냐ㅋㅋ
```

Explanation:
1. `엄...` : Assign 3 to variable 1
2. `동탄어?식.....!` : If variable 1 is not 0, print 5
3. `식..!` : Print 2

## Implementation Details

### Lexer
The lexer handles Korean characters and special tokens:
- Recognizes keywords: `어떻게`, ``, ``, ``, ``, `동탄`, `화이팅`, `이 사람이름이냐ㅋㅋ`
- Handles repeated `` for variable indexing (e.g., `어어어` = variable 3)
- Processes dots (`.`) and commas (`,`) as number literals
- Supports one-line programs with `~` separator

### Parser
Builds an Abstract Syntax Tree (AST) with:
- Expression nodes: Number, Variable, Add, Sub, Mul
- Statement nodes: Assign, Input, Print, Conditional, Goto, Return
- Handles operator precedence (multiplication before addition)

### Code Generator
Generates optimized Lamina IR:
- **Two-pass compilation**: First pass analyzes which variables are used, second pass generates code
- **Lazy variable allocation**: Only allocates stack space for variables that are actually used in the program
- **SSA form**: Uses Static Single Assignment with proper load/store operations
- **Memory management**: Variables stored on stack using `alloc.ptr.stack`, initialized to 0
- **Control flow**: Generates proper basic blocks for conditionals and goto statements

Example output for a simple variable program:
```lamina
fn @main() -> i64 {
  entry:
    %var_ptr_0 = alloc.ptr.stack i64  # Only allocate used variables
    store.i64 %var_ptr_0, 0

  line_1:
    %t0 = add.i64 3, 0
    store.i64 %var_ptr_0, %t0
    %t1 = load.i64 %var_ptr_0
    print %t1
    ret.i64 0
}
```

## Implementation Status

### Implemented
- ✅ Lexer/Tokenizer with Korean character support
- ✅ Parser (AST generation)
- ✅ Optimized Lamina IR code generation
- ✅ Basic arithmetic operations (add, subtract, multiply)
- ✅ Smart variable management (lazy allocation, only used variables)
- ✅ Console output (numbers and characters)
- ✅ Newline output
- ✅ Conditionals (동탄)
- ✅ GOTO (준)
- ✅ Program exit (화이팅!)

### Not Implemented
- ❌ Console input (식?) - Placeholder only (requires external function call)
- ❌ Full compilation to native code (generates Lamina IR only)

## Project Structure

```
umjunsik-lang-lamina/
├── src/
│   ├── main.rs          # Main executable
│   ├── lib.rs           # Library root
│   ├── token.rs         # Token definitions
│   ├── lexer.rs         # Lexer (tokenization)
│   ├── ast.rs           # Abstract syntax tree
│   ├── parser.rs        # Parser
│   └── codegen.rs       # Lamina IR code generator
├── examples/            # Example programs
├── Cargo.toml
└── README.md
```

## License

Apache License 2.0

## References

- [Lamina]https://github.com/SkuldNorniern/lamina - High-performance compiler backend