hub-codegen 0.3.0

Multi-language code generator for Hub plugins from Synapse IR
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
# prism: Python Codebase Health Reporter

prism analyzes a Python project directory and prints a structured health report covering
lines of code, cyclomatic complexity, symbol inventory, import graph, and code smells.
It uses only Python stdlib (os, sys, re, ast, json, pathlib, argparse, dataclasses,
collections, textwrap, typing). Target: Python 3.9+.

All source files go under `/workspace/prism/`.
All test files go under `/workspace/prism/tests/`.

Create `/workspace/prism/__init__.py` (empty) and `/workspace/prism/tests/__init__.py`
(empty) as needed.

---

# T1: File Scanner [agent]

Create `/workspace/prism/scanner.py`.

Implement a module that walks a directory tree and collects Python source files.

## Interface

```python
@dataclass
class SourceFile:
    path: str          # absolute path
    rel_path: str      # path relative to project root
    source: str        # full file contents as a string
    size_bytes: int

def scan(root: str, exclude_dirs: list[str] | None = None) -> list[SourceFile]:
    """Walk root recursively, return all .py files.
    Default exclude_dirs: ['.git', '__pycache__', '.venv', 'venv', 'node_modules', '.tox']
    Skips unreadable files silently.
    """
```

Also export `DEFAULT_EXCLUDE_DIRS: list[str]`.

## Tests

Create `/workspace/prism/tests/test_scanner.py`. Use `tempfile.mkdtemp()` to create
a real temporary directory structure. Tests must cover:

- Finds .py files recursively
- Skips excluded directories
- Returns correct rel_path (relative to root)
- SourceFile.source contains the actual file content
- Returns empty list for an empty directory
- Handles a single file at root

Clean up temp dirs with `shutil.rmtree` in teardown.

---

# T2: Line Counter [agent]

Create `/workspace/prism/loc.py`.

Implement a module that counts lines of code in a Python source string.

## Interface

```python
@dataclass
class LocResult:
    total: int      # all lines including blank
    blank: int      # lines that are empty or whitespace-only
    comment: int    # lines whose first non-whitespace char is '#'
    code: int       # total - blank - comment

def count_loc(source: str) -> LocResult:
    """Count lines in a Python source string."""

def count_loc_multi(sources: list[str]) -> LocResult:
    """Sum LocResult across multiple source strings."""
```

## Tests

Create `/workspace/prism/tests/test_loc.py`. Cover:

- Empty string → all zeros
- Only blank lines
- Only comment lines (# comment)
- Only code lines
- Mixed source: verify code = total - blank - comment
- Inline comments on code lines count as code (not comment)
- Docstrings (triple-quoted) count as code lines
- count_loc_multi sums correctly

---

# T3: Cyclomatic Complexity [agent]

Create `/workspace/prism/complexity.py`.

Implement a module that computes cyclomatic complexity for every function and method
in a Python source string using the `ast` module.

Cyclomatic complexity = 1 + number of branching nodes in the function body.
Branching nodes: `If`, `For`, `While`, `ExceptHandler`, `With`, `Assert`,
`comprehension` (inside ListComp/SetComp/DictComp/GeneratorExp), `BoolOp` (each `and`/`or`).

## Interface

```python
@dataclass
class FunctionComplexity:
    name: str           # "MyClass.my_method" or "my_function"
    lineno: int
    complexity: int
    is_method: bool

@dataclass
class ComplexityReport:
    functions: list[FunctionComplexity]
    average: float      # 0.0 if no functions
    maximum: int        # 0 if no functions
    high_complexity: list[FunctionComplexity]   # complexity > 10

def analyze_complexity(source: str, filepath: str = "<unknown>") -> ComplexityReport:
    """Parse source with ast.parse, walk to find all functions and methods,
    compute complexity for each. Returns empty report for unparseable source
    (catch SyntaxError silently)."""
```

## Tests

Create `/workspace/prism/tests/test_complexity.py`. Cover:

- A function with no branches → complexity 1
- A function with one if → complexity 2
- A function with if/elif/else → complexity 3 (elif counts)
- A function with a for loop and an if inside → complexity 3
- A method inside a class: name is "ClassName.method_name"
- Empty source → empty report with average 0.0
- SyntaxError source → empty report (no exception raised)
- high_complexity only includes functions with complexity > 10

---

# T4: Symbol Extractor [agent]

Create `/workspace/prism/symbols.py`.

Implement a module that extracts a symbol inventory from Python source using `ast`.

## Interface

```python
@dataclass
class SymbolSummary:
    classes: int
    functions: int          # top-level functions only (not methods)
    methods: int            # methods inside classes
    async_functions: int    # async def at top level
    async_methods: int      # async def inside classes
    class_names: list[str]
    function_names: list[str]  # top-level function names

def extract_symbols(source: str) -> SymbolSummary:
    """Parse source with ast.parse, walk top-level nodes to count symbols.
    Returns zeroed SymbolSummary for unparseable source."""
```

## Tests

Create `/workspace/prism/tests/test_symbols.py`. Cover:

- Empty source → all zeros
- One class with two methods → classes=1, methods=2, functions=0
- Two top-level functions → functions=2
- Async def at top level → async_functions=1
- Async def inside class → async_methods=1
- class_names contains correct names
- function_names contains correct names
- Nested class (class inside class): only outer counts toward top-level classes
- SyntaxError → zeroed summary, no exception

---

# T5: Import Analyzer [agent]

Create `/workspace/prism/imports.py`.

Implement a module that analyzes import statements in Python source using `ast`.

## Interface

```python
@dataclass
class ImportInfo:
    module: str         # top-level module name, e.g. "os" not "os.path"
    full_name: str      # full import: "os.path", "collections.abc"
    is_stdlib: bool     # True if module is in a known stdlib set
    lineno: int

@dataclass
class ImportReport:
    imports: list[ImportInfo]
    unique_modules: list[str]       # sorted list of unique top-level module names
    stdlib_count: int
    third_party_count: int
    most_common: list[tuple[str, int]]  # top 5 (module, count) by usage

STDLIB_MODULES: set[str]   # a reasonably complete set of stdlib module names

def analyze_imports(source: str) -> ImportReport:
    """Parse source, extract all import and from-import statements.
    Returns empty report for unparseable source."""
```

Populate `STDLIB_MODULES` with at least: os, sys, re, ast, json, io, abc, math,
time, datetime, pathlib, collections, itertools, functools, typing, dataclasses,
contextlib, copy, enum, hashlib, hmac, logging, argparse, unittest, tempfile,
shutil, subprocess, threading, multiprocessing, socket, http, urllib, xml, csv,
sqlite3, struct, array, base64, binascii, codecs, configparser, difflib, email,
fnmatch, fractions, gc, glob, gzip, heapq, html, imaplib, inspect, io, ipaddress,
keyword, linecache, locale, mimetypes, numbers, operator, pickle, platform, pprint,
profile, queue, random, selectors, signal, smtplib, ssl, stat, statistics, string,
tarfile, telnetlib, textwrap, traceback, types, uuid, warnings, weakref, zipfile,
zlib, builtins, __future__.

## Tests

Create `/workspace/prism/tests/test_imports.py`. Cover:

- `import os` → is_stdlib=True, module="os", full_name="os"
- `import os.path` → module="os", full_name="os.path"
- `from collections import OrderedDict` → module="collections"
- `import requests` → is_stdlib=False
- unique_modules is sorted and deduplicated
- most_common returns up to 5 entries
- Empty source → empty report
- SyntaxError → empty report

---

# T6: Code Smell Detector [agent]

Create `/workspace/prism/smells.py`.

Implement a module that detects common code smells in Python source using `ast`
and line-by-line analysis.

## Interface

```python
@dataclass
class Smell:
    kind: str       # "todo", "long_function", "deep_nesting", "bare_except", "print_call"
    message: str    # human-readable description
    lineno: int
    severity: str   # "info", "warning", "error"

@dataclass
class SmellReport:
    smells: list[Smell]
    by_kind: dict[str, int]     # {kind: count}
    total: int

def detect_smells(source: str, filepath: str = "<unknown>") -> SmellReport:
    """Detect smells. Rules:
    - todo: any line with TODO, FIXME, HACK, or XXX (case-insensitive) in a comment
    - long_function: any function/method body > 50 lines (severity: warning)
    - deep_nesting: any function with ast node depth > 5 from function root (severity: warning)
    - bare_except: `except:` with no exception type (severity: error)
    - print_call: call to `print(...)` at module level or inside a function (severity: info)
    SyntaxError → return empty SmellReport.
    """
```

## Tests

Create `/workspace/prism/tests/test_smells.py`. Cover:

- A line `# TODO: fix this` → smell of kind "todo"
- A line `x = 1  # FIXME ugly` → smell of kind "todo"
- A bare `except:` clause → smell of kind "bare_except"
- A print() call → smell of kind "print_call"
- A function longer than 50 lines → smell of kind "long_function"
- No smells in clean code
- SyntaxError source → empty report
- by_kind counts match actual smells list
- total equals len(smells)

---

# T7: Report Formatter [agent]

Create `/workspace/prism/report.py`.

Implement a module that takes analysis results and produces a formatted terminal
report as a string.

## Interface

```python
@dataclass
class AnalysisBundle:
    root: str
    file_count: int
    loc: "LocResult"            # from loc.py (use duck-typed dict or Any)
    complexity: "ComplexityReport"
    symbols: "SymbolSummary"
    imports: "ImportReport"
    smells: "SmellReport"

def format_report(bundle: AnalysisBundle, width: int = 72) -> str:
    """Return a multi-line string suitable for printing to a terminal.

    The report must include sections for:
    - Header: project root, file count, analysis timestamp
    - Lines of Code: total, blank, comment, code percentages
    - Complexity: average, maximum, count of high-complexity functions,
                  list of top-5 worst functions (name, line, complexity)
    - Symbols: class count, function count, method count, async counts
    - Top Imports: top 5 most-used modules with stdlib vs third-party label
    - Code Smells: total count, breakdown by kind, list of error-severity smells

    Use only ASCII box-drawing characters: ─ │ ┌ ┐ └ ┘ ├ ┤
    (U+2500, U+2502, U+250C, U+2510, U+2514, U+2518, U+251C, U+2524)
    Wrap lines at `width` characters.
    """

def format_summary_line(bundle: AnalysisBundle) -> str:
    """Return a single-line summary:
    e.g. '12 files · 1,204 LOC · avg complexity 2.4 · 3 smells'
    """
```

The `AnalysisBundle` fields use duck typing — it only accesses attributes by name,
so the actual types from the other modules will work when wired up.

## Tests

Create `/workspace/prism/tests/test_report.py`. Use `SimpleNamespace` or dataclasses
to construct mock `AnalysisBundle` objects without importing other prism modules. Cover:

- `format_report` returns a non-empty string
- The string contains the root path
- The string contains LOC information
- `format_summary_line` returns a single line (no newlines)
- `format_summary_line` contains file count
- Report width doesn't exceed `width + 10` chars per line (allow slight overflow for content)

---

# SYNTH: CLI Entry Point and Project Wiring [agent/synthesize]

blocked_by: [T1, T2, T3, T4, T5, T6, T7]
validate: cd /workspace/prism && python -m pytest tests/ -q 2>&1 | tail -5

Create `/workspace/prism/__main__.py` and `/workspace/prism/cli.py`.

You have output from 7 parallel agents. Wire them into a working CLI.

## `cli.py`

```python
def run_analysis(root: str) -> AnalysisBundle:
    """
    1. scanner.scan(root) → list[SourceFile]
    2. For each file, run loc.count_loc, complexity.analyze_complexity,
       symbols.extract_symbols, imports.analyze_imports, smells.detect_smells
    3. Aggregate results:
       - loc: count_loc_multi across all sources
       - complexity: merge all FunctionComplexity lists, recompute average/max/high
       - symbols: sum all SymbolSummary fields, union class_names/function_names
       - imports: merge ImportReport (combine lists, recompute unique_modules,
                  stdlib_count, third_party_count, most_common)
       - smells: merge all SmellReport lists, recompute by_kind and total
    4. Return AnalysisBundle
    """

def main():
    """argparse CLI:

    usage: python -m prism [path] [--json] [--summary]

    path: directory to analyze (default: current directory)
    --json: output raw JSON instead of formatted report
    --summary: print only the one-line summary

    Exit code 0 on success, 1 on error.
    """
```

## `__main__.py`

```python
from prism.cli import main
if __name__ == "__main__":
    main()
```

Also ensure `/workspace/prism/__init__.py` exists (empty is fine).
And `/workspace/prism/tests/__init__.py` exists (empty is fine).

## Merging imports for most_common

Collect all ImportInfo across files. Count occurrences of each top-level module name.
Sort descending, take top 5. Recompute stdlib_count and third_party_count as totals
across all files (not unique — a module used in 3 files counts 3).

## Tests

Create `/workspace/prism/tests/test_cli.py`. Use `tempfile.mkdtemp()` to build a
small synthetic Python project (2–3 .py files with known content). Cover:

- `run_analysis(root)` returns an `AnalysisBundle`
- `bundle.file_count` equals the number of .py files created
- `bundle.loc.total > 0`
- `bundle.symbols.functions >= 0`
- The formatted report string contains the temp dir path
- `format_summary_line` returns a single line

Do NOT test that running `python -m prism` as a subprocess works — just test the
Python API via `run_analysis` and `format_report`.