phpantom_lsp 0.7.0

Fast PHP language server with deep type intelligence. Generics, Laravel, PHPStan annotations. Ready in an instant.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
# PHPantom — Inline Completion

Inline completion provides ghost-text suggestions beyond what traditional
LSP completion offers. Where LSP completion fills in a symbol name after
you type `$user->`, inline completion fills in entire expressions,
statements, and blocks before you ask.

The goal is not to compete with cloud-hosted LLMs on general coding
ability. The goal is to be so fast and so PHP-accurate that the cloud
model is still tokenizing your request by the time PHPantom has already
shown the answer. Every suggestion is grounded in type information the
LSP already has. No network, no subscription, no GPU required for the
base experience.

## Philosophy

Traditional AI coding tools remove the human from the loop. The
developer describes intent, the model writes code, then the developer
reviews output they didn't think through. PHPantom's inline completion
takes the opposite approach: the human is thinking and writing code,
and the tool removes the mechanical friction. It fills in the parts
the developer already knows but hasn't typed yet. The developer stays
in the loop the entire time.

This also means the entire pipeline (training data, training scripts,
model weights, context format) must be open and reproducible. A
company should be able to retrain the model on their own codebase. A
contributor should be able to improve the n-gram corpus or add
template patterns without asking permission. A free tool that's been optimized for the specific team using it can bring a strong challange to a propritery tool trained for the general case.

Everything needed to reproduce the training is published alongside
the model weights:
- The corpus collection scripts (which packages, how they're processed)
- The tokenizer definition
- The training configuration and scripts
- The context format specification (so third parties can train
  compatible models)
- The evaluation suite (so improvements can be measured)

## Architecture

```
PHPantom LSP (main process)
  │
  ├── Template Engine (built-in, zero cost)
  │     Pattern matching on AST + type context
  │     Responds in <1ms
  │
  ├── N-gram Engine (built-in, ~2-5MB model file)
  │     Token-level predictions from PHP corpus
  │     Responds in <5ms
  │
  └── Sidecar: Fine-tuned GGUF model (optional download)
        Fill-in-the-middle with tight PHP context
        ~50-150MB, runs on CPU
        Responds in ~100-500ms
```

Each layer is independently useful. The template engine ships day one
with no model files. The n-gram engine adds a small data file. The
sidecar is an optional download for users who want deeper suggestions.

All three layers share the same context-gathering pipeline: the LSP
already knows the current class outline, variable types, method
signatures, and surrounding code structure. That context is formatted
once and fed to whichever engine is active.

### Protocol

Inline completions use the LSP `textDocument/inlineCompletion` request
(proposed in LSP 3.18). For editors that don't support it yet, we can
fall back to `completionItem/resolve` with snippet insert text, or a
custom `phpantom/inlineCompletion` method.

The response is simple: a string of text to show as ghost text at the
cursor, plus an edit range. The editor renders it dimmed and the user
presses Tab to accept.

### Sidecar Protocol

The sidecar process communicates over stdin/stdout with a minimal
JSON protocol:

```json
{ "id": 1, "context": "...", "cursor": "...", "max_tokens": 64 }
```

```json
{ "id": 1, "text": "return $this->customer->fullName;", "confidence": 0.87 }
```

The LSP spawns the sidecar on first use and keeps it alive. If the
sidecar is not installed or crashes, the LSP falls back to templates
and n-grams silently. The sidecar binary is a separate downloadable
artifact, not bundled in the main LSP binary.

---

## N1. Template Engine

**Effort:** Medium (2-5 days per pattern group)
**Dependencies:** None beyond existing type resolution

The template engine is pattern matching on the current AST node and
cursor position, combined with type information from the existing
completion resolver. No model, no data files, no latency.

### How It Works

1. On each keystroke (debounced), check if the cursor is at a
   template trigger point (after a keyword, inside an empty block,
   at a return statement, etc.)
2. Gather context: surrounding variables and their types, the
   containing function's return type, the class outline, nearby
   assignments.
3. Match against template patterns. Each pattern has a confidence
   score based on how much context it could use.
4. Return the highest-confidence suggestion as ghost text.

### Context Gathering

The template engine reuses existing infrastructure:

- **Variable types:** `completion/variable/resolution.rs` already
  resolves every variable in scope to a type. The template engine
  calls the same pipeline.
- **Containing function:** The AST map already stores `MethodInfo`
  and `FunctionInfo` with return types, parameter lists, and
  docblocks.
- **Class outline:** `ClassInfo` gives us all properties, methods,
  constants, and their types for `$this->` context.
- **Selection range spans:** `selection_range.rs` already walks the
  AST to find the containing statement, block, function, and class.
  We can reuse this to scope the context window.

A new `InlineContext` struct bundles all of this:

```rust
pub struct InlineContext {
    /// Variables in scope with their resolved types.
    pub variables: Vec<(String, String)>,
    /// The containing method/function, if any.
    pub containing_function: Option<FunctionSummary>,
    /// The containing class, if any.
    pub containing_class: Option<ClassSummary>,
    /// The AST node immediately before the cursor.
    pub preceding_node: Option<NodeKind>,
    /// The line the cursor is on, trimmed.
    pub current_line: String,
    /// Lines above the cursor within the current block.
    pub block_context: Vec<String>,
}
```

This struct is built once per request and shared across all engines.

### Template Patterns

#### `foreach` with Type-Aware Iteration Variable

Trigger: User types `foreach` or `fore` (snippet prefix).

The engine looks at variables in scope whose types implement
`Traversable`, are arrays, or are generic collections. It picks the
most likely candidate based on:
- Proximity (closest assignment wins)
- Naming (plural names like `$items`, `$users`)
- Type (generic collections with known element types)

For each candidate, it generates the iteration variable name by
singularizing the collection name and resolves the element type:

```php
// $users is Collection<User> in scope
foreach ($users as $user) {
}

// $itemsByCategory is array<string, array<Item>> in scope
foreach ($itemsByCategory as $category => $items) {
}

// $results is QueryBuilder (Traversable<Model>) in scope
foreach ($results as $result) {
}
```

Singularization rules (simple, no NLP needed):
- `$users``$user` (drop trailing `s`)
- `$entries``$entry` (`ies``y`)
- `$addresses``$address` (drop `es`)
- `$data``$datum` or `$item` (known irregulars / fallback)
- `$list``$item` (non-plural name, use generic fallback)

When the element type has a key type (e.g. `array<string, User>`),
include `$key => $value` form.

#### `if` with Nullability Awareness

Trigger: User types `if` near a nullable variable.

The engine checks if any variable in the current block is nullable
(`?Type` or `Type|null`) and has not yet been null-checked. It
suggests the guard:

```php
// $user is ?User, not yet checked
if ($user === null) {
}

// Or, if the next line accesses $user->something,
// prefer the positive form:
if ($user !== null) {
    $user->█
}
```

It can also suggest `instanceof` checks when a variable is a union
of class types:

```php
// $shape is Circle|Square|Triangle
if ($shape instanceof Circle) {
}
```

#### `try/catch` with Thrown Exception Detection

Trigger: User types `try` or is inside a `try` block that has no
`catch` yet.

The engine uses the existing throws analysis pipeline
(`source/throws_analysis.rs`) to detect what exceptions the code
inside the try block can throw:

```php
try {
    $this->repository->save($entity);
    $this->mailer->send($notification);
} catch (DatabaseException $e) {
}
// If multiple throwable types detected, suggest multi-catch:
// catch (DatabaseException | MailerException $e)
```

#### `match` with Enum Exhaustiveness

Trigger: User types `match` with an enum variable in scope, or
starts a match expression on a variable whose type is a backed or
unit enum.

```php
// $status is Status enum with Active, Inactive, Pending
match ($status) {
    Status::Active => █,
    Status::Inactive => ,
    Status::Pending => ,
}
```

The engine knows all enum cases from `ClassInfo` and fills them in.
If the enum is a `BackedEnum`, it can also suggest the value form.

#### `return` with Type-Guided Expression

Trigger: User types `return` inside a function with a known return
type.

The engine looks at what's available in scope that matches the return
type:

```php
/** @return string */
public function getCustomerName(): string
{
    // $this->customer is Customer, Customer has fullName: string
    return $this->customer->fullName;█
}

/** @return array<string, mixed> */
public function toArray(): array
{
    return [
    ];
}

/** @return self */
public function withName(string $name): self
{
    return new self(█);
    // or: return clone $this; if immutable pattern detected
}
```

The return type matching works by:
1. If the return type is scalar and exactly one variable/property
   in scope matches, suggest it.
2. If the return type is `self`/`static`, suggest `new self(...)` or
   `clone $this`.
3. If the return type is `array`, suggest an array literal with
   known shape keys if a `@return` docblock specifies them.
4. If a property or method chain reaches the return type in one or
   two steps, suggest the chain.

#### Function/Method Body from Signature

Trigger: Cursor is on the first line inside an empty method body.

For simple accessor patterns, the engine can suggest the entire body:

```php
public function getName(): string
{
    return $this->name;█
}

public function setName(string $name): void
{
    $this->name = $name;█
}

public function isActive(): bool
{
    return $this->active;█
}

public function hasPermission(string $permission): bool
{
    return in_array($permission, $this->permissions, true);█
}
```

Detection heuristics:
- `getName()` with return type matching `$this->name` type → getter
- `setName($name)` with void return → setter
- `isX()` / `hasX()` with bool return → boolean property accessor
- `with*()` returning `self`/`static` → immutable setter (clone)

#### Assignment from Constructor Parameter

Trigger: Inside a constructor body, after the parameter list.

```php
public function __construct(
    private readonly string $name,
    private readonly int $age,
) {
    // No suggestion needed — promoted properties handle it.
}

// But for non-promoted:
public function __construct(string $name, int $age)
{
    $this->name = $name;
    $this->age = $age;█
}
```

The engine matches parameter names to property names and suggests
assignments for any that aren't yet assigned.

#### Catch Variable Usage

Trigger: Inside a catch block, cursor on the first line.

Based on common patterns per exception type:

```php
catch (ValidationException $e) {
    return response()->json(['errors' => $e->errors()], 422);█
}

catch (\Throwable $e) {
    Log::error($e->getMessage(), ['exception' => $e]);█
}
```

This is more heuristic than type-driven. We match on the exception
class name and suggest common handling patterns. A small hardcoded
table of exception → handler patterns covers the most common cases.

#### Early Return Guard Clauses

Trigger: Start of a method body, or after an assignment, when the
method has validation-like parameters.

```php
public function process(string $input): Result
{
    if ($input === '') {
        throw new \InvalidArgumentException('Input cannot be empty');
    }
}
```

This triggers when:
- The parameter is a string and the method name suggests processing
- The parameter is nullable and the method doesn't return nullable
- The method has a `@throws` tag for an `InvalidArgumentException`

### Template Priority and Confidence

Each template pattern returns a confidence score (0.0 to 1.0):

- **1.0** — Exact type match, single obvious completion (getter body)
- **0.8** — Strong type match with minor ambiguity (foreach with
  typed collection)
- **0.6** — Heuristic match (enum match arms, try/catch from throws
  analysis)
- **0.4** — Name-based heuristic (singularization guess, common
  pattern match)
- **0.2** — Fallback suggestion (generic catch block body)

Only suggestions above a configurable threshold (default 0.4) are
shown. Below that, showing nothing is better than showing something
wrong.

---

## N2. N-gram Engine

**Effort:** Medium-High (1-2 weeks including training)
**Dependencies:** Training corpus, PHP tokenizer

The n-gram engine handles the cases where the template engine has no
pattern match but there's still enough local context to make a useful
prediction. It predicts the next few PHP tokens based on the
preceding tokens.

### Why PHP Tokens, Not BPE

BPE (byte-pair encoding) is designed for natural language and treats
PHP syntax as opaque byte sequences. A PHP-aware tokenizer means:
- `$this->` is one token, not four
- `::` is one token
- String literals are one token regardless of length
- Variable names are one token
- `array_map` is one token, not fragmented

This dramatically reduces the sequence length and makes the n-gram
table smaller and more predictive. A 5-gram over PHP tokens covers
roughly 2-3 lines of code, versus half a line with BPE tokens.

### Token Vocabulary

The vocabulary is built from PHP's own token types plus symbols:

| Category | Examples | Count |
|---|---|---|
| Keywords | `function`, `return`, `class`, `if`, `foreach`, ... | ~70 |
| Operators | `->`, `=>`, `??`, `?->`, `::`, ... | ~40 |
| Delimiters | `(`, `)`, `{`, `}`, `[`, `]`, `;` | ~10 |
| Types | `string`, `int`, `bool`, `float`, `array`, `void`, ... | ~15 |
| Special | `<VAR>`, `<STRING>`, `<NUMBER>`, `<FQCN>`, `<FUNC>` | ~10 |
| Common identifiers | Top 500 function names from corpus | ~500 |
| Common methods | Top 500 method names from corpus | ~500 |
| Common properties | Top 200 property names from corpus | ~200 |

Total vocabulary: ~1300-1500 tokens. Small enough that the n-gram
table stays compact.

Variable names, string literals, and numbers are replaced with their
category token (`<VAR>`, `<STRING>`, `<NUMBER>`). This means the
n-gram engine predicts *structure*, not specific names. The template
engine and type resolver fill in the actual names.

### Training

**Corpus:** Top 500-1000 PHP packages from Packagist by monthly
downloads. This covers Laravel, Symfony, PHPStan, Doctrine, PHPUnit,
Monolog, Guzzle, and the rest of the ecosystem that defines PHP
idioms.

**Process:**
1. Clone packages, extract all `.php` files
2. Tokenize with the PHP-aware tokenizer
3. Count all n-grams (3-gram through 7-gram)
4. Apply Kneser-Ney smoothing for unseen n-grams
5. Prune n-grams with count < 3
6. Serialize to a compact binary format

**Hardware:** N-gram training is embarrassingly parallel (count frequencies per file, merge). No GPU needed. Expected training time: minutes, not hours.

**Output:** A single binary file, ~2-5MB compressed, containing:
- Token vocabulary (string → id mapping)
- N-gram probability tables (5-gram primary, 3-gram fallback)
- Top-k predictions precomputed for the most common contexts

### Runtime

Given the current cursor position:
1. Tokenize the preceding ~50 characters into PHP tokens
2. Look up the 5-gram (or fall back to 3-gram) in the table
3. Get the top-k predicted next tokens
4. If the prediction is a structural token (`return`, `->`, `(`),
   continue predicting up to ~10 tokens to form a complete fragment
5. Replace `<VAR>` placeholders with the most likely variable from
   the type context (using the same `InlineContext` the template
   engine uses)

The n-gram engine never suggests more than one line. Its role is to
predict the likely *shape* of the next expression, then the type
resolver fills in the actual symbols.

### Example

User is inside a method, just typed `$this->`:

1. Preceding tokens: `return`, `$this`, `->`
2. N-gram lookup: after `return <VAR> ->`, the most common next
   tokens in the PHP corpus are property/method names
3. The type resolver narrows this to properties/methods of the
   current class that match the return type
4. Combined suggestion: `return $this->repository->find($id);`

The n-gram engine provides the structural skeleton (`<VAR> -> <FUNC>
( <VAR> )`), and the type resolver fills in `repository`, `find`,
and `$id`.

---

## N3. Fine-Tuned GGUF Model

**Effort:** High (2-4 weeks including training and integration)
**Dependencies:** Phase 1 and 2 complete, training infrastructure

This is the "eventually" phase. A small language model fine-tuned
exclusively on PHP code with fill-in-the-middle capability. It runs
as a sidecar process and provides multi-line suggestions that neither
templates nor n-grams can handle.

### Why a Sidecar

- The model is 50-150MB. Embedding it inflates the main binary.
- Model inference takes 100-500ms. It must not block LSP responses.
- Users without the model still get templates and n-grams.
- The model can be updated independently of the LSP.
- Different model sizes can be offered (tiny for laptops, small for
  desktops).

### Base Model Selection

Candidates (as of early 2025, evaluate latest at training time):

| Model | Params | Quantized Size | Notes |
|---|---|---|---|
| SmolLM2-135M | 135M | ~80MB Q4 | Very fast, limited capacity |
| Qwen2.5-Coder-0.5B | 500M | ~300MB Q4 | Good code understanding |
| StarCoder2-164M | 164M | ~100MB Q4 | Code-focused from the start |

Start with 135M-164M class. These run on CPU in <500ms for short
completions. The 0.5B class is better but may be too slow without
quantization tricks.

### Training Approach

**Fill-in-the-middle (FIM) objective.** The model learns to complete
code given a prefix and suffix. This is exactly what inline
completion needs: the user has code above and below the cursor.

**Training data:** Same Packagist corpus as the n-gram engine, but
processed differently:
- Split files into function/method bodies
- Create FIM examples: randomly mask a contiguous span of 1-5 lines
- Prefix and suffix are the surrounding code
- Target is the masked span

**Context format.** The model receives a tightly structured prompt
that mirrors what the LSP knows:

```
<|class|>
class OrderService
  + __construct(OrderRepository $repository, Mailer $mailer)
  + process(Order $order): Result
  + private validate(Order $order): void
<|vars|>
  $order: Order { id: int, customer: Customer, total: Money }
  $this->repository: OrderRepository
<|method|> process(Order $order): Result
<|prefix|>
    $this->validate($order);
    $result = $this->repository->save($order);
<|suffix|>
    return $result;
<|cursor|>
```

The `<|class|>` section is a compressed class outline (method
signatures only, no bodies). The `<|vars|>` section includes resolved
types for variables in scope, with one level of property expansion
for types the cursor is likely to chain through. The `<|method|>`
line identifies which method we're in.

The exact shape of this context will evolve as we implement and
discover what the model actually needs. For example, completing
`$this->customer->fullName` requires knowing that `$this->customer`
resolves to `Customer` and that `Customer` has a `fullName` property.
The LSP already does this multi-hop resolution for completion, so the
context builder can include resolved property types for variables
that appear in the prefix. How deep to go (one hop? two?) and how
to format it compactly are details to figure out during
implementation.

This format is critical. The model is trained on exactly this
structured context, not raw PHP files. At inference time, the LSP
builds the same structure from its type resolver and the AST, so the
model sees exactly the format it was trained on. This tight coupling
between the LSP's knowledge and the model's training data is what
makes a tiny model competitive with a generic 7B model that has to
figure out the class structure from raw text.

**Fine-tuning approach:** LoRA on a small base model, trained on the
Packagist PHP corpus processed into the structured context format
above. Specific hyperparameters, framework choice, and hardware
requirements depend on the base model selected and what's available
at training time. The training scripts and configuration will be
published so anyone can reproduce or adapt the process.

### Inference

The sidecar process loads the GGUF model at startup and keeps it in
memory. On each request:

1. LSP builds the `InlineContext` (same as template engine)
2. LSP formats it into the structured prompt above
3. LSP sends it to the sidecar over stdin
4. Sidecar generates up to `max_tokens` (default 64, ~2-3 lines)
5. Sidecar returns the completion text
6. LSP validates the suggestion (syntax check, type check) and
   shows it or discards it

**Validation is important.** The model will sometimes hallucinate
method names or wrong types. The LSP already has the type resolver,
so it can check if `$this->repository->find($id)` is actually valid
by resolving the chain. Invalid suggestions are silently dropped.
This is a massive advantage over generic AI completion tools that
have no type checker in the loop.

### Model Distribution

- The model is not bundled with the LSP binary
- On first use (or via a command), the LSP downloads it from a
  GitHub release or CDN
- Stored in `~/.phpantom/models/`
- Version-pinned to the LSP version to avoid compatibility issues
- Users can provide their own GGUF model path in config (custom
  trained, company-internal, or community-contributed models)

### Reproducibility

Everything needed to train a compatible model from scratch is
published:

- **Corpus scripts:** Download and process packages from Packagist
- **Tokenizer:** The PHP-aware tokenizer used for n-grams, plus the
  context formatter that builds the structured prompt
- **Training scripts:** End-to-end pipeline from raw PHP to GGUF
- **Evaluation suite:** A set of fill-in-the-middle test cases with
  expected completions, so anyone can measure model quality
- **Context format spec:** Documented well enough that someone could
  train a compatible model without reading the LSP source

A company can clone the training repo, point it at their private
codebase, and produce a model that knows their domain objects, their
naming conventions, and their architectural patterns. The LSP doesn't
care where the model came from as long as it speaks the same context
format.

---

## Context Window Design

All three engines share a common context format. The LSP builds this
once per request.

### For Templates and N-grams

```rust
pub struct InlineContext {
    /// Variables in scope with their resolved types.
    pub variables: Vec<(String, String)>,

    /// The containing method/function, if any.
    pub containing_function: Option<FunctionSummary>,

    /// The containing class, if any.
    pub containing_class: Option<ClassSummary>,

    /// The AST node immediately before the cursor.
    pub preceding_node: Option<NodeKind>,

    /// The return type of the containing function, if known.
    pub expected_return_type: Option<String>,

    /// The line the cursor is on, trimmed.
    pub current_line: String,

    /// Lines above the cursor within the current block (max ~20).
    pub block_prefix: Vec<String>,

    /// Lines below the cursor within the current block (max ~10).
    pub block_suffix: Vec<String>,
}
```

### For the GGUF Model

The model context is built from `InlineContext` plus the class
outline and resolved types for variables the cursor is likely to
interact with:

```rust
pub struct ModelContext {
    /// Compressed class outline: method signatures, no bodies.
    pub class_outline: String,

    /// Resolved types for variables in scope, with property
    /// expansion for types that appear in the prefix.
    /// E.g. "$this->customer: Customer { fullName: string, email: string }"
    pub resolved_variables: String,

    /// The current method signature.
    pub method_signature: String,

    /// Code before the cursor (within the current method).
    pub prefix: String,

    /// Code after the cursor (within the current method).
    pub suffix: String,

    /// Maximum tokens to generate.
    pub max_tokens: u32,
}
```

The prefix and suffix are bounded to the current method body. The
class outline is bounded to signature lines only (one line per
member). Resolved variable types include one or two levels of
property expansion for types that are referenced in the prefix or
are likely chain targets. The exact depth and format will be refined
during implementation. The total context should stay compact enough
for a small model (target: under 512 tokens).

---

## Configuration

```toml
# .phpantom.toml or editor settings
[inlineCompletion]
# Enable/disable the entire feature
enabled = true

# Minimum confidence to show a suggestion (0.0 - 1.0)
minConfidence = 0.4

# Which engines to use (in priority order)
engines = ["template", "ngram", "model"]

# Debounce delay in milliseconds
debounceMs = 50

[inlineCompletion.model]
# Path to a custom GGUF model (overrides default)
# path = "~/.phpantom/models/phpantom-completion-v1.gguf"

# Maximum tokens to generate per request
maxTokens = 64

# Temperature (lower = more conservative)
temperature = 0.2
```

---

## Phasing and Sprint Placement

### N1. Template Engine (Sprint 7 timeframe)

Implement the `InlineContext` builder and 3-4 high-value template
patterns:
- `foreach` with collection type awareness
- `return` with return-type matching
- Getter/setter body generation
- `try/catch` with throws detection

This is enough to demo the "how did it know" effect. No external
dependencies, no model files, ships in the main binary.

### N2. N-gram Engine (post-Sprint 7)

- Build the PHP tokenizer
- Scrape and process the Packagist corpus
- Train and compress the n-gram model
- Integrate with the `InlineContext` for variable substitution
- Ship the model file as a separate downloadable asset or embed it
  if it stays under 5MB

### N3. GGUF Sidecar (when competing with PHPStorm)

- Select and fine-tune the base model
- Build the sidecar binary (Rust + llama.cpp bindings, or a small
  C++ binary)
- Implement the sidecar protocol
- Add validation (type-check generated code before showing)
- Distribution and auto-download
- Publish all training scripts, corpus tools, and evaluation suite
  so anyone can reproduce or retrain

### Ongoing

Each phase builds on the previous one. Templates keep working even
after the model is available (they're faster for the patterns they
cover). The n-gram engine fills gaps between templates and model
suggestions. The model handles the truly open-ended cases.

As we add more type intelligence to the LSP (better generics, better
narrowing, more Laravel magic), every engine automatically benefits
because they all read from the same `InlineContext`.

---

## Success Criteria

A suggestion is good if:
1. It is **correct** (compiles, type-checks, does what the user
   intended)
2. It is **fast** (appears before the user's next keystroke)
3. It is **non-obvious** (the user couldn't have typed it faster
   than accepting the suggestion)

We'd rather show nothing than show something wrong. A 30% hit rate
with 95% accuracy is far more valuable than an 80% hit rate with 60%
accuracy. Users learn to trust the suggestions and press Tab without
reading them carefully. One wrong suggestion breaks that trust.

Metrics to track (via opt-in telemetry or local dev testing):
- **Acceptance rate:** % of shown suggestions that the user accepts
- **Accuracy:** % of accepted suggestions that aren't immediately
  edited
- **Latency:** p50 and p95 time from keystroke to suggestion shown
- **Coverage:** % of cursor positions where we have a suggestion

Target for Phase 1 (templates only):
- Acceptance rate: >50%
- Accuracy: >90%
- Latency p95: <10ms
- Coverage: ~5-10% of cursor positions (only where we have a strong
  pattern match)