paraglide-launch 0.1.2

Analyze a project and detect deployable services, languages, frameworks, commands, and env vars
Documentation
# Launch Architecture

A Rust library and CLI that walks a project once and returns structured JSON describing every deployable service it contains — language, version, commands, env vars, framework, and monorepo structure.

## Data Flow

```
Project on disk
┌───────────────────────────────────────────────────────┐
│  Step 1: Walk + Observe                               │
│                                                       │
│  Iterative stack walk, exclusion list applied         │
│  (node_modules, .git, target, dist, ...)              │
│  LocalFs: parallel rayon walk + gitignore filtering   │
│  MemoryFs: sequential walk (tests)                    │
│                                                       │
│  All signals observe every entry name. No file reads. │
└───────────────────┬───────────────────────────────────┘
                    │ (each signal notes which paths it cares about)
┌───────────────────────────────────────────────────────┐
│  Step 2: Generate                                     │
│                                                       │
│  Each signal reads its noted files via FileSystem.    │
│  Content is read, extracted, dropped.                 │
│  Signal errors are non-fatal — logged, skipped.       │
│                                                       │
│  Outputs: services | DirContext | monorepo            │
└───────────────────┬───────────────────────────────────┘
                    │ Vec<SignalOutput>
┌───────────────────────────────────────────────────────┐
│  Step 3: Assemble                                     │
│                                                       │
│  3.1 Dedup services (explicit vs derived names)       │
│  3.2 Merge DirContext per directory                   │
│  3.3 Layer context onto services (ancestor chain)     │
│  3.4 Promote unclaimed dirs with start commands       │
│  3.5 Enrich with monorepo package info                │
└───────────────────┬───────────────────────────────────┘
              Discovery { services, monorepo }
           JSON output
```

## Concepts & Terminology

| Term | Definition | NOT |
|------|-----------|-----|
| Signal | A detector that observes file paths then reads files and emits results | A validator or transformer |
| Service signal | Emits named, deployable services (Dockerfile, docker-compose, Railway config) | Knows anything about directory context |
| Context signal | Describes a directory (language, version, package manager, env vars) | Knows how many services are in that directory |
| DirContext | Directory-level data emitted by context signals | A service |
| Derived name | A service name inferred from its directory path (`api/` → "api") | An authoritative identity |
| Explicit name | A service name declared in a config file (Compose key, Procfile process type) | A placeholder |
| Context promotion | Elevating a DirContext with a start command to a full Service | Applies when no service signal claimed the directory |
| Ancestor inheritance | Child dir context overrides root context; all propagate to services in subdirs | Only exact-dir matching |
| Registration order | The order signals appear in `default_signals()` — this IS the priority | A confidence score |
| Assembly | The final step that turns raw signal outputs into a coherent Discovery | A resolution pass with thresholds |

## Core Mechanism: The Signal Pipeline

### Signal Trait

```rust
pub trait Signal: Send {
    fn name(&self) -> &'static str;
    fn observe(&mut self, dir: &Path, entry: &DirEntry);
    fn generate(&mut self, fs: &dyn FileSystem) -> Result<SignalOutput, LaunchError>;
}
```

`observe()` is called for every entry during the walk — no file I/O, only name matching. Signals record the paths they care about as internal state. `generate()` reads those files through `FileSystem` and emits structured output. Signals are stateful between observe and generate, stateless between invocations.

### Two Kinds of Signals

The critical insight: not all signals know how many services live in a directory.

**Service signals** emit named services. A `docker-compose.yml` defines service keys. A `Procfile` defines process types. An `api.Dockerfile` defines one service named "api". These signals know the multiplicity.

**Context signals** know about a directory but not its service count. A `package.json` at `apps/api/` describes language, version, package manager, and scripts — but it can't know whether there's one service or three at that path. It emits a `DirContext`.

Assembly layers context onto services, handling the multiplicity mismatch without heuristics.

**Example:** `apps/api/` has `api.Dockerfile`, `worker.Dockerfile`, and `package.json`.
- Dockerfile signal emits two services: "api" and "worker".
- Package signal emits one `DirContext`: TypeScript, pnpm, scripts.
- Assembly layers the context onto both services. Both inherit TypeScript and pnpm.

### Signal Registration Order = Priority

Signals registered first win field-level conflicts. No confidence scores. The ordering in `default_signals()` is the entire priority system — explicit and debuggable.

```
Railway → Fly → Vercel → Netlify → Heroku                       (platform configs, most authoritative)
Dockerfile → DockerCompose                                      (container specs)
DotEnv → StructuredConfig → Package → Framework → LibraryCalls  (context signals)
Monorepo                                                        (enrichment only)
```

Platform configs win over Dockerfiles win over inferred package.json scripts. User-written `package.json` scripts win over framework conventions (`Package` before `Framework`). `.env` values win over source-code extraction (`DotEnv` before `LibraryCalls`).

### Assembly Algorithm

**Step 3.1 — Dedup services.** Group services by directory. Classify each as derived (name == dir last component) or explicit (name from config).

- Only derived services at a directory → merge all into one (derived names are placeholders).
- Explicit services present → explicit wins. Derived services are subsumed: their data becomes `DirContext` and is layered onto all explicit services. (Compose defines "api" + "worker", plain Dockerfile exists → Compose wins, both get the dockerfile path via context.)
- Explicit services with the same name at the same dir → merged (same signal registration order applies).

**Step 3.2 — Merge DirContext per directory.** Multiple signals may emit context for the same directory (e.g., `DotEnv` and `Package` both describe `apps/api/`). Merge all by directory — first non-None wins per field, env vars collected from all.

**Step 3.3 — Layer context with ancestor inheritance.** For each service, build a chain from root to service dir. Root context is the base; each child overrides. Then layer the merged chain onto the service — service's own fields win, None fields filled from context. Env vars merged: service's keys take priority.

This propagates a root `.node-version` or root `.env` down to all services while letting subdirectory files override them.

**Step 3.4 — Promote unclaimed directories.** A `DirContext` with a `start` command whose directory isn't already claimed by a service becomes a service. Named from directory. Also handles HTML/SPA contexts without start commands. This is the most common real-world case: a plain repo with `package.json` and a start script.

**Step 3.5 — Enrich with monorepo.** If a `Monorepo` was detected, match services to packages by directory and annotate `detected_by`.

The key invariant: `Service::layer_context()` and `DirContext::merge()` both use "self wins" semantics — the closer/earlier source always takes priority over a more distant/later one.

## FileSystem Abstraction

```rust
pub trait FileSystem: Send + Sync {
    fn read_file(&self, path: &Path) -> io::Result<Vec<u8>>;
    fn read_dir(&self, path: &Path) -> io::Result<Vec<DirEntry>>;
    fn exists(&self, path: &Path) -> bool;
}
```

Two implementations:

| Impl | Usage | Walk strategy |
|------|-------|---------------|
| `LocalFs` | Production — thin `std::fs` wrapper | `walk_local()`: parallel rayon + gitignore |
| `MemoryFs` | Tests — HashMap of path → bytes | `walk()`: sequential iterative stack |

`MemoryFs` infers directory structure from file paths — no need to explicitly declare directories. The walk does NOT go through `FileSystem.read_dir()` for the local case: `walk_local()` bypasses the trait to use `rayon` for parallel directory reads while respecting `.gitignore`.

**Why own `DirEntry`?** `std::fs::DirEntry` has fallible accessors and platform-specific metadata. Our `DirEntry` has no errors and no OS-specific surface area — safe to pass to `Signal::observe()` without any error handling at the call site.

## Walk Strategies

### MemoryFs walk (`discover_with_fs`)
Iterative stack. Sequential. Exclusion list is a `phf::Set` for O(1) lookup.

### LocalFs walk (`discover_local`)
Two-phase:

1. **Phase 1 (parallel):** `rayon::scope` spawns a recursive task per directory. Each task reads its directory with `std::fs::read_dir`, filters excluded dirs and gitignored entries, pushes `DirEntry` batches into a `Mutex<Vec>`, and spawns child tasks.
2. **Phase 2 (sequential):** Iterate all batches and call `signal.observe()`. Sequential because signals are `&mut` and are not `Sync`.

This gives parallel I/O for large repos while keeping signal observation single-threaded.

## Signal Catalog

### Service Signals

| Signal | Files Matched | Outputs |
|--------|--------------|---------|
| `RailwaySignal` | `railway.json`, `railway.toml` | services + env vars + commands |
| `FlySignal` | `fly.toml` | services + env vars + resources |
| `VercelSignal` | `vercel.json` | services + framework hint |
| `NetlifySignal` | `netlify.toml` | services + build commands |
| `HerokuSignal` | `Procfile` | services by process type |
| `DockerfileSignal` | `Dockerfile`, `Dockerfile.*`, `*.Dockerfile` | services + ENV/ARG as env vars, CMD/ENTRYPOINT as start |
| `DockerComposeSignal` | `docker-compose.yml`, `compose.yml` + variants | services by service key + env, commands, images |

### Context Signals

| Signal | Files Matched | Outputs |
|--------|--------------|---------|
| `DotEnvSignal` | `.env`, `.env.*` | env vars with defaults |
| `StructuredConfigSignal` | TypeScript (Zod), Go (struct tags), Python (Pydantic) | env vars inferred from config schemas |
| `PackageSignal` | `package.json`, `pyproject.toml`, `go.mod`, `Cargo.toml`, `Gemfile`, `composer.json`, `pom.xml`, `mix.exs`, `deno.json` | language, runtime, framework, package_manager, commands, language_config |
| `FrameworkSignal` | Framework config files (`next.config.*`, `astro.config.*`, etc.) | framework name + conventional commands (when Package didn't provide them) |
| `LibraryCallsSignal` | `*.ts`, `*.js`, `*.py`, `*.go`, `*.rs`, etc. | env vars from `process.env.X`, `os.Getenv("X")`, etc. |

### Monorepo Signal

| Signal | Files Matched | Outputs |
|--------|--------------|---------|
| `MonorepoSignal` | `pnpm-workspace.yaml`, `package.json` (workspaces), `turbo.json`, `nx.json` | `Monorepo` with workspace type, orchestrator tool, package list |

## Package Structure

| File | Purpose |
|------|---------|
| `src/lib.rs` | Public API: `discover_local()`, `discover_with_fs()` |
| `src/main.rs` | CLI: `launch [PATH] [--format json\|json-pretty]` |
| `src/error.rs` | `LaunchError`: Filesystem, Parse, Config variants |
| `src/types.rs` | All output types: `Discovery`, `Service`, `DirContext`, `Commands`, `EnvVar`, `Monorepo`, enums; `merge_env_vars()`, `Service::layer_context()` |
| `src/signal.rs` | `Signal` trait, `SignalOutput`, `read_config()` helper |
| `src/fs.rs` | `FileSystem` trait, `LocalFs`, `MemoryFs`, `DirEntry` |
| `src/discovery.rs` | Walk, generate, assemble — the entire pipeline |
| `src/config.rs` | `Config` via figment |
| `src/signals/mod.rs` | `default_signals()` — the signal registry in priority order |
| `src/signals/railway.rs` | Railway platform signal |
| `src/signals/fly.rs` | Fly.io platform signal |
| `src/signals/vercel.rs` | Vercel platform signal |
| `src/signals/netlify.rs` | Netlify platform signal |
| `src/signals/heroku.rs` | Heroku Procfile signal |
| `src/signals/dockerfile.rs` | Dockerfile service signal |
| `src/signals/docker_compose.rs` | Docker Compose service signal |
| `src/signals/dotenv.rs` | .env context signal |
| `src/signals/structured_config.rs` | Schema-based env var extraction |
| `src/signals/package/` | Per-language package file parsers (node, python, go, rust, ruby, php, java, elixir, deno, staticfile) |
| `src/signals/framework.rs` | Framework detection + conventional commands |
| `src/signals/library_calls.rs` | Source-code env var extraction |
| `src/signals/monorepo.rs` | Workspace + orchestrator detection |

## Language Coverage

Each language gets its own module under `signals/package/` with thorough detection.

| Language | Frameworks | Version Sources | Package Managers |
|----------|-----------|-----------------|-----------------|
| JavaScript/TypeScript | Next.js, Nuxt, Remix, SvelteKit, Astro, Solid, Qwik, Gatsby, Express, Fastify, Hono, Elysia, NestJS, Koa, Hapi, Vite | `.node-version`, `.nvmrc`, `package.json` engines, `.tool-versions`, `.mise.toml` | npm, pnpm, yarn, bun |
| Python | Django, Flask, FastAPI, Starlette, Tornado, Sanic, Streamlit, Dash, Gradio | `.python-version`, `runtime.txt`, `pyproject.toml`, `.tool-versions` | pip, poetry, pipenv, uv, conda |
| Go | Echo, Gin, Chi, Fiber, stdlib | `go.mod` directive | go modules |
| Rust | Actix, Axum, Warp, Rocket | `rust-toolchain.toml`, `Cargo.toml` `rust-version` | cargo |
| Ruby | Rails, Sinatra, Hanami | `.ruby-version`, `Gemfile`, `.tool-versions` | bundler |
| PHP | Laravel, Symfony | `composer.json` | composer |
| Java/Kotlin | Spring Boot, Quarkus, Micronaut | `.java-version`, `pom.xml`, `build.gradle` | Maven, Gradle |
| Elixir | Phoenix | `mix.exs`, `.tool-versions` | mix |

## Key Types

```rust
// Top-level output
pub struct Discovery {
    pub services: Vec<Service>,
    pub monorepo: Option<Monorepo>,
}

// A deployable unit
pub struct Service {
    pub name: String,                            // "api"
    pub dir: String,                             // "apps/api"
    pub language: Option<Language>,
    pub language_config: Option<LanguageConfig>, // language-specific details (NodeConfig, PythonConfig, ...)
    pub runtime: Option<RuntimeInfo>,            // { name: "node", version: "20.11.1", source: ".nvmrc" }
    pub framework: Option<String>,
    pub package_manager: Option<PackageManagerInfo>,
    pub network: Option<Network>,                // Private | Public
    pub exec_mode: Option<ExecMode>,             // Daemon | Scheduled
    pub commands: Commands,                      // install, build, start, dev
    pub image: Option<String>,                   // pre-built container image (from docker-compose)
    pub dockerfile: Option<String>,
    pub output_dir: Option<String>,              // for SPAs and static sites
    pub env: Vec<EnvVar>,
    pub system_deps: Vec<String>,
    pub volumes: Vec<Volume>,
    pub resources: Option<Resources>,
    pub replicas: Option<u32>,
    pub restart: Option<Restart>,
    pub healthcheck: Option<String>,
    pub schedule: Option<String>,
    pub detected_by: Vec<String>,                // provenance
}

// Directory-level context (signals don't know service count)
pub struct DirContext {
    pub dir: String,
    // same fields as Service minus identity/deployment fields
}
```

## Design Decisions

### Why service signals vs context signals?

Consider `apps/api/` with `api.Dockerfile`, `worker.Dockerfile`, and `package.json`.

If everything were a service signal, `PackageSignal` would emit one service and we'd need heuristics to decide whether to merge it with "api", "worker", or split it. That's an arbitrary cutoff system.

Instead, `PackageSignal` emits one `DirContext` — TypeScript, pnpm, scripts — without claiming to know the service count. Assembly handles the merge: "I have two explicit services (api, worker) and one context (TypeScript, pnpm). Layer context onto both."

The invariant: service signals know identity; context signals know content. Assembly combines them.

### Why registration order instead of confidence scores?

Confidence scores require calibration and create non-obvious behavior (why did this field come from Fly instead of the Dockerfile?). Registration order is transparent: if you know the signal list, you know exactly what wins. Changes to priority are explicit code changes.

The order also matches real-world semantics: a platform config (Fly) knows more about how to run the app than a generic Dockerfile, which knows more than inferred package.json scripts.

### Why two walk implementations?

`MemoryFs` tests need a deterministic sequential walk — no need for parallelism, no `.gitignore` file to parse. `LocalFs` production use benefits from parallel directory reads on large monorepos and should skip gitignored paths (test fixtures, generated files, build artifacts that weren't excluded by the static list).

Separating them keeps the test path simple while the production path gets the full optimization.

### Why own DirEntry instead of std::fs::DirEntry?

`std::fs::DirEntry::file_type()` is fallible (a syscall). Passing it into `Signal::observe()` would require every signal to handle I/O errors during observation — or we'd panic. Our `DirEntry` surfaces only what signals need (path, name, is_dir) with no error surface. The fallibility is handled once in the walk, not scattered across signals.

### Why no confidence thresholds in assembly?

Every threshold-based system needs tuning, creates edge cases, and is hard to reason about. "We need at least 2 signals to emit X before we believe it" means the 1-signal case breaks silently.

Our rule is simpler: a dir with a start command is a service. No threshold. If you don't want it, don't have a start command. This matches how real projects work — the start command is the unambiguous indicator of deployability.

### Why no async?

File I/O in a discovery tool is sequential reads of small config files. The latency is filesystem cache latency (microseconds), not network latency (milliseconds). Async adds runtime overhead, stack complexity, and lifetime complications for no throughput benefit. Parallelism in `walk_local()` is enough — it uses OS threads via rayon, not async tasks.

## Error Handling

```rust
pub enum LaunchError {
    Filesystem { path: PathBuf, source: io::Error },
    Parse { path: PathBuf, message: String },
    Config { message: String },
}
```

Signal errors are **non-fatal**: `generate()` failures are logged to stderr and skipped. The pipeline continues without that signal's output. A malformed `railway.toml` shouldn't prevent detecting a `package.json` next to it.

Walk errors (can't read a directory) are **fatal**: returned immediately. A missing directory means we can't build a correct picture of the repo.

## Configuration

| CLI Flag | Default | Purpose |
|----------|---------|---------|
| `PATH` | `.` (current dir) | Repository root to analyze |
| `--format` | `json-pretty` | Output format: `json` (compact) or `json-pretty` (indented) |

No environment variables. No config files consumed by the CLI itself (config files consumed by signals are in the repository being analyzed).

## Failure Modes

| Failure | Behavior |
|---------|----------|
| Signal parse error (bad TOML, bad JSON) | Log to stderr, skip that signal's output, continue |
| Signal file read error | Log to stderr, skip that signal's output, continue |
| Walk can't read a directory | Return `LaunchError::Filesystem` immediately |
| `.gitignore` parse error (local walk) | Fall back to empty gitignore, continue walk |
| Empty repository | Returns `Discovery { services: [], monorepo: None }` |
| Directory with no start command, no service signal | Not promoted, not in output |