backdisco 0.2.0

Discover backend origins from CDN frontends using LLM-assisted pattern analysis and brute force enumeration
# backDisco


**backDisco** is a tool that discovers exposed backend origins from CDN frontends using LLM-assisted pattern analysis and brute force enumeration.

## Overview


Given a known pattern of CDN frontend and backend hostname pairs, backDisco uses an LLM to identify naming patterns and then applies those patterns to discover additional backend origins from a list of target frontends. It can also perform brute force subdomain enumeration based on extracted patterns.

## Features


- **Multi-API LLM Support**: Automatic detection and support for OpenAI-compatible, Ollama, and Anthropic APIs
- **LLM-Powered Pattern Analysis**: Automatically identifies naming patterns between frontend and backend hostnames
- **Pattern-Based Discovery**: Applies discovered patterns to target frontends to generate backend candidates
- **Position-Aware Candidate Generation**: Parses hostnames by both '.' and '-' delimiters for context-aware expansion
- **Batched LLM Expansion**: Efficiently expands word lists using configurable batch processing
- **SAN Extraction**: Extracts Subject Alternative Names from target certificates to discover additional hosts
- **Brute Force Enumeration**: Generates subdomain candidates based on backend URL patterns with structured generation
- **LLM Word Expansion**: Uses LLM to generate related words for brute force wordlists at specific positions
- **Model Selection**: Interactive model selection when model is not specified
- **Concurrent Verification**: Efficiently tests candidates with configurable concurrency
- **DNS and HTTP Verification**: Validates discovered backends via DNS resolution and HTTP/HTTPS checks

## Requirements


- Rust (edition 2021)
- Access to an LLM API endpoint (OpenAI-compatible, Ollama, or Anthropic)
- Network access to target hosts

## Installation


### From crates.io


```bash
cargo install backdisco
```

### From source


```bash
git clone <repository-url>
cd backdisco
cargo build --release
```

The binary will be located at `target/release/backdisco`.

## Configuration


backDisco supports multiple LLM API types with automatic detection:

- **OpenAI-compatible APIs**: Standard OpenAI API format (e.g., `https://api.openai.com/v1`)
- **Ollama**: Local or remote Ollama instances (e.g., `http://localhost:11434/v1` or `http://localhost:11434/api`)
- **Anthropic**: Claude API endpoints (e.g., `https://api.anthropic.com/v1`)

The LLM endpoint URL and model can be specified via command-line arguments:
- `--llmurl`: LLM API base URL (defaults to `http://localhost:11434/v1`)
- `--model`: Model name (if not provided, the tool will fetch available models and prompt for selection)

The API type is automatically detected from the URL using pattern matching, so you don't need to specify it manually.

## Usage


### Basic Usage


```bash
backdisco \
  --front "cdn.example.com" \
  --back "api.internal.example.com" \
  --targets targets.txt
```

### With Custom LLM Configuration


Specify a custom LLM endpoint and model:

```bash
backdisco \
  --front "cdn.example.com" \
  --back "api.internal.example.com" \
  --targets targets.txt \
  --llmurl "https://api.openai.com/v1" \
  --model "gpt-4"
```

If `--model` is not specified, the tool will fetch available models and prompt for selection:

```bash
backdisco \
  --front "cdn.example.com" \
  --back "api.internal.example.com" \
  --targets targets.txt \
  --llmurl "http://localhost:11434/v1"
```

### With SAN Extraction


Extract Subject Alternative Names from target certificates:

```bash
backdisco \
  --front "cdn.example.com" \
  --back "api.internal.example.com" \
  --targets targets.txt \
  --extract-sans
```

### With Brute Force Enumeration


Enable brute force subdomain enumeration with position-aware expansion:

```bash
backdisco \
  --front "cdn.example.com" \
  --back "api.internal.example.com" \
  --targets targets.txt \
  --brute \
  --llm-expand 5 \
  --llm-batch-size 20
```

The `--brute` flag enables structured candidate generation that:
- Parses hostnames by both '.' and '-' delimiters to preserve positional context
- Expands words at each position using LLM (e.g., "dev" → ["dev", "prod", "test", "staging"])
- Generates candidates using cartesian products of position expansions
- Processes expansions in batches for efficiency

### Complete Example


```bash
backdisco \
  --front "cdn.example.com" \
  --back "api.internal.example.com" \
  --targets targets.txt \
  --output results.txt \
  --llmurl "http://localhost:11434/v1" \
  --model "llama2" \
  --extract-sans \
  --brute \
  --llm-expand 5 \
  --llm-batch-size 20 \
  --max-depth 3 \
  --concurrency 50 \
  --timeout 5 \
  --gen-wordlist-output candidates.txt \
  --verbose 2
```

## Command-Line Options


| Option | Short | Description |
|--------|-------|-------------|
| `--front` | `-f` | Known frontend hostname (required) |
| `--back` | `-b` | Known backend hostname (required) |
| `--targets` | `-t` | File with target frontends, one per line (required) |
| `--output` | `-o` | Output file (defaults to stdout) |
| `--verbose` | `-v` | Verbosity level 0-2 (default: 1) |
| `--dns-only` | | Skip HTTP checks, DNS only |
| `--timeout` | | HTTP timeout in seconds (default: 5) |
| `--concurrency` | | Concurrent check limit (default: 50) |
| `--extract-sans` | | Extract SANs from target certificates and add to target list |
| `--no-sans` | | Skip SAN extraction (opposite of --extract-sans) |
| `--brute` | | Enable brute force subdomain enumeration with position-aware expansion |
| `--llm-expand` | | Number of related words to generate per seed word using LLM (default: 5, 0 to disable) |
| `--llm-batch-size` | | Batch size for LLM position expansion (default: 20) |
| `--max-depth` | | Override maximum subdomain depth for brute forcing (default: derived from backend URL) |
| `--llmurl` | | LLM API base URL (default: http://localhost:11434/v1) |
| `--model` | | LLM model name (if not provided, will fetch and prompt for selection) |
| `--gen-wordlist-output` | | Output file for generated candidate list (one hostname per line) |

## How It Works


1. **LLM Configuration**: 
   - If `--llmurl` is provided, uses that endpoint; otherwise uses default
   - If `--model` is provided, uses that model; otherwise fetches available models and prompts for selection
   - Automatically detects API type (OpenAI-compatible, Ollama, or Anthropic) from the URL

2. **Pre-flight Check**: Verifies connectivity to the LLM endpoint and model availability

3. **Backend Analysis**: 
   - Extracts subdomain words, depth, and base domain from the known backend hostname
   - Parses hostname structure by both '.' and '-' delimiters for position-aware processing

4. **Target Loading**: Reads target frontends from the specified file

5. **SAN Extraction** (optional): Extracts Subject Alternative Names from target certificates to discover additional hosts and wildcard patterns

6. **Pattern Discovery**: Queries the LLM to identify naming patterns between the frontend and backend hostnames

7. **Pattern Application**: Applies discovered patterns to target frontends to generate backend candidates

8. **Brute Force** (optional):
   - Parses hostname structures from backend, targets, and SANs
   - Groups words by position across all structures
   - Expands words at each position using batched LLM calls (context-aware: environment, service type, etc.)
   - Generates structured candidates using cartesian products of position expansions
   - Filters candidates to match backend base domain

9. **Verification**: Tests all candidates via DNS resolution and HTTP/HTTPS checks with configurable concurrency

10. **Output**: Reports live backends with DNS and HTTP status information

## Output Format


The tool provides colored output with different verbosity levels:

- **Level 0**: Minimal output
- **Level 1** (default): Standard output with progress and results
- **Level 2**: Detailed debug information including pattern details, SAN extraction results, and failed candidates

Example output:

```
[+] LIVE: api.internal.example.com
    DNS:   192.168.1.100
    HTTPS: 200 OK
    HTTP:  301 Moved Permanently
```

## Examples


### Example 1: Simple Pattern Discovery


Given:
- Frontend: `cdn.example.com`
- Backend: `api.internal.example.com`
- Pattern discovered: Replace `cdn` with `api.internal`

Applied to targets:
- `cdn.target1.com``api.internal.target1.com`
- `cdn.target2.com``api.internal.target2.com`

### Example 2: SAN Extraction


If `cdn.example.com` has a certificate with SANs:
- `*.internal.example.com`
- `api.internal.example.com`
- `admin.internal.example.com`

These will be added to the target list for pattern application.

### Example 3: Position-Aware Brute Force with LLM Expansion


Backend: `api-v2.internal.example.com`
- Parsed structure: segments `["api", "v2", "internal"]`, base `"example.com"`
- Position-aware expansion:
  - Position 0 (service): `["api"]``["api", "rest", "graphql", "rpc", "service", "gateway"]`
  - Position 1 (version): `["v2"]``["v2", "v1", "v3", "beta", "alpha", "prod"]`
  - Position 2 (environment): `["internal"]``["internal", "int", "private", "corp", "internal"]`
- Generated candidates (cartesian product): 
  - `api.v2.internal.example.com`, `rest.v1.internal.example.com`, `graphql.prod.int.example.com`, etc.
- All candidates filtered to match base domain `example.com`

## License


This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## Contributing


Contributions are welcome! Please ensure your code follows Rust best practices and includes appropriate error handling.