nab 0.8.2

Token-optimized HTTP client for LLMs — fetches any URL as clean markdown
Documentation
# Getting started with nab

This guide takes you from zero to a working nab install in five minutes, then walks through the four main commands (`fetch`, `analyze`, `watch`, `nab-mcp`) with real examples.

## Install

Pick one of the following.

### Homebrew (macOS, recommended)

```bash
brew tap MikkoParkkola/tap
brew install nab
```

### crates.io

```bash
cargo install nab
```

Requires Rust 1.93 or newer. Install Rust via [rustup.rs](https://rustup.rs/) if you don't already have it.

### Pre-built binary

```bash
cargo binstall nab
```

Or grab the binary for your platform from [GitHub Releases](https://github.com/MikkoParkkola/nab/releases).

### From source

```bash
git clone https://github.com/MikkoParkkola/nab.git
cd nab
cargo install --path .
```

### Verify the install

```bash
nab --version
```

You should see something like `nab 0.7.0`.

## First fetch

The simplest possible nab invocation:

```bash
nab fetch https://example.com
```

You will see clean markdown on stdout. No JavaScript was rendered, no headless browser was launched, and no API key was needed. nab made a single HTTP/3 request, parsed the HTML, removed boilerplate, and converted the body to markdown.

### Fetch with browser cookies

By default, `nab fetch` auto-detects your default browser and uses its session cookies for the request. If you are logged in to GitHub in Brave, this works:

```bash
nab fetch https://github.com/notifications --cookies brave
```

You stay logged in. nab does not run a browser — it reads the cookie store, inserts the cookies into its own request, and pulls the page. Supported browsers: `brave`, `chrome`, `firefox`, `safari`, `edge`, `dia`.

### Fetch with 1Password auto-login

If you have the 1Password CLI installed and unlocked:

```bash
nab fetch https://internal.company.com --1password
```

nab looks up the credentials for the URL, follows the login form, handles CSRF tokens and TOTP/MFA, then fetches the target page.

### Query-focused extraction

Send a focus query to get only the parts of the page relevant to your question:

```bash
nab fetch https://docs.anthropic.com/en/api/messages \
  --focus "what does the streaming response look like" \
  --max-tokens 2000
```

nab applies BM25-lite scoring to the extracted markdown, keeps the top sections, and respects a strict token budget that never splits mid-block (headings, code blocks, tables stay intact).

### Output formats

```bash
nab fetch https://example.com                  # markdown (default)
nab fetch https://example.com --format json    # JSON with confidence scores
nab fetch https://example.com --format compact # one-line summary
nab fetch https://example.com --raw-html       # bypass markdown conversion
```

## First analyze

`nab analyze` transcribes audio and video. It runs locally on your machine. There is no cloud API in the default path.

### Install the model

The default backend on macOS arm64 is FluidAudio (Parakeet TDT v3). Download it once:

```bash
nab models fetch fluidaudio
```

This pulls the FluidAudio binary plus the Parakeet TDT v3 weights (~600 MB total) into `~/.local/share/nab/models/`. You only need to do this once.

### Transcribe

```bash
nab analyze interview.mp4
```

Output is a transcript with segment timestamps. On a 2-hour English audio file, this typically completes in about a minute on Apple Silicon.

### Add diarization

```bash
nab analyze interview.mp4 --diarize
```

Each segment now has a `speaker` field (`SPEAKER_00`, `SPEAKER_01`, ...). Diarization uses the FluidAudio offline VBx clustering with PyAnnote community-1 weights.

### Word-level timestamps

```bash
nab analyze talk.mp4 --word-timestamps
```

Each segment now contains a `words` array with one entry per word, including start, end, and confidence.

### Force a language

```bash
nab analyze finnish_podcast.mp3 --language fi
```

Without `--language`, nab auto-detects. Pass a BCP-47 code to skip detection or override an incorrect guess.

### Active reading

```bash
nab analyze interview.mp4 --active-reading
```

This requires running `nab analyze` from inside an MCP client (Claude Code, Continue, Zed, ...) so that nab can call back to the host LLM. nab sends transcript chunks to the LLM via `sampling/createMessage`, asks it to identify references (papers, people, claims), looks up each reference via `nab fetch`, and inlines the result as a footnote in the transcript.

The transcript stops being a wall of text and starts being annotated with citations.

### Output JSON

```bash
nab analyze podcast.mp3 --format json > transcript.json
```

## First watch

`nab watch` is RSS for the entire web. Add a URL, set an interval, and nab will check it on schedule and notify you when it changes.

```bash
nab watch add https://news.ycombinator.com --interval 10m
```

You will see a watch ID. List your watches:

```bash
nab watch list
```

Inspect a watch's recent check log:

```bash
nab watch logs <id>
```

Remove a watch:

```bash
nab watch remove <id>
```

### Selectors

For pages with a lot of noise, scope the watch to a CSS selector:

```bash
nab watch add https://example.com/pricing \
  --interval 1h \
  --selector "table.pricing" \
  --notify-on regression
```

`--notify-on regression` only fires when the price changes (not when an unrelated banner updates).

### How it works

The watch poller iterates all watches every minute and fetches the ones whose interval has elapsed. It uses conditional GETs (`If-None-Match` and `If-Modified-Since`), so 304 responses cost effectively nothing — they don't even count as a check.

When a watch fires, the change is delivered two ways:

1. **Watch log** on disk (visible via `nab watch logs <id>`)
2. **MCP notification** if the watch was created from inside an MCP client — the client receives `notifications/resources/updated` for the `nab://watch/<id>` resource and reads the diff via `resources/read`

## MCP integration

`nab-mcp` is a Model Context Protocol server. It exposes everything nab can do as MCP tools, prompts, and resources.

### Claude Code

Add to `~/.config/claude/mcp.json`:

```json
{
  "mcpServers": {
    "nab": {
      "command": "nab-mcp"
    }
  }
}
```

Restart Claude Code. You should see nab's tools (`fetch`, `analyze`, `watch_create`, ...) appear in the tool palette.

### Continue, Zed, Cursor, Windsurf

Same shape. Each editor has its own MCP config file location, but the structure is identical: point `command` at the `nab-mcp` binary on your `PATH`.

### HTTP transport

For multi-client setups or when running nab on a server:

```bash
nab-mcp --http 127.0.0.1:8765
```

This starts a Streamable HTTP MCP endpoint on localhost. The transport is fully spec-compliant: origin checks, `MCP-Protocol-Version` header validation, session IDs via `MCP-Session-Id`, resumability via `Last-Event-ID`, DELETE for session termination.

### Available tools

Once nab-mcp is configured, your MCP client gets 11 tools:

| Tool | Use it for |
|------|-----------|
| `fetch` | Get a URL as markdown |
| `fetch_batch` | Fetch many URLs in parallel (async with progress) |
| `submit` | Submit a form |
| `login` | 1Password auto-login |
| `auth_lookup` | Check 1Password for a URL's credentials |
| `fingerprint` | Generate browser fingerprint profiles |
| `validate` | Run the test suite |
| `benchmark` | Time URL fetches |
| `analyze` | Transcribe audio or video |
| `watch_create` | Create a URL watch and subscribe |
| `watch_list` / `watch_remove` | Manage watches |

Plus 4 prompts (including `match-speakers-with-hebb` for cross-tool composition with the [hebb](https://github.com/MikkoParkkola/hebb) memory server).

## Common recipes

### Fetch with browser cookies and a session

```bash
nab fetch https://app.example.com/dashboard \
  --cookies brave \
  --session work-app
```

The session persists cookies across requests. Subsequent fetches with the same `--session` reuse the saved jar.

### Analyze with diarization and export embeddings for hebb

```bash
nab analyze interview.mp4 \
  --diarize \
  --include-embeddings \
  --format json > interview.json
```

The output JSON contains 256-dim WeSpeaker embeddings per speaker turn. Pipe these to `hebb voice_match` to identify the speakers if you have a voiceprint database.

### Watch a price page

```bash
nab watch add "https://example.com/products/foo" \
  --interval 1h \
  --selector ".product-price" \
  --notify-on regression
```

### Batch fetch a list of URLs

```bash
echo "https://example.com" > urls.txt
echo "https://news.ycombinator.com" >> urls.txt
echo "https://en.wikipedia.org/wiki/Rust_(programming_language)" >> urls.txt

nab fetch --batch urls.txt --parallel 4
```

### Fetch a Google Doc with comments

```bash
nab fetch --cookies brave \
  "https://docs.google.com/document/d/DOCID/edit"
```

nab uses the Google Workspace export API plus OOXML parsing, so you get markdown content with comments and suggested edits inline.

### Fetch a PDF

```bash
nab fetch https://arxiv.org/pdf/2301.00001 -o paper.md
```

PDF conversion uses pdfium for character-level positioning and reconstructs lines, paragraphs, and tables.

## Troubleshooting

### `nab models fetch fluidaudio` fails

Check that you have at least 1 GB free in `~/.local/share/nab/models/`. The download is resumable — re-run the command if it was interrupted.

### `nab analyze` says "model not installed"

Run `nab models fetch fluidaudio` first. Verify with `nab models list`.

### `nab fetch --cookies brave` returns an empty response

Brave (and Chrome) lock the cookie database while the browser is running. Try closing the browser, or use a different `--cookies` source. nab will print a warning when it detects a locked store.

### `nab analyze --active-reading` says "sampling not available"

Active reading requires the host LLM to support MCP sampling. Claude Code does. If you're running `nab analyze` from a plain terminal (not an MCP client), the LLM is unreachable and nab falls back to passive transcription.

### Watch poller seems idle

The poller iterates every minute, then picks watches whose `last_check_at + interval` is less than `now`. A 1-hour-interval watch will wait up to a minute past the hour before firing. Use shorter intervals for testing.

### MCP client doesn't see nab's tools

Verify the binary is on `PATH` and executable: `which nab-mcp && nab-mcp --version`. Restart the MCP client after editing its config.

## Where to go next

- [README.md]../README.md — feature reference
- [docs/sovereign-stack.md]sovereign-stack.md — composing nab with hebb
- [docs/ARCHITECTURE.md]ARCHITECTURE.md — internal architecture
- [docs/design/]design/ — recent design proposals (analyze v2, URL watch resources, active reading, MCP spec closure)