nab 0.8.2

Token-optimized HTTP client for LLMs — fetches any URL as clean markdown
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
# nab

[![CI](https://github.com/MikkoParkkola/nab/actions/workflows/ci.yml/badge.svg)](https://github.com/MikkoParkkola/nab/actions/workflows/ci.yml)
[![Crates.io](https://img.shields.io/crates/v/nab.svg)](https://crates.io/crates/nab)
[![Downloads](https://img.shields.io/crates/d/nab.svg)](https://crates.io/crates/nab)
[![docs.rs](https://img.shields.io/docsrs/nab)](https://docs.rs/nab)
[![Rust](https://img.shields.io/badge/Rust-1.93+-orange.svg?logo=rust)](https://www.rust-lang.org)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![MCP Protocol](https://img.shields.io/badge/MCP-2025--11--25-blueviolet.svg)](https://modelcontextprotocol.io)
[![nab MCP server](https://glama.ai/mcp/servers/MikkoParkkola/nab/badges/score.svg)](https://glama.ai/mcp/servers/MikkoParkkola/nab)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install_MCP-0078d4?logo=visualstudiocode)](https://insiders.vscode.dev/redirect/mcp/install?name=nab&config=%7B%22command%22%3A%22nab-mcp%22%7D)
[![Install in Cursor](https://img.shields.io/badge/Cursor-Install_MCP-black?logo=cursor)](cursor://anysphere.cursor-deeplink/mcp/install?name=nab&config=%7B%22command%22%3A%22nab-mcp%22%7D)

Token-optimized web fetcher + multilingual ASR + URL watcher. MCP 2025-11-25 compliant. Rust. macOS arm64 first, cross-platform.

![demo](demo.gif)

nab is a single Rust binary that does three things very well: it **fetches** any URL as clean markdown (with your real browser cookies and anti-bot evasion), it **analyzes** any audio or video file with on-device multilingual ASR and speaker diarization, and it **watches** any URL for changes and pushes notifications when content moves. Everything runs locally. There are no API keys to set up by default. The output is shaped for LLM context windows.

## Quick start

**Tell your AI assistant** (recommended):

> Read https://github.com/MikkoParkkola/nab and install nab as my web fetching and audio analysis MCP server

Your agent will install the binary, wire itself up, and start fetching. Works in Claude Code, Cursor, Windsurf, and any AI with terminal access.

**Or install and try manually:**

```bash
brew install MikkoParkkola/tap/nab                            # install
nab fetch https://news.ycombinator.com                        # fetch as markdown
nab models fetch fluidaudio                                   # download ASR model
nab analyze interview.mp4 --diarize                           # transcribe + identify speakers
nab watch add https://status.openai.com --interval 5m         # subscribe to changes
```

## Features

| Command | What it does |
|---------|--------------|
| `nab fetch <url>` | Fetch any URL as clean markdown. HTTP/3, browser cookie injection (Brave / Chrome / Firefox / Safari / Edge / Dia), 1Password auto-login, fingerprint spoofing, 11 site providers, query-focused extraction, token budget. |
| `nab analyze <video\|audio>` | Transcribe and diarize. FluidAudio (Parakeet TDT v3) on Apple Neural Engine, 131x realtime on a 2-hour clip, word-level timestamps, 25 EU languages, optional Qwen3-ASR for zh/ja/ko/vi, optional active reading via MCP sampling. |
| `nab watch add <url>` | Monitor a URL and push notifications via subscribable MCP resources. RSS for the entire web. Conditional GETs, semantic diff, adaptive backoff. |
| `nab models fetch <name>` | Persistent install of inference model binaries. Currently `fluidaudio`. Whisper and sherpa-onnx land in Phase 3. |
| `nab-mcp` | MCP 2025-11-25 server. stdio + Streamable HTTP. 12 tools, 4 prompts, 2+N resources, structured logging, sampling, roots, elicitation. |
| `nab::content::ocr` | Apple Vision OCR engine. 15 languages. Apple Neural Engine accelerated. ~10-50 ms per image. macOS only. |

## Installation

### Homebrew (macOS, recommended)

```bash
brew tap MikkoParkkola/tap
brew install nab
```

### From crates.io

```bash
cargo install nab
```

Requires Rust 1.93 or newer.

### Pre-built binary

```bash
cargo binstall nab
```

Or download directly from [GitHub Releases](https://github.com/MikkoParkkola/nab/releases):

| Platform | Binary |
|----------|--------|
| macOS Apple Silicon | `nab-aarch64-apple-darwin` |
| macOS Intel | `nab-x86_64-apple-darwin` |
| Linux x86_64 | `nab-x86_64-unknown-linux-gnu` |
| Linux ARM64 | `nab-aarch64-unknown-linux-gnu` |
| Windows x64 | `nab-x86_64-pc-windows-msvc.exe` |

### From source

```bash
git clone https://github.com/MikkoParkkola/nab.git
cd nab
cargo install --path .
```

## MCP Configuration

Add to your MCP client config (Claude Desktop, Cursor, Windsurf, etc.):

```json
{
  "mcpServers": {
    "nab": {
      "command": "nab-mcp"
    }
  }
}
```

Or use the auto-installer:

```bash
nab mcp install                        # Claude Desktop (default)
nab mcp install --client claude-code   # Claude Code
nab mcp install --client cursor        # Cursor
nab mcp install --client windsurf      # Windsurf
nab mcp install --client codex         # OpenAI Codex CLI
nab mcp install --client vscode        # VS Code Copilot
nab mcp install --client zed           # Zed
nab mcp install --dry-run              # preview without writing
```

Also supported: `gemini`, `amazon-q`, `lm-studio`.

See [MCP integration](#mcp-integration) below for the full list of tools, capabilities, and HTTP transport.

## Usage

### Fetch

```bash
# Basic fetch — auto-detects browser, returns markdown
nab fetch https://example.com

# Use cookies from a specific browser
nab fetch https://github.com/notifications --cookies brave

# 1Password auto-login (TOTP/MFA supported)
nab fetch https://internal.company.com --1password

# Google Workspace (Docs, Sheets, Slides) with comments
nab fetch --cookies brave "https://docs.google.com/document/d/DOCID/edit"

# Query-focused extraction — only sections relevant to "authentication"
nab fetch https://docs.example.com --focus "authentication" --max-tokens 2000

# Output JSON with confidence scores
nab fetch https://example.com --format json

# Batch fetch with parallelism
nab fetch --batch urls.txt --parallel 8
```

Common flags for `fetch`:

| Flag | Description |
|------|-------------|
| `--cookies <browser>` | `auto`, `brave`, `chrome`, `firefox`, `safari`, `edge`, `none` |
| `--1password` / `--op` | 1Password credential lookup + auto-login |
| `--proxy <url>` | HTTP or SOCKS5 proxy |
| `--format <fmt>` | `full` (default), `compact`, `json` |
| `--focus <query>` | BM25-lite query-focused extraction |
| `--max-tokens <n>` | Structure-aware token budget |
| `--raw-html` | Skip markdown conversion |
| `--diff` | Show what changed since the last fetch |
| `--session <name>` | Persistent named session with encrypted cookie store (memory-only on Windows for now) |
| `-X <method>` `-d <data>` | HTTP method + body |
| `-o <path>` | Write body to file |

### Analyze

`nab analyze` transcribes audio and video files locally. The default backend on macOS arm64 is FluidAudio, which runs Parakeet TDT v3 on the Apple Neural Engine.

```bash
# Download the ASR model (~600 MB, one-time)
nab models fetch fluidaudio

# Transcribe a video
nab analyze interview.mp4

# Add speaker diarization (PyAnnote community-1)
nab analyze interview.mp4 --diarize

# Force a language hint (BCP-47)
nab analyze podcast.mp3 --language fi

# Word-level timestamps
nab analyze talk.mp4 --word-timestamps

# Active reading: nab uses MCP sampling to look up references mentioned in the audio
nab analyze interview.mp4 --active-reading

# Expose speaker embeddings for matching against hebb's voiceprint database
nab analyze interview.mp4 --diarize --include-embeddings

# Output JSON
nab analyze podcast.mp3 --format json
```

Real numbers from a 2 h 09 m English audio file (Karen Hao interview, MacBook Pro M-series):

| Metric | Value |
|--------|-------|
| Wall time | 59.6 s |
| Realtime factor | 131x |
| FluidAudio mean confidence | 97.18 % |
| Audio extraction (ffmpeg) | ~650x realtime |

| Backend | Platform | Languages | Diarization |
|---------|----------|-----------|-------------|
| `fluidaudio` (default on macOS arm64) | macOS arm64 | 25 EU languages, +zh/ja/ko/vi via Qwen3-ASR (opt-in) | PyAnnote community-1 |
| `sherpa-onnx` (Phase 3) | Linux/x86, macOS, Windows | Parakeet ONNX, 25+ langs | sherpa-onnx pyannote-seg-3.0 |
| `whisper-rs` (Phase 3) | Universal fallback | whisper-large-v3-turbo, 99 langs | none |

### Watch

`nab watch` turns any URL into a subscribable resource. MCP clients receive `notifications/resources/updated` when the content changes.

```bash
nab watch add https://news.ycombinator.com --interval 10m
nab watch add https://example.com/pricing --interval 1h --selector "table.pricing"
nab watch add https://api.openai.com/status --interval 5m --notify-on regression
nab watch list
nab watch logs <id>
nab watch remove <id>
```

Per-watch options:

| Flag | Default | Description |
|------|---------|-------------|
| `--interval <duration>` | 1h | Polling interval (`5m`, `1h`, `24h`) |
| `--selector <css>` | none | CSS selector to scope diff to one element |
| `--notify-on <kind>` | `any` | `any`, `regression`, `semantic` |
| `--diff <kind>` | `semantic` | `text`, `semantic`, `dom` |

The poller uses conditional GETs (`If-None-Match`, `If-Modified-Since`), so 304 responses cost effectively nothing. Watches with five consecutive failures auto-mute. Adaptive backoff applies on 429 and 503.

### Models

```bash
nab models list                           # show installed model versions
nab models fetch fluidaudio               # download FluidAudio binary + Parakeet weights
nab models update fluidaudio              # check for upstream updates
nab models verify fluidaudio              # checksum + smoke test
```

Phase 3 will add `whisper` and `sherpa-onnx` subcommands.

## MCP integration

`nab-mcp` is a native Rust MCP server. It runs over stdio (default) or Streamable HTTP. It is fully compliant with MCP protocol version `2025-11-25`.

### Quick setup (recommended)

```bash
nab mcp install                        # Claude Desktop (default)
nab mcp install --client claude-code   # Claude Code
nab mcp install --client cursor        # Cursor
nab mcp install --client windsurf      # Windsurf
nab mcp install --client codex         # OpenAI Codex CLI
nab mcp install --client vscode        # VS Code Copilot
nab mcp install --client zed           # Zed
nab mcp install --dry-run              # preview what would change
```

Also supported: `gemini`, `amazon-q`, `lm-studio`. This auto-detects the `nab-mcp` binary path, backs up your existing config, and adds the `nab` entry. Restart your client after installing.

### Manual setup

Add to your MCP client configuration (`~/.config/claude/mcp.json` or equivalent):

```json
{
  "mcpServers": {
    "nab": {
      "command": "nab-mcp"
    }
  }
}
```

### HTTP transport

```bash
nab mcp serve --http 127.0.0.1:8765
# or directly:
nab-mcp --http 127.0.0.1:8765
```

Bind to localhost by default. Origin checks and `MCP-Protocol-Version` header validation are enforced per spec.

### MCP capabilities

| Capability | Status |
|-----------|--------|
| Tools | 11 tools with structured output schemas, annotations, validation errors |
| Prompts | 3 prompts (`fetch-and-extract`, `multi-page-research`, `authenticated-fetch`, `match-speakers-with-hebb`) |
| Resources | 2 static + N dynamic watch resources, all subscribable |
| Logging | `notifications/message` with RFC 5424 levels |
| Sampling | nab calls back to the host LLM for active reading, focus extraction, form auto-fill |
| Roots | `roots/list` queried for workspace-scoped saves |
| Elicitation | Form mode + URL mode for OAuth/SSO |
| Argument completion | `completion/complete` for tool args |
| Server icons | Light + dark SVG |
| Transports | stdio + Streamable HTTP (resumable, session-scoped) |

The 11 MCP tools:

| Tool | Description |
|------|-------------|
| `fetch` | Fetch URL → markdown, with cookies, focus, token budget, session |
| `fetch_batch` | Parallel multi-URL fetch with task-augmented async execution |
| `submit` | Submit a form with CSRF + smart field extraction |
| `login` | 1Password auto-login with TOTP support |
| `auth_lookup` | Look up 1Password credentials for a URL |
| `fingerprint` | Generate browser fingerprint profiles |
| `validate` | Run the validation test suite |
| `benchmark` | Time URL fetches with stats |
| `analyze` | Transcribe and diarize audio/video |
| `watch_create` | Create a URL watch and subscribe |
| `watch_list` / `watch_remove` | Manage watches |

## Site providers

nab detects URLs for 11 platforms and uses their APIs or structured data instead of scraping HTML.

| Provider | URL pattern | Method |
|----------|-------------|--------|
| Twitter / X | `x.com/*/status/*` | FxTwitter API |
| Reddit | `reddit.com/r/*/comments/*` | JSON API |
| Hacker News | `news.ycombinator.com/item?id=*` | Firebase API |
| GitHub | `github.com/*/*/issues/*`, `*/pull/*` | REST API |
| Google Workspace | Docs, Sheets, Slides | Export API + OOXML |
| YouTube | `youtube.com/watch?v=*`, `youtu.be/*` | oEmbed |
| Wikipedia | `*.wikipedia.org/wiki/*` | REST API |
| StackOverflow | `stackoverflow.com/questions/*` | API |
| Mastodon | `*/users/*/statuses/*` | ActivityPub |
| LinkedIn | `linkedin.com/posts/*` | oEmbed |
| Instagram | `instagram.com/p/*`, `*/reel/*` | oEmbed |

If no provider matches, nab falls back to standard HTML fetch + markdown conversion.

## Architecture

nab is built around a small set of orthogonal subsystems: `cmd/` (CLI), `bin/mcp_server/` (MCP server), `content/` (HTML / PDF / OCR pipeline), `analyze/` (ASR + diarization + vision), `watch/` (URL monitoring + subscriptions), `auth/` (cookies + 1Password + WebAuthn), `site/` (per-site providers), and the shared `AcceleratedClient` (HTTP/3 + connection pool + fingerprint store).

See:

- [docs/ARCHITECTURE.md]docs/ARCHITECTURE.md — full module map and data flow
- [docs/sovereign-stack.md]docs/sovereign-stack.md — how nab composes with hebb to form a local-first multimodal stack
- [docs/getting-started.md]docs/getting-started.md — new user onboarding

### Design notes

The `docs/design/` directory tracks recent design proposals:

- [analyze-v2.md]docs/design/analyze-v2.md — multilingual ASR + diarization + vision pipeline
- [url-watch-resources.md]docs/design/url-watch-resources.md — URL watch as MCP subscribable resources
- [active-reading.md]docs/design/active-reading.md — active reading via MCP sampling
- [mcp-spec-closure.md]docs/design/mcp-spec-closure.md — closing the last MCP 2025-11-25 spec gaps

## Companion tools

nab is half of a sovereign multimodal stack. The other half is [hebb](https://github.com/MikkoParkkola/hebb), a neuroscience-inspired memory MCP server. Composition examples:

- `nab analyze --diarize --include-embeddings``hebb voice_match` → speakers labeled with names
- `nab fetch URL``hebb kv_set` → personal sovereign web memory
- `nab watch add URL``hebb kv_set` (on update) → time-series of changes to any web page

See [docs/sovereign-stack.md](docs/sovereign-stack.md) for the full composition story.

## Configuration

nab requires no configuration files. It uses smart defaults: auto-detected browser cookies, randomized fingerprints, and markdown output.

Persistent state lives in `~/.nab/`:

| Path | Purpose |
|------|---------|
| `~/.nab/snapshots/` | Content snapshots for `--diff` change detection |
| `~/.nab/sessions/` | AES-256-GCM encrypted named-session jars (non-Windows) |
| `~/.nab/session-key` | Locally generated master key for session encryption (non-Windows) |
| `~/.nab/fingerprint_versions.json` | Cached browser versions (auto-updates every 14 days) |
| `~/.local/share/nab/watches/` | URL watch state |
| `~/.local/share/nab/models/` | Installed inference model binaries |

Optional plugin configuration at `~/.config/nab/plugins.toml`. See [docs/getting-started.md](docs/getting-started.md) for plugin examples.

### Environment variables

| Variable | Purpose |
|----------|---------|
| `HTTPS_PROXY` / `https_proxy` | HTTPS proxy URL |
| `HTTP_PROXY` / `http_proxy` | HTTP proxy URL |
| `ALL_PROXY` / `all_proxy` | Proxy for all protocols |
| `RUST_LOG` | Logging level (e.g., `nab=debug`) |
| `PUSHOVER_USER` / `PUSHOVER_TOKEN` | Pushover notifications for MFA |
| `TELEGRAM_BOT_TOKEN` / `TELEGRAM_CHAT_ID` | Telegram notifications for MFA |

## Library usage

```rust
use nab::AcceleratedClient;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let client = AcceleratedClient::new()?;
    let html = client.fetch_text("https://example.com").await?;
    println!("Fetched {} bytes", html.len());
    Ok(())
}
```

## Requirements

- **Rust 1.93+** for building from source
- **ffmpeg** for `analyze` and `stream` commands: `brew install ffmpeg`
- **1Password CLI** (optional, for credential integration): see [1Password docs]https://developer.1password.com/docs/cli/get-started/

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, code style guidelines, testing instructions, and pull request process.

## Responsible use

This tool includes browser cookie extraction and fingerprint spoofing capabilities. They are intended for legitimate use cases — accessing your own authenticated content, automated testing, sites where you have authorization. Use responsibly.

## Troubleshooting

**MCP server not connecting?** Run `nab-mcp` directly in your terminal to see errors. Verify the binary exists with `which nab-mcp`. If installed via `cargo install nab`, both `nab` and `nab-mcp` should be on your `$PATH`.

**Cookie extraction failing?** Grant Full Disk Access to your terminal in **System Settings > Privacy & Security > Full Disk Access** (macOS). Browser cookies are stored in protected directories. Use `--cookies brave` to target a specific browser.

**ASR model not found?** Run `nab models fetch fluidaudio` to download the model (~542 MB). The model directory is `~/.nab/models/`. Use `nab models list` to see what's installed.

**Fetch returning HTML instead of markdown?** Some sites block automated access. Try `nab fetch URL --cookies brave` to use your browser session, or `nab fetch URL --1password` for sites that need login.

**"too many open files" on watch?** Increase your ulimit: `ulimit -n 4096`. The default macOS limit (256) is too low for many concurrent watches.

## Ecosystem

nab is part of a suite of MCP tools:

| Tool | Description |
|------|-------------|
| [mcp-gateway]https://github.com/MikkoParkkola/mcp-gateway | Universal MCP gateway — compact 12-15 tool surface replaces 100+ registrations |
| [trvl]https://github.com/MikkoParkkola/trvl | AI travel agent — 36 MCP tools for flights, hotels, ground transport |
| **[nab]https://github.com/MikkoParkkola/nab** | **Web content extraction — fetch any URL with cookies + anti-bot bypass** |
| [axterminator]https://github.com/MikkoParkkola/axterminator | macOS GUI automation — 34 MCP tools via Accessibility API |

## License

MIT — see [LICENSE](LICENSE).