# pcap-toolkit
[](https://ci.codeberg.org/slundi/pcap-toolkit)
[](#license)
A high-performance CLI for inspecting, filtering, sorting, modifying, replaying, and exporting PCAP captures — designed to handle everything from quick triage to TB-scale data pipeline ingestion.
## Table of Contents
- [Why pcap-toolkit?](#why-pcap-toolkit)
- [Features](#features)
- [Usage](#usage)
- [Configuration](#configuration)
- [Installation](#installation)
## Why pcap-toolkit?
`tcpdump` and `tshark` are powerful but stop short of the data engineering workflows that security analysts and threat hunters actually need: deterministic flow IDs for correlation, columnar export for DuckDB or Snowflake, timestamp shifting for lab replay, or sorting a months-long multi-file capture that doesn't fit in RAM.
`pcap-toolkit` fills that gap. Every operation streams packets with a minimal memory footprint and uses `Rayon` for multi-core throughput — so it stays fast whether your input is a 10 MB sample or a 2 TB archive.
## Features
### Inspection — `info` / `stats`
Extract a full capture summary in a single streaming pass, without loading payloads into RAM:
- Start and end timestamps (millisecond precision)
- Total packet count and byte volume
- Unique source and destination IPs
- Per-flow statistics keyed by 5-tuple `(src_ip, dst_ip, src_port, dst_port, protocol)`
- Deterministic **Flow ID** (`xxh3_64` hash) — bidirectional by default so A→B and B→A share one ID; `--unidirectional` for direction-aware keying
### Filtering
Composable filters applied after sorting, before any output or replay:
| Protocol | `--proto tcp,udp,icmp` | by name or IP protocol number |
| Source IP / CIDR | `--src-ip 10.0.0.0/8` | exact or prefix, IPv4 and IPv6 |
| Destination IP / CIDR | `--dst-ip 192.168.1.5` | |
| Either endpoint IP | `--ip 10.0.0.0/8` | OR across src and dst |
| Source port / range | `--src-port 1024-65535` | TCP and UDP only |
| Destination port / range | `--dst-port 443` | |
| Either endpoint port | `--port 80,443` | |
| Flow ID | `--flow-id <hex>` | one or more, comma-separated |
| Time window | `--from` / `--to` | RFC 3339 or ms epoch |
| TCP flags | `--tcp-flags SYN,RST` | exact or `any` match |
| Packet length | `--min-len` / `--max-len` | applied to captured length |
| BPF expression | `--filter "tcp and dst port 443"` | pure-Rust implementation, no libpcap required |
Rules of the same type are OR-ed; different types are AND-ed. Full boolean control (`and` / `or` / `not`) is available in the TOML configuration.
### Two-Pass Sorting
Strict chronological ordering with a near-zero RAM footprint (~20 bytes per packet):
1. **First pass** — build a `(timestamp_ns, byte_offset, length)` index. Kept in memory for normal files; streamed to a `.idx` sidecar on disk for TB-scale inputs (~20 MB index per 1 M packets).
2. **Second pass** — sort the index, then seek-and-stream packets in order to the output pipeline.
Sorted output can be time-sliced into separate files (hourly, daily, or any custom interval).
### Traffic Modification
Applied during the second pass, before writing or replaying:
- **Payload truncation** — `--max-payload-bytes N`: keep only the first N bytes of the application payload, preserving all Ethernet / IP / transport headers. Shrinks storage while retaining full header fidelity for analysis.
- **Timestamp shifting** — provide a target start datetime (ms epoch); all timestamps are shifted by the computed delta. Useful for re-anchoring old captures to a lab timeline.
- **IP address mapping** — replace specific IPs with others (`--replace-ip 10.0.0.1=192.168.1.1`) or via a TOML mapping table. Checksums are automatically recomputed after any header change.
### Export
Convert filtered, sorted captures into modern data formats:
- **JSON** — one document per packet with parsed layer fields, flow ID, and Base64/hex payload; optional Zstd payload compression.
- **Apache Parquet** — typed columnar schema (timestamps, IPs as integers, ports, flags, flow ID, payload). Row groups encoded in parallel with `Rayon`.
- **Apache Avro** — schema-first encoding; Avro schema file emitted alongside the data for self-describing datasets.
All formats integrate directly with DuckDB, Spark, Snowflake, and Elasticsearch.
### Live Replay
Send a processed capture back onto a network interface:
- Honour original inter-packet timing or apply a speed multiplier (`--speed 2.0`, `--speed max`)
- Accepts replay interface via CLI or TOML config
- Requires `CAP_NET_RAW`; missing capability is caught early with a clear error
## Usage
```sh
# Summarise a capture
pcap-toolkit info traffic.pcap
# Show per-flow statistics
pcap-toolkit stats traffic.pcap
# Filter to HTTPS traffic from a subnet and export to Parquet
pcap-toolkit export --proto tcp --dst-port 443 --src-ip 10.0.0.0/8 \
--format parquet --output out.parquet traffic.pcap
# Sort a large capture and split into hourly files
pcap-toolkit sort --slice 1h --output sorted/ traffic.pcap
# Shift timestamps so the capture starts now, then replay at 2× speed
pcap-toolkit replay --shift now --speed 2.0 --interface eth0 traffic.pcap
# Extract a specific flow by ID
pcap-toolkit export --flow-id a3f2c1b0e4d5... --format json traffic.pcap
# Use a BPF expression for complex filtering
pcap-toolkit export --filter "tcp and dst port 443 and src net 10.0.0.0/8" traffic.pcap
```
> Commands and flags are illustrative — see `pcap-toolkit --help` for the authoritative reference as the CLI stabilises.
## Configuration
All options are available as CLI flags or in a TOML config file for repeatable pipelines:
```toml
# pcap-toolkit.toml
[[input]]
path = "captures/*.pcap"
[sort]
enabled = true
slice = "1h"
[filter]
proto = ["tcp", "udp"]
dst_port = [443, 80]
src_ip = ["10.0.0.0/8"]
unidirectional = false # bidirectional flow IDs (default)
[[output]]
format = "parquet"
path = "out/traffic.parquet"
compress_payload = true
[[output]]
format = "json"
path = "out/traffic.json"
[replay]
interface = "eth0"
speed = 1.0
```
CLI flags take precedence over the config file.
## Installation
### Pre-built binaries
Download the latest binary for your platform from the
[releases page](https://codeberg.org/slundi/pcap-toolkit/releases).
### From crates.io
```sh
cargo install pcap-toolkit
```
### With Nix
```sh
nix run codeberg:slundi/pcap-toolkit
```
Or add it permanently to your NixOS configuration or home-manager:
```nix
inputs.pcap-toolkit.url = "git+https://codeberg.org/slundi/pcap-toolkit.git";
```
### From source
```sh
git clone https://codeberg.org/slundi/pcap-toolkit.git
cd pcap-toolkit
cargo install --path .
```