# Code Review: pcap-toolkit
## Architecture Overview
Well-structured streaming tool with clean separation of concerns: PCAP I/O (`pcap/`),
filtering (`filter.rs`, `bpf.rs`), two-pass sorting (`sort/`), multi-format export
(`export/`), and replay (`replay/`). The `Filter` → `FilterRule` trait-based composition
via `impl_filter_body!` is elegant. The two-thread export pipeline (bounded
`sync_channel` at `export/mod.rs:208`) and parallel first-pass via Rayon are the right
tools for those jobs.
---
## Correctness Issues
### ✅ 1. Sidecar file leak on partial first-pass failure — `sort/mod.rs:97–105`
```rust
let results: Vec<Result<...>> = inputs.par_iter().map(|p| first_pass(...)).collect();
let mut headers_and_stores = Vec::with_capacity(inputs.len());
for r in results {
headers_and_stores.push(r?); // ← exits early on first error
}
```
If N first passes succeed and one fails, the `?` short-circuits the loop. The sidecar
`.idx` files from the successful passes are never cleaned up. The existing cleanup block
at lines 112–122 only covers the link-type mismatch case. Fix: collect sidecar paths
before iterating results, clean them up in an error branch.
### ✅ 2. TCP data-offset < 5 not rejected in truncation — `transform.rs:543`
```rust
PROTO_TCP => {
if data.len() < ts + 20 { return (origlen, false); }
((data[ts + 12] >> 4) * 4) as usize
}
```
A corrupt data-offset field < 5 (meaning < 20 bytes) produces a `transport_hdr_len`
smaller than the minimum TCP header, making `payload_start` point into the header
itself. The bounds check guards against total buffer underflow but not against invalid
header lengths. Add: `if transport_hdr_len < 20 { return (origlen, false); }`.
### ✅ 3. `src_port`/`dst_port` are `Some(0)` for ICMP in export records — `export/mod.rs:343–355`
`FlowKey::new` always fills `src_port = 0` / `dst_port = 0` for non-TCP/UDP
(see `filter.rs:402`). The export record ends up with `src_port: Some(0)` for ICMP
packets, which is misleading. Downstream consumers need out-of-band protocol knowledge
to know whether port fields are meaningful. Emit `None` when `protocol ∉ {6, 17}`:
```rust
let src_port = if matches!(key.protocol, 6 | 17) { Some(key.src_port) } else { None };
let dst_port = if matches!(key.protocol, 6 | 17) { Some(key.dst_port) } else { None };
```
### ✅ 4. Bare integer epoch silently treated as milliseconds — `filter.rs:300–302`
```rust
if let Ok(ms) = s.parse::<u64>() {
return Ok(ms.saturating_mul(1_000_000));
}
```
A user who passes a Unix epoch in **seconds** (the dominant convention) gets a timestamp
1000× too far in the future with no error or warning. The behaviour is documented, but
it is surprising enough to warrant a louder note in the CLI help text and the error
message when `--from`/`--to` filters produce zero matches.
---
## Performance
### ✅ 5. `flow_ids` stored as `Vec`, O(n) scan per packet — `filter.rs:601`
```rust
if !b.flow_ids().contains(&id) {
return false;
}
```
`resolve_min_flow_filter` builds a `HashSet<u64>` internally but then converts it to a
`Vec` at `main.rs:648`. For large allowlists (common with `--min-flow-packets`), every
packet pays a linear scan. Keep the qualifying set as a `HashSet<u64>` in `Filter`, or
at minimum sort the vec and use binary search.
### ✅ 6. `Pps` timing overflow at high rates — `replay/mod.rs:127`
```rust
let target_ns = sent_count * 1_000_000_000 / pps;
```
`sent_count * 1_000_000_000` overflows `u64` after ~18.4 billion packets. At 1 Mpps
that is roughly 5 hours of replay, after which the computed target wraps to zero and
pacing breaks silently. Use `u128` for the intermediate product or apply saturating
arithmetic.
---
## Code Quality
### ✅ 7. Three nearly-identical filter builders — `main.rs:177–606`
`build_filter`, `build_export_filter`, and `build_replay_filter` are ~120 lines of
copy-paste that differ only in their argument struct type. A future filter field added
to one will silently be missing from the others (this has nearly happened with
`min_len`/`max_len`). Unify with a shared trait or a free function taking the common
fields as parameters.
### ✅ 8. `--format` silently applies only to the first `--output` — `main.rs:339–347`
```rust
let format = if i == 0 {
args.format.as_deref().and_then(ExportFormat::parse)
.or_else(|| ExportFormat::from_extension(path))
} else {
ExportFormat::from_extension(path)
};
```
If a user passes `--format parquet --output a.pq --output b.pq`, only `a.pq` receives
the explicit format. The behaviour is labelled "backward compat" but is not documented
in the CLI help. Add a note to the `--format` help text.
---
## Missing Tests
### 9. No test for `--min-flow-packets`
`resolve_min_flow_filter` (`main.rs:616`) is the most recently added complex feature.
It has no direct test. The pre-scan + intersection logic is non-trivial and deserves an
integration test that verifies flows with fewer than N packets are excluded from the
output.
### 10. No test for time-slice boundary and empty-slice cases
`SlicedWriter` rotates output files on time boundaries. There are no tests verifying
that a packet exactly on a boundary goes to the expected file, or that an empty time
window does not produce a zero-byte output file.
### 11. No integration tests for compound BPF expressions
`bpf.rs` implements a pure-Rust libpcap-compatible parser. Tests for compound
expressions (`tcp and port 443`, `not udp`, nested `or`/`and`) would catch regressions
in precedence or short-circuit evaluation.
---
## Security
### 12. `unsafe` in `open_raw_socket` is correct — `replay/mod.rs:222–235`
The `unsafe` block uses `SockAddr::try_init` to write a `SockAddrLl` into a zeroed
storage buffer. The struct is `#[repr(C)]`, all fields are initialised, and `*len` is
set to `size_of::<SockAddrLl>()`. This is the canonical pattern for AF_PACKET sockets
without a `libc` dependency. No issues.
### 13. Output paths accept arbitrary destinations
If `--output` can be influenced by untrusted input, packets could be written to
arbitrary paths. Acceptable in the current trusted-operator context, but worth noting
if the tool is ever wrapped in an API or web frontend.
---
## Summary Table
| Bug | Sidecar `.idx` file leak on first-pass error | `sort/mod.rs:97–105` |
| Bug | TCP data-offset < 5 not rejected before truncation | `transform.rs:543` |
| Misleading | `src_port`/`dst_port` `Some(0)` for ICMP in export | `export/mod.rs:343` |
| Misleading | Bare integer epoch silently treated as milliseconds | `filter.rs:300` |
| Performance | `flow_ids` Vec gives O(n) lookup per packet | `filter.rs:601` |
| Performance | `Pps` timing overflows u64 after ~5 h at 1 Mpps | `replay/mod.rs:127` |
| Maintainability | Three identical filter builders (150 lines duplicated) | `main.rs:177–606` |
| Missing test | `--min-flow-packets` end-to-end behaviour | — |
| Missing test | Time-slice boundary and empty-window cases | — |
| Missing test | Compound BPF expression parsing and evaluation | — |
The codebase is well-structured and clearly written. Items 1 and 2 (sidecar leak and
data-offset check) are the most important to address before a 1.0 release.