freeswitch-sofia-trace-parser 0.5.0

Parser for FreeSWITCH mod_sofia SIP trace dump files
Documentation
# FreeSWITCH Sofia Trace Parser — Developer Guide


## Project Type


This is a **library-first** crate. `src/bin/main.rs` is a sample (but complete) CLI implementation.
`Cargo.lock` is gitignored per Cargo convention for libraries.

## Library Code Rules


- **No `unwrap()`/`expect()`/`panic!()` in library code** outside of tests. Return
  `Result` or `Option` instead.
- **Binary-only dependencies must be feature-gated** behind the `cli` feature.
  Library consumers (`default-features = false`) must not pull in CLI deps.
- **CLI-only modules must not be `pub` in `lib.rs`.** Code used only by the binary
  lives under `src/bin/` (e.g. `src/bin/grep.rs`), not in the library.
- **No `pub mod` names that shadow `std`** (e.g. don't name a module `fmt`, `io`,
  `collections`).
- **Never expose dependency types in public signatures.** A dependency major-version
  bump becomes a semver break if its types leak into the public API.

## Build & Test Workflow


**Always run `cargo fmt` before every commit.** The pre-commit hook enforces
formatting, clippy, gitleaks, tests, and semver-checks.

```sh
cargo fmt
cargo check --no-default-features --message-format=short  # lib only
cargo check --message-format=short
cargo clippy --fix --allow-dirty --message-format=short
cargo test --lib                    # unit tests (fast, no sample files needed)
cargo test --test level1_samples    # Level 1 integration tests (requires samples/)
cargo test --test level2_samples    # Level 2 integration tests (requires samples/)
cargo test --test level3_samples    # Level 3 integration tests (requires samples/)
```

## Release Workflow


Before tagging a release:

```sh
cargo semver-checks --baseline-rev <previous-tag> --default-features=false
cargo clippy --release -- -D warnings
cargo test --release
cargo build --release
```

Tag with a signed annotated tag. Include a brief changelog in the tag message:

```sh
git tag -as v0.X.0 -m "v0.X.0

- Brief changelog entry
- Another change"
git push --tags
```

**Never `cargo publish` without completing these steps first:**

1. Create a signed annotated tag (`git tag -as`)
2. Push the tag (`git push --tags`)
3. Wait for CI to pass on the tagged commit
4. Only then `cargo publish`

## Test Architecture

### Unit tests (`cargo test --lib`)

Always available, no external dependencies. Cover:

- Frame header parsing (all transports, address formats, timestamp variants)
- Frame iterator (boundary detection, truncated first/last frames, file concatenation, garbage recovery)
- Message reassembly (TCP grouping, UDP pass-through, direction/address splits)
- Aggregation splitting (Content-Length based multi-message splitting)
- SIP parsing (request/status lines, headers, body extraction)

### Integration tests (`cargo test --test level{1,2,3}_samples`)

Require production sample files in `samples/` (gitignored, contain PII).
Tests skip gracefully if files are missing — they check `path.exists()` and return early.

Sample files are raw binary FreeSWITCH dump files (~50-350MB each):

- `esinet1-v4-tcp.dump.{20..29}` — TCP IPv4
- `esinet1-v4-udp.dump.{20..29}` — UDP IPv4
- `esinet1-v6-tls.dump.{20..29}` — TLS IPv6
- `internal-v4.dump.{20..29}` — internal TCP IPv4
- `internal-v6.dump.{20..29}` — internal TCP IPv6
- `esinet1-v6-tls.dump.180` — TLS IPv6 with real traffic (INVITE/NOTIFY/BYE)
- `esinet1-v4-tls.dump.{179,180}` — TLS IPv4 (180 has real traffic)

Logrotate numbering: higher number = older file.

Level 3 tests tolerate a small number of parse failures (~0.004% on TCP) caused by
TCP reassembly edge cases producing fragments without valid SIP first lines.

The `file_concatenation_two_dumps` test validates `Read::chain()` across two files
(simulating `cat dump.29 dump.28 | parser`).

### Running integration tests

```sh
# All integration tests
cargo test --test level1_samples -- --nocapture
cargo test --test level2_samples -- --nocapture
cargo test --test level3_samples -- --nocapture

# Single test
cargo test --test level1_samples esinet1_v4_tcp -- --nocapture
```

## Development Methodology — TDD


This project follows test-driven development:

1. Write failing tests that reproduce the bug or specify the new behavior
2. Confirm tests fail (`cargo test --lib`)
3. `cargo fmt && git commit --no-verify` (red phase — clippy/tests will fail, but code must be formatted)
4. Implement the fix/feature
5. Confirm all tests pass
6. Commit the implementation (hooks run normally)

## Investigation Principle


Before modifying the data stream (frame parsing, message reassembly, SIP parsing),
consider all 3 parsing levels. The parser aims for 100% accuracy — no missing bytes.
If a new dump file triggers errors, investigate the root cause across all levels before
assuming malformed data and adding workarounds.

## Key Design Decisions


### Boundary detection: byte_count-first strategy


The `\x0B\n` boundary is validated two ways:

1. **Primary**: Check at expected position (`content_start + byte_count`). If `\x0B` is there, accept it. This handles file concatenation where garbage follows the boundary.
2. **Fallback**: Scan for `\x0B\n` followed by a valid frame header (`recv/sent N bytes ...`). This handles `\x0B` appearing in XML/binary content.

### Streaming design


All iterators accept `impl Read`. Truncated first frames are expected and logged via `tracing::warn!`. The parser never panics on malformed input.

### Multi-level architecture


```
Level 1: FrameIterator  — raw bytes → Frame (header + content)
Level 2: MessageIterator — Frame → SipMessage (reassembled + split)
Level 3: ParsedSipMessage — SipMessage → parsed headers/body
```

Each level wraps the previous, all streaming.