freeswitch-sofia-trace-parser
Rust library and CLI for parsing FreeSWITCH mod_sofia SIP trace dump files.
[]
= "0"
Overview
FreeSWITCH logs SIP traffic to dump files at
/var/log/freeswitch/sip_traces/{profile}/{profile}.dump (rotated as .dump.1.xz, etc.).
This library provides a streaming, multi-level parser:
- Level 1 — Frames: Split raw bytes on
\x0B\nboundaries, parse frame headers - Level 2 — Messages: Reassemble TCP segments, split aggregated messages by Content-Length
- Level 3 — Parsed SIP: Extract method/status, headers, body, and multipart MIME parts
Library Usage
Raw messages (Level 2)
use File;
use ;
let file = open?;
for result in new
Parsed SIP messages (Level 3)
use File;
use ParsedMessageIterator;
let file = open?;
for result in new
Multipart body splitting (SDP + EIDO/PIDF)
use File;
use ParsedMessageIterator;
let file = open?;
for result in new
Content-type-aware body access
ParsedSipMessage provides three methods for body access:
body_data()— raw bytes as UTF-8 (no processing, exact wire representation)body_text()— for JSON content types, unescapes RFC 8259 string escape sequences (\r\n→ CRLF,\t→ tab,\"→",\uXXXX→ Unicode including surrogate pairs); passthrough for all other content typesjson_field(key)— parses body as JSON, returns unescaped string value for a top-level key; returnsNoneif content type is not JSON, body is invalid, key is missing, or value is not a string
JSON-aware behavior activates for application/json and any application/*+json subtype (e.g., application/emergencyCallData.AbandonedCall+json). Matching is case-insensitive; media type parameters like charset=utf-8 are ignored.
use File;
use ParsedMessageIterator;
let file = open?;
for result in new
Streaming from pipes
use ;
use MessageIterator;
let child = new
.arg
.stdout
.spawn?;
for msg in new
Concatenating multiple files
use File;
use FrameIterator;
let f1 = open?;
let f2 = open?;
let chain = chain;
for frame in new
Edge Cases Handled
- Truncated first frame (rotated files,
xzgrepextracts, pipe mid-stream) \x0Bin XML/binary content (not a boundary unless followed by valid header)- Multiple SIP messages aggregated in one TCP read
- TCP segment reassembly (consecutive same-direction same-address frames)
- File concatenation (
cat dump.2 dump.1 | parser) - Non-UTF-8 content (works on
&[u8]) - EOF without trailing
\x0B\n - Multipart MIME bodies (SDP + PIDF/EIDO splitting for NG-911)
- JSON body unescaping for
application/jsonandapplication/*+jsoncontent types - TLS keep-alive whitespace (RFC 5626 CRLF probes, sofia-sip bare
\n) - Logrotate replay detection (partial frame re-written at start of new file)
- Incomplete frames at EOF (byte_count exceeds available content)
- Byte-level input coverage tracking (
ParseStatswith unparsed region reporting)
Validated Against Production Data
Tested against 83 production dump files (~12GB) from FreeSWITCH NG-911 infrastructure:
| Profile | Files | Frames | Messages | Multi-frame | byte_count mismatches |
|---|---|---|---|---|---|
| TCP IPv4 | 14 | 6.2M | 6.0M | 21,492 (max 7) | 0 |
| UDP IPv4 | 13 | 4.8M | 4.8M (1:1) | 0 | 0 |
| TLS IPv6 | 18 | 5.9M | 5.9M | 108 | 0 |
| TLS IPv4 | 5 | 660K | 660K | 70 | 0 |
| TCP IPv6 | 3 | 327K | 327K | - | 0 |
| UDP IPv6 | 3 | 301K | 301K (1:1) | 0 | 0 |
| Internal TCP v4 | 13 | 723K | - | - | 0 |
| Internal TCP v6 | 13 | 836K | - | - | 0 |
- Zero byte_count mismatches across all frames
- 99.99%+ of reassembled messages start with a valid SIP request/response line
- Level 3 SIP parsing: 100% on all tested profiles (TCP, UDP, TLS)
- Multipart body splitting: 1,223 multipart messages, 2,446 parts (SDP + PIDF), 0 failures
- File concatenation (
cat dump.29 dump.28 |): 965,515 frames, zero mismatches
Input coverage tracking
Every sample file is verified for byte-level parse coverage. Each unparsed region is
classified by SkipReason:
PartialFirstFrame— truncated frame at start of file (logrotate, pipe, grep extract), capped at 65535 bytesOversizedFrame— skipped region exceeds 65535 bytes (corrupt or non-dump content)ReplayedFrame— logrotate wrote a partial frame tail at the start of the new fileMidStreamSkip— unrecoverable bytes skipped mid-stream (e.g., TCP reassembly edge case)IncompleteFrame— frame at EOF with fewer bytes than declared in the headerInvalidHeader— data starts withrecv/sentbut header fails to parse
ParseStats exposes bytes_read, bytes_skipped, and detailed UnparsedRegion records
with offset, length, and skip reason for each region.
Memory Profile
The parser is designed for constant-memory streaming of arbitrarily large inputs,
including multi-day dump file chains (50GB+). Memory behavior was validated using
jemalloc heap profiling (_RJEM_MALLOC_CONF=prof:true) and gdb inspection of live
data structures during processing of 50+ chained dump files.
Parser internals at runtime (gdb-verified):
FrameIterator::buf— 64KB capacity, ~200 bytes used (single read buffer, never grows)MessageIterator::buffers— 0 entries (TCP reassembly buffers evicted after message extraction)MessageIterator::ready— 0 entries, capacity 10 (drained each iteration)
Design choices that maintain constant memory:
SkipTrackingdefaults toCountOnly— no allocation for unparsed region tracking unless opted in- TCP connection buffers are eagerly removed after complete message extraction
- Stale buffers (>2h inactive) are evicted via time-based sweep to handle TLS ephemeral port accumulation
flush_all()clears the entire buffer map at EOF
Consumers processing many files should open files lazily (one at a time) rather
than using Read::chain() upfront, which keeps all file handles and decompression
state alive for the entire run. With 50+ XZ-compressed dump files, eager chaining
consumed 172MB of LZMA decoder state alone.
CLI Tool
OPTIONS keepalives are excluded by default (use --all-methods to include them).
# One-line summary (OPTIONS excluded by default)
# Pipe from xzcat
|
# Filter by method — shows INVITE requests and their 100/180/200 responses
# Filter by Call-ID regex
# Header regex — all sent INVITEs from a specific extension
# Grep for a string anywhere in the SIP message (headers + body)
# Body grep — match only in message body (SDP, EIDO XML, etc.)
# Extract SDP body from a specific call's INVITEs
# Full SIP message output
# Statistics: method and status code distribution
# Multiple files (concatenated in order)
# Raw frames (level 1) or reassembled messages (level 2)
Dialog mode
Use -D to expand matched messages to full Call-ID conversations. When any message
matches, all messages sharing its Call-ID are output. Single pass — works with stdin/pipes.
# Find dialogs containing INVITEs, show full call flow
# Find all dialogs related to an incident ID (across profiles)
# Find dialogs by phone number anywhere in message
# Find dialogs by body content (EIDO XML, PIDF)
# Works with stdin/pipes
|
Terminated dialogs (BYE + 200 OK) that never matched are pruned during processing to limit memory usage. Unmatched Call-IDs with only OPTIONS traffic are never buffered.
Filter options
| Flag | Description |
|---|---|
-m, --method <VERB> |
Include method (request + responses via CSeq), repeatable |
-x, --exclude <VERB> |
Exclude method (request + responses), repeatable |
-c, --call-id <REGEX> |
Match Call-ID by regex |
-d, --direction <DIR> |
Filter by direction (recv/sent) |
-a, --address <REGEX> |
Match address by regex |
-H, --header <NAME=REGEX> |
Match header value by regex, repeatable |
-g, --grep <REGEX> |
Match regex against full reconstructed SIP message |
-b, --body-grep <REGEX> |
Match regex against message body only |
-D, --dialog |
Expand matches to full Call-ID conversations |
--all-methods |
Include OPTIONS (excluded by default) |
Output modes
| Flag | Description |
|---|---|
| (default) | One-line summary per message |
--full |
Full SIP message with metadata header |
--headers |
Headers only, no body |
--body |
Body only (for SDP/PIDF extraction) |
--raw |
Raw reassembled bytes (level 2) |
--frames |
Raw frames (level 1) |
--stats |
Method and status code distribution + input coverage |
--unparsed |
Report unparsed input regions to stderr (combinable with any mode) |
Building
Testing
# Unit tests (no external files needed)
# Integration tests (requires production samples in samples/)
Integration tests validate at each parser level:
- Level 1: Frame parsing, transport detection, address format, byte_count accuracy, and parse stats coverage (max 1 partial first frame per file, zero invalid header skips)
- Level 2: TCP reassembly, UDP pass-through, interleaved multi-address reassembly, frame accounting, and parse stats delegation
- Level 3: SIP request/response parsing, Call-ID/CSeq extraction, multipart MIME splitting, method distribution, and parse stats delegation
The all_samples_consistent_frame_counts test iterates all sample files per profile and asserts parse stats on each individually.
See CLAUDE.md for test architecture details.
License
LGPL-2.1-or-later