xphone
A Rust library for SIP calling and RTP media. Register with a SIP trunk or PBX, or accept calls directly as a SIP server — and get decoded PCM audio frames through crossbeam channels.
Also available in Go (the more mature implementation).
Table of Contents
- Status | Scope and Limitations | Tested Against | Use Cases
- Quick Start | Connection Modes | Working with Audio
- Features | Call States | Call Control | Media Pipeline
- Configuration | RTP Port Range | NAT Traversal | Opus Codec | G.729 Codec
- Testing | Example App | Logging | Stack | Roadmap
Status — Beta
xphone-rust is in active development and used in production alongside xbridge. Feature coverage is broad but real-world mileage is still limited — not all features have been exercised under diverse production conditions. The Go implementation has more production exposure; if you're evaluating and language is flexible, start there.
The entire SIP and RTP stack is implemented from scratch in Rust — no external SIP or RTP crate dependencies.
Scope and limitations
xphone is a voice data-plane library — SIP signaling and RTP media. It is not a telephony platform.
You are responsible for:
- Billing, number provisioning, and call routing rules
- Recording storage and playback infrastructure
- High availability, persistence, and failover
- Rate limiting, authentication, and abuse prevention at the application level
Security boundaries:
- SRTP uses SDES key exchange only. DTLS-SRTP is not supported — xphone cannot interop with WebRTC endpoints that require it.
- TLS is supported for SIP transport. See Configuration for transport options.
- There is no built-in authentication layer for your application — xphone authenticates to SIP servers, not your end users.
Codec constraints:
- Opus requires the
opus-codecfeature and system-installed libopus. - G.729 uses a pure Rust implementation (
g729-sys) — no system dependencies. - G.711 and G.722 are always available with no external dependencies.
- PCM sample rate is fixed at 8 kHz (narrowband) or 16 kHz (G.722 wideband). There is no configurable sample rate.
Tested against
| Category | Tested with |
|---|---|
| SIP trunks | Telnyx, Twilio SIP, VoIP.ms, Vonage |
| PBXes | Asterisk, FreeSWITCH, 3CX |
| Integration tests | fakepbx (in-process SIP server, real SIP over loopback) + Dockerized Asterisk (xpbx) in CI |
| Unit tests | MockPhone & MockCall — full Phone/Call API mocks |
This is not a comprehensive compatibility matrix. If you hit issues with a provider or PBX not listed here, please open an issue.
Use cases
- AI voice agents — pipe call audio directly into your STT/LLM/TTS pipeline without a telephony platform
- Softphones and click-to-call — embed SIP calling into any Rust application against a trunk or PBX
- Call recording and monitoring — tap the PCM audio stream for transcription, analysis, or storage
- Outbound dialers — programmatic dialing with DTMF detection for IVR automation
- Unit-testable call flows — MockPhone and MockCall let you test every call branch without a SIP server
See the demos repo for working examples.
Quick Start
Install
Add to your Cargo.toml:
[]
= "0.4"
Requires Rust 1.87+.
Receive calls
use Arc;
use ;
PCM format: Vec<i16>, mono, 8000 Hz, 160 samples per frame (20ms) — the standard input format for most speech-to-text APIs.
Make an outbound call
use DialOptions;
use Duration;
let opts = DialOptions ;
let call = phone.dial?;
if let Some = call.pcm_reader
dial accepts a full SIP URI or just the number — if no host is given, your configured SIP server is used.
Connection Modes
xphone supports two ways to connect to the SIP world. Both produce the same Call API — accept, end, DTMF, pcm_reader/writer are identical.
Phone mode (SIP client)
Registers with a SIP server like a normal endpoint. Use this with SIP trunks (Telnyx, Vonage), PBXes (Asterisk, FreeSWITCH), or any SIP registrar. No PBX is required — you can register directly with a SIP trunk provider:
let phone = new;
phone.on_incoming;
phone.connect?;
Server mode (SIP trunk)
Accepts and places calls directly with trusted SIP peers — no registration required. Use this when trunk providers send INVITEs to your public IP, or when a PBX routes calls to your application:
let server = new;
server.on_incoming;
server.listen.await?;
Peers are authenticated by IP/CIDR or SIP digest auth. Per-peer codec and RTP address overrides are supported.
Which mode? Use Phone when you register to a SIP server (most setups). Use Server when SIP peers send INVITEs directly to your application (Twilio SIP Trunk, direct PBX routing, peer-to-peer).
Working with Audio
xphone exposes audio as a stream of PCM frames through crossbeam channels.
Frame format
| Property | Value |
|---|---|
| Encoding | 16-bit signed PCM |
| Channels | Mono |
| Sample rate | 8000 Hz |
| Samples per frame | 160 |
| Frame duration | 20ms |
Reading inbound audio
call.pcm_reader() returns a crossbeam_channel::Receiver<Vec<i16>>:
if let Some = call.pcm_reader
Important: Read frames promptly. The inbound buffer holds 256 frames (~5 seconds). If you fall behind, the oldest frames are silently dropped.
Writing outbound audio
call.pcm_writer() returns a crossbeam_channel::Sender<Vec<i16>>. Send one 20ms frame at a time:
if let Some = call.pcm_writer
Important:
pcm_writer()sends each buffer as an RTP packet immediately — the caller must provide frames at real-time rate (one 160-sample frame every 20ms). For TTS or file playback, usepaced_pcm_writer()instead.
Paced writer (for TTS / pre-generated audio)
call.paced_pcm_writer() accepts arbitrary-length PCM buffers and handles framing + pacing internally:
if let Some = call.paced_pcm_writer
pcm_writerandpaced_pcm_writerare mutually exclusive — using one suppresses the other for that call.
Raw RTP access
For lower-level control — pre-encoded audio, custom codecs, or RTP header inspection:
if let Some = call.rtp_reader
if let Some = call.rtp_writer
rtp_writerandpcm_writerare mutually exclusive — if you write tortp_writer,pcm_writeris ignored for that call.
Converting to f32
Features
Calling — stable
- SIP registration with auto-reconnect and keepalive
- Inbound and outbound calls
- Hold / resume (re-INVITE)
- Blind transfer (REFER) and attended transfer (REFER with Replaces, RFC 3891)
- Call waiting (
Phone.calls()API) - Session timers (RFC 4028)
- Mute / unmute
- 302 redirect following
- Early media (183 Session Progress)
- Outbound proxy routing (
Config::outbound_proxy) - Separate outbound credentials (
outbound_username/outbound_password) - P-Asserted-Identity for caller ID (
DialOptions::caller_id) - Custom headers on outbound INVITEs (
DialOptions::custom_headers) Server::dial_uri— dial arbitrary SIP URIs without pre-configured peersEndReason::TransferFailed— surfaces REFER failures instead of silently dropping them
DTMF — stable
- RFC 4733 (RTP telephone-events)
- SIP INFO (RFC 2976)
Audio codecs — stable
- G.711 u-law (PCMU), G.711 A-law (PCMA) — built-in
- G.722 wideband — built-in
- Opus — optional, requires libopus (
--features opus-codec) - G.729 — optional, pure Rust (
--features g729-codec) - Jitter buffer
Video — newer, less production mileage
- H.264 (RFC 6184) and VP8 (RFC 7741)
- Depacketizer/packetizer pipeline
- Mid-call video upgrade/downgrade (re-INVITE)
- Video upgrade accept/reject API
- VideoReader / VideoWriter / VideoRTPReader / VideoRTPWriter
- RTCP PLI/FIR for keyframe requests
Security — stable
- SRTP (AES_CM_128_HMAC_SHA1_80) with SDES key exchange
- SRTP replay protection (RFC 3711)
- SRTCP encryption (RFC 3711 §3.4)
- Key material zeroization
- Separate SRTP contexts for audio and video
Network — stable
- TCP and TLS SIP transport
- STUN NAT traversal (RFC 5389)
- TURN relay for symmetric NAT (RFC 5766)
- ICE-Lite (RFC 8445 §2.2)
- RTCP Sender/Receiver Reports (RFC 3550)
Messaging — newer, less production mileage
- SIP MESSAGE (RFC 3428)
- SIP SUBSCRIBE/NOTIFY (RFC 6665)
- Generic event subscriptions (presence, dialog, etc.)
- MWI / voicemail notification (RFC 3842)
- BLF / Busy Lamp Field monitoring
Testing — stable
- MockPhone and MockCall — full API mocks for unit testing
Call States
Idle -> Ringing (inbound) or Dialing (outbound)
-> RemoteRinging -> Active <-> OnHold -> Ended
call.on_state;
call.on_ended;
Call Control
call.hold?;
call.resume?;
call.blind_transfer?;
call_a.attended_transfer?; // works for both Phone and Server calls
call.mute?;
call.unmute?;
call.send_dtmf?;
call.on_dtmf;
// Mid-call video upgrade
call.add_video?;
call.on_video_request;
call.on_video;
phone.send_message?;
Media Pipeline
Audio
Inbound:
SIP Trunk -> RTP/UDP -> Jitter Buffer -> Codec Decode -> pcm_reader (Vec<i16>)
Outbound:
pcm_writer (Vec<i16>) -> Codec Encode -> RTP/UDP -> SIP Trunk
rtp_writer -> RTP/UDP -> SIP Trunk (raw mode)
Video
Inbound:
SIP Trunk -> RTP/UDP -> Depacketizer (H.264/VP8) -> video_reader (VideoFrame)
-> video_rtp_reader (raw video RTP packets)
Outbound:
video_writer (VideoFrame) -> Packetizer (H.264/VP8) -> RTP/UDP -> SIP Trunk
video_rtp_writer -> RTP/UDP -> SIP Trunk (raw mode)
Video uses a separate RTP port and independent SRTP contexts. RTCP PLI/FIR requests trigger keyframe generation on the sender side.
All channels are buffered (256 entries). Inbound taps drop oldest on overflow; outbound writers drop newest. Audio frames are 160 samples at 8000 Hz = 20ms. Video frames carry codec-specific NAL units (H.264) or encoded frames (VP8).
Each pipeline runs on a dedicated std::thread per call, bridged to the application via crossbeam-channel.
Configuration
let phone = new;
// Or use the builder:
let phone = new;
See docs.rs for all options.
RTP Port Range
Each active call requires an even-numbered UDP port for RTP audio. Configure an explicit range for production deployments behind firewalls:
let phone = new;
Only even ports are used (per RTP spec). Maximum concurrent audio-only calls = (max - min) / 2.
| Range | Even ports | Max concurrent calls |
|---|---|---|
| 10000–10100 | 50 | ~50 |
| 10000–12000 | 1000 | ~1000 |
| 10000–20000 | 5000 | ~5000 |
When ports run out: inbound calls receive a 500 Internal Server Error and outbound dials fail with an error. Widen the range before investigating SIP server configuration.
Default (0, 0) lets the OS assign ephemeral ports. This works for development but is impractical in production where firewall rules need a known range.
NAT Traversal
STUN (most deployments)
Discovers your public IP via a STUN Binding Request:
let phone = new;
TURN (symmetric NAT)
For environments where STUN alone fails (cloud VMs, corporate firewalls):
let phone = new;
ICE-Lite
SDP-level candidate negotiation (RFC 8445 §2.2):
let phone = new;
Only enable STUN/TURN/ICE when the SIP server is on the public internet. Do not enable it when connecting via VPN or private network.
Opus Codec
Opus is optional and requires system-installed libopus. The default build has no external C dependencies.
Install libopus
# Debian / Ubuntu
# macOS
Build with Opus
Usage
let phone = new;
Opus runs at 8kHz natively — no resampling needed. PCM frames remain Vec<i16>, mono, 160 samples (20ms). RTP timestamps use 48kHz clock per RFC 7587.
Without the opus-codec feature, Codec::Opus is accepted in configuration but will not be negotiated.
G.729 Codec
G.729 is optional via the g729-codec feature. Unlike Opus, it uses a pure Rust implementation (g729-sys) — no system libraries required.
Build with G.729
Usage
let phone = new;
G.729 runs at 8kHz, 8 kbps CS-ACELP. SDP advertises annexb=no — Annex B (VAD/CNG) is not supported.
Testing
Unit tests with mocks
MockPhone and MockCall provide the same API as the real types:
use MockPhone;
let phone = new;
phone.connect.unwrap;
phone.on_incoming;
phone.simulate_incoming;
assert_eq!;
use MockCall;
let call = new;
call.accept.unwrap;
call.send_dtmf.unwrap;
assert_eq!;
call.simulate_dtmf;
Integration tests with FakePBX (no Docker)
End-to-end tests with Asterisk
Or using the Makefile:
Example App
examples/sipcli is a terminal SIP client with registration, calls, hold, resume, DTMF, mute, transfer, video calls, echo mode, and speaker output:
# Audio-only
# With video display (H.264 decoding + window)
Logging
xphone uses the tracing crate for structured logging:
fmt
.with_max_level
.init;
All SIP messages, RTP stats, media events, and call state transitions are instrumented with tracing spans and events.
To silence library logs in production:
use EnvFilter;
fmt
.with_env_filter
.init;
Stack
| Layer | Implementation |
|---|---|
| SIP Signaling | Built-in (message parsing, digest auth, transactions, UDP/TCP/TLS) |
| RTP / SRTP / SRTCP | Built-in (std::net::UdpSocket, AES_CM_128_HMAC_SHA1_80, replay protection) |
| G.711 / G.722 | Built-in (PCMU, PCMA, G.722 ADPCM) |
| G.729 | g729-sys (optional, g729-codec feature, pure Rust) |
| Opus | opus (optional, opus-codec feature, libopus FFI) |
| H.264 / VP8 | Built-in packetizer/depacketizer (RFC 6184, RFC 7741) |
| RTCP | Built-in (RFC 3550 SR/RR + PLI/FIR) |
| Jitter Buffer | Built-in |
| STUN | Built-in (RFC 5389) |
| TURN | Built-in (RFC 5766) |
| ICE-Lite | Built-in (RFC 8445 §2.2) |
| TUI (sipcli) | ratatui + cpal |
No external SIP or RTP crate dependencies — the entire protocol stack is implemented from scratch.
Roadmap
- DTLS-SRTP key exchange (WebRTC interop)
- Full ICE (connectivity checks, nomination)
Changelog
See CHANGELOG.md.
License
MIT