# ADS-B Aircraft Anomaly Detection System
## Product Requirements Document (PRD)
---
## Vision Statement
**Build a robust, real-time system that monitors local aircraft activity via ADS-B and alerts on genuinely anomalous behavior using machine learning.**
---
## Problem Statement
### The Challenge
Aircraft in controlled airspace occasionally exhibit unusual behavior that may indicate:
- Equipment malfunctions
- Security threats (spoofing, jamming)
- Emergency situations
- Unauthorized or illegal activity
- Test/research aircraft operations
Traditional flight tracking focuses on position and altitude, but misses subtle behavioral anomalies in transmission patterns, signal characteristics, and identity management.
### Market Reality
- **ADS-B receivers** (like PiAware) are affordable and widely deployed
- **Data is inherently partial** - aircraft transmit different message types at different intervals
- **No system exists** that performs ML-based anomaly detection on real-world partial ADS-B data
- **Security community** needs tools to detect spoofing and injection attacks
---
## Success Criteria
### Primary Goals
1. **Detect 5-20 genuine anomalies per day** in a typical metropolitan area
2. **<20% false positive rate** to avoid alert fatigue
3. **<30 second detection latency** from anomalous behavior to alert
4. **90%+ aircraft coverage** - analyze the vast majority of detected aircraft
### Key Results
- **Threat Detection**: Catch spoofed aircraft, signal injection, equipment testing
- **Safety Monitoring**: Flag aircraft with equipment malfunctions or unusual patterns
- **Research Value**: Generate insights into local air traffic behavioral patterns
- **Operational Reliability**: 24/7 operation with minimal maintenance
---
## User Stories
### Primary User: Security Researcher
- "As a security researcher, I want to detect ADS-B spoofing attempts so I can study attack patterns"
- "I want to identify aircraft transmitting suspicious callsigns or hex codes"
- "I need to monitor for signal injection attacks near airports"
### Secondary User: Aviation Enthusiast
- "As an aviation enthusiast, I want to spot unusual aircraft in my area"
- "I want to identify military or government aircraft that might be interesting"
- "I want alerts when aircraft exhibit unusual transmission patterns"
### Tertiary User: Safety Monitor
- "As a safety monitor, I want to detect aircraft with equipment malfunctions"
- "I need to identify aircraft that might be in emergency situations"
- "I want to track aircraft that deviate from normal behavioral patterns"
---
## Core Functionality
### Essential Features (MVP)
1. **Real-time ADS-B ingestion** from PiAware or similar receivers
2. **Behavioral pattern analysis** using machine learning
3. **Multi-tier anomaly detection** working with partial/incomplete data
4. **Configurable alerting** with confidence scoring
5. **Web dashboard** for monitoring and investigation
### Important Features (V1.1)
1. **Historical analysis** - detect patterns over time
2. **Alert correlation** - link multiple anomalous behaviors from same aircraft
3. **Geographic filtering** - focus on specific areas of interest
4. **Export capabilities** - data export for external analysis
5. **API access** - programmatic access to alerts and data
### Nice-to-Have Features (V2.0)
1. **Multi-receiver support** - combine data from multiple ADS-B receivers
2. **Threat intelligence integration** - cross-reference with known bad actors
3. **Predictive analysis** - forecast unusual activity periods
4. **Mobile alerts** - push notifications to mobile devices
5. **Community sharing** - share anomaly patterns with other researchers
---
## Technical Architecture
### Data Sources
**Primary**: ADS-B receiver (PiAware) JSON feed
- Aircraft hex IDs (always available)
- Flight callsigns (intermittent)
- Position data (sporadic)
- Altitude/speed (rare)
- Signal strength/timing (usually available)
**Secondary**: External enrichment (optional)
- Aircraft registration databases
- Airline/operator information
- Airport reference data
### Detection Methodology
#### Tier 1: Temporal Analysis (High Coverage)
**Data Required**: Message timestamps, frequency
**Detects**: Transmission timing anomalies
- Rapid-fire transmission (test equipment)
- Irregular intervals (malfunctioning transponders)
- Burst patterns (signal generators)
- Long silence periods followed by activity
#### Tier 2: Signal Analysis (Medium Coverage)
**Data Required**: RSSI, signal characteristics
**Detects**: Signal-based anomalies
- Impossible signal strengths (physics violations)
- Unusually strong signals (nearby equipment/spoofing)
- Signal pattern inconsistencies
- Multi-path or interference signatures
#### Tier 3: Identity Analysis (Medium Coverage)
**Data Required**: Hex IDs, callsigns
**Detects**: Identity-based anomalies
- Suspicious callsign patterns (TEST123, FAKE001)
- Invalid hex code patterns (000000, FFFFFF)
- Identity switching (same aircraft, different callsigns)
- Inconsistent registration data
#### Tier 4: Behavioral Analysis (Low Coverage)
**Data Required**: Position, altitude, speed (when available)
**Detects**: Flight behavior anomalies
- Unusual flight paths or altitudes
- Impossible speed/altitude changes
- Geographic outliers
- Restricted area violations
### System Components
#### Ingestion Engine
- Real-time ADS-B data consumption
- Data validation and normalization
- Message deduplication
- Session management (group messages by aircraft)
#### Analysis Engine
- Multi-tier ML anomaly detection
- Real-time scoring and classification
- Confidence calculation
- Historical baseline learning
#### Alert Engine
- Configurable alerting thresholds
- Alert aggregation and deduplication
- Multiple notification channels
- Alert lifecycle management
#### Dashboard & API
- Real-time monitoring interface
- Historical analysis tools
- Alert investigation capabilities
- RESTful API for integration
### Technology Stack
- **Language**: Python (rapid development, rich ML ecosystem)
- **Database**: Time-series optimized (SQLite initially, PostgreSQL+TimescaleDB for scale)
- **ML Framework**: scikit-learn, pandas (proven, stable)
- **Web Framework**: FastAPI + React or Streamlit (depending on complexity needs)
- **Deployment**: Docker containers with docker-compose
---
## Anomaly Examples
### High-Confidence Anomalies (Auto-Alert)
- **Rapid Transmission**: Aircraft broadcasting >10 messages/second
- **Impossible Physics**: RSSI values outside -120 to -10 dBm range
- **Test Callsigns**: Aircraft with obviously fake identifiers
- **Hex Spoofing**: Common fake hex patterns or duplicates
### Medium-Confidence Anomalies (Review Queue)
- **Timing Irregularities**: Message intervals outside normal distribution
- **Strong Signals**: RSSI suggesting very close proximity to receiver
- **Identity Inconsistencies**: Callsign format violations or unusual patterns
- **Behavioral Outliers**: Flight patterns significantly different from baseline
### Low-Confidence Anomalies (Monitor)
- **Subtle Timing Variations**: Minor deviations from expected patterns
- **Signal Fluctuations**: Unusual RSSI variance or patterns
- **Geographic Anomalies**: Aircraft in less common but not impossible locations
- **Equipment Signatures**: Patterns suggesting specific transponder types or ages
---
## Performance Requirements
### Scalability
- **Throughput**: Handle 1000+ aircraft messages per second
- **Storage**: Retain 30 days of detailed data, 1 year of aggregated data
- **Analysis Speed**: Process incoming messages within 5 seconds
- **Concurrent Users**: Support 10+ simultaneous dashboard users
### Reliability
- **Uptime**: 99%+ availability (brief maintenance windows acceptable)
- **Data Integrity**: <0.1% message loss rate
- **False Positive Rate**: <20% of alerts should be genuine anomalies
- **Response Time**: Web dashboard responsive within 2 seconds
### Resource Usage
- **Memory**: Operate effectively on 4GB+ RAM systems
- **Storage**: <100GB storage for 30 days typical metropolitan area
- **CPU**: Efficient enough for Raspberry Pi 4 or equivalent
- **Network**: Minimal bandwidth usage for ADS-B ingestion
---
## Security & Privacy
### Security Considerations
- **Input Validation**: Robust validation of ADS-B data (untrusted source)
- **Alert Security**: Alerts themselves could be used for intelligence gathering
- **Access Control**: Dashboard and API should require authentication
- **Data Protection**: Local storage by default, optional remote backup
### Privacy Considerations
- **Aircraft Privacy**: Some aircraft operators may prefer not to be tracked
- **Data Retention**: Clear policies on how long to retain aircraft data
- **Sharing Controls**: User control over what data (if any) is shared externally
- **Anonymization**: Option to anonymize aircraft identifiers in stored data
---
## Success Metrics & KPIs
### Detection Effectiveness
- **True Positive Rate**: % of real anomalies successfully detected
- **False Positive Rate**: % of alerts that are false alarms
- **Detection Latency**: Average time from anomaly occurrence to alert
- **Coverage Rate**: % of detected aircraft that are analyzed for anomalies
### Operational Metrics
- **System Uptime**: % of time system is operational and processing data
- **Data Quality**: % of ingested messages that are valid and processed
- **User Engagement**: Dashboard usage, alert review rates, user retention
- **Performance**: Message processing rate, storage efficiency, response times
### Research Value
- **Anomaly Diversity**: Number of different anomaly types detected
- **Pattern Discovery**: New behavioral patterns discovered over time
- **Community Value**: Usage by security researchers and aviation enthusiasts
- **Data Export**: Volume and frequency of data exports for external research
---
## Risk Assessment
### Technical Risks
- **False Positive Flood**: Too many alerts overwhelm users → Conservative thresholds initially
- **Performance Bottlenecks**: ML processing can't keep up → Asynchronous processing design
- **Data Quality Issues**: Bad ADS-B data causes incorrect alerts → Input validation and sanity checking
### Operational Risks
- **Alert Fatigue**: Users ignore alerts due to too many false positives → Confidence scoring and tuning
- **Missed Threats**: Real anomalies go undetected → Multiple detection approaches, human oversight
- **Privacy Concerns**: Aircraft operators object to monitoring → Clear privacy policies and opt-outs
### Business Risks
- **Limited Interest**: Fewer users than expected → Focus on core security researcher community first
- **Regulatory Issues**: Aviation authorities restrict or regulate the system → Compliance research and engagement
- **Technical Obsolescence**: ADS-B protocols change or are supplemented → Modular design for adaptability
---
## Go-to-Market Strategy
### Phase 1: Security Research Community (Months 1-3)
- **Target**: Security researchers studying ADS-B vulnerabilities
- **Distribution**: GitHub open source release, security conference demos
- **Success**: 50+ GitHub stars, 5+ security research papers citing the tool
### Phase 2: Aviation Enthusiast Community (Months 4-6)
- **Target**: Aviation hobbyists and spotters
- **Distribution**: Aviation forums, Reddit communities, maker communities
- **Success**: 500+ installations, active community contributions
### Phase 3: Professional/Commercial (Months 7-12)
- **Target**: Airport security, aviation safety organizations
- **Distribution**: Professional networks, industry conferences
- **Success**: 5+ professional installations, possible commercial licensing
---
## Development Timeline
### Month 1: Foundation
- [ ] Core ADS-B ingestion and session management
- [ ] Basic temporal anomaly detection (Tier 1)
- [ ] Simple alerting system
- [ ] Minimal dashboard for monitoring
### Month 2: Detection Enhancement
- [ ] Signal-based anomaly detection (Tier 2)
- [ ] Identity pattern analysis (Tier 3)
- [ ] Alert confidence scoring and filtering
- [ ] Enhanced dashboard with investigation tools
### Month 3: Polish & Release
- [ ] Behavioral analysis for position data (Tier 4)
- [ ] Performance optimization and testing
- [ ] Documentation and deployment guides
- [ ] Open source release and community building
### Months 4-6: Community & Enhancement
- [ ] User feedback integration
- [ ] Additional anomaly detection patterns
- [ ] API development for integrations
- [ ] Multi-receiver support planning
---
## Theoretical Architecture
### Language Choice: Rust vs Python
#### Recommendation: **Rust**
**Why Rust over Python?**
| **Performance** | Native speed, zero-cost abstractions | Interpreted, GIL limitations |
| **Deployment** | Single static binary | Complex dependency management |
| **Memory Safety** | Compile-time guarantees | Runtime errors possible |
| **Concurrency** | Excellent async/await, no GIL | Limited by GIL, complex threading |
| **Resource Usage** | Low memory footprint | Higher memory overhead |
| **24/7 Reliability** | Memory safety prevents crashes | Runtime errors can crash service |
**ML Ecosystem in Rust** (Rapidly Improving):
- **linfa**: Scikit-learn equivalent for Rust
- **smartcore**: ML algorithms with good performance
- **candle**: PyTorch-like tensor operations
- **polars**: DataFrame operations (faster than pandas)
**Trade-offs**:
- **Learning Curve**: Rust is harder initially, but Doctor Biz is smart
- **Development Speed**: Slightly slower initially, much faster once proficient
- **Debugging**: Excellent tooling with `cargo` and built-in testing
### System Architecture: Single Binary Design
```
┌─────────────────────────────────────────────────────────────┐
│ adsb-anomaly (Single Rust Binary) │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
│ │ Ingestion │ │ Analysis │ │ Alert & Web │ │
│ │ Service │ │ Engine │ │ Dashboard │ │
│ │ │ │ │ │ │ │
│ │ • HTTP Poll │ │ • Multi-Tier │ │ • Axum Web Server │ │
│ │ • Validate │ │ ML Models │ │ • HTMX Frontend │ │
│ │ • Sessions │ │ • Async Proc │ │ • WebSocket Alerts │ │
│ └─────────────┘ └──────────────┘ └─────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ SQLite Database (adsb.db) │ │
│ │ • aircraft_messages • aircraft_sessions │ │
│ │ • anomaly_detections • system_metrics │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### Core Technical Decisions
#### 1. Single Binary Architecture
```rust
// main.rs - Single binary with async tasks
#[tokio::main]
async fn main() -> Result<()> {
// Initialize database and migrations
let db = Database::new("adsb.db").await?;
// Start concurrent services
let (tx, rx) = mpsc::channel(1000);
tokio::spawn(ingestion_service(db.clone(), tx));
tokio::spawn(analysis_service(db.clone(), rx));
tokio::spawn(web_service(db.clone(), 8080));
// Wait for shutdown signal
signal::ctrl_c().await?;
Ok(())
}
```
**Benefits**:
- **Zero Docker**: Just run `./adsb-anomaly --config config.toml`
- **Easy Deployment**: Single file to deploy
- **Resource Efficient**: Shared memory, no IPC overhead
- **Fast Development**: `cargo run` and you're running
#### 2. SQLite for Simplicity
```sql
-- Simple, embedded, reliable
CREATE TABLE aircraft_messages (
id INTEGER PRIMARY KEY,
ts INTEGER NOT NULL, -- Unix timestamp ms
hex TEXT NOT NULL, -- Aircraft ICAO code
flight TEXT, -- Callsign (nullable)
lat REAL, lon REAL, -- Position (often null)
altitude INTEGER, -- Feet (often null)
speed INTEGER, -- Knots (often null)
rssi REAL, -- Signal strength
raw_json TEXT -- Full message for debugging
);
CREATE INDEX idx_messages_ts_hex ON aircraft_messages(ts, hex);
CREATE INDEX idx_messages_hex_ts ON aircraft_messages(hex, ts);
CREATE TABLE aircraft_sessions (
id INTEGER PRIMARY KEY,
hex TEXT NOT NULL UNIQUE, -- One active session per aircraft
first_seen INTEGER NOT NULL,
last_seen INTEGER NOT NULL,
message_count INTEGER DEFAULT 1,
-- Data availability flags
has_position BOOLEAN DEFAULT FALSE,
has_altitude BOOLEAN DEFAULT FALSE,
has_callsign BOOLEAN DEFAULT FALSE,
-- Latest known values
flight TEXT,
lat REAL, lon REAL, altitude INTEGER, speed INTEGER,
-- Analysis tiers this session supports
tier_temporal BOOLEAN DEFAULT FALSE,
tier_signal BOOLEAN DEFAULT FALSE,
tier_identity BOOLEAN DEFAULT FALSE,
tier_behavioral BOOLEAN DEFAULT FALSE
);
CREATE TABLE anomaly_detections (
id INTEGER PRIMARY KEY,
ts INTEGER NOT NULL,
hex TEXT NOT NULL,
anomaly_type TEXT NOT NULL, -- "temporal", "signal", "identity", "behavioral"
confidence REAL NOT NULL, -- 0.0 to 1.0
details TEXT, -- JSON with specific anomaly info
reviewed BOOLEAN DEFAULT FALSE
);
```
**Why SQLite?**
- **Embedded**: No separate database server to manage
- **Reliable**: ACID transactions, WAL mode for concurrency
- **Fast**: Perfect for time-series data with proper indexing
- **Portable**: Database is just a file
- **Backup**: `cp adsb.db adsb_backup.db`
#### 3. Rust Async Architecture
```rust
// Three main async tasks communicating via channels
use tokio::{sync::mpsc, time::interval};
use sqlx::SqlitePool;
pub struct AircraftMessage {
pub hex: String,
pub ts: i64,
pub flight: Option<String>,
pub lat: Option<f64>,
pub lon: Option<f64>,
pub rssi: Option<f64>,
// ... other fields
}
pub struct AnomalyAlert {
pub hex: String,
pub anomaly_type: AnomalyType,
pub confidence: f64,
pub explanation: String,
pub timestamp: i64,
}
// Ingestion: Fetch from PiAware every second
async fn ingestion_service(
db: SqlitePool,
alert_tx: mpsc::Sender<AnomalyAlert>
) -> Result<()> {
let mut interval = interval(Duration::from_secs(1));
loop {
interval.tick().await;
// Fetch from PiAware
let messages = fetch_aircraft_json().await?;
// Store raw messages
store_messages(&db, &messages).await?;
// Update or create sessions
for msg in messages {
let session = upsert_session(&db, &msg).await?;
// Quick anomaly check (temporal analysis)
if let Some(alert) = check_temporal_anomaly(&session) {
let _ = alert_tx.send(alert).await;
}
}
}
}
// Analysis: Deeper ML analysis on batches
async fn analysis_service(
db: SqlitePool,
mut alert_rx: mpsc::Receiver<AnomalyAlert>
) -> Result<()> {
let mut interval = interval(Duration::from_secs(30));
loop {
tokio::select! {
_ = interval.tick() => {
// Run batch analysis every 30 seconds
run_batch_analysis(&db).await?;
}
Some(alert) = alert_rx.recv() => {
// Process real-time alerts
handle_alert(&db, alert).await?;
}
}
}
}
// Web: Simple dashboard with WebSocket alerts
async fn web_service(db: SqlitePool, port: u16) -> Result<()> {
use axum::{routing::get, Router};
let app = Router::new()
.route("/", get(dashboard_handler))
.route("/api/sessions", get(sessions_api))
.route("/api/anomalies", get(anomalies_api))
.route("/ws", get(websocket_handler))
.with_state(db);
let listener = tokio::net::TcpListener::bind(
format!("0.0.0.0:{}", port)
).await?;
axum::serve(listener, app).await?;
Ok(())
}
```
#### 4. Configuration Management
```toml
# config.toml - Simple TOML configuration
[adsb]
piaware_url = "http://192.168.1.100/dump1090-fa/data/aircraft.json"
poll_interval_ms = 1000
[database]
path = "adsb.db"
wal_mode = true
vacuum_interval_hours = 24
[analysis]
# Temporal anomaly thresholds
max_messages_per_second = 10.0
min_message_interval_ms = 50
max_session_gap_seconds = 600
# Signal anomaly thresholds
min_rssi_dbm = -120.0
max_rssi_dbm = -10.0
suspicious_rssi_dbm = -20.0
# Identity patterns (regex)
suspicious_callsigns = ["TEST.*", "FAKE.*", "ANOM.*"]
invalid_hex_patterns = ["000000", "FFFFFF", "AAAAAA"]
[alerts]
confidence_threshold = 0.7
max_alerts_per_hour = 100
webhook_url = "" # Optional Slack/Discord webhook
[web]
port = 8080
dashboard_title = "ADS-B Anomaly Monitor"
```
#### 5. ML Implementation Strategy
```rust
// Use Rust ML crates for core detection
use linfa::prelude::*;
use linfa_clustering::Dbscan;
use smartcore::ensemble::isolation_forest::IsolationForest;
pub struct MultiTierDetector {
temporal_model: Option<IsolationForest<f64>>,
signal_thresholds: SignalThresholds,
identity_patterns: IdentityPatterns,
}
impl MultiTierDetector {
pub async fn analyze_session(&mut self, session: &AircraftSession) -> Vec<AnomalyAlert> {
let mut alerts = Vec::new();
// Tier 1: Temporal Analysis
if session.message_count >= 3 {
if let Some(alert) = self.check_temporal_anomaly(session) {
alerts.push(alert);
}
}
// Tier 2: Signal Analysis
if let Some(rssi) = session.last_rssi {
if let Some(alert) = self.check_signal_anomaly(session, rssi) {
alerts.push(alert);
}
}
// Tier 3: Identity Analysis
if let Some(alert) = self.check_identity_anomaly(session) {
alerts.push(alert);
}
alerts
}
}
```
### Deployment & Operations
#### Single Binary Deployment
```bash
# Development
git clone https://github.com/user/adsb-anomaly
cd adsb-anomaly
cargo run --release
# Production deployment
cargo build --release
cp target/release/adsb-anomaly /usr/local/bin/
cp config.toml /etc/adsb-anomaly/
# Systemd service
sudo systemctl enable adsb-anomaly
sudo systemctl start adsb-anomaly
```
#### No Docker Required
```toml
# systemd service file: /etc/systemd/system/adsb-anomaly.service
[Unit]
Description=ADS-B Anomaly Detection System
After=network.target
[Service]
Type=simple
User=adsb
WorkingDirectory=/opt/adsb-anomaly
ExecStart=/usr/local/bin/adsb-anomaly --config /etc/adsb-anomaly/config.toml
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
```
#### Resource Requirements
- **Memory**: ~50MB base + ~10MB per 1000 active aircraft
- **CPU**: Single core sufficient for typical metropolitan area
- **Storage**: ~1GB/month for typical traffic (compacts automatically)
- **Network**: ~1KB/s from PiAware (minimal bandwidth)
### Development Workflow
#### Fast Iteration
```bash
# Terminal 1: Run with auto-reload
cargo watch -x 'run --release'
# Terminal 2: Test API
curl http://localhost:8080/api/sessions
# Terminal 3: Database inspection
sqlite3 adsb.db ".tables"
sqlite3 adsb.db "SELECT COUNT(*) FROM aircraft_sessions;"
```
#### Testing Strategy
```rust
// Built-in Rust testing
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn test_temporal_anomaly_detection() {
let session = create_rapid_fire_session();
let detector = MultiTierDetector::new();
let alerts = detector.analyze_session(&session).await;
assert_eq!(alerts.len(), 1);
assert_eq!(alerts[0].anomaly_type, AnomalyType::Temporal);
assert!(alerts[0].confidence > 0.8);
}
}
// Run tests
cargo test
```
### Why This Architecture Wins
1. **Simplicity**: Single binary, single database file, single config file
2. **Performance**: Rust native speed, async concurrency, efficient memory usage
3. **Reliability**: Memory safety, structured error handling, automatic restarts
4. **Maintainability**: Strong typing, excellent tooling, built-in testing
5. **Deployment**: `scp` the binary and run - no container orchestration needed
6. **Monitoring**: Built-in metrics, structured logging, embedded dashboard
**This architecture delivers maximum functionality with minimum operational complexity.**
---
**This system will be the first open-source, ML-powered ADS-B anomaly detection platform designed for real-world partial data scenarios. It fills a critical gap in aviation security and research tooling.**