adsb-anomaly 0.2.2

# ADS-B Aircraft Anomaly Detection System
## Product Requirements Document (PRD)

---

## Vision Statement
**Build a robust, real-time system that monitors local aircraft activity via ADS-B and alerts on genuinely anomalous behavior using machine learning.**

---

## Problem Statement

### The Challenge
Aircraft in controlled airspace occasionally exhibit unusual behavior that may indicate:
- Equipment malfunctions
- Security threats (spoofing, jamming)
- Emergency situations
- Unauthorized or illegal activity
- Test/research aircraft operations

Traditional flight tracking focuses on position and altitude, but misses subtle behavioral anomalies in transmission patterns, signal characteristics, and identity management.

### Market Reality
- **ADS-B receivers** (like PiAware) are affordable and widely deployed
- **Data is inherently partial** - aircraft transmit different message types at different intervals
- **No system exists** that performs ML-based anomaly detection on real-world partial ADS-B data
- **Security community** needs tools to detect spoofing and injection attacks

---

## Success Criteria

### Primary Goals
1. **Detect 5-20 genuine anomalies per day** in a typical metropolitan area
2. **<20% false positive rate** to avoid alert fatigue
3. **<30 second detection latency** from anomalous behavior to alert
4. **90%+ aircraft coverage** - analyze the vast majority of detected aircraft

### Key Results
- **Threat Detection**: Catch spoofed aircraft, signal injection, equipment testing
- **Safety Monitoring**: Flag aircraft with equipment malfunctions or unusual patterns
- **Research Value**: Generate insights into local air traffic behavioral patterns
- **Operational Reliability**: 24/7 operation with minimal maintenance

---

## User Stories

### Primary User: Security Researcher
- "As a security researcher, I want to detect ADS-B spoofing attempts so I can study attack patterns"
- "I want to identify aircraft transmitting suspicious callsigns or hex codes"
- "I need to monitor for signal injection attacks near airports"

### Secondary User: Aviation Enthusiast
- "As an aviation enthusiast, I want to spot unusual aircraft in my area"
- "I want to identify military or government aircraft that might be interesting"
- "I want alerts when aircraft exhibit unusual transmission patterns"

### Tertiary User: Safety Monitor
- "As a safety monitor, I want to detect aircraft with equipment malfunctions"
- "I need to identify aircraft that might be in emergency situations"
- "I want to track aircraft that deviate from normal behavioral patterns"

---

## Core Functionality

### Essential Features (MVP)
1. **Real-time ADS-B ingestion** from PiAware or similar receivers
2. **Behavioral pattern analysis** using machine learning
3. **Multi-tier anomaly detection** working with partial/incomplete data
4. **Configurable alerting** with confidence scoring
5. **Web dashboard** for monitoring and investigation

### Important Features (V1.1)
1. **Historical analysis** - detect patterns over time
2. **Alert correlation** - link multiple anomalous behaviors from same aircraft
3. **Geographic filtering** - focus on specific areas of interest
4. **Export capabilities** - data export for external analysis
5. **API access** - programmatic access to alerts and data

### Nice-to-Have Features (V2.0)
1. **Multi-receiver support** - combine data from multiple ADS-B receivers
2. **Threat intelligence integration** - cross-reference with known bad actors
3. **Predictive analysis** - forecast unusual activity periods
4. **Mobile alerts** - push notifications to mobile devices
5. **Community sharing** - share anomaly patterns with other researchers

---

## Technical Architecture

### Data Sources
**Primary**: ADS-B receiver (PiAware) JSON feed
- Aircraft hex IDs (always available)
- Flight callsigns (intermittent)
- Position data (sporadic)
- Altitude/speed (rare)
- Signal strength/timing (usually available)

**Secondary**: External enrichment (optional)
- Aircraft registration databases
- Airline/operator information
- Airport reference data

### Detection Methodology

#### Tier 1: Temporal Analysis (High Coverage)
**Data Required**: Message timestamps, frequency
**Detects**: Transmission timing anomalies
- Rapid-fire transmission (test equipment)
- Irregular intervals (malfunctioning transponders)
- Burst patterns (signal generators)
- Long silence periods followed by activity

#### Tier 2: Signal Analysis (Medium Coverage)
**Data Required**: RSSI, signal characteristics
**Detects**: Signal-based anomalies
- Impossible signal strengths (physics violations)
- Unusually strong signals (nearby equipment/spoofing)
- Signal pattern inconsistencies
- Multi-path or interference signatures

#### Tier 3: Identity Analysis (Medium Coverage)
**Data Required**: Hex IDs, callsigns
**Detects**: Identity-based anomalies
- Suspicious callsign patterns (TEST123, FAKE001)
- Invalid hex code patterns (000000, FFFFFF)
- Identity switching (same aircraft, different callsigns)
- Inconsistent registration data

#### Tier 4: Behavioral Analysis (Low Coverage)
**Data Required**: Position, altitude, speed (when available)
**Detects**: Flight behavior anomalies
- Unusual flight paths or altitudes
- Impossible speed/altitude changes
- Geographic outliers
- Restricted area violations

### System Components

#### Ingestion Engine
- Real-time ADS-B data consumption
- Data validation and normalization
- Message deduplication
- Session management (group messages by aircraft)

#### Analysis Engine
- Multi-tier ML anomaly detection
- Real-time scoring and classification
- Confidence calculation
- Historical baseline learning

#### Alert Engine
- Configurable alerting thresholds
- Alert aggregation and deduplication
- Multiple notification channels
- Alert lifecycle management

#### Dashboard & API
- Real-time monitoring interface
- Historical analysis tools
- Alert investigation capabilities
- RESTful API for integration

### Technology Stack
- **Language**: Python (rapid development, rich ML ecosystem)
- **Database**: Time-series optimized (SQLite initially, PostgreSQL+TimescaleDB for scale)
- **ML Framework**: scikit-learn, pandas (proven, stable)
- **Web Framework**: FastAPI + React or Streamlit (depending on complexity needs)
- **Deployment**: Docker containers with docker-compose

---

## Anomaly Examples

### High-Confidence Anomalies (Auto-Alert)
- **Rapid Transmission**: Aircraft broadcasting >10 messages/second
- **Impossible Physics**: RSSI values outside -120 to -10 dBm range
- **Test Callsigns**: Aircraft with obviously fake identifiers
- **Hex Spoofing**: Common fake hex patterns or duplicates

### Medium-Confidence Anomalies (Review Queue)
- **Timing Irregularities**: Message intervals outside normal distribution
- **Strong Signals**: RSSI suggesting very close proximity to receiver
- **Identity Inconsistencies**: Callsign format violations or unusual patterns
- **Behavioral Outliers**: Flight patterns significantly different from baseline

### Low-Confidence Anomalies (Monitor)
- **Subtle Timing Variations**: Minor deviations from expected patterns
- **Signal Fluctuations**: Unusual RSSI variance or patterns
- **Geographic Anomalies**: Aircraft in less common but not impossible locations
- **Equipment Signatures**: Patterns suggesting specific transponder types or ages

---

## Performance Requirements

### Scalability
- **Throughput**: Handle 1000+ aircraft messages per second
- **Storage**: Retain 30 days of detailed data, 1 year of aggregated data
- **Analysis Speed**: Process incoming messages within 5 seconds
- **Concurrent Users**: Support 10+ simultaneous dashboard users

### Reliability
- **Uptime**: 99%+ availability (brief maintenance windows acceptable)
- **Data Integrity**: <0.1% message loss rate
- **False Positive Rate**: <20% of alerts should be genuine anomalies
- **Response Time**: Web dashboard responsive within 2 seconds

### Resource Usage
- **Memory**: Operate effectively on 4GB+ RAM systems
- **Storage**: <100GB storage for 30 days typical metropolitan area
- **CPU**: Efficient enough for Raspberry Pi 4 or equivalent
- **Network**: Minimal bandwidth usage for ADS-B ingestion

---

## Security & Privacy

### Security Considerations
- **Input Validation**: Robust validation of ADS-B data (untrusted source)
- **Alert Security**: Alerts themselves could be used for intelligence gathering
- **Access Control**: Dashboard and API should require authentication
- **Data Protection**: Local storage by default, optional remote backup

### Privacy Considerations
- **Aircraft Privacy**: Some aircraft operators may prefer not to be tracked
- **Data Retention**: Clear policies on how long to retain aircraft data
- **Sharing Controls**: User control over what data (if any) is shared externally
- **Anonymization**: Option to anonymize aircraft identifiers in stored data

---

## Success Metrics & KPIs

### Detection Effectiveness
- **True Positive Rate**: % of real anomalies successfully detected
- **False Positive Rate**: % of alerts that are false alarms
- **Detection Latency**: Average time from anomaly occurrence to alert
- **Coverage Rate**: % of detected aircraft that are analyzed for anomalies

### Operational Metrics
- **System Uptime**: % of time system is operational and processing data
- **Data Quality**: % of ingested messages that are valid and processed
- **User Engagement**: Dashboard usage, alert review rates, user retention
- **Performance**: Message processing rate, storage efficiency, response times

### Research Value
- **Anomaly Diversity**: Number of different anomaly types detected
- **Pattern Discovery**: New behavioral patterns discovered over time
- **Community Value**: Usage by security researchers and aviation enthusiasts
- **Data Export**: Volume and frequency of data exports for external research

---

## Risk Assessment

### Technical Risks
- **False Positive Flood**: Too many alerts overwhelm users → Conservative thresholds initially
- **Performance Bottlenecks**: ML processing can't keep up → Asynchronous processing design
- **Data Quality Issues**: Bad ADS-B data causes incorrect alerts → Input validation and sanity checking

### Operational Risks
- **Alert Fatigue**: Users ignore alerts due to too many false positives → Confidence scoring and tuning
- **Missed Threats**: Real anomalies go undetected → Multiple detection approaches, human oversight
- **Privacy Concerns**: Aircraft operators object to monitoring → Clear privacy policies and opt-outs

### Business Risks
- **Limited Interest**: Fewer users than expected → Focus on core security researcher community first
- **Regulatory Issues**: Aviation authorities restrict or regulate the system → Compliance research and engagement
- **Technical Obsolescence**: ADS-B protocols change or are supplemented → Modular design for adaptability

---

## Go-to-Market Strategy

### Phase 1: Security Research Community (Months 1-3)
- **Target**: Security researchers studying ADS-B vulnerabilities
- **Distribution**: GitHub open source release, security conference demos
- **Success**: 50+ GitHub stars, 5+ security research papers citing the tool

### Phase 2: Aviation Enthusiast Community (Months 4-6)
- **Target**: Aviation hobbyists and spotters
- **Distribution**: Aviation forums, Reddit communities, maker communities
- **Success**: 500+ installations, active community contributions

### Phase 3: Professional/Commercial (Months 7-12)
- **Target**: Airport security, aviation safety organizations
- **Distribution**: Professional networks, industry conferences
- **Success**: 5+ professional installations, possible commercial licensing

---

## Development Timeline

### Month 1: Foundation
- [ ] Core ADS-B ingestion and session management
- [ ] Basic temporal anomaly detection (Tier 1)
- [ ] Simple alerting system
- [ ] Minimal dashboard for monitoring

### Month 2: Detection Enhancement
- [ ] Signal-based anomaly detection (Tier 2)
- [ ] Identity pattern analysis (Tier 3)
- [ ] Alert confidence scoring and filtering
- [ ] Enhanced dashboard with investigation tools

### Month 3: Polish & Release
- [ ] Behavioral analysis for position data (Tier 4)
- [ ] Performance optimization and testing
- [ ] Documentation and deployment guides
- [ ] Open source release and community building

### Months 4-6: Community & Enhancement
- [ ] User feedback integration
- [ ] Additional anomaly detection patterns
- [ ] API development for integrations
- [ ] Multi-receiver support planning

---

## Theoretical Architecture

### Language Choice: Rust vs Python

#### Recommendation: **Rust**

**Why Rust over Python?**

| Factor | Rust ✅ | Python ❌ |
|--------|---------|-----------|
| **Performance** | Native speed, zero-cost abstractions | Interpreted, GIL limitations |
| **Deployment** | Single static binary | Complex dependency management |
| **Memory Safety** | Compile-time guarantees | Runtime errors possible |
| **Concurrency** | Excellent async/await, no GIL | Limited by GIL, complex threading |
| **Resource Usage** | Low memory footprint | Higher memory overhead |
| **24/7 Reliability** | Memory safety prevents crashes | Runtime errors can crash service |

**ML Ecosystem in Rust** (Rapidly Improving):
- **linfa**: Scikit-learn equivalent for Rust
- **smartcore**: ML algorithms with good performance
- **candle**: PyTorch-like tensor operations
- **polars**: DataFrame operations (faster than pandas)

**Trade-offs**:
- **Learning Curve**: Rust is harder initially, but Doctor Biz is smart
- **Development Speed**: Slightly slower initially, much faster once proficient
- **Debugging**: Excellent tooling with `cargo` and built-in testing

### System Architecture: Single Binary Design

```
┌─────────────────────────────────────────────────────────────┐
│                    adsb-anomaly (Single Rust Binary)        │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌──────────────┐  ┌─────────────────────┐ │
│  │   Ingestion │  │   Analysis   │  │   Alert & Web       │ │
│  │   Service   │  │   Engine     │  │   Dashboard         │ │
│  │             │  │              │  │                     │ │
│  │ • HTTP Poll │  │ • Multi-Tier │  │ • Axum Web Server   │ │
│  │ • Validate  │  │   ML Models  │  │ • HTMX Frontend     │ │
│  │ • Sessions  │  │ • Async Proc │  │ • WebSocket Alerts  │ │
│  └─────────────┘  └──────────────┘  └─────────────────────┘ │
│                                                             │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │              SQLite Database (adsb.db)                  │ │
│  │  • aircraft_messages    • aircraft_sessions             │ │
│  │  • anomaly_detections  • system_metrics                │ │
│  └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```

### Core Technical Decisions

#### 1. Single Binary Architecture
```rust
// main.rs - Single binary with async tasks
#[tokio::main]
async fn main() -> Result<()> {
    // Initialize database and migrations
    let db = Database::new("adsb.db").await?;

    // Start concurrent services
    let (tx, rx) = mpsc::channel(1000);

    tokio::spawn(ingestion_service(db.clone(), tx));
    tokio::spawn(analysis_service(db.clone(), rx));
    tokio::spawn(web_service(db.clone(), 8080));

    // Wait for shutdown signal
    signal::ctrl_c().await?;
    Ok(())
}
```

**Benefits**:
- **Zero Docker**: Just run `./adsb-anomaly --config config.toml`
- **Easy Deployment**: Single file to deploy
- **Resource Efficient**: Shared memory, no IPC overhead
- **Fast Development**: `cargo run` and you're running

#### 2. SQLite for Simplicity
```sql
-- Simple, embedded, reliable
CREATE TABLE aircraft_messages (
    id INTEGER PRIMARY KEY,
    ts INTEGER NOT NULL,           -- Unix timestamp ms
    hex TEXT NOT NULL,             -- Aircraft ICAO code
    flight TEXT,                   -- Callsign (nullable)
    lat REAL, lon REAL,           -- Position (often null)
    altitude INTEGER,              -- Feet (often null)
    speed INTEGER,                 -- Knots (often null)
    rssi REAL,                     -- Signal strength
    raw_json TEXT                  -- Full message for debugging
);

CREATE INDEX idx_messages_ts_hex ON aircraft_messages(ts, hex);
CREATE INDEX idx_messages_hex_ts ON aircraft_messages(hex, ts);

CREATE TABLE aircraft_sessions (
    id INTEGER PRIMARY KEY,
    hex TEXT NOT NULL UNIQUE,      -- One active session per aircraft
    first_seen INTEGER NOT NULL,
    last_seen INTEGER NOT NULL,
    message_count INTEGER DEFAULT 1,
    -- Data availability flags
    has_position BOOLEAN DEFAULT FALSE,
    has_altitude BOOLEAN DEFAULT FALSE,
    has_callsign BOOLEAN DEFAULT FALSE,
    -- Latest known values
    flight TEXT,
    lat REAL, lon REAL, altitude INTEGER, speed INTEGER,
    -- Analysis tiers this session supports
    tier_temporal BOOLEAN DEFAULT FALSE,
    tier_signal BOOLEAN DEFAULT FALSE,
    tier_identity BOOLEAN DEFAULT FALSE,
    tier_behavioral BOOLEAN DEFAULT FALSE
);

CREATE TABLE anomaly_detections (
    id INTEGER PRIMARY KEY,
    ts INTEGER NOT NULL,
    hex TEXT NOT NULL,
    anomaly_type TEXT NOT NULL,    -- "temporal", "signal", "identity", "behavioral"
    confidence REAL NOT NULL,      -- 0.0 to 1.0
    details TEXT,                  -- JSON with specific anomaly info
    reviewed BOOLEAN DEFAULT FALSE
);
```

**Why SQLite?**
- **Embedded**: No separate database server to manage
- **Reliable**: ACID transactions, WAL mode for concurrency
- **Fast**: Perfect for time-series data with proper indexing
- **Portable**: Database is just a file
- **Backup**: `cp adsb.db adsb_backup.db`

#### 3. Rust Async Architecture
```rust
// Three main async tasks communicating via channels
use tokio::{sync::mpsc, time::interval};
use sqlx::SqlitePool;

pub struct AircraftMessage {
    pub hex: String,
    pub ts: i64,
    pub flight: Option<String>,
    pub lat: Option<f64>,
    pub lon: Option<f64>,
    pub rssi: Option<f64>,
    // ... other fields
}

pub struct AnomalyAlert {
    pub hex: String,
    pub anomaly_type: AnomalyType,
    pub confidence: f64,
    pub explanation: String,
    pub timestamp: i64,
}

// Ingestion: Fetch from PiAware every second
async fn ingestion_service(
    db: SqlitePool,
    alert_tx: mpsc::Sender<AnomalyAlert>
) -> Result<()> {
    let mut interval = interval(Duration::from_secs(1));

    loop {
        interval.tick().await;

        // Fetch from PiAware
        let messages = fetch_aircraft_json().await?;

        // Store raw messages
        store_messages(&db, &messages).await?;

        // Update or create sessions
        for msg in messages {
            let session = upsert_session(&db, &msg).await?;

            // Quick anomaly check (temporal analysis)
            if let Some(alert) = check_temporal_anomaly(&session) {
                let _ = alert_tx.send(alert).await;
            }
        }
    }
}

// Analysis: Deeper ML analysis on batches
async fn analysis_service(
    db: SqlitePool,
    mut alert_rx: mpsc::Receiver<AnomalyAlert>
) -> Result<()> {
    let mut interval = interval(Duration::from_secs(30));

    loop {
        tokio::select! {
            _ = interval.tick() => {
                // Run batch analysis every 30 seconds
                run_batch_analysis(&db).await?;
            }
            Some(alert) = alert_rx.recv() => {
                // Process real-time alerts
                handle_alert(&db, alert).await?;
            }
        }
    }
}

// Web: Simple dashboard with WebSocket alerts
async fn web_service(db: SqlitePool, port: u16) -> Result<()> {
    use axum::{routing::get, Router};

    let app = Router::new()
        .route("/", get(dashboard_handler))
        .route("/api/sessions", get(sessions_api))
        .route("/api/anomalies", get(anomalies_api))
        .route("/ws", get(websocket_handler))
        .with_state(db);

    let listener = tokio::net::TcpListener::bind(
        format!("0.0.0.0:{}", port)
    ).await?;

    axum::serve(listener, app).await?;
    Ok(())
}
```

#### 4. Configuration Management
```toml
# config.toml - Simple TOML configuration
[adsb]
piaware_url = "http://192.168.1.100/dump1090-fa/data/aircraft.json"
poll_interval_ms = 1000

[database]
path = "adsb.db"
wal_mode = true
vacuum_interval_hours = 24

[analysis]
# Temporal anomaly thresholds
max_messages_per_second = 10.0
min_message_interval_ms = 50
max_session_gap_seconds = 600

# Signal anomaly thresholds
min_rssi_dbm = -120.0
max_rssi_dbm = -10.0
suspicious_rssi_dbm = -20.0

# Identity patterns (regex)
suspicious_callsigns = ["TEST.*", "FAKE.*", "ANOM.*"]
invalid_hex_patterns = ["000000", "FFFFFF", "AAAAAA"]

[alerts]
confidence_threshold = 0.7
max_alerts_per_hour = 100
webhook_url = ""  # Optional Slack/Discord webhook

[web]
port = 8080
dashboard_title = "ADS-B Anomaly Monitor"
```

#### 5. ML Implementation Strategy
```rust
// Use Rust ML crates for core detection
use linfa::prelude::*;
use linfa_clustering::Dbscan;
use smartcore::ensemble::isolation_forest::IsolationForest;

pub struct MultiTierDetector {
    temporal_model: Option<IsolationForest<f64>>,
    signal_thresholds: SignalThresholds,
    identity_patterns: IdentityPatterns,
}

impl MultiTierDetector {
    pub async fn analyze_session(&mut self, session: &AircraftSession) -> Vec<AnomalyAlert> {
        let mut alerts = Vec::new();

        // Tier 1: Temporal Analysis
        if session.message_count >= 3 {
            if let Some(alert) = self.check_temporal_anomaly(session) {
                alerts.push(alert);
            }
        }

        // Tier 2: Signal Analysis
        if let Some(rssi) = session.last_rssi {
            if let Some(alert) = self.check_signal_anomaly(session, rssi) {
                alerts.push(alert);
            }
        }

        // Tier 3: Identity Analysis
        if let Some(alert) = self.check_identity_anomaly(session) {
            alerts.push(alert);
        }

        alerts
    }
}
```

### Deployment & Operations

#### Single Binary Deployment
```bash
# Development
git clone https://github.com/user/adsb-anomaly
cd adsb-anomaly
cargo run --release

# Production deployment
cargo build --release
cp target/release/adsb-anomaly /usr/local/bin/
cp config.toml /etc/adsb-anomaly/

# Systemd service
sudo systemctl enable adsb-anomaly
sudo systemctl start adsb-anomaly
```

#### No Docker Required
```toml
# systemd service file: /etc/systemd/system/adsb-anomaly.service
[Unit]
Description=ADS-B Anomaly Detection System
After=network.target

[Service]
Type=simple
User=adsb
WorkingDirectory=/opt/adsb-anomaly
ExecStart=/usr/local/bin/adsb-anomaly --config /etc/adsb-anomaly/config.toml
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
```

#### Resource Requirements
- **Memory**: ~50MB base + ~10MB per 1000 active aircraft
- **CPU**: Single core sufficient for typical metropolitan area
- **Storage**: ~1GB/month for typical traffic (compacts automatically)
- **Network**: ~1KB/s from PiAware (minimal bandwidth)

### Development Workflow

#### Fast Iteration
```bash
# Terminal 1: Run with auto-reload
cargo watch -x 'run --release'

# Terminal 2: Test API
curl http://localhost:8080/api/sessions

# Terminal 3: Database inspection
sqlite3 adsb.db ".tables"
sqlite3 adsb.db "SELECT COUNT(*) FROM aircraft_sessions;"
```

#### Testing Strategy
```rust
// Built-in Rust testing
#[cfg(test)]
mod tests {
    use super::*;

    #[tokio::test]
    async fn test_temporal_anomaly_detection() {
        let session = create_rapid_fire_session();
        let detector = MultiTierDetector::new();
        let alerts = detector.analyze_session(&session).await;

        assert_eq!(alerts.len(), 1);
        assert_eq!(alerts[0].anomaly_type, AnomalyType::Temporal);
        assert!(alerts[0].confidence > 0.8);
    }
}

// Run tests
cargo test
```

### Why This Architecture Wins

1. **Simplicity**: Single binary, single database file, single config file
2. **Performance**: Rust native speed, async concurrency, efficient memory usage
3. **Reliability**: Memory safety, structured error handling, automatic restarts
4. **Maintainability**: Strong typing, excellent tooling, built-in testing
5. **Deployment**: `scp` the binary and run - no container orchestration needed
6. **Monitoring**: Built-in metrics, structured logging, embedded dashboard

**This architecture delivers maximum functionality with minimum operational complexity.**

---

**This system will be the first open-source, ML-powered ADS-B anomaly detection platform designed for real-world partial data scenarios. It fills a critical gap in aviation security and research tooling.**