Skip to main content

hexz_server/
lib.rs

1#![cfg_attr(test, allow(clippy::unwrap_used, clippy::expect_used, unused_results))]
2
3//! HTTP, NBD, and S3 gateway server implementations for exposing Hexz archives.
4//!
5//! This module provides network-facing interfaces for accessing compressed Hexz
6//! archive data over standard protocols. It supports three distinct serving modes:
7//!
8//! 1. **HTTP Range Server** (`serve_http`): Exposes disk and auxiliary streams via
9//!    HTTP 1.1 range requests with `DoS` protection and partial content support.
10//! 2. **NBD (Network Block Device) Server** (`serve_nbd`): Allows mounting archives
11//!    as Linux block devices using the standard NBD protocol.
12//! 3. **S3 Gateway** (`serve_s3_gateway`): Planned S3-compatible API for cloud
13//!    integration (currently unimplemented).
14//!
15//! # Architecture Overview
16//!
17//! All servers expose the same underlying `Archive` API, which provides:
18//! - Block-level decompression with LRU caching
19//! - Dual-stream access (disk and memory archives)
20//! - Random access with minimal I/O overhead
21//! - Thread-safe concurrent reads via `Arc<Archive>`
22//!
23//! The servers differ in protocol semantics and use cases:
24//!
25//! | Protocol | Use Case | Access Pattern | Authentication |
26//! |----------|----------|----------------|----------------|
27//! | HTTP     | Browser/API access | Range requests | None (planned) |
28//! | NBD      | Linux block device mount | Block-level reads | None |
29//! | S3       | Cloud integration | Object API | AWS `SigV4` (planned) |
30//!
31//! # Design Decisions
32//!
33//! ## Why HTTP Range Requests?
34//!
35//! HTTP range requests (RFC 7233) provide a standardized way to access large files
36//! in chunks without loading the entire file into memory. This aligns perfectly with
37//! Hexz's block-indexed architecture, allowing clients to fetch only the data they
38//! need. The implementation:
39//!
40//! - Returns HTTP 206 (Partial Content) for range requests
41//! - Returns HTTP 416 (Range Not Satisfiable) for invalid ranges
42//! - Clamps requests to `MAX_CHUNK_SIZE` (32 MiB) to prevent memory exhaustion
43//! - Supports both bounded (`bytes=0-1023`) and unbounded (`bytes=1024-`) ranges
44//!
45//! ## Why NBD Protocol?
46//!
47//! The Network Block Device protocol allows mounting remote storage as a local block
48//! device on Linux systems. This enables:
49//! - Transparent filesystem access (mount archive, browse files)
50//! - Use of standard Linux tools (`dd`, `fsck`, `mount`)
51//! - Zero application changes (existing software works unmodified)
52//!
53//! Trade-offs:
54//! - **Pro**: Native OS integration, no special client software required
55//! - **Pro**: Kernel handles caching and buffering
56//! - **Con**: No built-in encryption or authentication
57//! - **Con**: TCP-based, higher latency than local disk
58//!
59//! ## Security Architecture
60//!
61//! ### Current Security Posture (localhost-only)
62//!
63//! All servers bind to `127.0.0.1` (loopback) by default, preventing network exposure.
64//! This is appropriate for:
65//! - Local development and testing
66//! - Forensics workstations accessing local archives
67//! - Scenarios where network access is provided via SSH tunnels or VPNs
68//!
69//! ### Attack Surface
70//!
71//! The current implementation has a minimal attack surface:
72//! 1. **`DoS` via large reads**: Mitigated by `MAX_CHUNK_SIZE` clamping (32 MiB)
73//! 2. **Range header parsing**: Simplified parser with strict validation
74//! 3. **Connection exhaustion**: Limited by OS socket limits, no artificial cap
75//! 4. **Path traversal**: N/A (no filesystem access, only fixed `/disk` and `/memory` routes)
76//!
77//! ### Future Security Enhancements (Planned)
78//!
79//! - TLS/HTTPS support for encrypted transport
80//! - Token-based authentication (Bearer tokens)
81//! - Rate limiting per IP address
82//! - Configurable bind addresses (`0.0.0.0` for network access)
83//! - Request logging and audit trails
84//!
85//! # Performance Characteristics
86//!
87//! ## HTTP Server
88//!
89//! - **Throughput**: ~500-2000 MB/s (limited by decompression, not network)
90//! - **Latency**: ~1-5 ms per request (includes decompression)
91//! - **Concurrency**: Handles 1000+ concurrent connections (Tokio async runtime)
92//! - **Memory**: ~100 KB per connection + block cache overhead
93//!
94//! ## NBD Server
95//!
96//! - **Throughput**: ~500-1000 MB/s (similar to HTTP, plus NBD protocol overhead)
97//! - **Latency**: ~2-10 ms per block read (includes TCP RTT + decompression)
98//! - **Concurrency**: One Tokio task per client connection
99//!
100//! ## Bottlenecks
101//!
102//! For local (localhost) connections, the main bottleneck is:
103//! 1. **Decompression CPU time** (80% of latency for LZ4, more for ZSTD)
104//! 2. **Block cache misses** (requires backend I/O)
105//! 3. **Memory allocation** for large reads (mitigated by clamping)
106//!
107//! Network bandwidth is rarely a bottleneck for localhost connections.
108//!
109//! # Examples
110//!
111//! ## Starting an HTTP Server
112//!
113//! ```no_run
114//! use std::sync::Arc;
115//! use hexz_core::Archive;
116//! use hexz_store::local::FileBackend;
117//! use hexz_core::algo::compression::lz4::Lz4Compressor;
118//! use hexz_server::serve_http;
119//!
120//! # #[tokio::main]
121//! # async fn main() -> anyhow::Result<()> {
122//! let backend = Arc::new(FileBackend::new("archive.hxz".as_ref())?);
123//! let compressor = Box::new(Lz4Compressor::new());
124//! let snap = Archive::new(backend, compressor, None)?;
125//!
126//! // Start HTTP server on port 8080
127//! serve_http(snap, 8080, "127.0.0.1").await?;
128//! # Ok(())
129//! # }
130//! ```
131//!
132//! ## Starting an NBD Server
133//!
134//! ```no_run
135//! use std::sync::Arc;
136//! use hexz_core::Archive;
137//! use hexz_store::local::FileBackend;
138//! use hexz_core::algo::compression::lz4::Lz4Compressor;
139//! use hexz_server::serve_nbd;
140//!
141//! # #[tokio::main]
142//! # async fn main() -> anyhow::Result<()> {
143//! let backend = Arc::new(FileBackend::new("archive.hxz".as_ref())?);
144//! let compressor = Box::new(Lz4Compressor::new());
145//! let snap = Archive::new(backend, compressor, None)?;
146//!
147//! // Start NBD server on port 10809
148//! serve_nbd(snap, 10809, "127.0.0.1").await?;
149//! # Ok(())
150//! # }
151//! ```
152//!
153//! ## Client Usage Examples
154//!
155//! ### HTTP Client (curl)
156//!
157//! ```bash
158//! # Fetch the first 4KB of the main stream
159//! curl -H "Range: bytes=0-4095" http://localhost:8080/disk -o chunk.bin
160//!
161//! # Fetch 1MB starting at offset 1MB
162//! curl -H "Range: bytes=1048576-2097151" http://localhost:8080/memory -o mem_chunk.bin
163//!
164//! # Fetch from offset to EOF (server will clamp to MAX_CHUNK_SIZE)
165//! curl -H "Range: bytes=1048576-" http://localhost:8080/disk
166//! ```
167//!
168//! ### NBD Client (Linux)
169//!
170//! ```bash
171//! # Connect NBD client to server
172//! sudo nbd-client localhost 10809 /dev/nbd0
173//!
174//! # Mount the block device (read-only)
175//! sudo mount -o ro /dev/nbd0 /mnt/archive
176//!
177//! # Access files normally
178//! ls -la /mnt/archive
179//! cat /mnt/archive/important.log
180//!
181//! # Disconnect when done
182//! sudo umount /mnt/archive
183//! sudo nbd-client -d /dev/nbd0
184//! ```
185//!
186//! # Protocol References
187//!
188//! - **HTTP Range Requests**: [RFC 7233](https://tools.ietf.org/html/rfc7233)
189//! - **NBD Protocol**: [NBD Protocol Specification](https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md)
190//! - **S3 API**: [AWS S3 API Reference](https://docs.aws.amazon.com/s3/index.html) (future work)
191
192pub mod nbd;
193
194use axum::{
195    Router,
196    extract::State,
197    http::{HeaderMap, StatusCode, header},
198    response::{IntoResponse, Response},
199    routing::get,
200};
201use hexz_core::{Archive, ArchiveStream};
202use std::net::SocketAddr;
203use std::sync::Arc;
204use tokio::net::TcpListener;
205
206/// IPv4 address for all server listeners (localhost only).
207///
208/// # Security Rationale
209///
210/// This constant defaults to the loopback address (`127.0.0.1`) to prevent
211/// accidental exposure of archive data to the local network or internet.
212/// Archives may contain sensitive information (credentials, personal data,
213/// proprietary code), so network exposure must be an explicit, informed decision.
214///
215/// ## Current Behavior
216///
217/// All servers (HTTP, NBD, S3) bind to `127.0.0.1`, making them accessible only
218/// from the local machine. Remote access requires:
219/// - SSH port forwarding: `ssh -L 8080:localhost:8080 user@server`
220/// - VPN tunnel with local forwarding
221/// - Reverse proxy with authentication (e.g., nginx with TLS + basic auth)
222///
223/// ## Future Enhancement
224///
225/// To enable network access, a future version will support configurable bind
226/// addresses via command-line flags or configuration files:
227///
228/// ```bash
229/// # Proposed CLI syntax (not yet implemented)
230/// hexz-server --bind 0.0.0.0:8080 --auth-token mytoken123 archive.st
231/// ```
232///
233/// Network exposure will require authentication to be enabled (enforced by the CLI).
234/// Length in bytes of the HTTP `Range` header prefix `"bytes="`.
235///
236/// The HTTP Range header format is defined in RFC 7233 as:
237///
238/// ```text
239/// Range: bytes=<start>-<end>
240/// ```
241///
242/// This constant represents the length of the literal string `"bytes="` (6 bytes),
243/// which is stripped during parsing. The parser supports:
244///
245/// - Bounded ranges: `bytes=0-1023` (fetch bytes 0 through 1023 inclusive)
246/// - Unbounded ranges: `bytes=1024-` (fetch from byte 1024 to EOF)
247/// - Single-byte ranges: `bytes=0-0` (fetch only byte 0)
248///
249/// Unsupported range types (will return HTTP 416):
250/// - Suffix ranges: `bytes=-500` (last 500 bytes)
251/// - Multi-part ranges: `bytes=0-100,200-300`
252///
253/// # Rationale for Limited Support
254///
255/// Suffix ranges and multi-part ranges are rarely used in practice and add
256/// significant parsing complexity. If needed for browser compatibility, they
257/// can be added in a future version without breaking existing clients.
258const RANGE_PREFIX_LEN: usize = 6;
259
260/// Maximum allowed read size per HTTP request to prevent `DoS` attacks.
261///
262/// # Value
263///
264/// 32 MiB (33,554,432 bytes)
265///
266/// # `DoS` Protection Rationale
267///
268/// Without a limit, a malicious client could request the entire archive in a single
269/// HTTP request (e.g., `Range: bytes=0-`), forcing the server to:
270///
271/// 1. Decompress gigabytes of data
272/// 2. Allocate gigabytes of heap memory
273/// 3. Hold that memory while slowly transmitting over the network
274///
275/// With multiple concurrent requests, this could exhaust server memory and CPU,
276/// causing crashes or unresponsiveness (denial of service).
277///
278/// # Why 32 MiB?
279///
280/// This value balances throughput efficiency and resource protection:
281///
282/// - **Large enough**: Clients can fetch substantial chunks with low overhead
283///   (at 1 Gbps, 32 MiB transfers in ~256 ms)
284/// - **Small enough**: Even 100 concurrent maximal requests consume <3.2 GB RAM,
285///   which is manageable on modern servers
286/// - **Common practice**: Many HTTP servers use similar limits (nginx default: 16 MiB,
287///   AWS S3 max single GET: 5 GB but recommends <100 MB for performance)
288///
289/// # Clamping Behavior
290///
291/// When a client requests more than `MAX_CHUNK_SIZE` bytes:
292///
293/// 1. The server clamps the end offset: `end = min(end, start + MAX_CHUNK_SIZE - 1)`
294/// 2. Returns HTTP 206 with the clamped range in the `Content-Range` header
295/// 3. The client sees a short read and can issue follow-up requests
296///
297/// Example:
298///
299/// ```text
300/// Client request:  Range: bytes=0-67108863   (64 MiB)
301/// Server response: Content-Range: bytes 0-33554431/total  (32 MiB)
302/// ```
303///
304/// The client must check the `Content-Range` header to detect clamping.
305///
306/// # Future Work
307///
308/// This limit could be made configurable via CLI flags for scenarios where higher
309/// memory usage is acceptable (e.g., dedicated forensics servers with 128+ GB RAM).
310const MAX_CHUNK_SIZE: u64 = 32 * 1024 * 1024;
311
312/// Shared application state for the HTTP serving layer.
313///
314/// This struct is wrapped in `Arc` and cloned for each HTTP request handler.
315/// The inner `snap` field is also `Arc`-wrapped, so cloning `AppState` is cheap
316/// (just incrementing reference counts, no data copying).
317///
318/// # Thread Safety
319///
320/// `AppState` is `Send + Sync` because `Archive` is `Send + Sync`. The underlying
321/// block cache uses `Mutex` for interior mutability, so multiple concurrent requests
322/// can safely read from the same archive.
323///
324/// # Memory Overhead
325///
326/// Each `AppState` clone adds ~16 bytes (one `Arc` pointer). With 1000 concurrent
327/// connections, this overhead is negligible (~16 KB).
328struct AppState {
329    /// The opened Hexz archive file being served via HTTP.
330    ///
331    /// This is the same `Archive` instance for all requests. It contains:
332    /// - The storage backend (local file, S3, etc.)
333    /// - Block cache (shared across all requests)
334    /// - Decompressor instances (thread-local via pooling)
335    snap: Arc<Archive>,
336}
337
338/// Exposes a `Archive` over NBD (Network Block Device) protocol.
339///
340/// Starts a TCP listener on `127.0.0.1:<port>` that implements the NBD protocol,
341/// allowing Linux clients to mount the Hexz archive as a local block device
342/// using standard tools like `nbd-client`.
343///
344/// This function runs indefinitely, accepting connections in a loop. Each client
345/// connection is handled in a separate Tokio task, allowing concurrent clients.
346///
347/// # Arguments
348///
349/// - `snap`: The Hexz archive file to expose. Must be wrapped in `Arc` for sharing
350///   across multiple client connections.
351/// - `port`: TCP port to bind to on the loopback interface (e.g., `10809`).
352///
353/// # Returns
354///
355/// This function never returns under normal operation (it runs forever). It only
356/// returns `Err` if:
357/// - The TCP listener fails to bind (port already in use, permission denied)
358/// - An unrecoverable I/O error occurs on the listener socket
359///
360/// Individual client errors (malformed requests, disconnects) are logged but do not
361/// stop the server.
362///
363/// # Errors
364///
365/// - `std::io::Error`: If binding to the socket fails or the listener encounters
366///   a fatal error.
367///
368/// # Examples
369///
370/// ```no_run
371/// use std::sync::Arc;
372/// use hexz_core::Archive;
373/// use hexz_store::local::FileBackend;
374/// use hexz_core::algo::compression::lz4::Lz4Compressor;
375/// use hexz_server::serve_nbd;
376///
377/// # #[tokio::main]
378/// # async fn main() -> anyhow::Result<()> {
379/// let backend = Arc::new(FileBackend::new("vm_archive.hxz".as_ref())?);
380/// let compressor = Box::new(Lz4Compressor::new());
381/// let snap = Archive::new(backend, compressor, None)?;
382///
383/// // Start NBD server (runs forever)
384/// serve_nbd(snap, 10809, "127.0.0.1").await?;
385/// # Ok(())
386/// # }
387/// ```
388///
389/// ## Client-Side Usage (Linux)
390///
391/// ```bash
392/// # Connect to the NBD server
393/// sudo nbd-client localhost 10809 /dev/nbd0
394///
395/// # Mount the block device (read-only, automatically detected filesystem)
396/// sudo mount -o ro /dev/nbd0 /mnt/archive
397///
398/// # Browse files normally
399/// ls -la /mnt/archive
400/// sudo cat /mnt/archive/var/log/syslog
401///
402/// # Unmount and disconnect
403/// sudo umount /mnt/archive
404/// sudo nbd-client -d /dev/nbd0
405/// ```
406///
407/// # Security Considerations
408///
409/// ## No Encryption
410///
411/// The NBD protocol transmits data in plaintext. For localhost connections this
412/// is acceptable, but for remote access consider:
413///
414/// - **SSH tunnel**: `ssh -L 10809:localhost:10809 user@server`
415/// - **VPN**: `WireGuard`, `OpenVPN`, etc.
416/// - **TLS wrapper**: `stunnel` or similar
417///
418/// ## No Authentication
419///
420/// Any process with network access to the port can connect. The default loopback
421/// binding mitigates this, but if exposing to the network, use firewall rules or
422/// SSH key authentication.
423///
424/// ## Read-Only Enforcement
425///
426/// The NBD server always exports archives as read-only (NBD flag `NBD_FLAG_READ_ONLY`).
427/// Write attempts return `EPERM` (operation not permitted). However, a malicious
428/// NBD client could theoretically attempt to crash the server via protocol abuse.
429///
430/// # Performance Notes
431///
432/// - **Concurrency**: Each client spawns a separate Tokio task. With 100 concurrent
433///   clients, memory overhead is ~10 MB (100 KB per task).
434/// - **Throughput**: Typically 500-1000 MB/s for sequential reads, limited by
435///   decompression rather than NBD protocol overhead.
436/// - **Latency**: ~2-10 ms per read, including TCP round-trip and decompression.
437///
438/// # Panics
439///
440/// This function does not panic under normal operation. Client errors are logged
441/// and handled gracefully.
442pub async fn serve_nbd(snap: Arc<Archive>, port: u16, bind: &str) -> anyhow::Result<()> {
443    let addr: SocketAddr = format!("{bind}:{port}").parse()?;
444    let listener = TcpListener::bind(addr).await?;
445
446    tracing::info!("NBD server listening on {}", addr);
447    println!("NBD server started on {addr}. Use 'nbd-client localhost {port} /dev/nbd0' to mount.");
448
449    loop {
450        // Accept incoming NBD connections
451        let (socket, remote_addr) = match listener.accept().await {
452            Ok(conn) => conn,
453            Err(e) => {
454                tracing::warn!("NBD accept error (continuing): {}", e);
455                continue;
456            }
457        };
458        tracing::debug!("Accepted NBD connection from {}", remote_addr);
459
460        let snap_clone = snap.clone();
461        _ = tokio::spawn(async move {
462            if let Err(e) = nbd::handle_client(socket, snap_clone).await {
463                tracing::error!("NBD client error: {}", e);
464            }
465        });
466    }
467}
468
469/// Exposes a `Archive` as an S3-compatible object storage gateway.
470///
471/// # Implementation Status: NOT IMPLEMENTED
472///
473/// This function is a **placeholder** for future S3 API compatibility. It currently
474/// blocks forever without serving any requests. Calling this function will NOT panic,
475/// but it provides no useful functionality.
476///
477/// # Planned Functionality
478///
479/// When implemented, this gateway will provide S3-compatible HTTP endpoints for:
480///
481/// ## Supported Operations (Planned)
482///
483/// - `GET /<bucket>/<key>`: Retrieve archive data as an S3 object
484/// - `HEAD /<bucket>/<key>`: Get object metadata (size, `ETag`)
485/// - `GET /<bucket>/<key>?range=bytes=<start>-<end>`: Partial object retrieval
486/// - `GET /<bucket>?list-type=2`: List objects (future: multi-archive support)
487///
488/// ## S3 API Compatibility Goals
489///
490/// - **Authentication**: AWS Signature Version 4 (`SigV4`) for production use
491/// - **Authorization**: IAM-style policies (read-only by default)
492/// - **Error responses**: Standard S3 XML error responses
493/// - **Metadata**: `ETag` (CRC32 of archive header), Content-Type, Last-Modified
494///
495/// ## Mapping Hexz Concepts to S3
496///
497/// | Hexz Concept | S3 Equivalent | Mapping Strategy |
498/// |----------------|---------------|------------------|
499/// | Archive file | Bucket | One bucket per archive |
500/// | Main stream | Object `disk.img` | Virtual object, synthesized from archive |
501/// | Auxiliary stream | Object `memory.img` | Virtual object, synthesized from archive |
502/// | Block index | N/A | Transparent to S3 clients |
503///
504/// ## Example S3 API Usage (Planned)
505///
506/// ```bash
507/// # Configure AWS CLI to point to local S3 gateway
508/// export AWS_ACCESS_KEY_ID=minioadmin
509/// export AWS_SECRET_ACCESS_KEY=minioadmin
510/// export AWS_ENDPOINT_URL=http://localhost:9000
511///
512/// # List buckets (archives)
513/// aws s3 ls
514///
515/// # List objects in a archive
516/// aws s3 ls s3://my-archive/
517///
518/// # Download the main stream
519/// aws s3 cp s3://my-archive/disk.img disk_copy.img
520///
521/// # Download a range (100 MB starting at offset 1 GB)
522/// aws s3api get-object --bucket my-archive --key disk.img \
523///   --range bytes=1073741824-1178599423 chunk.bin
524/// ```
525///
526/// # Configuration (Planned)
527///
528/// Future configuration options (not yet implemented):
529///
530/// - **Bind address**: CLI flag `--s3-bind 0.0.0.0:9000` (default: `127.0.0.1`)
531/// - **Authentication**: `--s3-access-key` and `--s3-secret-key` for `SigV4`
532/// - **Bucket name**: `--s3-bucket-name <name>` (default: derived from archive filename)
533/// - **Anonymous access**: `--s3-allow-anonymous` flag (dangerous, for testing only)
534///
535/// # Why S3 Compatibility?
536///
537/// S3 is a de facto standard for object storage. Supporting the S3 API enables:
538///
539/// 1. **Cloud integration**: Use Hexz with existing cloud infrastructure (AWS, `MinIO`, etc.)
540/// 2. **Tool compatibility**: Any S3-compatible tool (s3cmd, rclone, boto3) works with Hexz
541/// 3. **Caching CDNs**: Front the gateway with `CloudFront` or similar for caching
542/// 4. **Lifecycle policies**: Future support for automated archive expiration
543///
544/// # Security Considerations (Planned)
545///
546/// When implemented, the S3 gateway will require authentication by default:
547///
548/// - **`SigV4` authentication**: All requests must include valid AWS Signature V4 headers
549/// - **Read-only mode**: No PUT/DELETE operations to prevent accidental modification
550/// - **Rate limiting**: Per-access-key request throttling to prevent abuse
551/// - **TLS requirement**: Production deployments must use HTTPS (enforced by CLI flag check)
552///
553/// # Performance Goals (Planned)
554///
555/// - **Throughput**: Match HTTP server performance (~500-2000 MB/s)
556/// - **Latency**: <10 ms for authenticated requests (signature verification adds ~1-2 ms)
557/// - **Concurrency**: Handle 1000+ concurrent S3 GET requests
558///
559/// # Limitations (Planned)
560///
561/// The S3 gateway will NOT support:
562///
563/// - **Write operations**: No PUT, POST, DELETE (archives are read-only)
564/// - **Multipart uploads**: N/A for read-only gateway
565/// - **Bucket policies**: Simplified IAM-like policies only
566/// - **Versioning**: Archives are immutable, no object versioning needed
567/// - **Server-side encryption**: Use TLS for transport encryption instead
568///
569/// # Arguments
570///
571/// - `_snap`: The Hexz archive to expose (currently unused).
572/// - `port`: TCP port to bind to on the loopback interface (e.g., `9000`).
573///
574/// # Returns
575///
576/// This function never returns (blocks indefinitely on `std::future::pending()`).
577/// It does not perform any useful work in the current implementation.
578///
579/// # Errors
580///
581/// Currently, this function cannot return an error (it blocks forever). In the
582/// future implementation, it will return errors for:
583///
584/// - Socket binding failures
585/// - Configuration validation errors
586/// - Unrecoverable I/O errors on the listener
587///
588/// # Examples
589///
590/// ```no_run
591/// use std::sync::Arc;
592/// use hexz_core::Archive;
593/// use hexz_store::local::FileBackend;
594/// use hexz_core::algo::compression::lz4::Lz4Compressor;
595/// use hexz_server::serve_s3_gateway;
596///
597/// # #[tokio::main]
598/// # async fn main() -> anyhow::Result<()> {
599/// let backend = Arc::new(FileBackend::new("archive.hxz".as_ref())?);
600/// let compressor = Box::new(Lz4Compressor::new());
601/// let snap = Archive::new(backend, compressor, None)?;
602///
603/// // WARNING: This will block forever without serving requests
604/// serve_s3_gateway(snap, 9000).await?;
605/// # Ok(())
606/// # }
607/// ```
608///
609/// # Implementation Roadmap
610///
611/// 1. **Phase 1**: Basic GET/HEAD operations with no authentication (localhost-only)
612/// 2. **Phase 2**: AWS `SigV4` authentication and bucket listing
613/// 3. **Phase 3**: Multi-archive support (multiple buckets)
614/// 4. **Phase 4**: TLS support and network binding options
615/// 5. **Phase 5**: IAM-style policies and access control
616///
617/// # Call for Contributions
618///
619/// Implementing S3 compatibility is a substantial undertaking. If you are interested
620/// in contributing, see `docs/s3_gateway_design.md` (to be created) for the design
621/// specification and implementation plan.
622#[deprecated(note = "Not implemented. Blocks indefinitely without serving requests.")]
623pub async fn serve_s3_gateway(_snap: Arc<Archive>, port: u16) -> anyhow::Result<()> {
624    tracing::info!("Starting S3 Gateway on port {}", port);
625    println!("S3 Gateway started on port {port} (Not fully implemented)");
626    std::future::pending::<()>().await; // Keep alive
627    unreachable!();
628}
629
630/// Exposes a `Archive` over HTTP with range request support.
631///
632/// Starts an HTTP 1.1 server on `127.0.0.1:<port>` that exposes archive data via
633/// two endpoints:
634///
635/// - `GET /disk`: Serves the main stream (persistent storage archive)
636/// - `GET /memory`: Serves the auxiliary stream (RAM archive)
637///
638/// Both endpoints support HTTP range requests (RFC 7233) for partial content retrieval.
639///
640/// # Protocol Behavior
641///
642/// ## Full Content Request (No Range Header)
643///
644/// ```http
645/// GET /disk HTTP/1.1
646/// Host: localhost:8080
647/// ```
648///
649/// Response:
650///
651/// ```http
652/// HTTP/1.1 206 Partial Content
653/// Content-Type: application/octet-stream
654/// Content-Range: bytes 0-33554431/10737418240
655/// Accept-Ranges: bytes
656///
657/// [First 32 MiB of data, clamped by MAX_CHUNK_SIZE]
658/// ```
659///
660/// Note: Even without a `Range` header, the response is clamped to `MAX_CHUNK_SIZE`
661/// and returns HTTP 206 (not 200) to indicate partial content.
662///
663/// ## Range Request (Partial Content)
664///
665/// ```http
666/// GET /memory HTTP/1.1
667/// Host: localhost:8080
668/// Range: bytes=1048576-2097151
669/// ```
670///
671/// Response (success):
672///
673/// ```http
674/// HTTP/1.1 206 Partial Content
675/// Content-Type: application/octet-stream
676/// Content-Range: bytes 1048576-2097151/8589934592
677/// Accept-Ranges: bytes
678///
679/// [1 MiB of data from offset 1048576]
680/// ```
681///
682/// Response (invalid range):
683///
684/// ```http
685/// HTTP/1.1 416 Range Not Satisfiable
686/// Content-Range: bytes */8589934592
687/// ```
688///
689/// ## Error Responses
690///
691/// - **416 Range Not Satisfiable**: Invalid range syntax or out-of-bounds request
692/// - **500 Internal Server Error**: Backend I/O failure or decompression error
693///
694/// # HTTP Range Request Limitations
695///
696/// ## Supported Range Types
697///
698/// - **Bounded ranges**: `bytes=<start>-<end>` (both offsets specified)
699/// - **Unbounded ranges**: `bytes=<start>-` (from start to EOF, clamped to `MAX_CHUNK_SIZE`)
700///
701/// ## Unsupported Range Types
702///
703/// These return HTTP 416 (Range Not Satisfiable):
704///
705/// - **Suffix ranges**: `bytes=-<suffix-length>` (e.g., `bytes=-1024` for last 1KB)
706/// - **Multi-part ranges**: `bytes=0-100,200-300` (multiple ranges in one request)
707///
708/// Rationale: These are rarely used and add significant implementation complexity.
709/// Standard range requests cover 99% of real-world use cases.
710///
711/// # `DoS` Protection Mechanisms
712///
713/// ## Request Size Clamping
714///
715/// All reads are clamped to `MAX_CHUNK_SIZE` (32 MiB) to prevent memory exhaustion:
716///
717/// ```text
718/// Client requests:  bytes=0-1073741823   (1 GB)
719/// Server clamps to: bytes=0-33554431     (32 MiB)
720/// Response header:  Content-Range: bytes 0-33554431/total
721/// ```
722///
723/// The client detects clamping by comparing the `Content-Range` header to the
724/// requested range and can issue follow-up requests for remaining data.
725///
726/// ## Connection Limits
727///
728/// The server relies on OS-level TCP connection limits (controlled by `ulimit -n`
729/// and kernel parameters). Tokio's async runtime handles thousands of concurrent
730/// connections efficiently (each connection consumes ~100 KB of memory).
731///
732/// For production deployments, consider:
733///
734/// - **Reverse proxy**: nginx or Caddy with connection limits and rate limiting
735/// - **Firewall rules**: Limit connections per IP address
736/// - **Resource limits**: Set `ulimit -n` to a reasonable value (e.g., 4096)
737///
738/// # Arguments
739///
740/// - `snap`: The Hexz archive file to expose. Must be wrapped in `Arc` for sharing
741///   across request handlers.
742/// - `port`: TCP port to bind to on the loopback interface (e.g., `8080`, `3000`).
743///
744/// # Returns
745///
746/// This function runs indefinitely, serving HTTP requests until the server is shut
747/// down (e.g., via Ctrl+C signal). It only returns `Err` if:
748///
749/// - The TCP listener fails to bind (port already in use, permission denied)
750/// - The HTTP server encounters a fatal error (should be extremely rare)
751///
752/// Individual request errors (invalid ranges, read failures) are handled gracefully
753/// and return appropriate HTTP error responses without stopping the server.
754///
755/// # Errors
756///
757/// - `std::io::Error`: If binding to the socket fails.
758/// - `anyhow::Error`: If the HTTP server encounters an unrecoverable error.
759///
760/// # Examples
761///
762/// ## Server Setup
763///
764/// ```no_run
765/// use std::sync::Arc;
766/// use hexz_core::Archive;
767/// use hexz_store::local::FileBackend;
768/// use hexz_core::algo::compression::lz4::Lz4Compressor;
769/// use hexz_server::serve_http;
770///
771/// # #[tokio::main]
772/// # async fn main() -> anyhow::Result<()> {
773/// let backend = Arc::new(FileBackend::new("archive.hxz".as_ref())?);
774/// let compressor = Box::new(Lz4Compressor::new());
775/// let snap = Archive::new(backend, compressor, None)?;
776///
777/// // Start HTTP server on port 8080 (runs forever)
778/// serve_http(snap, 8080, "127.0.0.1").await?;
779/// # Ok(())
780/// # }
781/// ```
782///
783/// ## Client Usage (curl)
784///
785/// ```bash
786/// # Fetch first 4KB of main stream
787/// curl -H "Range: bytes=0-4095" http://localhost:8080/disk -o chunk.bin
788///
789/// # Fetch 1MB starting at 1MB offset
790/// curl -H "Range: bytes=1048576-2097151" http://localhost:8080/memory -o mem_chunk.bin
791///
792/// # Fetch from offset to EOF (clamped to 32 MiB)
793/// curl -H "Range: bytes=1048576-" http://localhost:8080/disk -o large_chunk.bin
794///
795/// # Full GET (no range header, returns first 32 MiB)
796/// curl http://localhost:8080/disk -o first_32mb.bin
797/// ```
798///
799/// ## Client Usage (Python)
800///
801/// ```python
802/// import requests
803///
804/// # Fetch a range
805/// headers = {'Range': 'bytes=0-4095'}
806/// response = requests.get('http://localhost:8080/disk', headers=headers)
807/// assert response.status_code == 206  # Partial Content
808/// data = response.content
809/// print(f"Fetched {len(data)} bytes")
810///
811/// # Parse Content-Range header
812/// content_range = response.headers['Content-Range']
813/// # Example: "bytes 0-4095/10737418240"
814/// print(f"Content-Range: {content_range}")
815/// ```
816///
817/// # Performance Characteristics
818///
819/// ## Throughput
820///
821/// - **Local (127.0.0.1)**: 500-2000 MB/s (limited by decompression, not HTTP overhead)
822/// - **1 Gbps network**: ~120 MB/s (network-bound)
823/// - **10 Gbps network**: ~800 MB/s (may be decompression-bound for LZ4, network-bound for ZSTD)
824///
825/// ## Latency
826///
827/// - **Cache hit**: ~80μs (block already decompressed)
828/// - **Cache miss**: ~1-5 ms (includes decompression and backend I/O)
829/// - **Network RTT**: Add local RTT (~0.1 ms for localhost, ~10-50 ms for remote)
830///
831/// ## Memory Usage
832///
833/// - **Per connection**: ~100 KB (Tokio task stack + buffers)
834/// - **Per request**: ~32 MB worst-case (if requesting `MAX_CHUNK_SIZE`)
835/// - **Block cache**: Shared across all connections (typically 100-500 MB)
836///
837/// With 1000 concurrent connections, memory overhead is ~100 MB for connections
838/// plus the shared block cache.
839///
840/// # Security Considerations
841///
842/// ## Current Security Posture
843///
844/// - **Localhost-only**: Binds to `127.0.0.1`, not accessible from network
845/// - **No authentication**: Anyone with local access can read archive data
846/// - **No TLS**: Plaintext HTTP (acceptable for loopback)
847/// - **`DoS` protection**: Request size clamping, but no rate limiting
848///
849/// ## Threat Model
850///
851/// For localhost-only deployments, the threat model assumes:
852///
853/// 1. **Trusted local environment**: All local users are trusted (or isolated via OS permissions)
854/// 2. **No remote attackers**: Firewall prevents external access
855/// 3. **Process isolation**: Archive data is not more sensitive than other local files
856///
857/// ## Future Security Enhancements (Planned)
858///
859/// - **TLS/HTTPS**: Certificate-based encryption for network access
860/// - **Bearer token auth**: Simple token in `Authorization` header
861/// - **Rate limiting**: Per-IP request throttling
862/// - **Audit logging**: Request logs with client IP and byte ranges
863///
864/// # Panics
865///
866/// This function does not panic under normal operation. Request handling errors
867/// are converted to HTTP error responses.
868pub async fn serve_http(snap: Arc<Archive>, port: u16, bind: &str) -> anyhow::Result<()> {
869    let addr: SocketAddr = format!("{bind}:{port}").parse()?;
870    let listener = TcpListener::bind(addr).await?;
871    tracing::info!("HTTP server listening on {}", addr);
872    serve_http_with_listener(snap, listener).await
873}
874
875/// Like [`serve_http`], but accepts a pre-bound [`TcpListener`].
876///
877/// This avoids a TOCTOU race when the caller needs to discover a free port
878/// (bind to port 0) and then pass the listener directly instead of
879/// re-binding by port number.
880pub async fn serve_http_with_listener(
881    snap: Arc<Archive>,
882    listener: TcpListener,
883) -> anyhow::Result<()> {
884    let state = Arc::new(AppState { snap });
885
886    let app = Router::new()
887        .route("/disk", get(get_disk))
888        .route("/memory", get(get_memory))
889        .with_state(state);
890
891    axum::serve(listener, app).await?;
892    Ok(())
893}
894
895/// HTTP handler for the `/disk` endpoint.
896///
897/// Serves the main stream (persistent storage archive) from the Hexz file.
898/// Delegates to `handle_request` with `ArchiveStream::Main`.
899///
900/// # Route
901///
902/// `GET /disk`
903///
904/// # Request Headers
905///
906/// - `Range` (optional): HTTP range request (e.g., `bytes=0-4095`)
907///
908/// # Response Headers
909///
910/// - `Content-Type`: Always `application/octet-stream` (raw binary data)
911/// - `Content-Range`: Byte range served (e.g., `bytes 0-4095/10737418240`)
912/// - `Accept-Ranges`: Always `bytes` (indicates range request support)
913///
914/// # Response Status Codes
915///
916/// - **206 Partial Content**: Successful range request
917/// - **416 Range Not Satisfiable**: Invalid or out-of-bounds range
918/// - **500 Internal Server Error**: Archive read failure
919///
920/// # Examples
921///
922/// See `serve_http` for client usage examples.
923async fn get_disk(headers: HeaderMap, State(state): State<Arc<AppState>>) -> impl IntoResponse {
924    handle_request(&headers, &state.snap, ArchiveStream::Main)
925}
926
927/// HTTP handler for the `/memory` endpoint.
928///
929/// Serves the auxiliary stream (RAM archive) from the Hexz file.
930/// Delegates to `handle_request` with `ArchiveStream::Auxiliary`.
931///
932/// # Route
933///
934/// `GET /memory`
935///
936/// # Request Headers
937///
938/// - `Range` (optional): HTTP range request (e.g., `bytes=0-4095`)
939///
940/// # Response Headers
941///
942/// - `Content-Type`: Always `application/octet-stream` (raw binary data)
943/// - `Content-Range`: Byte range served (e.g., `bytes 0-4095/8589934592`)
944/// - `Accept-Ranges`: Always `bytes` (indicates range request support)
945///
946/// # Response Status Codes
947///
948/// - **206 Partial Content**: Successful range request
949/// - **416 Range Not Satisfiable**: Invalid or out-of-bounds range
950/// - **500 Internal Server Error**: Archive read failure
951///
952/// # Examples
953///
954/// See `serve_http` for client usage examples.
955async fn get_memory(headers: HeaderMap, State(state): State<Arc<AppState>>) -> impl IntoResponse {
956    handle_request(&headers, &state.snap, ArchiveStream::Auxiliary)
957}
958
959/// Core HTTP request handler that translates `Range` headers into archive reads.
960///
961/// This function implements the HTTP range request logic for both `/disk` and `/memory`
962/// endpoints. It performs the following steps:
963///
964/// 1. Parse the `Range` header (if present) or default to full stream access
965/// 2. Clamp the requested range to `MAX_CHUNK_SIZE` to prevent `DoS`
966/// 3. Read the data from the archive via `Archive::read_at`
967/// 4. Return HTTP 206 with `Content-Range` header, or error status codes
968///
969/// # Arguments
970///
971/// - `headers`: HTTP request headers from the client (parsed by Axum)
972/// - `snap`: The Hexz archive file to read from
973/// - `stream`: Which logical stream to read (`Disk` or `Memory`)
974///
975/// # Returns
976///
977/// An Axum `Response` with one of the following status codes:
978///
979/// - **206 Partial Content**: Successful read (even for full stream requests)
980/// - **416 Range Not Satisfiable**: Invalid range syntax or out-of-bounds offset
981/// - **500 Internal Server Error**: Archive read failure (decompression error, I/O error)
982///
983/// # HTTP Range Request Parsing
984///
985/// The `Range` header is expected in the format `bytes=<start>-<end>` where:
986///
987/// - `<start>` is the starting byte offset (inclusive, zero-indexed)
988/// - `<end>` is the ending byte offset (inclusive), or omitted for "to EOF"
989///
990/// ## Examples of Supported Ranges
991///
992/// ```text
993/// Range: bytes=0-1023         → Read bytes 0-1023 (1024 bytes)
994/// Range: bytes=1024-2047      → Read bytes 1024-2047 (1024 bytes)
995/// Range: bytes=1048576-       → Read from 1MB to EOF (clamped to MAX_CHUNK_SIZE)
996/// (no Range header)           → Read from start to EOF (clamped to MAX_CHUNK_SIZE)
997/// ```
998///
999/// ## Examples of Unsupported/Invalid Ranges
1000///
1001/// These return HTTP 416:
1002///
1003/// ```text
1004/// Range: bytes=-1024          → Suffix range (last 1024 bytes) - not supported
1005/// Range: bytes=0-100,200-300  → Multi-part range - not supported
1006/// Range: bytes=1000-500       → Start > end - invalid
1007/// Range: bytes=999999999999-  → Start beyond EOF - out of bounds
1008/// ```
1009///
1010/// # `DoS` Protection: Range Clamping Algorithm
1011///
1012/// To prevent a malicious client from requesting gigabytes of data in a single
1013/// request, the handler clamps the effective range:
1014///
1015/// ```text
1016/// requested_length = end - start + 1
1017/// if requested_length > MAX_CHUNK_SIZE:
1018///     end = start + MAX_CHUNK_SIZE - 1
1019///     if end >= total_size:
1020///         end = total_size - 1
1021/// ```
1022///
1023/// The clamped range is reflected in the `Content-Range` response header:
1024///
1025/// ```text
1026/// Content-Range: bytes <actual_start>-<actual_end>/<total_size>
1027/// ```
1028///
1029/// Clients must check this header to detect clamping and issue follow-up requests
1030/// for remaining data.
1031///
1032/// ## Clamping Example
1033///
1034/// ```text
1035/// Client request:    Range: bytes=0-67108863 (64 MiB)
1036/// Total size:        10 GB
1037/// Server clamps to:  0-33554431 (32 MiB due to MAX_CHUNK_SIZE)
1038/// Response header:   Content-Range: bytes 0-33554431/10737418240
1039/// ```
1040///
1041/// # Error Handling
1042///
1043/// ## Range Parsing Errors
1044///
1045/// If `parse_range` returns `None`, the handler returns HTTP 416 (Range Not
1046/// Satisfiable). This occurs when:
1047///
1048/// - The `Range` header does not start with `"bytes="`
1049/// - The start/end offsets are not valid integers
1050/// - The start offset is greater than the end offset
1051/// - The end offset is beyond the stream size
1052///
1053/// ## Archive Read Errors
1054///
1055/// If `snap.read_at` returns `Err(_)`, the handler returns HTTP 500 (Internal
1056/// Server Error). This occurs when:
1057///
1058/// - Decompression fails (corrupted compressed data)
1059/// - Backend I/O fails (disk error, network timeout for remote backends)
1060/// - Encryption decryption fails (incorrect key, corrupted ciphertext)
1061///
1062/// The specific error is not exposed to the client (only logged internally) to
1063/// avoid information leakage.
1064///
1065/// # Edge Cases
1066///
1067/// ## Empty Range
1068///
1069/// If the calculated range length is 0 (e.g., due to clamping at EOF), the handler
1070/// returns HTTP 416. This should be rare in practice since clients typically request
1071/// valid ranges.
1072///
1073/// ## Zero-Sized Stream
1074///
1075/// If the archive stream size is 0 (empty disk or memory archive), any range
1076/// request returns HTTP 416 because no valid offsets exist.
1077///
1078/// ## Single-Byte Range
1079///
1080/// A request like `bytes=0-0` (fetch only byte 0) is valid and returns 1 byte with
1081/// HTTP 206 and `Content-Range: bytes 0-0/<total>`.
1082///
1083/// # Performance Characteristics
1084///
1085/// - **No Range Header**: Clamps to `MAX_CHUNK_SIZE`, then performs one `read_at` call
1086/// - **Valid Range**: One `read_at` call (may hit block cache or require decompression)
1087/// - **Invalid Range**: Immediate return (no archive I/O)
1088///
1089/// For cache hits, latency is ~80μs. For cache misses, latency is ~1-5 ms depending
1090/// on backend speed and compression algorithm.
1091///
1092/// # Security Notes
1093///
1094/// - **No authentication**: This function does not check credentials (handled by
1095///   future middleware or reverse proxy)
1096/// - **`DoS` mitigation**: Request size clamping prevents memory exhaustion
1097/// - **Information leakage**: Error responses do not reveal internal details
1098///   (e.g., "decompression failed" is hidden behind HTTP 500)
1099///
1100/// # Examples
1101///
1102/// See `serve_http`, `get_disk`, and `get_memory` for usage context.
1103fn handle_request(headers: &HeaderMap, snap: &Arc<Archive>, stream: ArchiveStream) -> Response {
1104    let total_size = snap.size(stream);
1105
1106    let (start, mut end) = if let Some(range) = headers.get(header::RANGE) {
1107        match parse_range(range.to_str().unwrap_or(""), total_size) {
1108            Some(r) => r,
1109            None => return StatusCode::RANGE_NOT_SATISFIABLE.into_response(),
1110        }
1111    } else {
1112        (0, total_size.saturating_sub(1))
1113    };
1114
1115    // SECURITY: DoS Protection
1116    // Clamp the requested range to avoid huge memory allocations.
1117    if end - start + 1 > MAX_CHUNK_SIZE {
1118        end = start + MAX_CHUNK_SIZE - 1;
1119        // Ensure we don't go past EOF after clamping
1120        if end >= total_size {
1121            end = total_size.saturating_sub(1);
1122        }
1123    }
1124
1125    let len = (end - start + 1) as usize;
1126    if len == 0 {
1127        // Handle empty range edge case
1128        return StatusCode::RANGE_NOT_SATISFIABLE.into_response();
1129    }
1130
1131    match snap.read_at(stream, start, len) {
1132        Ok(data) => (
1133            StatusCode::PARTIAL_CONTENT,
1134            [
1135                (header::CONTENT_TYPE, "application/octet-stream"),
1136                (
1137                    header::CONTENT_RANGE,
1138                    &format!("bytes {start}-{end}/{total_size}"),
1139                ),
1140                (header::ACCEPT_RANGES, "bytes"),
1141            ],
1142            data,
1143        )
1144            .into_response(),
1145        Err(_) => StatusCode::INTERNAL_SERVER_ERROR.into_response(),
1146    }
1147}
1148
1149/// Parses an HTTP `Range` header into absolute byte offsets.
1150///
1151/// Implements a subset of HTTP range request syntax (RFC 7233), supporting only
1152/// simple byte ranges without multi-part or suffix ranges.
1153///
1154/// # Supported Syntax
1155///
1156/// - **Bounded range**: `bytes=<start>-<end>` (both offsets specified)
1157///   - Example: `bytes=0-1023` → Returns `(0, 1023)`
1158/// - **Unbounded range**: `bytes=<start>-` (from start to EOF)
1159///   - Example: `bytes=1024-` → Returns `(1024, size-1)`
1160///
1161/// # Unsupported Syntax
1162///
1163/// - **Suffix range**: `bytes=-<length>` (last N bytes)
1164///   - Example: `bytes=-1024` → Returns `None`
1165/// - **Multi-part range**: `bytes=0-100,200-300`
1166///   - Example: `bytes=0-100,200-300` → Returns `None`
1167///
1168/// These are rejected because:
1169/// 1. They are rarely used in practice (<1% of range requests)
1170/// 2. They add significant parsing and response generation complexity
1171/// 3. The HTTP 416 error response is acceptable for clients that need them
1172///
1173/// # Arguments
1174///
1175/// - `range`: The value of the `Range` header (e.g., `"bytes=0-1023"`)
1176/// - `size`: The total size of the stream in bytes (used to validate offsets)
1177///
1178/// # Returns
1179///
1180/// - `Some((start, end))`: Valid range with absolute byte offsets (both inclusive)
1181/// - `None`: Invalid syntax or out-of-bounds range
1182///
1183/// # Error Conditions
1184///
1185/// Returns `None` if:
1186///
1187/// 1. **Missing prefix**: Header does not start with `"bytes="`
1188///    - Example: `"items=0-100"` → Error
1189/// 2. **Invalid integer**: Start or end cannot be parsed as `u64`
1190///    - Example: `"bytes=abc-def"` → Error
1191/// 3. **Inverted range**: Start offset is greater than end offset
1192///    - Example: `"bytes=1000-500"` → Error
1193/// 4. **Out of bounds**: End offset is beyond the stream size
1194///    - Example: `"bytes=0-999999"` when size is 1000 → Error
1195///
1196/// # Parsing Algorithm
1197///
1198/// ```text
1199/// 1. Check for "bytes=" prefix (RANGE_PREFIX_LEN = 6)
1200/// 2. Split remaining string on '-' delimiter
1201/// 3. Parse start offset (parts[0])
1202/// 4. Parse end offset (parts[1] if present and non-empty, else size-1)
1203/// 5. Validate: start <= end && end < size
1204/// 6. Return (start, end)
1205/// ```
1206///
1207/// # Edge Cases
1208///
1209/// ## Empty String After Prefix
1210///
1211/// ```text
1212/// Range: bytes=
1213/// ```
1214///
1215/// Returns `None` because there is no start offset.
1216///
1217/// ## Single Byte Range
1218///
1219/// ```text
1220/// Range: bytes=0-0
1221/// ```
1222///
1223/// Returns `Some((0, 0))` (valid, requests exactly 1 byte).
1224///
1225/// ## Range at EOF
1226///
1227/// ```text
1228/// Range: bytes=0-999 (size = 1000)
1229/// ```
1230///
1231/// Returns `Some((0, 999))` (valid, end is inclusive and equals `size - 1`).
1232///
1233/// ## Range Beyond EOF
1234///
1235/// ```text
1236/// Range: bytes=0-1000 (size = 1000)
1237/// ```
1238///
1239/// Returns `None` because offset 1000 does not exist (valid range is 0-999).
1240///
1241/// # Examples
1242///
1243/// ```text
1244/// parse_range("bytes=0-1023", 10000)  -> Some((0, 1023))
1245/// parse_range("bytes=1024-", 10000)   -> Some((1024, 9999))
1246/// parse_range("0-1023", 10000)        -> None   // missing "bytes=" prefix
1247/// parse_range("bytes=0-10000", 10000) -> None   // out of bounds
1248/// parse_range("bytes=1000-500", 10000)-> None   // inverted range
1249/// ```
1250///
1251/// # Performance
1252///
1253/// - **Time complexity**: O(n) where n is the length of the range string (typically <20 chars)
1254/// - **Allocation**: One heap allocation for the `split('-')` iterator's internal state
1255/// - **Typical latency**: <1 μs (negligible compared to archive read latency)
1256///
1257/// # Security
1258///
1259/// This function is resilient to malicious input:
1260///
1261/// - **Integer overflow**: `u64::parse` rejects values >2^64-1
1262/// - **Unbounded length**: The `Range` header is bounded by HTTP header size limits
1263///   (typically 8 KB, enforced by the HTTP server)
1264/// - **No allocation attacks**: Uses only one small allocation for splitting
1265pub fn parse_range(range: &str, size: u64) -> Option<(u64, u64)> {
1266    if !range.starts_with("bytes=") {
1267        return None;
1268    }
1269    let parts: Vec<&str> = range[RANGE_PREFIX_LEN..].split('-').collect();
1270    let start = parts[0].parse::<u64>().ok()?;
1271    let end = if parts.len() > 1 && !parts[1].is_empty() {
1272        parts[1].parse::<u64>().ok()?
1273    } else {
1274        size.saturating_sub(1)
1275    };
1276    if start > end || end >= size {
1277        return None;
1278    }
1279    Some((start, end))
1280}