hexz_server/lib.rs
1//! HTTP, NBD, and S3 gateway server implementations for exposing Hexz snapshots.
2//!
3//! This module provides network-facing interfaces for accessing compressed Hexz
4//! snapshot data over standard protocols. It supports three distinct serving modes:
5//!
6//! 1. **HTTP Range Server** (`serve_http`): Exposes disk and memory streams via
7//! HTTP 1.1 range requests with DoS protection and partial content support.
8//! 2. **NBD (Network Block Device) Server** (`serve_nbd`): Allows mounting snapshots
9//! as Linux block devices using the standard NBD protocol.
10//! 3. **S3 Gateway** (`serve_s3_gateway`): Planned S3-compatible API for cloud
11//! integration (currently unimplemented).
12//!
13//! # Architecture Overview
14//!
15//! All servers expose the same underlying `File` API, which provides:
16//! - Block-level decompression with LRU caching
17//! - Dual-stream access (disk and memory snapshots)
18//! - Random access with minimal I/O overhead
19//! - Thread-safe concurrent reads via `Arc<File>`
20//!
21//! The servers differ in protocol semantics and use cases:
22//!
23//! | Protocol | Use Case | Access Pattern | Authentication |
24//! |----------|----------|----------------|----------------|
25//! | HTTP | Browser/API access | Range requests | None (planned) |
26//! | NBD | Linux block device mount | Block-level reads | None |
27//! | S3 | Cloud integration | Object API | AWS SigV4 (planned) |
28//!
29//! # Design Decisions
30//!
31//! ## Why HTTP Range Requests?
32//!
33//! HTTP range requests (RFC 7233) provide a standardized way to access large files
34//! in chunks without loading the entire file into memory. This aligns perfectly with
35//! Hexz's block-indexed architecture, allowing clients to fetch only the data they
36//! need. The implementation:
37//!
38//! - Returns HTTP 206 (Partial Content) for range requests
39//! - Returns HTTP 416 (Range Not Satisfiable) for invalid ranges
40//! - Clamps requests to `MAX_CHUNK_SIZE` (32 MiB) to prevent memory exhaustion
41//! - Supports both bounded (`bytes=0-1023`) and unbounded (`bytes=1024-`) ranges
42//!
43//! ## Why NBD Protocol?
44//!
45//! The Network Block Device protocol allows mounting remote storage as a local block
46//! device on Linux systems. This enables:
47//! - Transparent filesystem access (mount snapshot, browse files)
48//! - Use of standard Linux tools (`dd`, `fsck`, `mount`)
49//! - Zero application changes (existing software works unmodified)
50//!
51//! Trade-offs:
52//! - **Pro**: Native OS integration, no special client software required
53//! - **Pro**: Kernel handles caching and buffering
54//! - **Con**: No built-in encryption or authentication
55//! - **Con**: TCP-based, higher latency than local disk
56//!
57//! ## Security Architecture
58//!
59//! ### Current Security Posture (localhost-only)
60//!
61//! All servers bind to `127.0.0.1` (loopback) by default, preventing network exposure.
62//! This is appropriate for:
63//! - Local development and testing
64//! - Forensics workstations accessing local snapshots
65//! - Scenarios where network access is provided via SSH tunnels or VPNs
66//!
67//! ### Attack Surface
68//!
69//! The current implementation has a minimal attack surface:
70//! 1. **DoS via large reads**: Mitigated by `MAX_CHUNK_SIZE` clamping (32 MiB)
71//! 2. **Range header parsing**: Simplified parser with strict validation
72//! 3. **Connection exhaustion**: Limited by OS socket limits, no artificial cap
73//! 4. **Path traversal**: N/A (no filesystem access, only fixed `/disk` and `/memory` routes)
74//!
75//! ### Future Security Enhancements (Planned)
76//!
77//! - TLS/HTTPS support for encrypted transport
78//! - Token-based authentication (Bearer tokens)
79//! - Rate limiting per IP address
80//! - Configurable bind addresses (`0.0.0.0` for network access)
81//! - Request logging and audit trails
82//!
83//! # Performance Characteristics
84//!
85//! ## HTTP Server
86//!
87//! - **Throughput**: ~500-2000 MB/s (limited by decompression, not network)
88//! - **Latency**: ~1-5 ms per request (includes decompression)
89//! - **Concurrency**: Handles 1000+ concurrent connections (Tokio async runtime)
90//! - **Memory**: ~100 KB per connection + block cache overhead
91//!
92//! ## NBD Server
93//!
94//! - **Throughput**: ~500-1000 MB/s (similar to HTTP, plus NBD protocol overhead)
95//! - **Latency**: ~2-10 ms per block read (includes TCP RTT + decompression)
96//! - **Concurrency**: One Tokio task per client connection
97//!
98//! ## Bottlenecks
99//!
100//! For local (localhost) connections, the primary bottleneck is:
101//! 1. **Decompression CPU time** (80% of latency for LZ4, more for ZSTD)
102//! 2. **Block cache misses** (requires backend I/O)
103//! 3. **Memory allocation** for large reads (mitigated by clamping)
104//!
105//! Network bandwidth is rarely a bottleneck for localhost connections.
106//!
107//! # Examples
108//!
109//! ## Starting an HTTP Server
110//!
111//! ```no_run
112//! use std::sync::Arc;
113//! use hexz_core::File;
114//! use hexz_core::store::local::FileBackend;
115//! use hexz_core::algo::compression::lz4::Lz4Compressor;
116//! use hexz_server::serve_http;
117//!
118//! # #[tokio::main]
119//! # async fn main() -> anyhow::Result<()> {
120//! let backend = Arc::new(FileBackend::new("snapshot.hxz".as_ref())?);
121//! let compressor = Box::new(Lz4Compressor::new());
122//! let snap = File::new(backend, compressor, None)?;
123//!
124//! // Start HTTP server on port 8080
125//! serve_http(snap, 8080).await?;
126//! # Ok(())
127//! # }
128//! ```
129//!
130//! ## Starting an NBD Server
131//!
132//! ```no_run
133//! use std::sync::Arc;
134//! use hexz_core::File;
135//! use hexz_core::store::local::FileBackend;
136//! use hexz_core::algo::compression::lz4::Lz4Compressor;
137//! use hexz_server::serve_nbd;
138//!
139//! # #[tokio::main]
140//! # async fn main() -> anyhow::Result<()> {
141//! let backend = Arc::new(FileBackend::new("snapshot.hxz".as_ref())?);
142//! let compressor = Box::new(Lz4Compressor::new());
143//! let snap = File::new(backend, compressor, None)?;
144//!
145//! // Start NBD server on port 10809
146//! serve_nbd(snap, 10809).await?;
147//! # Ok(())
148//! # }
149//! ```
150//!
151//! ## Client Usage Examples
152//!
153//! ### HTTP Client (curl)
154//!
155//! ```bash
156//! # Fetch the first 4KB of the disk stream
157//! curl -H "Range: bytes=0-4095" http://localhost:8080/disk -o chunk.bin
158//!
159//! # Fetch 1MB starting at offset 1MB
160//! curl -H "Range: bytes=1048576-2097151" http://localhost:8080/memory -o mem_chunk.bin
161//!
162//! # Fetch from offset to EOF (server will clamp to MAX_CHUNK_SIZE)
163//! curl -H "Range: bytes=1048576-" http://localhost:8080/disk
164//! ```
165//!
166//! ### NBD Client (Linux)
167//!
168//! ```bash
169//! # Connect NBD client to server
170//! sudo nbd-client localhost 10809 /dev/nbd0
171//!
172//! # Mount the block device (read-only)
173//! sudo mount -o ro /dev/nbd0 /mnt/snapshot
174//!
175//! # Access files normally
176//! ls -la /mnt/snapshot
177//! cat /mnt/snapshot/important.log
178//!
179//! # Disconnect when done
180//! sudo umount /mnt/snapshot
181//! sudo nbd-client -d /dev/nbd0
182//! ```
183//!
184//! # Protocol References
185//!
186//! - **HTTP Range Requests**: [RFC 7233](https://tools.ietf.org/html/rfc7233)
187//! - **NBD Protocol**: [NBD Protocol Specification](https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md)
188//! - **S3 API**: [AWS S3 API Reference](https://docs.aws.amazon.com/s3/index.html) (future work)
189
190pub mod nbd;
191
192use axum::{
193 Router,
194 extract::State,
195 http::{HeaderMap, StatusCode, header},
196 response::{IntoResponse, Response},
197 routing::get,
198};
199use hexz_core::{File, SnapshotStream};
200use std::net::SocketAddr;
201use std::sync::Arc;
202use tokio::net::TcpListener;
203
204/// IPv4 address for all server listeners (localhost only).
205///
206/// # Security Rationale
207///
208/// This constant defaults to the loopback address (`127.0.0.1`) to prevent
209/// accidental exposure of snapshot data to the local network or internet.
210/// Snapshots may contain sensitive information (credentials, personal data,
211/// proprietary code), so network exposure must be an explicit, informed decision.
212///
213/// ## Current Behavior
214///
215/// All servers (HTTP, NBD, S3) bind to `127.0.0.1`, making them accessible only
216/// from the local machine. Remote access requires:
217/// - SSH port forwarding: `ssh -L 8080:localhost:8080 user@server`
218/// - VPN tunnel with local forwarding
219/// - Reverse proxy with authentication (e.g., nginx with TLS + basic auth)
220///
221/// ## Future Enhancement
222///
223/// To enable network access, a future version will support configurable bind
224/// addresses via command-line flags or configuration files:
225///
226/// ```bash
227/// # Proposed CLI syntax (not yet implemented)
228/// hexz-server --bind 0.0.0.0:8080 --auth-token mytoken123 snapshot.st
229/// ```
230///
231/// Network exposure will require authentication to be enabled (enforced by the CLI).
232const BIND_ADDR: [u8; 4] = [127, 0, 0, 1];
233
234/// Length in bytes of the HTTP `Range` header prefix `"bytes="`.
235///
236/// The HTTP Range header format is defined in RFC 7233 as:
237///
238/// ```text
239/// Range: bytes=<start>-<end>
240/// ```
241///
242/// This constant represents the length of the literal string `"bytes="` (6 bytes),
243/// which is stripped during parsing. The parser supports:
244///
245/// - Bounded ranges: `bytes=0-1023` (fetch bytes 0 through 1023 inclusive)
246/// - Unbounded ranges: `bytes=1024-` (fetch from byte 1024 to EOF)
247/// - Single-byte ranges: `bytes=0-0` (fetch only byte 0)
248///
249/// Unsupported range types (will return HTTP 416):
250/// - Suffix ranges: `bytes=-500` (last 500 bytes)
251/// - Multi-part ranges: `bytes=0-100,200-300`
252///
253/// # Rationale for Limited Support
254///
255/// Suffix ranges and multi-part ranges are rarely used in practice and add
256/// significant parsing complexity. If needed for browser compatibility, they
257/// can be added in a future version without breaking existing clients.
258const RANGE_PREFIX_LEN: usize = 6;
259
260/// Maximum allowed read size per HTTP request to prevent DoS attacks.
261///
262/// # Value
263///
264/// 32 MiB (33,554,432 bytes)
265///
266/// # DoS Protection Rationale
267///
268/// Without a limit, a malicious client could request the entire snapshot in a single
269/// HTTP request (e.g., `Range: bytes=0-`), forcing the server to:
270///
271/// 1. Decompress gigabytes of data
272/// 2. Allocate gigabytes of heap memory
273/// 3. Hold that memory while slowly transmitting over the network
274///
275/// With multiple concurrent requests, this could exhaust server memory and CPU,
276/// causing crashes or unresponsiveness (denial of service).
277///
278/// # Why 32 MiB?
279///
280/// This value balances throughput efficiency and resource protection:
281///
282/// - **Large enough**: Clients can fetch substantial chunks with low overhead
283/// (at 1 Gbps, 32 MiB transfers in ~256 ms)
284/// - **Small enough**: Even 100 concurrent maximal requests consume <3.2 GB RAM,
285/// which is manageable on modern servers
286/// - **Common practice**: Many HTTP servers use similar limits (nginx default: 16 MiB,
287/// AWS S3 max single GET: 5 GB but recommends <100 MB for performance)
288///
289/// # Clamping Behavior
290///
291/// When a client requests more than `MAX_CHUNK_SIZE` bytes:
292///
293/// 1. The server clamps the end offset: `end = min(end, start + MAX_CHUNK_SIZE - 1)`
294/// 2. Returns HTTP 206 with the clamped range in the `Content-Range` header
295/// 3. The client sees a short read and can issue follow-up requests
296///
297/// Example:
298///
299/// ```text
300/// Client request: Range: bytes=0-67108863 (64 MiB)
301/// Server response: Content-Range: bytes 0-33554431/total (32 MiB)
302/// ```
303///
304/// The client must check the `Content-Range` header to detect clamping.
305///
306/// # Future Work
307///
308/// This limit could be made configurable via CLI flags for scenarios where higher
309/// memory usage is acceptable (e.g., dedicated forensics servers with 128+ GB RAM).
310const MAX_CHUNK_SIZE: u64 = 32 * 1024 * 1024;
311
312/// Shared application state for the HTTP serving layer.
313///
314/// This struct is wrapped in `Arc` and cloned for each HTTP request handler.
315/// The inner `snap` field is also `Arc`-wrapped, so cloning `AppState` is cheap
316/// (just incrementing reference counts, no data copying).
317///
318/// # Thread Safety
319///
320/// `AppState` is `Send + Sync` because `File` is `Send + Sync`. The underlying
321/// block cache uses `Mutex` for interior mutability, so multiple concurrent requests
322/// can safely read from the same snapshot.
323///
324/// # Memory Overhead
325///
326/// Each `AppState` clone adds ~16 bytes (one `Arc` pointer). With 1000 concurrent
327/// connections, this overhead is negligible (~16 KB).
328struct AppState {
329 /// The opened Hexz snapshot file being served via HTTP.
330 ///
331 /// This is the same `File` instance for all requests. It contains:
332 /// - The storage backend (local file, S3, etc.)
333 /// - Block cache (shared across all requests)
334 /// - Decompressor instances (thread-local via pooling)
335 snap: Arc<File>,
336}
337
338/// Exposes a `File` over NBD (Network Block Device) protocol.
339///
340/// Starts a TCP listener on `127.0.0.1:<port>` that implements the NBD protocol,
341/// allowing Linux clients to mount the Hexz snapshot as a local block device
342/// using standard tools like `nbd-client`.
343///
344/// This function runs indefinitely, accepting connections in a loop. Each client
345/// connection is handled in a separate Tokio task, allowing concurrent clients.
346///
347/// # Arguments
348///
349/// - `snap`: The Hexz snapshot file to expose. Must be wrapped in `Arc` for sharing
350/// across multiple client connections.
351/// - `port`: TCP port to bind to on the loopback interface (e.g., `10809`).
352///
353/// # Returns
354///
355/// This function never returns under normal operation (it runs forever). It only
356/// returns `Err` if:
357/// - The TCP listener fails to bind (port already in use, permission denied)
358/// - An unrecoverable I/O error occurs on the listener socket
359///
360/// Individual client errors (malformed requests, disconnects) are logged but do not
361/// stop the server.
362///
363/// # Errors
364///
365/// - `std::io::Error`: If binding to the socket fails or the listener encounters
366/// a fatal error.
367///
368/// # Examples
369///
370/// ```no_run
371/// use std::sync::Arc;
372/// use hexz_core::File;
373/// use hexz_core::store::local::FileBackend;
374/// use hexz_core::algo::compression::lz4::Lz4Compressor;
375/// use hexz_server::serve_nbd;
376///
377/// # #[tokio::main]
378/// # async fn main() -> anyhow::Result<()> {
379/// let backend = Arc::new(FileBackend::new("vm_snapshot.hxz".as_ref())?);
380/// let compressor = Box::new(Lz4Compressor::new());
381/// let snap = File::new(backend, compressor, None)?;
382///
383/// // Start NBD server (runs forever)
384/// serve_nbd(snap, 10809).await?;
385/// # Ok(())
386/// # }
387/// ```
388///
389/// ## Client-Side Usage (Linux)
390///
391/// ```bash
392/// # Connect to the NBD server
393/// sudo nbd-client localhost 10809 /dev/nbd0
394///
395/// # Mount the block device (read-only, automatically detected filesystem)
396/// sudo mount -o ro /dev/nbd0 /mnt/snapshot
397///
398/// # Browse files normally
399/// ls -la /mnt/snapshot
400/// sudo cat /mnt/snapshot/var/log/syslog
401///
402/// # Unmount and disconnect
403/// sudo umount /mnt/snapshot
404/// sudo nbd-client -d /dev/nbd0
405/// ```
406///
407/// # Security Considerations
408///
409/// ## No Encryption
410///
411/// The NBD protocol transmits data in plaintext. For localhost connections this
412/// is acceptable, but for remote access consider:
413///
414/// - **SSH tunnel**: `ssh -L 10809:localhost:10809 user@server`
415/// - **VPN**: WireGuard, OpenVPN, etc.
416/// - **TLS wrapper**: `stunnel` or similar
417///
418/// ## No Authentication
419///
420/// Any process with network access to the port can connect. The default loopback
421/// binding mitigates this, but if exposing to the network, use firewall rules or
422/// SSH key authentication.
423///
424/// ## Read-Only Enforcement
425///
426/// The NBD server always exports snapshots as read-only (NBD flag `NBD_FLAG_READ_ONLY`).
427/// Write attempts return `EPERM` (operation not permitted). However, a malicious
428/// NBD client could theoretically attempt to crash the server via protocol abuse.
429///
430/// # Performance Notes
431///
432/// - **Concurrency**: Each client spawns a separate Tokio task. With 100 concurrent
433/// clients, memory overhead is ~10 MB (100 KB per task).
434/// - **Throughput**: Typically 500-1000 MB/s for sequential reads, limited by
435/// decompression rather than NBD protocol overhead.
436/// - **Latency**: ~2-10 ms per read, including TCP round-trip and decompression.
437///
438/// # Panics
439///
440/// This function does not panic under normal operation. Client errors are logged
441/// and handled gracefully.
442pub async fn serve_nbd(snap: Arc<File>, port: u16) -> anyhow::Result<()> {
443 let addr = SocketAddr::from((BIND_ADDR, port));
444 let listener = TcpListener::bind(addr).await?;
445
446 tracing::info!("NBD server listening on {}", addr);
447 println!(
448 "NBD server started on {}. Use 'nbd-client localhost {} /dev/nbd0' to mount.",
449 addr, port
450 );
451
452 loop {
453 // Accept incoming NBD connections
454 let (socket, remote_addr) = match listener.accept().await {
455 Ok(conn) => conn,
456 Err(e) => {
457 tracing::warn!("NBD accept error (continuing): {}", e);
458 continue;
459 }
460 };
461 tracing::debug!("Accepted NBD connection from {}", remote_addr);
462
463 let snap_clone = snap.clone();
464 tokio::spawn(async move {
465 if let Err(e) = nbd::handle_client(socket, snap_clone).await {
466 tracing::error!("NBD client error: {}", e);
467 }
468 });
469 }
470}
471
472/// Exposes a `File` as an S3-compatible object storage gateway.
473///
474/// # Implementation Status: NOT IMPLEMENTED
475///
476/// This function is a **placeholder** for future S3 API compatibility. It currently
477/// blocks forever without serving any requests. Calling this function will NOT panic,
478/// but it provides no useful functionality.
479///
480/// # Planned Functionality
481///
482/// When implemented, this gateway will provide S3-compatible HTTP endpoints for:
483///
484/// ## Supported Operations (Planned)
485///
486/// - `GET /<bucket>/<key>`: Retrieve snapshot data as an S3 object
487/// - `HEAD /<bucket>/<key>`: Get object metadata (size, ETag)
488/// - `GET /<bucket>/<key>?range=bytes=<start>-<end>`: Partial object retrieval
489/// - `GET /<bucket>?list-type=2`: List objects (future: multi-snapshot support)
490///
491/// ## S3 API Compatibility Goals
492///
493/// - **Authentication**: AWS Signature Version 4 (SigV4) for production use
494/// - **Authorization**: IAM-style policies (read-only by default)
495/// - **Error responses**: Standard S3 XML error responses
496/// - **Metadata**: ETag (CRC32 of snapshot header), Content-Type, Last-Modified
497///
498/// ## Mapping Hexz Concepts to S3
499///
500/// | Hexz Concept | S3 Equivalent | Mapping Strategy |
501/// |----------------|---------------|------------------|
502/// | Snapshot file | Bucket | One bucket per snapshot |
503/// | Disk stream | Object `disk.img` | Virtual object, synthesized from snapshot |
504/// | Memory stream | Object `memory.img` | Virtual object, synthesized from snapshot |
505/// | Block index | N/A | Transparent to S3 clients |
506///
507/// ## Example S3 API Usage (Planned)
508///
509/// ```bash
510/// # Configure AWS CLI to point to local S3 gateway
511/// export AWS_ACCESS_KEY_ID=minioadmin
512/// export AWS_SECRET_ACCESS_KEY=minioadmin
513/// export AWS_ENDPOINT_URL=http://localhost:9000
514///
515/// # List buckets (snapshots)
516/// aws s3 ls
517///
518/// # List objects in a snapshot
519/// aws s3 ls s3://my-snapshot/
520///
521/// # Download the disk stream
522/// aws s3 cp s3://my-snapshot/disk.img disk_copy.img
523///
524/// # Download a range (100 MB starting at offset 1 GB)
525/// aws s3api get-object --bucket my-snapshot --key disk.img \
526/// --range bytes=1073741824-1178599423 chunk.bin
527/// ```
528///
529/// # Configuration (Planned)
530///
531/// Future configuration options (not yet implemented):
532///
533/// - **Bind address**: CLI flag `--s3-bind 0.0.0.0:9000` (default: `127.0.0.1`)
534/// - **Authentication**: `--s3-access-key` and `--s3-secret-key` for SigV4
535/// - **Bucket name**: `--s3-bucket-name <name>` (default: derived from snapshot filename)
536/// - **Anonymous access**: `--s3-allow-anonymous` flag (dangerous, for testing only)
537///
538/// # Why S3 Compatibility?
539///
540/// S3 is a de facto standard for object storage. Supporting the S3 API enables:
541///
542/// 1. **Cloud integration**: Use Hexz with existing cloud infrastructure (AWS, MinIO, etc.)
543/// 2. **Tool compatibility**: Any S3-compatible tool (s3cmd, rclone, boto3) works with Hexz
544/// 3. **Caching CDNs**: Front the gateway with CloudFront or similar for caching
545/// 4. **Lifecycle policies**: Future support for automated snapshot expiration
546///
547/// # Security Considerations (Planned)
548///
549/// When implemented, the S3 gateway will require authentication by default:
550///
551/// - **SigV4 authentication**: All requests must include valid AWS Signature V4 headers
552/// - **Read-only mode**: No PUT/DELETE operations to prevent accidental modification
553/// - **Rate limiting**: Per-access-key request throttling to prevent abuse
554/// - **TLS requirement**: Production deployments must use HTTPS (enforced by CLI flag check)
555///
556/// # Performance Goals (Planned)
557///
558/// - **Throughput**: Match HTTP server performance (~500-2000 MB/s)
559/// - **Latency**: <10 ms for authenticated requests (signature verification adds ~1-2 ms)
560/// - **Concurrency**: Handle 1000+ concurrent S3 GET requests
561///
562/// # Limitations (Planned)
563///
564/// The S3 gateway will NOT support:
565///
566/// - **Write operations**: No PUT, POST, DELETE (snapshots are read-only)
567/// - **Multipart uploads**: N/A for read-only gateway
568/// - **Bucket policies**: Simplified IAM-like policies only
569/// - **Versioning**: Snapshots are immutable, no object versioning needed
570/// - **Server-side encryption**: Use TLS for transport encryption instead
571///
572/// # Arguments
573///
574/// - `_snap`: The Hexz snapshot to expose (currently unused).
575/// - `port`: TCP port to bind to on the loopback interface (e.g., `9000`).
576///
577/// # Returns
578///
579/// This function never returns (blocks indefinitely on `std::future::pending()`).
580/// It does not perform any useful work in the current implementation.
581///
582/// # Errors
583///
584/// Currently, this function cannot return an error (it blocks forever). In the
585/// future implementation, it will return errors for:
586///
587/// - Socket binding failures
588/// - Configuration validation errors
589/// - Unrecoverable I/O errors on the listener
590///
591/// # Examples
592///
593/// ```no_run
594/// use std::sync::Arc;
595/// use hexz_core::File;
596/// use hexz_core::store::local::FileBackend;
597/// use hexz_core::algo::compression::lz4::Lz4Compressor;
598/// use hexz_server::serve_s3_gateway;
599///
600/// # #[tokio::main]
601/// # async fn main() -> anyhow::Result<()> {
602/// let backend = Arc::new(FileBackend::new("snapshot.hxz".as_ref())?);
603/// let compressor = Box::new(Lz4Compressor::new());
604/// let snap = File::new(backend, compressor, None)?;
605///
606/// // WARNING: This will block forever without serving requests
607/// serve_s3_gateway(snap, 9000).await?;
608/// # Ok(())
609/// # }
610/// ```
611///
612/// # Implementation Roadmap
613///
614/// 1. **Phase 1**: Basic GET/HEAD operations with no authentication (localhost-only)
615/// 2. **Phase 2**: AWS SigV4 authentication and bucket listing
616/// 3. **Phase 3**: Multi-snapshot support (multiple buckets)
617/// 4. **Phase 4**: TLS support and network binding options
618/// 5. **Phase 5**: IAM-style policies and access control
619///
620/// # Call for Contributions
621///
622/// Implementing S3 compatibility is a substantial undertaking. If you are interested
623/// in contributing, see `docs/s3_gateway_design.md` (to be created) for the design
624/// specification and implementation plan.
625#[deprecated(note = "Not implemented. Blocks indefinitely without serving requests.")]
626pub async fn serve_s3_gateway(_snap: Arc<File>, port: u16) -> anyhow::Result<()> {
627 tracing::info!("Starting S3 Gateway on port {}", port);
628 println!(
629 "S3 Gateway started on port {} (Not fully implemented)",
630 port
631 );
632 std::future::pending::<()>().await; // Keep alive
633 unreachable!();
634}
635
636/// Exposes a `File` over HTTP with range request support.
637///
638/// Starts an HTTP 1.1 server on `127.0.0.1:<port>` that exposes snapshot data via
639/// two endpoints:
640///
641/// - `GET /disk`: Serves the disk stream (persistent storage snapshot)
642/// - `GET /memory`: Serves the memory stream (RAM snapshot)
643///
644/// Both endpoints support HTTP range requests (RFC 7233) for partial content retrieval.
645///
646/// # Protocol Behavior
647///
648/// ## Full Content Request (No Range Header)
649///
650/// ```http
651/// GET /disk HTTP/1.1
652/// Host: localhost:8080
653/// ```
654///
655/// Response:
656///
657/// ```http
658/// HTTP/1.1 206 Partial Content
659/// Content-Type: application/octet-stream
660/// Content-Range: bytes 0-33554431/10737418240
661/// Accept-Ranges: bytes
662///
663/// [First 32 MiB of data, clamped by MAX_CHUNK_SIZE]
664/// ```
665///
666/// Note: Even without a `Range` header, the response is clamped to `MAX_CHUNK_SIZE`
667/// and returns HTTP 206 (not 200) to indicate partial content.
668///
669/// ## Range Request (Partial Content)
670///
671/// ```http
672/// GET /memory HTTP/1.1
673/// Host: localhost:8080
674/// Range: bytes=1048576-2097151
675/// ```
676///
677/// Response (success):
678///
679/// ```http
680/// HTTP/1.1 206 Partial Content
681/// Content-Type: application/octet-stream
682/// Content-Range: bytes 1048576-2097151/8589934592
683/// Accept-Ranges: bytes
684///
685/// [1 MiB of data from offset 1048576]
686/// ```
687///
688/// Response (invalid range):
689///
690/// ```http
691/// HTTP/1.1 416 Range Not Satisfiable
692/// Content-Range: bytes */8589934592
693/// ```
694///
695/// ## Error Responses
696///
697/// - **416 Range Not Satisfiable**: Invalid range syntax or out-of-bounds request
698/// - **500 Internal Server Error**: Backend I/O failure or decompression error
699///
700/// # HTTP Range Request Limitations
701///
702/// ## Supported Range Types
703///
704/// - **Bounded ranges**: `bytes=<start>-<end>` (both offsets specified)
705/// - **Unbounded ranges**: `bytes=<start>-` (from start to EOF, clamped to `MAX_CHUNK_SIZE`)
706///
707/// ## Unsupported Range Types
708///
709/// These return HTTP 416 (Range Not Satisfiable):
710///
711/// - **Suffix ranges**: `bytes=-<suffix-length>` (e.g., `bytes=-1024` for last 1KB)
712/// - **Multi-part ranges**: `bytes=0-100,200-300` (multiple ranges in one request)
713///
714/// Rationale: These are rarely used and add significant implementation complexity.
715/// Standard range requests cover 99% of real-world use cases.
716///
717/// # DoS Protection Mechanisms
718///
719/// ## Request Size Clamping
720///
721/// All reads are clamped to `MAX_CHUNK_SIZE` (32 MiB) to prevent memory exhaustion:
722///
723/// ```text
724/// Client requests: bytes=0-1073741823 (1 GB)
725/// Server clamps to: bytes=0-33554431 (32 MiB)
726/// Response header: Content-Range: bytes 0-33554431/total
727/// ```
728///
729/// The client detects clamping by comparing the `Content-Range` header to the
730/// requested range and can issue follow-up requests for remaining data.
731///
732/// ## Connection Limits
733///
734/// The server relies on OS-level TCP connection limits (controlled by `ulimit -n`
735/// and kernel parameters). Tokio's async runtime handles thousands of concurrent
736/// connections efficiently (each connection consumes ~100 KB of memory).
737///
738/// For production deployments, consider:
739///
740/// - **Reverse proxy**: nginx or Caddy with connection limits and rate limiting
741/// - **Firewall rules**: Limit connections per IP address
742/// - **Resource limits**: Set `ulimit -n` to a reasonable value (e.g., 4096)
743///
744/// # Arguments
745///
746/// - `snap`: The Hexz snapshot file to expose. Must be wrapped in `Arc` for sharing
747/// across request handlers.
748/// - `port`: TCP port to bind to on the loopback interface (e.g., `8080`, `3000`).
749///
750/// # Returns
751///
752/// This function runs indefinitely, serving HTTP requests until the server is shut
753/// down (e.g., via Ctrl+C signal). It only returns `Err` if:
754///
755/// - The TCP listener fails to bind (port already in use, permission denied)
756/// - The HTTP server encounters a fatal error (should be extremely rare)
757///
758/// Individual request errors (invalid ranges, read failures) are handled gracefully
759/// and return appropriate HTTP error responses without stopping the server.
760///
761/// # Errors
762///
763/// - `std::io::Error`: If binding to the socket fails.
764/// - `anyhow::Error`: If the HTTP server encounters an unrecoverable error.
765///
766/// # Examples
767///
768/// ## Server Setup
769///
770/// ```no_run
771/// use std::sync::Arc;
772/// use hexz_core::File;
773/// use hexz_core::store::local::FileBackend;
774/// use hexz_core::algo::compression::lz4::Lz4Compressor;
775/// use hexz_server::serve_http;
776///
777/// # #[tokio::main]
778/// # async fn main() -> anyhow::Result<()> {
779/// let backend = Arc::new(FileBackend::new("snapshot.hxz".as_ref())?);
780/// let compressor = Box::new(Lz4Compressor::new());
781/// let snap = File::new(backend, compressor, None)?;
782///
783/// // Start HTTP server on port 8080 (runs forever)
784/// serve_http(snap, 8080).await?;
785/// # Ok(())
786/// # }
787/// ```
788///
789/// ## Client Usage (curl)
790///
791/// ```bash
792/// # Fetch first 4KB of disk stream
793/// curl -H "Range: bytes=0-4095" http://localhost:8080/disk -o chunk.bin
794///
795/// # Fetch 1MB starting at 1MB offset
796/// curl -H "Range: bytes=1048576-2097151" http://localhost:8080/memory -o mem_chunk.bin
797///
798/// # Fetch from offset to EOF (clamped to 32 MiB)
799/// curl -H "Range: bytes=1048576-" http://localhost:8080/disk -o large_chunk.bin
800///
801/// # Full GET (no range header, returns first 32 MiB)
802/// curl http://localhost:8080/disk -o first_32mb.bin
803/// ```
804///
805/// ## Client Usage (Python)
806///
807/// ```python
808/// import requests
809///
810/// # Fetch a range
811/// headers = {'Range': 'bytes=0-4095'}
812/// response = requests.get('http://localhost:8080/disk', headers=headers)
813/// assert response.status_code == 206 # Partial Content
814/// data = response.content
815/// print(f"Fetched {len(data)} bytes")
816///
817/// # Parse Content-Range header
818/// content_range = response.headers['Content-Range']
819/// # Example: "bytes 0-4095/10737418240"
820/// print(f"Content-Range: {content_range}")
821/// ```
822///
823/// # Performance Characteristics
824///
825/// ## Throughput
826///
827/// - **Local (127.0.0.1)**: 500-2000 MB/s (limited by decompression, not HTTP overhead)
828/// - **1 Gbps network**: ~120 MB/s (network-bound)
829/// - **10 Gbps network**: ~800 MB/s (may be decompression-bound for LZ4, network-bound for ZSTD)
830///
831/// ## Latency
832///
833/// - **Cache hit**: ~80μs (block already decompressed)
834/// - **Cache miss**: ~1-5 ms (includes decompression and backend I/O)
835/// - **Network RTT**: Add local RTT (~0.1 ms for localhost, ~10-50 ms for remote)
836///
837/// ## Memory Usage
838///
839/// - **Per connection**: ~100 KB (Tokio task stack + buffers)
840/// - **Per request**: ~32 MB worst-case (if requesting `MAX_CHUNK_SIZE`)
841/// - **Block cache**: Shared across all connections (typically 100-500 MB)
842///
843/// With 1000 concurrent connections, memory overhead is ~100 MB for connections
844/// plus the shared block cache.
845///
846/// # Security Considerations
847///
848/// ## Current Security Posture
849///
850/// - **Localhost-only**: Binds to `127.0.0.1`, not accessible from network
851/// - **No authentication**: Anyone with local access can read snapshot data
852/// - **No TLS**: Plaintext HTTP (acceptable for loopback)
853/// - **DoS protection**: Request size clamping, but no rate limiting
854///
855/// ## Threat Model
856///
857/// For localhost-only deployments, the threat model assumes:
858///
859/// 1. **Trusted local environment**: All local users are trusted (or isolated via OS permissions)
860/// 2. **No remote attackers**: Firewall prevents external access
861/// 3. **Process isolation**: Snapshot data is not more sensitive than other local files
862///
863/// ## Future Security Enhancements (Planned)
864///
865/// - **TLS/HTTPS**: Certificate-based encryption for network access
866/// - **Bearer token auth**: Simple token in `Authorization` header
867/// - **Rate limiting**: Per-IP request throttling
868/// - **Audit logging**: Request logs with client IP and byte ranges
869///
870/// # Panics
871///
872/// This function does not panic under normal operation. Request handling errors
873/// are converted to HTTP error responses.
874pub async fn serve_http(snap: Arc<File>, port: u16) -> anyhow::Result<()> {
875 let addr = SocketAddr::from((BIND_ADDR, port));
876 let listener = TcpListener::bind(addr).await?;
877 tracing::info!("HTTP server listening on {}", addr);
878 serve_http_with_listener(snap, listener).await
879}
880
881/// Like [`serve_http`], but accepts a pre-bound [`TcpListener`].
882///
883/// This avoids a TOCTOU race when the caller needs to discover a free port
884/// (bind to port 0) and then pass the listener directly instead of
885/// re-binding by port number.
886pub async fn serve_http_with_listener(
887 snap: Arc<File>,
888 listener: TcpListener,
889) -> anyhow::Result<()> {
890 let state = Arc::new(AppState { snap });
891
892 let app = Router::new()
893 .route("/disk", get(get_disk))
894 .route("/memory", get(get_memory))
895 .with_state(state);
896
897 axum::serve(listener, app).await?;
898 Ok(())
899}
900
901/// HTTP handler for the `/disk` endpoint.
902///
903/// Serves the disk stream (persistent storage snapshot) from the Hexz file.
904/// Delegates to `handle_request` with `SnapshotStream::Disk`.
905///
906/// # Route
907///
908/// `GET /disk`
909///
910/// # Request Headers
911///
912/// - `Range` (optional): HTTP range request (e.g., `bytes=0-4095`)
913///
914/// # Response Headers
915///
916/// - `Content-Type`: Always `application/octet-stream` (raw binary data)
917/// - `Content-Range`: Byte range served (e.g., `bytes 0-4095/10737418240`)
918/// - `Accept-Ranges`: Always `bytes` (indicates range request support)
919///
920/// # Response Status Codes
921///
922/// - **206 Partial Content**: Successful range request
923/// - **416 Range Not Satisfiable**: Invalid or out-of-bounds range
924/// - **500 Internal Server Error**: Snapshot read failure
925///
926/// # Examples
927///
928/// See `serve_http` for client usage examples.
929async fn get_disk(headers: HeaderMap, State(state): State<Arc<AppState>>) -> impl IntoResponse {
930 handle_request(headers, &state.snap, SnapshotStream::Disk)
931}
932
933/// HTTP handler for the `/memory` endpoint.
934///
935/// Serves the memory stream (RAM snapshot) from the Hexz file.
936/// Delegates to `handle_request` with `SnapshotStream::Memory`.
937///
938/// # Route
939///
940/// `GET /memory`
941///
942/// # Request Headers
943///
944/// - `Range` (optional): HTTP range request (e.g., `bytes=0-4095`)
945///
946/// # Response Headers
947///
948/// - `Content-Type`: Always `application/octet-stream` (raw binary data)
949/// - `Content-Range`: Byte range served (e.g., `bytes 0-4095/8589934592`)
950/// - `Accept-Ranges`: Always `bytes` (indicates range request support)
951///
952/// # Response Status Codes
953///
954/// - **206 Partial Content**: Successful range request
955/// - **416 Range Not Satisfiable**: Invalid or out-of-bounds range
956/// - **500 Internal Server Error**: Snapshot read failure
957///
958/// # Examples
959///
960/// See `serve_http` for client usage examples.
961async fn get_memory(headers: HeaderMap, State(state): State<Arc<AppState>>) -> impl IntoResponse {
962 handle_request(headers, &state.snap, SnapshotStream::Memory)
963}
964
965/// Core HTTP request handler that translates `Range` headers into snapshot reads.
966///
967/// This function implements the HTTP range request logic for both `/disk` and `/memory`
968/// endpoints. It performs the following steps:
969///
970/// 1. Parse the `Range` header (if present) or default to full stream access
971/// 2. Clamp the requested range to `MAX_CHUNK_SIZE` to prevent DoS
972/// 3. Read the data from the snapshot via `File::read_at`
973/// 4. Return HTTP 206 with `Content-Range` header, or error status codes
974///
975/// # Arguments
976///
977/// - `headers`: HTTP request headers from the client (parsed by Axum)
978/// - `snap`: The Hexz snapshot file to read from
979/// - `stream`: Which logical stream to read (`Disk` or `Memory`)
980///
981/// # Returns
982///
983/// An Axum `Response` with one of the following status codes:
984///
985/// - **206 Partial Content**: Successful read (even for full stream requests)
986/// - **416 Range Not Satisfiable**: Invalid range syntax or out-of-bounds offset
987/// - **500 Internal Server Error**: Snapshot read failure (decompression error, I/O error)
988///
989/// # HTTP Range Request Parsing
990///
991/// The `Range` header is expected in the format `bytes=<start>-<end>` where:
992///
993/// - `<start>` is the starting byte offset (inclusive, zero-indexed)
994/// - `<end>` is the ending byte offset (inclusive), or omitted for "to EOF"
995///
996/// ## Examples of Supported Ranges
997///
998/// ```text
999/// Range: bytes=0-1023 → Read bytes 0-1023 (1024 bytes)
1000/// Range: bytes=1024-2047 → Read bytes 1024-2047 (1024 bytes)
1001/// Range: bytes=1048576- → Read from 1MB to EOF (clamped to MAX_CHUNK_SIZE)
1002/// (no Range header) → Read from start to EOF (clamped to MAX_CHUNK_SIZE)
1003/// ```
1004///
1005/// ## Examples of Unsupported/Invalid Ranges
1006///
1007/// These return HTTP 416:
1008///
1009/// ```text
1010/// Range: bytes=-1024 → Suffix range (last 1024 bytes) - not supported
1011/// Range: bytes=0-100,200-300 → Multi-part range - not supported
1012/// Range: bytes=1000-500 → Start > end - invalid
1013/// Range: bytes=999999999999- → Start beyond EOF - out of bounds
1014/// ```
1015///
1016/// # DoS Protection: Range Clamping Algorithm
1017///
1018/// To prevent a malicious client from requesting gigabytes of data in a single
1019/// request, the handler clamps the effective range:
1020///
1021/// ```text
1022/// requested_length = end - start + 1
1023/// if requested_length > MAX_CHUNK_SIZE:
1024/// end = start + MAX_CHUNK_SIZE - 1
1025/// if end >= total_size:
1026/// end = total_size - 1
1027/// ```
1028///
1029/// The clamped range is reflected in the `Content-Range` response header:
1030///
1031/// ```text
1032/// Content-Range: bytes <actual_start>-<actual_end>/<total_size>
1033/// ```
1034///
1035/// Clients must check this header to detect clamping and issue follow-up requests
1036/// for remaining data.
1037///
1038/// ## Clamping Example
1039///
1040/// ```text
1041/// Client request: Range: bytes=0-67108863 (64 MiB)
1042/// Total size: 10 GB
1043/// Server clamps to: 0-33554431 (32 MiB due to MAX_CHUNK_SIZE)
1044/// Response header: Content-Range: bytes 0-33554431/10737418240
1045/// ```
1046///
1047/// # Error Handling
1048///
1049/// ## Range Parsing Errors
1050///
1051/// If `parse_range` returns `Err(())`, the handler returns HTTP 416 (Range Not
1052/// Satisfiable). This occurs when:
1053///
1054/// - The `Range` header does not start with `"bytes="`
1055/// - The start/end offsets are not valid integers
1056/// - The start offset is greater than the end offset
1057/// - The end offset is beyond the stream size
1058///
1059/// ## Snapshot Read Errors
1060///
1061/// If `snap.read_at` returns `Err(_)`, the handler returns HTTP 500 (Internal
1062/// Server Error). This occurs when:
1063///
1064/// - Decompression fails (corrupted compressed data)
1065/// - Backend I/O fails (disk error, network timeout for remote backends)
1066/// - Encryption decryption fails (incorrect key, corrupted ciphertext)
1067///
1068/// The specific error is not exposed to the client (only logged internally) to
1069/// avoid information leakage.
1070///
1071/// # Edge Cases
1072///
1073/// ## Empty Range
1074///
1075/// If the calculated range length is 0 (e.g., due to clamping at EOF), the handler
1076/// returns HTTP 416. This should be rare in practice since clients typically request
1077/// valid ranges.
1078///
1079/// ## Zero-Sized Stream
1080///
1081/// If the snapshot stream size is 0 (empty disk or memory snapshot), any range
1082/// request returns HTTP 416 because no valid offsets exist.
1083///
1084/// ## Single-Byte Range
1085///
1086/// A request like `bytes=0-0` (fetch only byte 0) is valid and returns 1 byte with
1087/// HTTP 206 and `Content-Range: bytes 0-0/<total>`.
1088///
1089/// # Performance Characteristics
1090///
1091/// - **No Range Header**: Clamps to `MAX_CHUNK_SIZE`, then performs one `read_at` call
1092/// - **Valid Range**: One `read_at` call (may hit block cache or require decompression)
1093/// - **Invalid Range**: Immediate return (no snapshot I/O)
1094///
1095/// For cache hits, latency is ~80μs. For cache misses, latency is ~1-5 ms depending
1096/// on backend speed and compression algorithm.
1097///
1098/// # Security Notes
1099///
1100/// - **No authentication**: This function does not check credentials (handled by
1101/// future middleware or reverse proxy)
1102/// - **DoS mitigation**: Request size clamping prevents memory exhaustion
1103/// - **Information leakage**: Error responses do not reveal internal details
1104/// (e.g., "decompression failed" is hidden behind HTTP 500)
1105///
1106/// # Examples
1107///
1108/// See `serve_http`, `get_disk`, and `get_memory` for usage context.
1109fn handle_request(headers: HeaderMap, snap: &Arc<File>, stream: SnapshotStream) -> Response {
1110 let total_size = snap.size(stream);
1111
1112 let (start, mut end) = if let Some(range) = headers.get(header::RANGE) {
1113 match parse_range(range.to_str().unwrap_or(""), total_size) {
1114 Ok(r) => r,
1115 Err(_) => return StatusCode::RANGE_NOT_SATISFIABLE.into_response(),
1116 }
1117 } else {
1118 (0, total_size.saturating_sub(1))
1119 };
1120
1121 // SECURITY: DoS Protection
1122 // Clamp the requested range to avoid huge memory allocations.
1123 if end - start + 1 > MAX_CHUNK_SIZE {
1124 end = start + MAX_CHUNK_SIZE - 1;
1125 // Ensure we don't go past EOF after clamping
1126 if end >= total_size {
1127 end = total_size.saturating_sub(1);
1128 }
1129 }
1130
1131 let len = (end - start + 1) as usize;
1132 if len == 0 {
1133 // Handle empty range edge case
1134 return StatusCode::RANGE_NOT_SATISFIABLE.into_response();
1135 }
1136
1137 match snap.read_at(stream, start, len) {
1138 Ok(data) => (
1139 StatusCode::PARTIAL_CONTENT,
1140 [
1141 (header::CONTENT_TYPE, "application/octet-stream"),
1142 (
1143 header::CONTENT_RANGE,
1144 &format!("bytes {}-{}/{}", start, end, total_size),
1145 ),
1146 (header::ACCEPT_RANGES, "bytes"),
1147 ],
1148 data,
1149 )
1150 .into_response(),
1151 Err(_) => StatusCode::INTERNAL_SERVER_ERROR.into_response(),
1152 }
1153}
1154
1155/// Parses an HTTP `Range` header into absolute byte offsets.
1156///
1157/// Implements a subset of HTTP range request syntax (RFC 7233), supporting only
1158/// simple byte ranges without multi-part or suffix ranges.
1159///
1160/// # Supported Syntax
1161///
1162/// - **Bounded range**: `bytes=<start>-<end>` (both offsets specified)
1163/// - Example: `bytes=0-1023` → Returns `(0, 1023)`
1164/// - **Unbounded range**: `bytes=<start>-` (from start to EOF)
1165/// - Example: `bytes=1024-` → Returns `(1024, size-1)`
1166///
1167/// # Unsupported Syntax
1168///
1169/// - **Suffix range**: `bytes=-<length>` (last N bytes)
1170/// - Example: `bytes=-1024` → Returns `Err(())`
1171/// - **Multi-part range**: `bytes=0-100,200-300`
1172/// - Example: `bytes=0-100,200-300` → Returns `Err(())`
1173///
1174/// These are rejected because:
1175/// 1. They are rarely used in practice (<1% of range requests)
1176/// 2. They add significant parsing and response generation complexity
1177/// 3. The HTTP 416 error response is acceptable for clients that need them
1178///
1179/// # Arguments
1180///
1181/// - `range`: The value of the `Range` header (e.g., `"bytes=0-1023"`)
1182/// - `size`: The total size of the stream in bytes (used to validate offsets)
1183///
1184/// # Returns
1185///
1186/// - `Ok((start, end))`: Valid range with absolute byte offsets (both inclusive)
1187/// - `Err(())`: Invalid syntax or out-of-bounds range
1188///
1189/// # Error Conditions
1190///
1191/// Returns `Err(())` if:
1192///
1193/// 1. **Missing prefix**: Header does not start with `"bytes="`
1194/// - Example: `"items=0-100"` → Error
1195/// 2. **Invalid integer**: Start or end cannot be parsed as `u64`
1196/// - Example: `"bytes=abc-def"` → Error
1197/// 3. **Inverted range**: Start offset is greater than end offset
1198/// - Example: `"bytes=1000-500"` → Error
1199/// 4. **Out of bounds**: End offset is beyond the stream size
1200/// - Example: `"bytes=0-999999"` when size is 1000 → Error
1201///
1202/// # Parsing Algorithm
1203///
1204/// ```text
1205/// 1. Check for "bytes=" prefix (RANGE_PREFIX_LEN = 6)
1206/// 2. Split remaining string on '-' delimiter
1207/// 3. Parse start offset (parts[0])
1208/// 4. Parse end offset (parts[1] if present and non-empty, else size-1)
1209/// 5. Validate: start <= end && end < size
1210/// 6. Return (start, end)
1211/// ```
1212///
1213/// # Edge Cases
1214///
1215/// ## Empty String After Prefix
1216///
1217/// ```text
1218/// Range: bytes=
1219/// ```
1220///
1221/// Returns `Err(())` because there is no start offset.
1222///
1223/// ## Single Byte Range
1224///
1225/// ```text
1226/// Range: bytes=0-0
1227/// ```
1228///
1229/// Returns `Ok((0, 0))` (valid, requests exactly 1 byte).
1230///
1231/// ## Range at EOF
1232///
1233/// ```text
1234/// Range: bytes=0-999 (size = 1000)
1235/// ```
1236///
1237/// Returns `Ok((0, 999))` (valid, end is inclusive and equals `size - 1`).
1238///
1239/// ## Range Beyond EOF
1240///
1241/// ```text
1242/// Range: bytes=0-1000 (size = 1000)
1243/// ```
1244///
1245/// Returns `Err(())` because offset 1000 does not exist (valid range is 0-999).
1246///
1247/// # Examples
1248///
1249/// ```text
1250/// parse_range("bytes=0-1023", 10000) -> Ok((0, 1023))
1251/// parse_range("bytes=1024-", 10000) -> Ok((1024, 9999))
1252/// parse_range("0-1023", 10000) -> Err(()) // missing "bytes=" prefix
1253/// parse_range("bytes=0-10000", 10000) -> Err(()) // out of bounds
1254/// parse_range("bytes=1000-500", 10000)-> Err(()) // inverted range
1255/// ```
1256///
1257/// # Performance
1258///
1259/// - **Time complexity**: O(n) where n is the length of the range string (typically <20 chars)
1260/// - **Allocation**: One heap allocation for the `split('-')` iterator's internal state
1261/// - **Typical latency**: <1 μs (negligible compared to snapshot read latency)
1262///
1263/// # Security
1264///
1265/// This function is resilient to malicious input:
1266///
1267/// - **Integer overflow**: `u64::parse` rejects values >2^64-1
1268/// - **Unbounded length**: The `Range` header is bounded by HTTP header size limits
1269/// (typically 8 KB, enforced by the HTTP server)
1270/// - **No allocation attacks**: Uses only one small allocation for splitting
1271#[allow(clippy::result_unit_err)]
1272pub fn parse_range(range: &str, size: u64) -> Result<(u64, u64), ()> {
1273 if !range.starts_with("bytes=") {
1274 return Err(());
1275 }
1276 let parts: Vec<&str> = range[RANGE_PREFIX_LEN..].split('-').collect();
1277 let start = parts[0].parse::<u64>().map_err(|_| ())?;
1278 let end = if parts.len() > 1 && !parts[1].is_empty() {
1279 parts[1].parse::<u64>().map_err(|_| ())?
1280 } else {
1281 size.saturating_sub(1)
1282 };
1283 if start > end || end >= size {
1284 return Err(());
1285 }
1286 Ok((start, end))
1287}