http_serve/
lib.rs

1// Copyright (c) 2016-2021 The http-serve developers
2//
3// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE.txt or
4// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
5// <LICENSE-MIT.txt or http://opensource.org/licenses/MIT>, at your
6// option. This file may not be copied, modified, or distributed
7// except according to those terms.
8
9//! Helpers for serving HTTP GET and HEAD responses asynchronously with the
10//! [http](http://crates.io/crates/http) crate and [tokio](https://crates.io/crates/tokio).
11//! Works well with [hyper](https://crates.io/crates/hyper) 0.14.x.
12//!
13//! This crate supplies two ways to respond to HTTP GET and HEAD requests:
14//!
15//! *   the `serve` function can be used to serve an `Entity`, a trait representing reusable,
16//!     byte-rangeable HTTP entities. `Entity` must be able to produce exactly the same data on
17//!     every call, know its size in advance, and be able to produce portions of the data on demand.
18//! *   the `streaming_body` function can be used to add a body to an otherwise-complete response.
19//!     If a body is needed (on `GET` rather than `HEAD` requests), it returns a `BodyWriter`
20//!     (which implements `std::io::Writer`). The caller should produce the complete body or call
21//!     `BodyWriter::abort`, causing the HTTP stream to terminate abruptly.
22//!
23//! It supplies a static file `Entity` implementation and a (currently Unix-only)
24//! helper for serving a full directory tree from the local filesystem, including
25//! automatically looking for `.gz`-suffixed files when the client advertises
26//! `Accept-Encoding: gzip`.
27//!
28//! # Why two ways?
29//!
30//! They have pros and cons. This table shows some of them:
31//!
32//! <table>
33//!   <tr><th><th><code>serve</code><th><code>streaming_body</code></tr>
34//!   <tr><td>automatic byte range serving<td>yes<td>no [<a href="#range">1</a>]</tr>
35//!   <tr><td>backpressure<td>yes<td>no [<a href="#backpressure">2</a>]</tr>
36//!   <tr><td>conditional GET<td>yes<td>no [<a href="#conditional_get">3</a>]</tr>
37//!   <tr><td>sends first byte before length known<td>no<td>yes</tr>
38//!   <tr><td>automatic gzip content encoding<td>no [<a href="#gzip">4</a>]<td>yes</tr>
39//! </table>
40//!
41//! <a name="range">\[1\]</a>: `streaming_body` always sends the full body. Byte range serving
42//! wouldn't make much sense with its interface. The application will generate all the bytes
43//! every time anyway, and `http-serve`'s buffering logic would have to be complex
44//! to handle multiple ranges well.
45//!
46//! <a name="backpressure">\[2\]</a>: `streaming_body` is often appended to while holding
47//! a lock or open database transaction, where backpressure is undesired. It'd be
48//! possible to add support for "wait points" where the caller explicitly wants backpressure. This
49//! would make it more suitable for large streams, even infinite streams like
50//! [Server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events).
51//!
52//! <a name="conditional_get">\[3\]</a>: `streaming_body` doesn't yet support
53//! generating etags or honoring conditional GET requests. PRs welcome!
54//!
55//! <a name="gzip">\[4\]</a>: `serve` doesn't automatically apply `Content-Encoding:
56//! gzip` because the content encoding is a property of the entity you supply. The
57//! entity's etag, length, and byte range boundaries must match the encoding. You
58//! can use the `http_serve::should_gzip` helper to decide between supplying a plain
59//! or gzipped entity. `serve` could automatically apply the related
60//! `Transfer-Encoding: gzip` where the browser requests it via `TE: gzip`, but
61//! common browsers have
62//! [chosen](https://bugs.chromium.org/p/chromium/issues/detail?id=94730) to avoid
63//! requesting or handling `Transfer-Encoding`.
64//!
65//! Use `serve` when:
66//!
67//! *   metadata (length, etag, etc) and byte ranges can be regenerated cheaply and consistently
68//!     via a lazy `Entity`, or
69//! *   data can be fully buffered in memory or on disk and reused many times. You may want to
70//!     create a pair of buffers for gzipped (for user-agents which specify `Accept-Encoding:
71//!     gzip`) vs raw.
72//!
73//! Use `streaming_body` when regenerating the entire body each time a response is sent.
74//!
75//! Once you return a `hyper::server::Response` to hyper, your only way to signal error to the
76//! client is to abruptly close the HTTP connection while sending the body. If you want the ability
77//! to return a well-formatted error to the client while producing body bytes, you must buffer the
78//! entire body in-memory before returning anything to hyper.
79//!
80//! If you are buffering a response in memory, `serve` requires copying the bytes (when using
81//! `Data = Vec<u8>` or similar) or atomic reference-counting (with `Data = Arc<Vec<u8>>` or
82//! similar). `streaming_body` doesn't need to keep its own copy for potential future use; it may
83//! be cheaper because it can simply hand ownership of the existing `Vec<u8>`s to hyper.
84//!
85//! # Why the weird type bounds? Why not use `hyper::Body` and `bytes::Bytes` for everything?
86//!
87//! These bounds are compatible with `hyper::Body` and `bytes::Bytes`, and most callers will use
88//! those types. **Note:** if you see an error like the one below, ensure you are using hyper's
89//! `stream` feature:
90//!
91//! ```text
92//! error[E0277]: the trait bound `Body: From<Box<(dyn futures::Stream<Item = Result<_, _>> +
93//! std::marker::Send + 'static)>>` is not satisfied
94//! ```
95//!
96//! `Cargo.toml` should look similar to the following:
97//!
98//! ```toml
99//! hyper = { version = "0.14.7", features = ["stream"] }
100//! ```
101//!
102//! There are times when it's desirable to have more flexible ownership provided by a
103//! type such as `reffers::ARefs<'static, [u8]>`. One is `mmap`-based file serving:
104//! `bytes::Bytes` would require copying the data in each chunk. An implementation with `ARefs`
105//! could instead `mmap` and `mlock` the data on another thread and provide chunks which `munmap`
106//! when dropped. In these cases, the caller can supply an alternate implementation of the
107//! `http_body::Body` trait which uses a different `Data` type than `bytes::Bytes`.
108
109#![deny(clippy::print_stderr, clippy::print_stdout)]
110#![cfg_attr(docsrs, feature(doc_cfg))]
111
112use bytes::Buf;
113use futures_core::Stream;
114use http::header::{self, HeaderMap, HeaderValue};
115use std::ops::Range;
116use std::str::FromStr;
117use std::time::SystemTime;
118
119/// Returns a HeaderValue for the given formatted data.
120/// Caller must make two guarantees:
121///    * The data fits within `max_len` (or the write will panic).
122///    * The data are ASCII (or HeaderValue's safety will be violated).
123macro_rules! unsafe_fmt_ascii_val {
124    ($max_len:expr, $fmt:expr, $($arg:tt)+) => {{
125        let mut buf = bytes::BytesMut::with_capacity($max_len);
126        use std::fmt::Write;
127        write!(buf, $fmt, $($arg)*).expect("fmt_val fits within provided max len");
128        unsafe {
129            http::header::HeaderValue::from_maybe_shared_unchecked(buf.freeze())
130        }
131    }}
132}
133
134mod chunker;
135
136#[cfg(feature = "dir")]
137#[cfg_attr(docsrs, doc(cfg(feature = "dir")))]
138pub mod dir;
139
140mod etag;
141mod file;
142mod gzip;
143mod platform;
144mod range;
145mod serving;
146
147pub use crate::file::ChunkedReadFile;
148pub use crate::gzip::BodyWriter;
149pub use crate::serving::serve;
150
151/// A reusable, read-only, byte-rangeable HTTP entity for GET and HEAD serving.
152/// Must return exactly the same data on every call.
153pub trait Entity: 'static + Send + Sync {
154    type Error: 'static + Send + Sync;
155
156    /// The type of a data chunk.
157    ///
158    /// Commonly `bytes::Bytes` but may be something more exotic.
159    type Data: 'static + Send + Sync + Buf + From<Vec<u8>> + From<&'static [u8]>;
160
161    /// Returns the length of the entity's body in bytes.
162    fn len(&self) -> u64;
163
164    /// Returns true iff the entity's body has length 0.
165    fn is_empty(&self) -> bool {
166        self.len() == 0
167    }
168
169    /// Gets the body bytes indicated by `range`.
170    fn get_range(
171        &self,
172        range: Range<u64>,
173    ) -> Box<dyn Stream<Item = Result<Self::Data, Self::Error>> + Send + Sync>;
174
175    /// Adds entity headers such as `Content-Type` to the supplied `Headers` object.
176    /// In particular, these headers are the "other representation header fields" described by [RFC
177    /// 7233 section 4.1](https://tools.ietf.org/html/rfc7233#section-4.1); they should exclude
178    /// `Content-Range`, `Date`, `Cache-Control`, `ETag`, `Expires`, `Content-Location`, and `Vary`.
179    ///
180    /// This function will be called only when that section says that headers such as
181    /// `Content-Type` should be included in the response.
182    fn add_headers(&self, _: &mut HeaderMap);
183
184    /// Returns an etag for this entity, if available.
185    /// Implementations are encouraged to provide a strong etag. [RFC 7232 section
186    /// 2.1](https://tools.ietf.org/html/rfc7232#section-2.1) notes that only strong etags
187    /// are usable for sub-range retrieval.
188    fn etag(&self) -> Option<HeaderValue>;
189
190    /// Returns the last modified time of this entity, if available.
191    /// Note that `serve` may serve an earlier `Last-Modified:` date than the one returned here if
192    /// this time is in the future, as required by [RFC 7232 section
193    /// 2.2.1](https://tools.ietf.org/html/rfc7232#section-2.2.1).
194    fn last_modified(&self) -> Option<SystemTime>;
195}
196
197/// Parses an RFC 7231 section 5.3.1 `qvalue` into an integer in [0, 1000].
198/// ```text
199/// qvalue = ( "0" [ "." 0*3DIGIT ] )
200///        / ( "1" [ "." 0*3("0") ] )
201/// ```
202fn parse_qvalue(s: &str) -> Result<u16, ()> {
203    match s {
204        "1" | "1." | "1.0" | "1.00" | "1.000" => return Ok(1000),
205        "0" | "0." => return Ok(0),
206        s if !s.starts_with("0.") => return Err(()),
207        _ => {}
208    };
209    let v = &s[2..];
210    let factor = match v.len() {
211        1 /* 0.x */ => 100,
212        2 /* 0.xx */ => 10,
213        3 /* 0.xxx */ => 1,
214        _ => return Err(()),
215    };
216    let v = u16::from_str(v).map_err(|_| ())?;
217    let q = v * factor;
218    Ok(q)
219}
220
221/// Returns iff it's preferable to use `Content-Encoding: gzip` when responding to the given
222/// request, rather than no content coding.
223///
224/// Use via `should_gzip(req.headers())`.
225///
226/// Follows the rules of [RFC 7231 section
227/// 5.3.4](https://tools.ietf.org/html/rfc7231#section-5.3.4).
228pub fn should_gzip(headers: &HeaderMap) -> bool {
229    let v = match headers.get(header::ACCEPT_ENCODING) {
230        None => return false,
231        Some(v) => v,
232    };
233    let (mut gzip_q, mut identity_q, mut star_q) = (None, None, None);
234    let parts = match v.to_str() {
235        Ok(s) => s.split(','),
236        Err(_) => return false,
237    };
238    for qi in parts {
239        // Parse.
240        let qi = qi.trim();
241        let mut parts = qi.rsplitn(2, ';').map(|p| p.trim());
242        let last_part = parts
243            .next()
244            .expect("rsplitn should return at least one part");
245        let coding;
246        let quality;
247        match parts.next() {
248            None => {
249                coding = last_part;
250                quality = 1000;
251            }
252            Some(c) => {
253                if !last_part.starts_with("q=") {
254                    return false; // unparseable.
255                }
256                let q = &last_part[2..];
257                match parse_qvalue(q) {
258                    Ok(q) => {
259                        coding = c;
260                        quality = q;
261                    }
262                    Err(_) => return false, // unparseable.
263                };
264            }
265        }
266
267        if coding == "gzip" {
268            gzip_q = Some(quality);
269        } else if coding == "identity" {
270            identity_q = Some(quality);
271        } else if coding == "*" {
272            star_q = Some(quality);
273        }
274    }
275
276    let gzip_q = gzip_q.or(star_q).unwrap_or(0);
277
278    // "If the representation has no content-coding, then it is
279    // acceptable by default unless specifically excluded by the
280    // Accept-Encoding field stating either "identity;q=0" or "*;q=0"
281    // without a more specific entry for "identity"."
282    let identity_q = identity_q.or(star_q).unwrap_or(1);
283
284    gzip_q > 0 && gzip_q >= identity_q
285}
286
287/// A builder returned by [streaming_body].
288pub struct StreamingBodyBuilder {
289    chunk_size: usize,
290    gzip_level: u32,
291    should_gzip: bool,
292    body_needed: bool,
293}
294
295/// Creates a response and streaming body writer for the given request.
296///
297/// The streaming body writer is currently `Some(writer)` for `GET` requests and
298/// `None` for `HEAD` requests. In the future, `streaming_body` may also support
299/// conditional `GET` requests.
300///
301/// ```
302/// # use http::{Request, Response, header::{self, HeaderValue}};
303/// use std::io::Write as _;
304///
305/// fn respond(req: Request<hyper::Body>) -> std::io::Result<Response<hyper::Body>> {
306///     let (mut resp, stream) = http_serve::streaming_body(&req).build();
307///     if let Some(mut w) = stream {
308///         write!(&mut w, "hello world")?;
309///     }
310///     resp.headers_mut().insert(header::CONTENT_TYPE, HeaderValue::from_static("text/plain"));
311///     Ok(resp)
312/// }
313/// ```
314///
315/// The caller may also continue appending to `stream` after returning the response to `hyper`.
316/// The response will end when `stream` is dropped. The only disadvantage to writing to the stream
317/// after the fact is that there's no way to report mid-response errors other than abruptly closing
318/// the TCP connection ([BodyWriter::abort]).
319///
320/// ```
321/// # use http::{Request, Response, header::{self, HeaderValue}};
322/// use std::io::Write as _;
323///
324/// fn respond(req: Request<hyper::Body>) -> std::io::Result<Response<hyper::Body>> {
325///     let (mut resp, stream) = http_serve::streaming_body(&req).build();
326///     if let Some(mut w) = stream {
327///         tokio::spawn(async move {
328///             for i in 0..10 {
329///                 tokio::time::sleep(std::time::Duration::from_secs(1)).await;
330///                 write!(&mut w, "write {}\n", i)?;
331///             }
332///             Ok::<_, std::io::Error>(())
333///         });
334///     }
335///     resp.headers_mut().insert(header::CONTENT_TYPE, HeaderValue::from_static("text/plain"));
336///     Ok(resp)
337/// }
338/// ```
339pub fn streaming_body<T>(req: &http::Request<T>) -> StreamingBodyBuilder {
340    StreamingBodyBuilder {
341        chunk_size: 4096,
342        gzip_level: 6,
343        should_gzip: should_gzip(req.headers()),
344        body_needed: *req.method() != http::method::Method::HEAD,
345    }
346}
347
348impl StreamingBodyBuilder {
349    /// Sets the size of a data chunk.
350    ///
351    /// This is a compromise between memory usage and efficiency. The default of 4096 is usually
352    /// fine; increasing will likely only be noticeably more efficient when compression is off.
353    pub fn with_chunk_size(self, chunk_size: usize) -> Self {
354        StreamingBodyBuilder { chunk_size, ..self }
355    }
356
357    /// Sets the gzip compression level. Defaults to 6.
358    ///
359    /// `gzip_level` should be an integer between 0 and 9 (inclusive).
360    /// 0 means no compression; 9 gives the best compression (but most CPU usage).
361    ///
362    /// This is only effective if the client supports compression.
363    pub fn with_gzip_level(self, gzip_level: u32) -> Self {
364        StreamingBodyBuilder { gzip_level, ..self }
365    }
366
367    /// Returns the HTTP response and, if the request is a `GET`, a body writer.
368    pub fn build<P, D, E>(self) -> (http::Response<P>, Option<BodyWriter<D, E>>)
369    where
370        D: From<Vec<u8>> + Send + Sync,
371        E: Send + Sync,
372        P: From<Box<dyn Stream<Item = Result<D, E>> + Send>>,
373    {
374        let (w, stream) = chunker::BodyWriter::with_chunk_size(self.chunk_size);
375        let mut resp = http::Response::new(stream.into());
376        resp.headers_mut()
377            .append(header::VARY, HeaderValue::from_static("accept-encoding"));
378
379        if self.should_gzip && self.gzip_level > 0 {
380            resp.headers_mut()
381                .append(header::CONTENT_ENCODING, HeaderValue::from_static("gzip"));
382        }
383
384        if !self.body_needed {
385            return (resp, None);
386        }
387
388        let w = match self.should_gzip && self.gzip_level > 0 {
389            true => BodyWriter::gzipped(w, flate2::Compression::new(self.gzip_level)),
390            false => BodyWriter::raw(w),
391        };
392
393        (resp, Some(w))
394    }
395}
396
397#[cfg(test)]
398mod tests {
399    use http::header::HeaderValue;
400    use http::{self, header};
401
402    fn ae_hdrs(value: &'static str) -> http::HeaderMap {
403        let mut h = http::HeaderMap::new();
404        h.insert(header::ACCEPT_ENCODING, HeaderValue::from_static(value));
405        h
406    }
407
408    #[test]
409    fn parse_qvalue() {
410        use super::parse_qvalue;
411        assert_eq!(parse_qvalue("0"), Ok(0));
412        assert_eq!(parse_qvalue("0."), Ok(0));
413        assert_eq!(parse_qvalue("0.0"), Ok(0));
414        assert_eq!(parse_qvalue("0.00"), Ok(0));
415        assert_eq!(parse_qvalue("0.000"), Ok(0));
416        assert_eq!(parse_qvalue("0.0000"), Err(()));
417        assert_eq!(parse_qvalue("0.2"), Ok(200));
418        assert_eq!(parse_qvalue("0.23"), Ok(230));
419        assert_eq!(parse_qvalue("0.234"), Ok(234));
420        assert_eq!(parse_qvalue("1"), Ok(1000));
421        assert_eq!(parse_qvalue("1."), Ok(1000));
422        assert_eq!(parse_qvalue("1.0"), Ok(1000));
423        assert_eq!(parse_qvalue("1.1"), Err(()));
424        assert_eq!(parse_qvalue("1.00"), Ok(1000));
425        assert_eq!(parse_qvalue("1.000"), Ok(1000));
426        assert_eq!(parse_qvalue("1.001"), Err(()));
427        assert_eq!(parse_qvalue("1.0000"), Err(()));
428        assert_eq!(parse_qvalue("2"), Err(()));
429    }
430
431    #[test]
432    fn should_gzip() {
433        // "A request without an Accept-Encoding header field implies that the
434        // user agent has no preferences regarding content-codings. Although
435        // this allows the server to use any content-coding in a response, it
436        // does not imply that the user agent will be able to correctly process
437        // all encodings." Identity seems safer; don't gzip.
438        assert!(!super::should_gzip(&header::HeaderMap::new()));
439
440        // "If the representation's content-coding is one of the
441        // content-codings listed in the Accept-Encoding field, then it is
442        // acceptable unless it is accompanied by a qvalue of 0.  (As
443        // defined in Section 5.3.1, a qvalue of 0 means "not acceptable".)"
444        assert!(super::should_gzip(&ae_hdrs("gzip")));
445        assert!(super::should_gzip(&ae_hdrs("gzip;q=0.001")));
446        assert!(!super::should_gzip(&ae_hdrs("gzip;q=0")));
447
448        // "An Accept-Encoding header field with a combined field-value that is
449        // empty implies that the user agent does not want any content-coding in
450        // response."
451        assert!(!super::should_gzip(&ae_hdrs("")));
452
453        // "The asterisk "*" symbol in an Accept-Encoding field
454        // matches any available content-coding not explicitly listed in the
455        // header field."
456        assert!(super::should_gzip(&ae_hdrs("*")));
457        assert!(!super::should_gzip(&ae_hdrs("gzip;q=0, *")));
458        assert!(super::should_gzip(&ae_hdrs("identity=q=0, *")));
459
460        // "If multiple content-codings are acceptable, then the acceptable
461        // content-coding with the highest non-zero qvalue is preferred."
462        assert!(super::should_gzip(&ae_hdrs("identity;q=0.5, gzip;q=1.0")));
463        assert!(!super::should_gzip(&ae_hdrs("identity;q=1.0, gzip;q=0.5")));
464
465        // "If an Accept-Encoding header field is present in a request
466        // and none of the available representations for the response have a
467        // content-coding that is listed as acceptable, the origin server SHOULD
468        // send a response without any content-coding."
469        assert!(!super::should_gzip(&ae_hdrs("*;q=0")));
470    }
471}