http_serve/lib.rs
1// Copyright (c) 2016-2021 The http-serve developers
2//
3// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE.txt or
4// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
5// <LICENSE-MIT.txt or http://opensource.org/licenses/MIT>, at your
6// option. This file may not be copied, modified, or distributed
7// except according to those terms.
8
9//! Helpers for serving HTTP GET and HEAD responses asynchronously with the
10//! [http](http://crates.io/crates/http) crate and [tokio](https://crates.io/crates/tokio).
11//! Works well with [hyper](https://crates.io/crates/hyper) 0.14.x.
12//!
13//! This crate supplies two ways to respond to HTTP GET and HEAD requests:
14//!
15//! * the `serve` function can be used to serve an `Entity`, a trait representing reusable,
16//! byte-rangeable HTTP entities. `Entity` must be able to produce exactly the same data on
17//! every call, know its size in advance, and be able to produce portions of the data on demand.
18//! * the `streaming_body` function can be used to add a body to an otherwise-complete response.
19//! If a body is needed (on `GET` rather than `HEAD` requests), it returns a `BodyWriter`
20//! (which implements `std::io::Writer`). The caller should produce the complete body or call
21//! `BodyWriter::abort`, causing the HTTP stream to terminate abruptly.
22//!
23//! It supplies a static file `Entity` implementation and a (currently Unix-only)
24//! helper for serving a full directory tree from the local filesystem, including
25//! automatically looking for `.gz`-suffixed files when the client advertises
26//! `Accept-Encoding: gzip`.
27//!
28//! # Why two ways?
29//!
30//! They have pros and cons. This table shows some of them:
31//!
32//! <table>
33//! <tr><th><th><code>serve</code><th><code>streaming_body</code></tr>
34//! <tr><td>automatic byte range serving<td>yes<td>no [<a href="#range">1</a>]</tr>
35//! <tr><td>backpressure<td>yes<td>no [<a href="#backpressure">2</a>]</tr>
36//! <tr><td>conditional GET<td>yes<td>no [<a href="#conditional_get">3</a>]</tr>
37//! <tr><td>sends first byte before length known<td>no<td>yes</tr>
38//! <tr><td>automatic gzip content encoding<td>no [<a href="#gzip">4</a>]<td>yes</tr>
39//! </table>
40//!
41//! <a name="range">\[1\]</a>: `streaming_body` always sends the full body. Byte range serving
42//! wouldn't make much sense with its interface. The application will generate all the bytes
43//! every time anyway, and `http-serve`'s buffering logic would have to be complex
44//! to handle multiple ranges well.
45//!
46//! <a name="backpressure">\[2\]</a>: `streaming_body` is often appended to while holding
47//! a lock or open database transaction, where backpressure is undesired. It'd be
48//! possible to add support for "wait points" where the caller explicitly wants backpressure. This
49//! would make it more suitable for large streams, even infinite streams like
50//! [Server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events).
51//!
52//! <a name="conditional_get">\[3\]</a>: `streaming_body` doesn't yet support
53//! generating etags or honoring conditional GET requests. PRs welcome!
54//!
55//! <a name="gzip">\[4\]</a>: `serve` doesn't automatically apply `Content-Encoding:
56//! gzip` because the content encoding is a property of the entity you supply. The
57//! entity's etag, length, and byte range boundaries must match the encoding. You
58//! can use the `http_serve::should_gzip` helper to decide between supplying a plain
59//! or gzipped entity. `serve` could automatically apply the related
60//! `Transfer-Encoding: gzip` where the browser requests it via `TE: gzip`, but
61//! common browsers have
62//! [chosen](https://bugs.chromium.org/p/chromium/issues/detail?id=94730) to avoid
63//! requesting or handling `Transfer-Encoding`.
64//!
65//! Use `serve` when:
66//!
67//! * metadata (length, etag, etc) and byte ranges can be regenerated cheaply and consistently
68//! via a lazy `Entity`, or
69//! * data can be fully buffered in memory or on disk and reused many times. You may want to
70//! create a pair of buffers for gzipped (for user-agents which specify `Accept-Encoding:
71//! gzip`) vs raw.
72//!
73//! Use `streaming_body` when regenerating the entire body each time a response is sent.
74//!
75//! Once you return a `hyper::server::Response` to hyper, your only way to signal error to the
76//! client is to abruptly close the HTTP connection while sending the body. If you want the ability
77//! to return a well-formatted error to the client while producing body bytes, you must buffer the
78//! entire body in-memory before returning anything to hyper.
79//!
80//! If you are buffering a response in memory, `serve` requires copying the bytes (when using
81//! `Data = Vec<u8>` or similar) or atomic reference-counting (with `Data = Arc<Vec<u8>>` or
82//! similar). `streaming_body` doesn't need to keep its own copy for potential future use; it may
83//! be cheaper because it can simply hand ownership of the existing `Vec<u8>`s to hyper.
84//!
85//! # Why the weird type bounds? Why not use `hyper::Body` and `bytes::Bytes` for everything?
86//!
87//! These bounds are compatible with `hyper::Body` and `bytes::Bytes`, and most callers will use
88//! those types. **Note:** if you see an error like the one below, ensure you are using hyper's
89//! `stream` feature:
90//!
91//! ```text
92//! error[E0277]: the trait bound `Body: From<Box<(dyn futures::Stream<Item = Result<_, _>> +
93//! std::marker::Send + 'static)>>` is not satisfied
94//! ```
95//!
96//! `Cargo.toml` should look similar to the following:
97//!
98//! ```toml
99//! hyper = { version = "0.14.7", features = ["stream"] }
100//! ```
101//!
102//! There are times when it's desirable to have more flexible ownership provided by a
103//! type such as `reffers::ARefs<'static, [u8]>`. One is `mmap`-based file serving:
104//! `bytes::Bytes` would require copying the data in each chunk. An implementation with `ARefs`
105//! could instead `mmap` and `mlock` the data on another thread and provide chunks which `munmap`
106//! when dropped. In these cases, the caller can supply an alternate implementation of the
107//! `http_body::Body` trait which uses a different `Data` type than `bytes::Bytes`.
108
109#![deny(clippy::print_stderr, clippy::print_stdout)]
110#![cfg_attr(docsrs, feature(doc_cfg))]
111
112use bytes::Buf;
113use futures_core::Stream;
114use http::header::{self, HeaderMap, HeaderValue};
115use std::ops::Range;
116use std::str::FromStr;
117use std::time::SystemTime;
118
119/// Returns a HeaderValue for the given formatted data.
120/// Caller must make two guarantees:
121/// * The data fits within `max_len` (or the write will panic).
122/// * The data are ASCII (or HeaderValue's safety will be violated).
123macro_rules! unsafe_fmt_ascii_val {
124 ($max_len:expr, $fmt:expr, $($arg:tt)+) => {{
125 let mut buf = bytes::BytesMut::with_capacity($max_len);
126 use std::fmt::Write;
127 write!(buf, $fmt, $($arg)*).expect("fmt_val fits within provided max len");
128 unsafe {
129 http::header::HeaderValue::from_maybe_shared_unchecked(buf.freeze())
130 }
131 }}
132}
133
134mod chunker;
135
136#[cfg(feature = "dir")]
137#[cfg_attr(docsrs, doc(cfg(feature = "dir")))]
138pub mod dir;
139
140mod etag;
141mod file;
142mod gzip;
143mod platform;
144mod range;
145mod serving;
146
147pub use crate::file::ChunkedReadFile;
148pub use crate::gzip::BodyWriter;
149pub use crate::serving::serve;
150
151/// A reusable, read-only, byte-rangeable HTTP entity for GET and HEAD serving.
152/// Must return exactly the same data on every call.
153pub trait Entity: 'static + Send + Sync {
154 type Error: 'static + Send + Sync;
155
156 /// The type of a data chunk.
157 ///
158 /// Commonly `bytes::Bytes` but may be something more exotic.
159 type Data: 'static + Send + Sync + Buf + From<Vec<u8>> + From<&'static [u8]>;
160
161 /// Returns the length of the entity's body in bytes.
162 fn len(&self) -> u64;
163
164 /// Returns true iff the entity's body has length 0.
165 fn is_empty(&self) -> bool {
166 self.len() == 0
167 }
168
169 /// Gets the body bytes indicated by `range`.
170 fn get_range(
171 &self,
172 range: Range<u64>,
173 ) -> Box<dyn Stream<Item = Result<Self::Data, Self::Error>> + Send + Sync>;
174
175 /// Adds entity headers such as `Content-Type` to the supplied `Headers` object.
176 /// In particular, these headers are the "other representation header fields" described by [RFC
177 /// 7233 section 4.1](https://tools.ietf.org/html/rfc7233#section-4.1); they should exclude
178 /// `Content-Range`, `Date`, `Cache-Control`, `ETag`, `Expires`, `Content-Location`, and `Vary`.
179 ///
180 /// This function will be called only when that section says that headers such as
181 /// `Content-Type` should be included in the response.
182 fn add_headers(&self, _: &mut HeaderMap);
183
184 /// Returns an etag for this entity, if available.
185 /// Implementations are encouraged to provide a strong etag. [RFC 7232 section
186 /// 2.1](https://tools.ietf.org/html/rfc7232#section-2.1) notes that only strong etags
187 /// are usable for sub-range retrieval.
188 fn etag(&self) -> Option<HeaderValue>;
189
190 /// Returns the last modified time of this entity, if available.
191 /// Note that `serve` may serve an earlier `Last-Modified:` date than the one returned here if
192 /// this time is in the future, as required by [RFC 7232 section
193 /// 2.2.1](https://tools.ietf.org/html/rfc7232#section-2.2.1).
194 fn last_modified(&self) -> Option<SystemTime>;
195}
196
197/// Parses an RFC 7231 section 5.3.1 `qvalue` into an integer in [0, 1000].
198/// ```text
199/// qvalue = ( "0" [ "." 0*3DIGIT ] )
200/// / ( "1" [ "." 0*3("0") ] )
201/// ```
202fn parse_qvalue(s: &str) -> Result<u16, ()> {
203 match s {
204 "1" | "1." | "1.0" | "1.00" | "1.000" => return Ok(1000),
205 "0" | "0." => return Ok(0),
206 s if !s.starts_with("0.") => return Err(()),
207 _ => {}
208 };
209 let v = &s[2..];
210 let factor = match v.len() {
211 1 /* 0.x */ => 100,
212 2 /* 0.xx */ => 10,
213 3 /* 0.xxx */ => 1,
214 _ => return Err(()),
215 };
216 let v = u16::from_str(v).map_err(|_| ())?;
217 let q = v * factor;
218 Ok(q)
219}
220
221/// Returns iff it's preferable to use `Content-Encoding: gzip` when responding to the given
222/// request, rather than no content coding.
223///
224/// Use via `should_gzip(req.headers())`.
225///
226/// Follows the rules of [RFC 7231 section
227/// 5.3.4](https://tools.ietf.org/html/rfc7231#section-5.3.4).
228pub fn should_gzip(headers: &HeaderMap) -> bool {
229 let v = match headers.get(header::ACCEPT_ENCODING) {
230 None => return false,
231 Some(v) => v,
232 };
233 let (mut gzip_q, mut identity_q, mut star_q) = (None, None, None);
234 let parts = match v.to_str() {
235 Ok(s) => s.split(','),
236 Err(_) => return false,
237 };
238 for qi in parts {
239 // Parse.
240 let qi = qi.trim();
241 let mut parts = qi.rsplitn(2, ';').map(|p| p.trim());
242 let last_part = parts
243 .next()
244 .expect("rsplitn should return at least one part");
245 let coding;
246 let quality;
247 match parts.next() {
248 None => {
249 coding = last_part;
250 quality = 1000;
251 }
252 Some(c) => {
253 if !last_part.starts_with("q=") {
254 return false; // unparseable.
255 }
256 let q = &last_part[2..];
257 match parse_qvalue(q) {
258 Ok(q) => {
259 coding = c;
260 quality = q;
261 }
262 Err(_) => return false, // unparseable.
263 };
264 }
265 }
266
267 if coding == "gzip" {
268 gzip_q = Some(quality);
269 } else if coding == "identity" {
270 identity_q = Some(quality);
271 } else if coding == "*" {
272 star_q = Some(quality);
273 }
274 }
275
276 let gzip_q = gzip_q.or(star_q).unwrap_or(0);
277
278 // "If the representation has no content-coding, then it is
279 // acceptable by default unless specifically excluded by the
280 // Accept-Encoding field stating either "identity;q=0" or "*;q=0"
281 // without a more specific entry for "identity"."
282 let identity_q = identity_q.or(star_q).unwrap_or(1);
283
284 gzip_q > 0 && gzip_q >= identity_q
285}
286
287/// A builder returned by [streaming_body].
288pub struct StreamingBodyBuilder {
289 chunk_size: usize,
290 gzip_level: u32,
291 should_gzip: bool,
292 body_needed: bool,
293}
294
295/// Creates a response and streaming body writer for the given request.
296///
297/// The streaming body writer is currently `Some(writer)` for `GET` requests and
298/// `None` for `HEAD` requests. In the future, `streaming_body` may also support
299/// conditional `GET` requests.
300///
301/// ```
302/// # use http::{Request, Response, header::{self, HeaderValue}};
303/// use std::io::Write as _;
304///
305/// fn respond(req: Request<hyper::Body>) -> std::io::Result<Response<hyper::Body>> {
306/// let (mut resp, stream) = http_serve::streaming_body(&req).build();
307/// if let Some(mut w) = stream {
308/// write!(&mut w, "hello world")?;
309/// }
310/// resp.headers_mut().insert(header::CONTENT_TYPE, HeaderValue::from_static("text/plain"));
311/// Ok(resp)
312/// }
313/// ```
314///
315/// The caller may also continue appending to `stream` after returning the response to `hyper`.
316/// The response will end when `stream` is dropped. The only disadvantage to writing to the stream
317/// after the fact is that there's no way to report mid-response errors other than abruptly closing
318/// the TCP connection ([BodyWriter::abort]).
319///
320/// ```
321/// # use http::{Request, Response, header::{self, HeaderValue}};
322/// use std::io::Write as _;
323///
324/// fn respond(req: Request<hyper::Body>) -> std::io::Result<Response<hyper::Body>> {
325/// let (mut resp, stream) = http_serve::streaming_body(&req).build();
326/// if let Some(mut w) = stream {
327/// tokio::spawn(async move {
328/// for i in 0..10 {
329/// tokio::time::sleep(std::time::Duration::from_secs(1)).await;
330/// write!(&mut w, "write {}\n", i)?;
331/// }
332/// Ok::<_, std::io::Error>(())
333/// });
334/// }
335/// resp.headers_mut().insert(header::CONTENT_TYPE, HeaderValue::from_static("text/plain"));
336/// Ok(resp)
337/// }
338/// ```
339pub fn streaming_body<T>(req: &http::Request<T>) -> StreamingBodyBuilder {
340 StreamingBodyBuilder {
341 chunk_size: 4096,
342 gzip_level: 6,
343 should_gzip: should_gzip(req.headers()),
344 body_needed: *req.method() != http::method::Method::HEAD,
345 }
346}
347
348impl StreamingBodyBuilder {
349 /// Sets the size of a data chunk.
350 ///
351 /// This is a compromise between memory usage and efficiency. The default of 4096 is usually
352 /// fine; increasing will likely only be noticeably more efficient when compression is off.
353 pub fn with_chunk_size(self, chunk_size: usize) -> Self {
354 StreamingBodyBuilder { chunk_size, ..self }
355 }
356
357 /// Sets the gzip compression level. Defaults to 6.
358 ///
359 /// `gzip_level` should be an integer between 0 and 9 (inclusive).
360 /// 0 means no compression; 9 gives the best compression (but most CPU usage).
361 ///
362 /// This is only effective if the client supports compression.
363 pub fn with_gzip_level(self, gzip_level: u32) -> Self {
364 StreamingBodyBuilder { gzip_level, ..self }
365 }
366
367 /// Returns the HTTP response and, if the request is a `GET`, a body writer.
368 pub fn build<P, D, E>(self) -> (http::Response<P>, Option<BodyWriter<D, E>>)
369 where
370 D: From<Vec<u8>> + Send + Sync,
371 E: Send + Sync,
372 P: From<Box<dyn Stream<Item = Result<D, E>> + Send>>,
373 {
374 let (w, stream) = chunker::BodyWriter::with_chunk_size(self.chunk_size);
375 let mut resp = http::Response::new(stream.into());
376 resp.headers_mut()
377 .append(header::VARY, HeaderValue::from_static("accept-encoding"));
378
379 if self.should_gzip && self.gzip_level > 0 {
380 resp.headers_mut()
381 .append(header::CONTENT_ENCODING, HeaderValue::from_static("gzip"));
382 }
383
384 if !self.body_needed {
385 return (resp, None);
386 }
387
388 let w = match self.should_gzip && self.gzip_level > 0 {
389 true => BodyWriter::gzipped(w, flate2::Compression::new(self.gzip_level)),
390 false => BodyWriter::raw(w),
391 };
392
393 (resp, Some(w))
394 }
395}
396
397#[cfg(test)]
398mod tests {
399 use http::header::HeaderValue;
400 use http::{self, header};
401
402 fn ae_hdrs(value: &'static str) -> http::HeaderMap {
403 let mut h = http::HeaderMap::new();
404 h.insert(header::ACCEPT_ENCODING, HeaderValue::from_static(value));
405 h
406 }
407
408 #[test]
409 fn parse_qvalue() {
410 use super::parse_qvalue;
411 assert_eq!(parse_qvalue("0"), Ok(0));
412 assert_eq!(parse_qvalue("0."), Ok(0));
413 assert_eq!(parse_qvalue("0.0"), Ok(0));
414 assert_eq!(parse_qvalue("0.00"), Ok(0));
415 assert_eq!(parse_qvalue("0.000"), Ok(0));
416 assert_eq!(parse_qvalue("0.0000"), Err(()));
417 assert_eq!(parse_qvalue("0.2"), Ok(200));
418 assert_eq!(parse_qvalue("0.23"), Ok(230));
419 assert_eq!(parse_qvalue("0.234"), Ok(234));
420 assert_eq!(parse_qvalue("1"), Ok(1000));
421 assert_eq!(parse_qvalue("1."), Ok(1000));
422 assert_eq!(parse_qvalue("1.0"), Ok(1000));
423 assert_eq!(parse_qvalue("1.1"), Err(()));
424 assert_eq!(parse_qvalue("1.00"), Ok(1000));
425 assert_eq!(parse_qvalue("1.000"), Ok(1000));
426 assert_eq!(parse_qvalue("1.001"), Err(()));
427 assert_eq!(parse_qvalue("1.0000"), Err(()));
428 assert_eq!(parse_qvalue("2"), Err(()));
429 }
430
431 #[test]
432 fn should_gzip() {
433 // "A request without an Accept-Encoding header field implies that the
434 // user agent has no preferences regarding content-codings. Although
435 // this allows the server to use any content-coding in a response, it
436 // does not imply that the user agent will be able to correctly process
437 // all encodings." Identity seems safer; don't gzip.
438 assert!(!super::should_gzip(&header::HeaderMap::new()));
439
440 // "If the representation's content-coding is one of the
441 // content-codings listed in the Accept-Encoding field, then it is
442 // acceptable unless it is accompanied by a qvalue of 0. (As
443 // defined in Section 5.3.1, a qvalue of 0 means "not acceptable".)"
444 assert!(super::should_gzip(&ae_hdrs("gzip")));
445 assert!(super::should_gzip(&ae_hdrs("gzip;q=0.001")));
446 assert!(!super::should_gzip(&ae_hdrs("gzip;q=0")));
447
448 // "An Accept-Encoding header field with a combined field-value that is
449 // empty implies that the user agent does not want any content-coding in
450 // response."
451 assert!(!super::should_gzip(&ae_hdrs("")));
452
453 // "The asterisk "*" symbol in an Accept-Encoding field
454 // matches any available content-coding not explicitly listed in the
455 // header field."
456 assert!(super::should_gzip(&ae_hdrs("*")));
457 assert!(!super::should_gzip(&ae_hdrs("gzip;q=0, *")));
458 assert!(super::should_gzip(&ae_hdrs("identity=q=0, *")));
459
460 // "If multiple content-codings are acceptable, then the acceptable
461 // content-coding with the highest non-zero qvalue is preferred."
462 assert!(super::should_gzip(&ae_hdrs("identity;q=0.5, gzip;q=1.0")));
463 assert!(!super::should_gzip(&ae_hdrs("identity;q=1.0, gzip;q=0.5")));
464
465 // "If an Accept-Encoding header field is present in a request
466 // and none of the available representations for the response have a
467 // content-coding that is listed as acceptable, the origin server SHOULD
468 // send a response without any content-coding."
469 assert!(!super::should_gzip(&ae_hdrs("*;q=0")));
470 }
471}