diesel 1.4.8 - Docs.rs

#![doc(hidden)]
//! A primer on prepared statement caching in Diesel
//! ------------------------------------------------
//!
//! Diesel uses prepared statements for virtually all queries. This is most
//! visible in our lack of any sort of "quoting" API. Values must always be
//! transmitted as bind parameters, we do not support direct interpolation. The
//! only method in the public API that doesn't require the use of prepared
//! statements is `Connection#batch_execute`.
//!
//! In order to avoid the cost of re-parsing and planning subsequent queries,
//! Diesel caches the prepared statement whenever possible. Queries will fall
//! into one of three buckets:
//!
//! - Unsafe to cache
//! - Cached by SQL
//! - Cached by type
//!
//! A query is considered unsafe to cache if it represents a potentially
//! unbounded number of queries. This is communicated to the connection through
//! `QueryFragment#is_safe_to_cache_prepared`. While this is done as a full AST
//! pass, after monomorphisation and inlining this will usually be optimized to
//! a constant. Only boxed queries will need to do actual work to answer this
//! question.
//!
//! The majority of AST nodes are safe to cache if their components are safe to
//! cache. There are at least 4 cases where a query is unsafe to cache:
//!
//! - queries containing `IN` with bind parameters
//!     - This requires 1 bind parameter per value, and is therefore unbounded
//!     - `IN` with subselects are cached (assuming the subselect is safe to
//!        cache)
//! - `INSERT` statements
//!     - The SQL varies based on the number of rows being inserted. We could in
//!       theory cache the single row case, but the cost of parsing in a write
//!       query is pretty insignificant compared to reads
//! - `UPDATE` statements
//!     - Technically it's bounded on "number of optional values being passed to
//!       `SET` factorial" but that's still quite high, and not worth caching
//!       for the same reason as single row inserts
//! - `SqlLiteral` nodes
//!     - We have no way of knowing whether the SQL was generated dynamically or
//!       not, so we must assume that it's unbounded
//!
//! For queries which are unsafe to cache, the statement cache will never insert
//! them. They will be prepared and immediately released after use (or in the
//! case of PG they will use the unnamed prepared statement).
//!
//! For statements which are able to be cached, we then have to determine what
//! to use as the cache key. The standard method that virtually all ORMs or
//! database access layers use in the wild is to store the statements in a
//! hash map, using the SQL as the key.
//!
//! However, the majority of queries using Diesel that are safe to cache as
//! prepared statements will be uniquely identified by their type. For these
//! queries, we can bypass the query builder entirely. Since our AST is
//! generally optimized away by the compiler, for these queries the cost of
//! fetching a prepared statement from the cache is the cost of `HashMap<u32,
//! _>::get`, where the key we're fetching by is a compile time constant. For
//! these types, the AST pass to gather the bind parameters will also be
//! optimized to accessing each parameter individually.
//!
//! Determining if a query can be cached by type is the responsibility of the
//! `QueryId` trait. This trait is quite similar to `Any`, but with a few
//! differences:
//!
//! - No `'static` bound
//!     - Something being a reference never changes the SQL that is generated,
//!       so `&T` has the same query id as `T`.
//! - `Option<TypeId>` instead of `TypeId`
//!     - We need to be able to constrain on this trait being implemented, but
//!       not all types will actually have a static query id. Hopefully once
//!       specialization is stable we can remove the `QueryId` bound and
//!       specialize on it instead (or provide a blanket impl for all `T`)
//! - Implementors give a more broad type than `Self`
//!     - This really only affects bind parameters. There are 6 different Rust
//!       types which can be used for a parameter of type `timestamp`. The same
//!       statement can be used regardless of the Rust type, so `Bound<ST, T>`
//!       defines its `QueryId` as `Bound<ST, ()>`.
//!
//! A type returning `Some(id)` or `None` for its query ID is based on whether
//! the SQL it generates can change without the type changing. At the moment,
//! the only type which is safe to cache as a prepared statement but does not
//! have a static query ID is something which has been boxed.
//!
//! One potential optimization that we don't perform is storing the queries
//! which are cached by type ID in a separate map. Since a type ID is a u64,
//! this would allow us to use a specialized map which knows that there will
//! never be hashing collisions (also known as a perfect hashing function),
//! which would mean lookups are always constant time. However, this would save
//! nanoseconds on an operation that will take microseconds or even
//! milliseconds.

use std::any::TypeId;
use std::borrow::Cow;
use std::cell::{RefCell, RefMut};
use std::collections::HashMap;
use std::hash::Hash;
use std::ops::{Deref, DerefMut};

use backend::Backend;
use query_builder::*;
use result::QueryResult;

#[doc(hidden)]
#[allow(missing_debug_implementations)]
pub struct StatementCache<DB: Backend, Statement> {
    pub cache: RefCell<HashMap<StatementCacheKey<DB>, Statement>>,
}

#[allow(clippy::len_without_is_empty, clippy::new_without_default_derive)]
impl<DB, Statement> StatementCache<DB, Statement>
where
    DB: Backend,
    DB::TypeMetadata: Clone,
    DB::QueryBuilder: Default,
    StatementCacheKey<DB>: Hash + Eq,
{
    pub fn new() -> Self {
        StatementCache {
            cache: RefCell::new(HashMap::new()),
        }
    }

    pub fn len(&self) -> usize {
        self.cache.borrow().len()
    }

    pub fn cached_statement<T, F>(
        &self,
        source: &T,
        bind_types: &[DB::TypeMetadata],
        prepare_fn: F,
    ) -> QueryResult<MaybeCached<Statement>>
    where
        T: QueryFragment<DB> + QueryId,
        F: FnOnce(&str) -> QueryResult<Statement>,
    {
        use std::collections::hash_map::Entry::{Occupied, Vacant};

        let cache_key = StatementCacheKey::for_source(source, bind_types)?;

        if !source.is_safe_to_cache_prepared()? {
            let sql = cache_key.sql(source)?;
            return prepare_fn(&sql).map(MaybeCached::CannotCache);
        }

        refmut_map_result(self.cache.borrow_mut(), |cache| {
            match cache.entry(cache_key) {
                Occupied(entry) => Ok(entry.into_mut()),
                Vacant(entry) => {
                    let statement = {
                        let sql = entry.key().sql(source)?;
                        prepare_fn(&sql)
                    };

                    Ok(entry.insert(statement?))
                }
            }
        })
        .map(MaybeCached::Cached)
    }
}

#[doc(hidden)]
#[allow(missing_debug_implementations)]
pub enum MaybeCached<'a, T: 'a> {
    CannotCache(T),
    Cached(RefMut<'a, T>),
}

impl<'a, T> Deref for MaybeCached<'a, T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        match *self {
            MaybeCached::CannotCache(ref x) => x,
            MaybeCached::Cached(ref x) => &**x,
        }
    }
}

impl<'a, T> DerefMut for MaybeCached<'a, T> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        match *self {
            MaybeCached::CannotCache(ref mut x) => x,
            MaybeCached::Cached(ref mut x) => &mut **x,
        }
    }
}

#[doc(hidden)]
#[allow(missing_debug_implementations)]
#[derive(Hash, PartialEq, Eq)]
pub enum StatementCacheKey<DB: Backend> {
    Type(TypeId),
    Sql {
        sql: String,
        bind_types: Vec<DB::TypeMetadata>,
    },
}

impl<DB> StatementCacheKey<DB>
where
    DB: Backend,
    DB::QueryBuilder: Default,
    DB::TypeMetadata: Clone,
{
    pub fn for_source<T>(source: &T, bind_types: &[DB::TypeMetadata]) -> QueryResult<Self>
    where
        T: QueryFragment<DB> + QueryId,
    {
        match T::query_id() {
            Some(id) => Ok(StatementCacheKey::Type(id)),
            None => {
                let sql = Self::construct_sql(source)?;
                Ok(StatementCacheKey::Sql {
                    sql: sql,
                    bind_types: bind_types.into(),
                })
            }
        }
    }

    pub fn sql<T: QueryFragment<DB>>(&self, source: &T) -> QueryResult<Cow<str>> {
        match *self {
            StatementCacheKey::Type(_) => Self::construct_sql(source).map(Cow::Owned),
            StatementCacheKey::Sql { ref sql, .. } => Ok(Cow::Borrowed(sql)),
        }
    }

    fn construct_sql<T: QueryFragment<DB>>(source: &T) -> QueryResult<String> {
        let mut query_builder = DB::QueryBuilder::default();
        source.to_sql(&mut query_builder)?;
        Ok(query_builder.finish())
    }
}

/// Similar to `RefMut::map`, but for functions which return `Result`
///
/// If we were in Haskell (and if `RefMut` were a functor), this would just be
/// `sequenceA`.
fn refmut_map_result<T, U, F>(mut refmut: RefMut<T>, mapper: F) -> QueryResult<RefMut<U>>
where
    F: FnOnce(&mut T) -> QueryResult<&mut U>,
{
    // We can't just use `RefMut::map` here, since to lift the error out of that
    // closure we'd need to return *something*.
    //
    // Instead we will very briefly convert to a raw pointer to eliminate
    // lifetimes from the equation. Ultimately the cast is safe since the input
    // and output lifetimes are identical. However, without the raw pointer
    // we would have two live mutable references at the same time.
    let ptr = mapper(&mut *refmut).map(|x| x as *mut _)?;
    Ok(RefMut::map(refmut, |_| unsafe { &mut *ptr }))
}