[−][src]Crate avocado
Avocado: the strongly-typed MongoDB driver
This library allows MongoDB users to work directly with statically-typed domain model objects, instead of the dynamically and loosely-typed BSON or JSON representation which is native to Mongo.
The Prelude
Let's get this one out of the way quickly. The most useful and most
frequently utilized types from Avocado as well as the mongodb
and bson
crates are publicly re-exported under the module prelude
.
Therefore, for most purposes, it's enough to import the library in your
code like this:
#[macro_use] extern crate avocado_derive; extern crate avocado; use avocado::prelude::*;
Documents
The first step is defining your domain model / entity types. Transcoding them to and from BSON is handled by Serde and the BSON crate.
Avocado can handle any top-level entity type with the following properties:
-
It is
Serialize
andDeserialize
-
It has a serializable and deserializable unique ID which appears under the key
_id
at the top level. The corresponding field of thestruct
must be of typeUid<T>
orOption<Uid<T>>
, whereT
is the document type itself (what would beSelf
in a trait).If the
_id
field is anOption<Uid<T>>
, it must be marked with#[serde(skip_serializing_if = "Option::is_none")]
, becausenull
IDs won't be able to be returned viainsert_one
, for example (since they don't deserialize successfully as aUid<T>
). -
It has a name that is globally unique within the given MongoDB database
These constraints are captured by the Doc
trait.
Here's an example of how you can #[derive]
or manually implement Doc
for your entity types:
// Automatically, for more convenience and sensible defaults, respecting // Serde renaming conventions #[derive(Debug, Serialize, Deserialize, Doc)] struct Job { #[serde(rename = "_id")] pub id: Uid<Job>, pub description: String, pub salary: u32, }
// Manually, for complete flexibility and fine-grained control over indexes // and database operation options #[derive(Debug, Serialize, Deserialize)] struct Product { #[serde(rename = "_id")] pub id: Uid<Product>, pub name: String, pub num_employees: usize, } impl Doc for Product { // Mandatory associated items: type Id = ObjectId; const NAME: &'static str = "Product"; fn id(&self) -> Option<&Uid<Self>> { Some(&self.id) } fn set_id(&mut self, id: Uid<Self>) { self.id = id; } // optionally, you can e.g. override the `indexes()` method: fn indexes() -> Vec<IndexModel> { vec![ IndexModel { keys: doc!{ "name": IndexType::Ordered(Order::Ascending), }, options: IndexOptions::default(), } ] } }
Note that the model types Job
and Product
:
- Implement the
Serialize
andDeserialize
traits - Implement the
Debug
trait. This is not strictly necessary, however it is very strongly recommended. - Have a field which is serialized as
_id
. It doesn't matter what the name of the field is in Rust; here it'sid
but it could have been anything else, as long as it serializes/deserializes as_id
in BSON. - the
Job::Id
associated type is the underlying raw type ofUid<Job>
, and the same holds forProduct
. When derivingDoc
, it is controlled by the#[id_type = "..."]
attribute on the struct declaration. If you don't specify this attribute, the raw ID type will default toObjectId
. - the
NAME
associated constant describes and identifies the collection of values of this type.
This trait is also responsible for a couple of other collection-related
properties, such as specifying the indexes to be created on this collection,
by means of the indexes()
static method. By default, this returns an
empty vector meaning no custom indexes apart from the automatically-created
index on the _id
field.
A couple more static methods are also available for customizing the default behavior of the collection when performing various database operations, e.g. querying or insertion. If you don't implement these methods, they return sensible defaults. We'll see more on this later.
When the Doc
trait is #[derive]
d, the Id
type is bound to the type
of whichever field serializes as _id
. If there's 0 or more than 1 such
fields, you will get a compile-time error. The NAME
constant will
be set to the name of the type, respecting the #[serde(rename = "...")]
attribute at all times.
A #[derive]
d Doc
trait will not implement the various ..._options()
static methods either, leaving their implementations in the default state.
Deriving Doc
with indexes
The #[index(...)]
attribute can be applied to a type several times in
order to generate index specifications and implement the Doc::indexes()
static method. An example is provided below:
#[derive(Debug, Serialize, Deserialize)] struct NaiveDate { year: u32, month: u32, day: u32, } #[derive(Debug, Serialize, Deserialize, Doc)] #[id_type = "u64"] #[index(keys(name = "ascending"))] #[index( unique, sparse = false, name = "establishment_index", min = "-129.5", bits = 26, keys( established::year = "descending", established::month = "ascending", ) )] #[index(keys(geolocation_lng_lat = "2dsphere"))] struct Department { #[serde(rename = "_id")] guid: Uid<Department>, name: Option<String>, established: NaiveDate, employees: Vec<ObjectId>, geolocation_lng_lat: [f32; 2], } assert_eq!(Department::indexes(), &[ IndexModel { keys: doc!{ "name": IndexType::Ordered(Order::Ascending), }, options: IndexOptions::default(), }, IndexModel { keys: doc!{ "established.year": IndexType::Ordered(Order::Descending), "established.month": IndexType::Ordered(Order::Ascending), }, options: IndexOptions { unique: Some(true), sparse: Some(false), name: Some(String::from("establishment_index")), bits: Some(26), min: Some(-129.5), ..Default::default() }, }, IndexModel { keys: doc!{ "geolocation_lng_lat": IndexType::Geo2DSphere, }, options: IndexOptions::default(), }, ]);
This demonstrates the usage of the index
attribute. To sum up:
- Fields to be indexed are given as path-value pairs in the
keys
sub-attribute. The paths specify the field names whereas the values describe the type of index that should be created.-
Multi-component paths, such as
foo::bar::qux
, can be used to index a field of an embedded document or array. This is equivalent with MongoDB's "dot notation", e.g. the above example translates to the key"foo.bar.qux"
in the resuling BSON document. -
If a path (field name) occurs multiple times in the key list, the last occurrence will overwrite any previous ones.
-
The correctness of indexed field names/paths, i.e. the fact that they indeed exist in the
Doc
ument type, is not currently enforced. This is to allow indexes to be created on fields that only exist dynamically, e.g. aHashMap
which is#[serde(flatten)]
ed into its containingstruct
type.In the future, this behavior will be improved: the existence of the first segment of each field name will be enforced by default. Only the first segment is checked because further segments, referring to embedded documents/arrays, can't be checked, as the derive macro doesn't receive type information, so it only knows about the field names of the type it is being applied to. It will then be possible for individual fields to opt out of this constraint, e.g. using a
dynamic
attribute. -
The possible values of the index type are:
ascending
descending
text
hashed
2d
2dsphere
geoHaystack
-
- Additional, optional configuration attributes can be specified, such as
unique
,sparse
orname
. Thename
attribute must be string-valued. Theunique
andsparse
switches are either boolean-valued key-value pairs, or bare words. Specifying a bare word is equivalent with setting it totrue
, e.g.unique
is the same asunique = true
. - The rest of the supported options are:
max = 85.0
— maximal longitude/latitude for2d
indexes. This must be a floating-point number in the range[-180, +180]
. Use a string to specify negative values.min = "-129.5"
— minimal allowed longitude/latitude.bits = 26
— number of bits to set precision of a2d
index. Must be an integer between 1 and 32, inclusive.bucket_size
— grouping granularity ofGeoHaystack
indexes. Must be a strictly positive integer.default_language = "french"
— default language of a text index.language_override = "lang"
— field name that indicates the language of a document.
Collections and Databases
Once we have defined our entity types, we can start storing and retrieving them. For this, we'll need a database of collections, and one collection per entity type.
Avocado piggybacks on top of the mongodb
crate. You connect to a MongoDB
client using exactly the same code that you would use if you were using
the driver in its "raw" form, and you obtain a named database in exactly
the same manner.
Once you have a handle to the desired database, you obtain a handle to a
collection within that database. This is where the workflow departs from
that of the mongodb
crate: Avocado has its own, strongly-typed, generic
Collection
type. Let's see how these different parts all work together:
#[derive(Debug, Clone, Serialize, Deserialize, BsonSchema, Doc)] struct User { #[serde(rename = "_id")] id: Uid<User>, legal_name: String, } // Connect to the server using the underlying mongodb crate. let client = Client::with_uri("mongodb://localhost:27017/")?; // Obtain a database handle, still using the underlying mongodb crate. let db = client.db("avocado_example_db"); // Avocado extends database handle types with useful methods which let you // obtain strongly-typed, generic collection handles. // This is how you obtain such a **new, empty** collection without dynamic // schema validation. Note that **this drops and recreates the collection.** // It also creates any indexes specified in the `Doc::indexes()` method. let users_novalidate: Collection<User> = db.empty_collection_novalidate()?; // If you also enable the `schema_validation` feature, you can ask for a // collection which always validates inserted documents based on its schema. // Of course, this also **drops and recreates the collection,** and // it also creates any indexes specified in the `Doc::indexes()` method. let users: Collection<User> = db.empty_collection()?; // If you need to access an **existing collection without emptying it,** // here's how you do it: let users_existing: Collection<User> = db.existing_collection();
Operations
Once we get hold of a collection, we can finally start performing actual database operations. Some of the most basic ones are:
- First, we can try and insert some entities.
- Then, we can update them based on their identity (
_id
field). - Finally, we can retrieve them subject to some filtering criteria.
Let's see what this looks like in terms of concrete code!
let alice = User { id: Uid::new_oid()?, legal_name: String::from("Alice Wonderland"), }; let bob = User { id: Uid::new_oid()?, legal_name: String::from("Robert Tables"), // xkcd.com/327 }; let mut eve = User { id: Uid::new_oid()?, legal_name: String::from("Eve Sdropper"), }; // You can insert a single entity using `Collection::insert_one()`. users.insert_one(&eve)?; // If you have multiple entities, it's more efficient to use // `insert_many()` instead. It will save you precious network round-trips. users.insert_many(vec![&alice, &bob])?; // Update all properties of an entity based on its identity. eve.legal_name = String::from("Eve Adamson"); users.replace_entity(&eve)?; // If you want to insert the entity if one with the same ID doesn't exist, // and update its fields if it does already exist, then use `upsert_entity`: users.upsert_entity(&eve)?; // The above two methods constitute a very easy and quick solution to a // common use case, but they aren't very flexible in terms of speciying // finer-grain filter criteria; and setting each field of a large document // may be inefficient too. // So if you are looking for something more flexible or more efficient, // try `update_one()`, `update_many()`, `upsert_one()`, or `upsert_many()`. // Now that we have some data, we can retrieve and filter it: let filter_criteria = doc!{ "legal_name": "Robert Tables", }; for result in users.find_many(filter_criteria)? { let entity = result?; println!("Found entity: {:#?}", entity); }
Actually, instead of raw, loosely-typed BSON documents, you can specify
more sophisticated, custom objects for the filter criteria. For instance,
an example implementation thereof can be found in examples/basic.rs
.
In fact, several traits requiring some sort of filter specification are
implemented for Document
, but you can always make your own. The very
purpose of these traits is to make manipulating the database safer and
less error-prone by not requiring programmers to write a separate, ad-hoc
query document each time they want to perform a query.
For this more advanced (and recommended) use case, see the traits in the
ops
module and the corresponding
methods on Collection
.
For using more descriptive names for some constants in filter or update
specification documents, and also for preventing certain classes of typos
related to the stringly-typed nature of BSON, several "smart literal" types
are provided in the literal
module.
For query-like traits that produce an output, the raw output value (which
is always a Document
) can be transformed to something else, which the
Output
associated type of the trait can be deserialized from. This is
the task of the transform(raw: Document) -> Result<Bson>
method on these
traits, e.g. Query::transform()
or Pipeline::transform()
.
For the quick, painless, and idiomatic implementation of these methods,
the DocumentExt
trait is provided. This
trait is exported through the prelude
, so you can
use it readily. For the intended usage of its methods (specifically,
remove_str()
), plase see
the implementation of GetDescription::transform()
in the example below.
A short example:
#[derive(Debug, Clone, Serialize, Deserialize, Doc)] struct Recipe { #[serde(rename = "_id")] id: Uid<Recipe>, ingredients: Vec<String>, description: String, } #[derive(Debug, Clone)] struct AddIngredient<'a> { recipe_id: &'a Uid<Recipe>, ingredient: &'a str, } impl<'a> Update<Recipe> for AddIngredient<'a> { fn filter(&self) -> Document { doc!{ "_id": self.recipe_id } } fn update(&self) -> Document { doc!{ "$push": { "ingredients": self.ingredient } } } } #[derive(Debug, Clone, Copy)] struct GetDescription<'a> { recipe_id: &'a Uid<Recipe>, } impl<'a> Query<Recipe> for GetDescription<'a> { type Output = String; fn filter(&self) -> Document { doc!{ "_id": self.recipe_id } } fn transform(mut raw: Document) -> AvocadoResult<Bson> { raw.remove_str("description") } fn options() -> FindOptions { FindOptions { projection: Some(doc!{ "_id": false, "description": true, }), ..Default::default() } } } let client = Client::with_uri("mongodb://localhost:27017/")?; let db = client.db("avocado_example_db"); let recipes: Collection<Recipe> = db.empty_collection_novalidate()?; // Create a new `Recipe` entity and save it to the database. let r = Recipe { id: Uid::new_oid()?, ingredients: vec![String::from("cream"), String::from("sugar")], description: String::from("mix 'em all together"), }; recipes.insert_one(&r)?; // Add an extra ingredient to it. let u = AddIngredient { recipe_id: &r.id, ingredient: "strawberries", }; recipes.update_one(&u)?; // Retrieve its description in case we already forgot it. let q = GetDescription { recipe_id: &r.id }; let description = recipes.find_one(q)?; assert_eq!(description.as_ref(), Some(&r.description));
Preventing NoSQL Injection
Basically any database technology is subject to the hazard of DDL/DML (query, modification, and administrative) injection attacks if not enough care is taken.
In the case of traditional relational DB engines, the use of untrusted (e.g. user-supplied) text in formatted / templated SQL strings, and thus the concatenation of potentially arbitrary executable code with what was intended by the programmer, is the most common source of these security bugs.
This is usually mitigated by the use of "prepared statements", meaning that SQL statements are precompiled without any user input, while external values/arguments are marked by special placeholder syntax. Then, for the actual execution of a precompiled statement, parameters are structurally bound to each placeholder in the statement, i.e. by supplying typed values to the DB engine after parsing, without textually pasting them together with the query script.
Several NoSQL databases, including MongoDB, use a more structured query interface. (In fact, MongoDB queries are almost like the programmer writes plain syntax trees by hand.) This gets rid of some of the textual injection attempts. However, in a loosely-typed environment, supplying a query with arbitrary untrusted input can still lead to injection. For example, if one is directly working with the loosely-typed "value tree" representation of JSON, a malicious user might supply a MongoDB query operator document where the programmer was expecting a plain string. An example of this mistake can be found here.
Avocado tries to counter these problems by encouraging the use of static
types in queries as well as domain models. Therefore, any time you are
are handling untrusted input, you should build strongly-typed query and/or
update objects implementing the Query
,
Update
, Upsert
,
Delete
, etc. traits from the ops module,
instead of using what is effectively dynamic typing with raw BSON or JSON.
Ideally, the "no raw JSON/BSON" rule should be applied transitively in (recursive) data structures: no struct or tuple fields, enum variants, map keys, map/set/array values, etc., nor any substructures threof should contain untyped data.
Crate Features
schema_validation
(default): enables MongoDB-flavored JSON schema validation via themagnet_schema
crate.raw_uuid
(default): augments theUid
type with convenience methods for working with UUID-based entity/document IDs.
Modules
coll | A MongoDB collection of a single homogeneous type. |
cursor | Typed, generic wrapper around MongoDB |
db | Represents a MongoDB database. |
doc | A document is a direct member of a collection. |
error |
|
ext | Convenience extension traits and methods. |
literal | Helper types for making the construction of filter, update, etc. documents a little less stringly-typed. |
ops | High-level database operations: query, update, delete, etc. |
prelude | The Avocado prelude provides re-exports of the most commonly used traits
and types for convenience, including ones from crates |
uid | Strongly-typed unique entity IDs. |