Rust ORM for ScyllaDB and Apache Cassandra

Charybdis is a ORM layer on top of ScyllaDB Rust Driver focused on easy of use and performance

Usage considerations:

Provide and expressive API for CRUD & Complex Statement operations on model as a whole
Provide easy way to work with subset of model fields by using automatically generated partial_<model>! macro
Provide easy way to run complex queries by using automatically generated find_<model>! macro
Automatic migration tool analyzes the project files and runs migrations according to differences between the model definition and database

Performance consideration:

It uses prepared statements (shard/token aware) -> bind values
It expects CachingSession as a session arg for operations
Queries are macro generated str constants (no concatenation at runtime)
By using find_<model>! macro we can run complex queries that are generated at compile time as &'static str
Although it has expressive API it's thin layer on top of scylla_rust_driver, and it does not introduce any significant overhead

Charybdis Models
Automatic migration with charybdis-migrate
Basic Operations
Configuration Options
Batch Operations
- Chunked Batch Operations
- Batch Configuration
Callbacks
- Implementation
- Triggering Callbacks
Collection
- Generated Collection Queries
- Generated Collection Methods
Ignored fields
Roadmap

Charybdis Models

Before getting started, ensure that the scylla dependency is included in your Cargo.toml file. The version of scylla should match the one used by the charybdis crate.

[dependencies]
scylla = "1.2.0"
charybdis = "1.0.2"

Define Tables

  use charybdis::macros::charybdis_model;
  use charybdis::types::{Text, Timestamp, Uuid};
  
  #[charybdis_model(
      table_name = users,
      partition_keys = [id],
      clustering_keys = [],
      global_secondary_indexes = [username],
      local_secondary_indexes = [],
      static_columns = []
  )]
  pub struct User {
      pub id: Uuid,
      pub username: Text,
      pub email: Text,
      pub created_at: Timestamp,
      pub updated_at: Timestamp,
      pub address: Address,
  }

Define UDT

 use charybdis::macros::charybdis_udt_model;
 use charybdis::types::Text;
 
 #[charybdis_udt_model(type_name = address)]
 pub struct Address {
     pub street: Text,
     pub city: Text,
     pub state: Option<Text>,
     pub zip: Text,
     pub country: Text,
 }

🚨 UDT fields must be in the same order as they are in the database.

Note that in order for migration to correctly detect changes on each migration, type_name has to match struct name. So if we have struct ReorderData we have to use #[charybdis_udt_model(type_name = reorderdata)] - without underscores.

Define Materialized Views

use charybdis::macros::charybdis_view_model;
use charybdis::types::{Text, Timestamp, Uuid};

#[charybdis_view_model(
    table_name=users_by_username,
    base_table=users,
    partition_keys=[username],
    clustering_keys=[id]
)]
pub struct UsersByUsername {
    pub username: Text,
    pub id: Uuid,
    pub email: Text,
    pub created_at: Timestamp,
    pub updated_at: Timestamp,
}

Resulting auto-generated migration query will be:

CREATE MATERIALIZED VIEW IF NOT EXISTS users_by_email
AS SELECT created_at, updated_at, username, email, id
FROM users
WHERE email IS NOT NULL AND id IS NOT NULL
PRIMARY KEY (email, id)

Automatic migration

charybdis-migrate enables automatic migration to database without need to write migrations by hand. It iterates over project files and generates migrations based on differences between model definitions and database. It supports following operations:
- Create new tables
- Create new columns
- Drop columns
- Change field types (drop and recreate column --drop-and-replace flag)
- Create secondary indexes
- Drop secondary indexes
- Create UDTs
- Create materialized views
- Table options
```
  #[charybdis_model(
      table_name = commits,
      partition_keys = [object_id],
      clustering_keys = [created_at, id],
      global_secondary_indexes = [],
      local_secondary_indexes = [],
      table_options = r#"
          CLUSTERING ORDER BY (created_at DESC) 
          AND gc_grace_seconds = 86400
      "#
  )]
  #[derive(Serialize, Deserialize, Default)]
  pub struct Commit {...}
```
  - ⚠️ If table exists, table options will result in alter table query that without CLUSTERING ORDER and COMPACT STORAGE options.
Model dropping is not added. If you removed model, you need to drop table manually.
Running migration
```
cargo install charybdis-migrate

migrate --hosts <host> --keyspace <your_keyspace> --drop-and-replace (optional)
```
- ⚠️ Always run migrations from desired directories ('src' or 'test'), to avoid scanning 'target' or other large directories.
- ⚠️ If you are working with existing datasets, before running migration you need to make sure that your model definitions structure matches the database in respect to table names, column names, column types,partition keys,clustering keys and secondary indexes so you don't alter structure accidentally. If structure is matched, it will not run any migrations. As mentioned above, in case there is no model definition for table, it will not drop it. In future, we will add modelize command that will generate src/models files from existing data source.
- ⚠️ Make sure that nested collections are 'Frozen' as per ScyllaDB requirement, so when using --drop-and-replace flag, it will drop and recreate columns.

Programmatically running migrations

Within testing or development environment, we can trigger migrations programmatically:

use charybdis::migrate::MigrationBuilder;

let migration = MigrationBuilder::new()
    .keyspace("test")
    .drop_and_replace(true)
    .build(&session)
    .await;

migration.run().await;

Global secondary indexes

If we have model:

#[charybdis_model(
    table_name = users,
    partition_keys = [id],
    clustering_keys = [],
    global_secondary_indexes = [username]
)]

resulting query will be: CREATE INDEX ON users (username);

Local secondary Indexes

Indexes that are scoped to the partition key

#[charybdis_model(
    table_name = menus,
    partition_keys = [location],
    clustering_keys = [name, price, dish_type],
    global_secondary_indexes = [],
    local_secondary_indexes = [dish_type]
)]

resulting query will be: CREATE INDEX ON menus((location), dish_type);

Basic Operations:

For each operation you need to bring respective trait into scope. They are defined in charybdis::operations module.

Insert

use charybdis::{CachingSession, Insert};

#[tokio::main]
async fn main() {
  let session: &CachingSession; // init sylla session
  
  // init user
  let user: User = User {
    id,
    email: "charybdis@nodecosmos.com".to_string(),
    username: "charybdis".to_string(),
    created_at: Utc::now(),
    updated_at: Utc::now(),
    address: Some(
        Address {
            street: "street".to_string(),
            state: "state".to_string(),
            zip: "zip".to_string(),
            country: "country".to_string(),
            city: "city".to_string(),
        }
    ),
  };

  // create
  user.insert().execute(&session).await;
}

Find

Find by primary key

  let user = User {id, ..Default::default()};
  let user = user.find_by_primary_key().execute(&session).await?;

Find by partition key

  let users =  User {id, ..Default::default()}.find_by_partition_key().execute(&session).await;

Find by primary key associated

let users = User::find_by_primary_key_value(val: User::PrimaryKey).execute(&session).await;

Available find functions

use scylla::client::caching_session::CachingSession;
use charybdis::errors::CharybdisError;
use charybdis::macros::charybdis_model;
use charybdis::stream::CharybdisModelStream;
use charybdis::types::{Date, Text, Uuid};

#[charybdis_model(
    table_name = posts,
    partition_keys = [date],
    clustering_keys = [category_id, title],
    global_secondary_indexes = [category_id],
    local_secondary_indexes = [title]
)]
pub struct Post {
    pub date: Date,
    pub category_id: Uuid,
    pub title: Text,
}

impl Post {
    async fn find_various(db_session: &CachingSession) -> Result<(), CharybdisError> {
       let date = Date::default();
       let category_id = Uuid::new_v4();
       let title = Text::default();
    
       let posts: CharybdisModelStream<Post> = Post::find_by_date(date).execute(db_session).await?;
       let posts: CharybdisModelStream<Post> = Post::find_by_date_and_category_id(date, category_id).execute(db_session).await?;
       let posts: Post = Post::find_by_date_and_category_id_and_title(date, category_id, title.clone()).execute(db_session).await?;
    
       let post: Post = Post::find_first_by_date(date).execute(db_session).await?;
       let post: Post = Post::find_first_by_date_and_category_id(date, category_id).execute(db_session).await?;
    
       let post: Option<Post> = Post::maybe_find_first_by_date(date).execute(db_session).await?;
       let post: Option<Post> = Post::maybe_find_first_by_date_and_category_id(date, category_id).execute(db_session).await?;
       let post: Option<Post> = Post::maybe_find_first_by_date_and_category_id_and_title(date, category_id, title.clone()).execute(db_session).await?;
    
       // find by local secondary index
       let posts: CharybdisModelStream<Post> = Post::find_by_date_and_title(date, title.clone()).execute(db_session).await?;
       let post: Post = Post::find_first_by_date_and_title(date, title.clone()).execute(db_session).await?;
       let post: Option<Post> = Post::maybe_find_first_by_date_and_title(date, title.clone()).execute(db_session).await?;

      // find by global secondary index
      let posts: CharybdisModelStream<Post> = Post::find_by_category_id(category_id).execute(db_session).await?;
      let post: Post = Post::find_first_by_category_id(category_id).execute(db_session).await?;
      let post: Option<Post> = Post::maybe_find_first_by_category_id(category_id).execute(db_session).await?;
    
      Ok(())
    }
}

Custom filtering:

Lets use our Post model as an example:

#[charybdis_model(
    table_name = posts, 
    partition_keys = [category_id], 
    clustering_keys = [date, title],
    global_secondary_indexes = []
)]
pub struct Post {...}

We get automatically generated find_post! macro that follows convention find_<struct_name>!. It can be used to create custom queries.

Following will return stream of Post models, and query will be constructed at compile time as &'static str.

// automatically generated macro rule
let posts = find_post!("category_id in ? AND date > ?", (categor_vec, date))
    .execute(session)
    .await?;

We can also use find_first_post! macro to get single result:

let post = find_first_post!("category_id in ? AND date > ? LIMIT 1", (date, categor_vec))
    .execute(session)
    .await?;

If we just need the Statement and not the result, we can use find_post_query! macro:

let query = find_post_query!("date = ? AND category_id in ?", (date, categor_vec));

Update

let user = User::from_json(json);

user.username = "scylla".to_string();
user.email = "some@email.com";

user.update().execute(&session).await;

Collection:

Let's use our User model as an example:

#[charybdis_model(
    table_name = users,
    partition_keys = [id],
    clustering_keys = [],
)]
pub struct User {
    id: Uuid,
    tags: Set<Text>,
    post_ids: List<Uuid>,
}

push_to_<field_name> and pull_from_<field_name> methods are generated for each collection field.

let user: User;

user.push_tags(vec![tag]).execute(&session).await;
user.pull_tags(vec![tag]).execute(&session).await;

user.push_post_ids(vec![tag]).execute(&session).await;
user.pull_post_ids(vec![tag]).execute(&session).await;

Counter

Let's define post_counter model:

#[charybdis_model(
    table_name = post_counters,
    partition_keys = [id],
    clustering_keys = [],
)]
pub struct PostCounter {
    id: Uuid,
    likes: Counter,
    comments: Counter,
}

We can use increment_<field_name> and decrement_<field_name> methods to update counter fields.

let post_counter: PostCounter;
post_counter.increment_likes(1).execute(&session).await;
post_counter.decrement_likes(1).execute(&session).await;

post_counter.increment_comments(1).execute(&session).await;
post_counter.decrement_comments(1).execute(&session).await;

Delete

let user = User::from_json(json);

user.delete().execute(&session).await;

Macro generated delete helpers

Lets use our Post model as an example:

#[charybdis_model(
    table_name = posts,
    partition_keys = [date],
    clustering_keys = [categogry_id, title],
    global_secondary_indexes = [])
]
pub struct Post {
    date: Date,
    category_id: Uuid,
    title: Text,
    id: Uuid,
    ...
}

We have macro generated functions for up to 3 fields from primary key.

Post::delete_by_date(date: Date).execute(&session).await?;
Post::delete_by_date_and_category_id(date: Date, category_id: Uuid).execute(&session).await?;
Post::delete_by_date_and_category_id_and_title(date: Date, category_id: Uuid, title: Text).execute(&session).await?;

Custom delete queries

We can use delete_post! macro to create custom delete queries.

delete_post!("date = ? AND category_id in ?", (date, category_vec)).execute(&session).await?

Configuration

Every operation returns CharybdisQuery that can be configured before execution with method chaining.

let user: User = User::find_by_id(id)
    .consistency(Consistency::One)
    .timeout(Some(Duration::from_secs(5)))
    .execute(&app.session)
    .await?;
    
let result: QueryResult = user.update().consistency(Consistency::One).execute(&session).await?;

Supported configuration options:

consistency
serial_consistency
timestamp
timeout
page_size
timestamp

Batch

CharybdisModelBatch operations are used to perform multiple operations in a single batch.

Batch Operations

let users: Vec<User>;
let batch = User::batch();

// inserts
batch.append_inserts(users);

// or updates
batch.append_updates(users);

// or deletes
batch.append_deletes(users);

batch.execute(&session).await?;

Chunked Batch Operations

Chunked batch operations are used to operate on large amount of data in chunks.

  let users: Vec<User>;
  let chunk_size = 100;

  User::batch().chunked_inserts(&session, users, chunk_size).await?;
  User::batch().chunked_updates(&session, users, chunk_size).await?;
  User::batch().chunked_deletes(&session, users, chunk_size).await?;

Batch Configuration

Batch operations can be configured before execution with method chaining.

let batch = User::batch()
    .consistency(Consistency::One)
    .retry_policy(Some(Arc::new(DefaultRetryPolicy::new())))
    .chunked_inserts(&session, users, 100)
    .await?;

We could also use method chaining to append operations to batch:

let batch = User::batch()
    .consistency(Consistency::One)
    .retry_policy(Some(Arc::new(DefaultRetryPolicy::new())))
    .append_update(&user_1)
    .append_update(&user_2)
    .execute(data.db_session())
    .await?;

Statements Batch

We can use batch statements to perform collection operations in batch:

let batch = User::batch();
let users: Vec<User>;

for user in users {
    batch.append_statement(User::PUSH_TAGS_QUERY, (vec![tag], user.id));
}

batch.execute(&session).await;

Partial Models

Overview

Partial models allow you to work with a subset of fields from a complete model, making operations more efficient and focused. Each partial model implements the same ORM traits as the original model but only includes the fields you specify.

Usage

Use the auto-generated partial_<model>! macro to create a struct with the same structure as the original model, but only with the fields you need:

// auto-generated macro - available in crate::models::<original_model>
partial_user!(UpdateUsernameUser, id, username);

This creates a new UpdateUsernameUser struct that is equivalent to the User model, but only with id and username fields.

let mut update_user_username = UpdateUsernameUser {
    id,
    username: "updated_username".to_string(),
};

update_user_username.update().execute(&session).await?;

Design Pattern Benefits

Separation of Concerns: Each partial model handles a specific responsibility (e.g., for image operations) UpdateCoverImageUser for updating a user's cover image.
Type Safety: Type system enforces which fields are required for each operation
Performance: Only reads/writes necessary fields from the database
Maintainability: Clearer intention in code about what's being modified

Requirements

The original model must include #[derive(Default)]
Partial model definitions must include all primary key fields
The macro should be used in the same file as the original model to access the same imports

Inheritance of Attributes

Partial models inherit:

All derives defined after the #[charybdis_model] macro
All field attributes from the original model (e.g., #[serde(rename = "rootId")])
All ORM capabilities of the original model

As Native

In case we need to run operations on native model, we can use as_native method:

let native_user: User = update_user_username.as_native().find_by_primary_key().execute(&session).await?;
// action that requires native model
authorize_user(&native_user);

Naming Convention

For clarity, follow the pattern: Purpose + Original Struct Name. Examples:

UpdateAddressUser - For updating a user's address
For updating a user's cover image UpdateCoverImageUser
For authentication/authorization operations on a post AuthPost

Callbacks

Callbacks are convenient way to run additional logic on model before or after certain operations. E.g.

we can use before_insert to set default values and/or validate model before insert.
we can use after_update to update other data sources, e.g. elastic search.

Implementation:

Let's say we define custom extension that will be used to update elastic document on every post update:
```
pub struct AppExtensions {
    pub elastic_client: ElasticClient,
}
```

Now we can implement Callback that will utilize this extension:

#[charybdis_model(...)]
pub struct Post {}

impl Callback for Post {
    type Extention = AppExtensions;
    type Error = AppError; // From<CharybdisError>
    
   // use before_insert to set default values
    async fn before_insert(
        &mut self,
        _session: &CachingSession,
        extension: &AppExtensions,
    ) -> Result<(), CustomError> {
        self.id = Uuid::new_v4();
        self.created_at = Utc::now();
        
        Ok(())
    }
    
    // use before_update to set updated_at
    async fn before_update(
        &mut self,
        _session: &CachingSession,
        extension: &AppExtensions,
    ) -> Result<(), CustomError> {
        self.updated_at = Utc::now();
        
        Ok(())
    }

    // use after_update to update elastic document
    async fn after_update(
        &mut self,
        _session: &CachingSession,
        extension: &AppExtensions,
    ) -> Result<(), CustomError> {
        extension.elastic_client.update(...).await?;

        Ok(())
    }
    
    // use after_delete to delete elastic document
    async fn after_delete(
        &mut self,
        _session: &CachingSession,
        extension: &AppExtensions,
    ) -> Result<(), CustomError> {
        extension.elastic_client.delete(...).await?;

        Ok(())
    }
}

Possible Callbacks:
- before_insert
- before_update
- before_delete
- after_insert
- after_update
- after_delete

Triggering Callbacks

In order to trigger callback we use <operation>_cb. method: insert_cb, update_cb, delete_cb according traits. This enables us to have clear distinction between insert and insert with callbacks (insert_cb). Just as on main operation, we can configure callback operation query before execution.

 use charybdis::operations::{DeleteWithCallbacks, InsertWithCallbacks, UpdateWithCallbacks};

 post.insert_cb(app_extensions).execute(&session).await;
 post.update_cb(app_extensions).execute(&session).await;
 post.delete_cb(app_extensions).consistency(Consistency::All).execute(&session).await;

Collections

For each collection field, we get following:

PUSH_<field_name>_QUERY static str
PUSH_<field_name>_IF_EXISTS_QUERY static str'
PULL_<field_name>_QUERY static str
PULL_<field_name>_IF_EXISTS_QUERY static str
push_<field_name> method
push_<field_name>_if_exists method
pull_<field_name> method
pull_<field_name>_if_exists method

Model:

#[charybdis_model(
    table_name = users,
    partition_keys = [id],
    clustering_keys = []
)]
pub struct User {
    id: Uuid,
    tags: Set<Text>,
    post_ids: List<Uuid>,
    books_by_genre: Map<Text, Frozen<List<Text>>>,
}

Generated Collection Queries:

Generated query will expect value as first bind value and primary key fields as next bind values.

impl User {
    const PUSH_TAGS_QUERY: &'static str = "UPDATE users SET tags = tags + ? WHERE id = ?";
    const PUSH_TAGS_IF_EXISTS_QUERY: &'static str = "UPDATE users SET tags = tags + ? WHERE id = ? IF EXISTS";

    const PULL_TAGS_QUERY: &'static str = "UPDATE users SET tags = tags - ? WHERE id = ?";
    const PULL_TAGS_IF_EXISTS_QUERY: &'static str = "UPDATE users SET tags = tags - ? WHERE id = ? IF EXISTS";
     
    const PUSH_POST_IDS_QUERY: &'static str = "UPDATE users SET post_ids = post_ids + ? WHERE id = ?";
    const PUSH_POST_IDS_IF_EXISTS_QUERY: &'static str = "UPDATE users SET post_ids = post_ids + ? WHERE id = ? IF EXISTS";

    const PULL_POST_IDS_QUERY: &'static str = "UPDATE users SET post_ids = post_ids - ? WHERE id = ?";
    const PULL_POST_IDS_IF_EXISTS_QUERY: &'static str = "UPDATE users SET post_ids = post_ids - ? WHERE id = ? IF EXISTS";

    const PUSH_BOOKS_BY_GENRE_QUERY: &'static str = "UPDATE users SET books_by_genre = books_by_genre + ? WHERE id = ?";
    const PUSH_BOOKS_BY_GENRE_IF_EXISTS_QUERY: &'static str = "UPDATE users SET books_by_genre = books_by_genre + ? WHERE id = ? IF EXISTS";
    
    const PULL_BOOKS_BY_GENRE_QUERY: &'static str = "UPDATE users SET books_by_genre = books_by_genre - ? WHERE id = ?";
    const PULL_BOOKS_BY_GENRE_IF_EXISTS_QUERY: &'static str = "UPDATE users SET books_by_genre = books_by_genre - ? WHERE id = ? IF EXISTS";
}

Now we could use this constant within Batch operations.

let batch = User::batch();
let users: Vec<User>;

for user in users {
    batch.append_statement(User::PUSH_TAGS_QUERY, (vec![tag], user.id));
}

batch.execute(&session).await;

Generated Collection Methods:

push_to_<field_name> and pull_from_<field_name> methods are generated for each collection field.

let user: User::new();

user.push_tags(tags: HashSet<T>).execute(&session).await;
user.push_tags_if_exists(tags: HashSet<T>).execute(&session).await;

user.pull_tags(tags: HashSet<T>).execute(&session).await;
user.pull_tags_if_exists(tags: HashSet<T>).execute(&session).await;


user.push_post_ids(ids: Vec<T>).execute(&session).await;
user.push_post_ids_if_exists(ids: Vec<T>).execute(&session).await;

user.pull_post_ids(ids: Vec<T>).execute(&session).await;
user.pull_post_ids_if_exists(ids: Vec<T>).execute(&session).await;

user.push_books_by_genre(map: HashMap<K, V>).execute(&session).await;
user.push_books_by_genre_if_exists(map: HashMap<K, V>).execute(&session).await;

user.pull_books_by_genre(map: HashMap<K, V>).execute(&session).await;
user.pull_books_by_genre_if_exists(map: HashMap<K, V>).execute(&session).await;

Ignored fields

We can ignore fields by using #[charybdis(ignore)] attribute:

#[charybdis_model(...)]
pub struct User {
    id: Uuid,
    #[charybdis(ignore)]
    organization: Option<Organization>,
}

So field organization will be ignored in all operations and default value will be used when deserializing from other data sources. It can be used to hold data that is not persisted in database.

Custom Fields

Any rust type can be used directly in table or UDT definition. User must choose a ScyllaDB backing type (such as "TinyInt" or "Text") and implement SerializeValue and DeserializeValue traits:

#[charybdis_model(...)]
pub struct User {
    id: Uuid,
    #[charybdis(column_type = "Text")]
    extra_data: CustomField,
}

impl<'frame, 'metadata> DeserializeValue<'frame, 'metadata> for CustomField {
    ...
}

impl SerializeValue for CustomField {
    ...
}

See custom_field.rs integration test for examples using int and text encoding.

charybdis 1.0.2

Rust ORM for ScyllaDB and Apache Cassandra

Charybdis is a ORM layer on top of ScyllaDB Rust Driver focused on easy of use and performance

Usage considerations:

Performance consideration:

Table of Contents

Charybdis Models

Define Tables

Define UDT

Define Materialized Views

Automatic migration

Running migration

Programmatically running migrations

Global secondary indexes

Local secondary Indexes

Basic Operations:

Insert

Find

Find by primary key

Find by partition key

Find by primary key associated

Available find functions

Custom filtering:

Update

Collection:

Counter

Delete

Macro generated delete helpers

Custom delete queries

Configuration

Batch

Batch Operations

Chunked Batch Operations

Batch Configuration

Statements Batch

Partial Models

Overview

Usage

Design Pattern Benefits

Requirements

Inheritance of Attributes

As Native

Naming Convention

Callbacks

Implementation:

Possible Callbacks:

Triggering Callbacks

Collections

Model:

Generated Collection Queries:

Generated Collection Methods:

Ignored fields

Custom Fields