ClickType
ClickType is a ClickHouse client for Rust designed for bulk data ingestion and type-safe query construction. It focuses on explicit memory control and high performance by utilizing the RowBinary format.
Key Features
- Data Modeling: Schema definition via the
#[derive(ClickTable)]macro. - Batch Ingestion: Async buffering system with active memory management and backpressure control.
- Query Builder: Fluent API for SQL generation, validating column names and types at compile time.
- Observability: Native integration with the
tracingecosystem for monitoring latencies and errors. - Complex Types: Support for
Nullable,LowCardinality,Array, andMap.
Installation
Add the dependency to your Cargo.toml file:
[]
= "0.1"
= { = "1", = ["full"] }
= "0.1"
Table Definition
Map your ClickHouse tables to Rust structs. The macro handles serialization and DDL generation.
use *;
Data Ingestion (Batcher)
The Batcher manages the grouping of rows in memory before sending them to ClickHouse. It allows configuration of row limits, buffer sizes, and timeouts.
Memory and Error Management
- Memory Release: The internal buffer is automatically reduced after a flush if it exceeds the configured threshold (
buffer_shrink_threshold). - Supervision: The
spawn()method returns the worker'sJoinHandleto detect unexpected stops or panics. - Backpressure Strategies: Supports
insert(blocks until capacity is available) andtry_insert(fails immediately if the buffer is full).
Usage Example
use ;
use Duration;
async
Querying
The QueryBuilder generates structured SQL using the column constants provided by the macro.
use QueryBuilder;
async
Observability
ClickType uses tracing to emit events. Logs include structured data regarding:
- Number of rows and bytes sent per batch.
- Latency of HTTP requests to ClickHouse.
- Retries performed during network failures.
To enable log output:
init;
Implementation Details
- RowBinary Protocol: RowBinary is used for all insertions as it is the most efficient format in terms of CPU and bandwidth.
- Incremental Serialization: Data is serialized to the buffer at the time of insertion (
insert), distributing CPU cost and preventing latency spikes during the flush operation. - Buffer Protection: If a batch persistently fails after all configured retries, the buffer is cleared to prevent a total deadlock of the ingestion process.
- Schema Validation: Automatic schema validation on first insert prevents silent data corruption from schema mismatches.
Production Considerations
Automatic Schema Protection
ClickType validates your schema automatically to prevent data corruption.
RowBinary is ClickHouse's fastest format, but it's position-based (no column names in the wire format). ClickType protects you with comprehensive validation:
What ClickType Validates Automatically: ✅ Column order - Position mismatch detected immediately ✅ Column types - Type mismatches caught before any data is sent ✅ Column count - Missing or extra insertable columns detected ✅ Column names - Ensures struct fields match table columns
Validation happens on first insert:
let batcher = new;
let = batcher.spawn;
// Schema validation runs here - fails fast if anything is wrong:
handle.insert.await?;
// ✓ Order validated
// ✓ Types validated
// ✓ Count validated
Example validation error (column order mismatch):
Schema validation failed for table 'events':
Column order mismatch at position 0: struct has 'id', table has 'name'
Column order mismatch at position 1: struct has 'name', table has 'value'
Inherent RowBinary Limitations (Can't be validated):
- ⚠️ MATERIALIZED/ALIAS expressions (not insertable, skipped)
- ⚠️ DEFAULT expressions (server-side, not visible in schema)
- ⚠️ Codec settings (compression, internal)
Best Practices
1. Schema Changes - Use Migrations
When you need to change your schema:
// Step 1: Create new table version
// Step 2: Deploy code that writes to BOTH tables
batcher_v1.insert.await?;
batcher_v2.insert.await?;
// Step 3: Backfill data
// INSERT INTO events_v2 SELECT id, '', timestamp FROM events
// Step 4: Switch reads to v2
// Step 5: Stop writes to v1, drop old table
Alternative: Use ClickHouse ALTER TABLE for compatible changes:
-- Adding a column with DEFAULT (safe)
events ADD COLUMN new_field String DEFAULT ''
2. Monitor Buffer Memory
let config = BatchConfig ;
Why This Matters:
- Traffic spike → 64 MB buffer allocated
- Without shrink threshold → buffer stays 64 MB forever (memory leak!)
- With shrink threshold → buffer returns to 1 MB after spike
3. Choose Backpressure Strategy
// Data integrity priority (wait for capacity)
handle.insert.await?; // Blocks if channel full
// High availability priority (drop if full)
if let Err = handle.try_insert
4. Supervise Worker Task
let = batcher.spawn;
select!
Troubleshooting
"Schema validation failed: type mismatch"
Cause: Struct field type doesn't match ClickHouse column type.
Fix:
-- Check actual ClickHouse schema
DESCRIBE TABLE your_table;
Compare with your Rust struct. Common mismatches:
i32vsi64(Int32 vs Int64)StringvsLowCardinality<String>Option<T>vsT(Nullable vs non-Nullable)
"Insert failed after 3 retries"
Causes:
- Network issues
- ClickHouse server overload
- Quota/permissions
- Invalid data (e.g., duplicate primary key)
Debugging:
use tracing_subscriber;
// Enable detailed logs
fmt
.with_max_level
.init;
// Logs will show:
// - Exact HTTP error from ClickHouse
// - Retry attempts
// - Batch size and payload
Memory keeps growing
Check:
- Is
buffer_shrink_thresholdconfigured? - Are flushes completing successfully?
- Is the channel capacity too large?
Fix:
let config = BatchConfig ;
Worker stops processing
Likely cause: Worker panicked due to:
- Schema validation failure (fixed in v0.1+)
- Network error during flush
- Out of memory
Detection:
let = batcher.spawn;
// Monitor worker
spawn;
Testing
Property-Based Tests
ClickType includes extensive fuzzing tests using proptest:
Validates:
- Roundtrip serialization for all types
- Edge cases (NaN, infinity, empty strings, null bytes, max values)
- Large data (1MB strings, 100k element arrays)
Load Tests
Production-scale stress tests (run manually):
# 1M row insertion test
# Burst scenario (memory management)
# Concurrent inserts
# Backpressure testing