transcriptomic-rs
Expression matrix assembly and normalization → Arrow RecordBatches.
Description
transcriptomic-rs provides tools for assembling expression matrices from GEO SOFT data and applying common normalization methods.
Features
- Matrix assembly from SOFT data
- Log2, quantile, and z-score normalization
- Arrow-native output
- Parallel processing with rayon
Quick start
[]
= "0.1"
= "0.1"
use MatrixBuilder;
use SoftReader;
let reader = open?;
let matrix = new.from_soft?;
let normalized = log2?;
Matrix assembly
Construct an expression matrix from GEO SOFT data:
// Simple: expression matrix only
let matrix = new.from_soft?;
// Complete: matrix + sample metadata + platform annotation
let = new.build_all?;
Behavior:
- Joins sample data tables on probe IDs
- Maps probes to genes via platform annotation
- Aggregates multiple probes per gene (mean by default)
- Preserves null values in output
Aggregation methods: Mean, Median, Max, Min
Normalization
All normalization methods are explicit and composable—no hidden defaults.
// Log2 transformation: log2(x+1)
let log2 = log2?;
// Z-score per gene: (x - mean) / std
let zscore = z_score_per_gene?;
// Quantile normalization
let quantile = quantile?;
// Compose: log2 then z-score
let composed = z_score_per_gene?;
Properties:
- Null values propagate unchanged through all transformations
- Methods return new matrices; original is unmodified
- All transformations preserve gene and sample ordering
License
Licensed under MIT OR Apache-2.0.