transcriptomic-rs 0.1.0

Expression matrix assembly and normalization → Arrow RecordBatches
Documentation

transcriptomic-rs

Expression matrix assembly and normalization → Arrow RecordBatches.

Description

transcriptomic-rs provides tools for assembling expression matrices from GEO SOFT data and applying common normalization methods.

Features

  • Matrix assembly from SOFT data
  • Log2, quantile, and z-score normalization
  • Arrow-native output
  • Parallel processing with rayon

Quick start

[dependencies]
transcriptomic-rs = "0.1"
geo-soft-rs = "0.1"
use transcriptomic_rs::MatrixBuilder;
use geo_soft_rs::SoftReader;

let reader = SoftReader::open("GSE65682_family.soft.gz")?;
let matrix = MatrixBuilder::new().from_soft(reader)?;
let normalized = transcriptomic_rs::Normalize::log2(&matrix)?;

Matrix assembly

Construct an expression matrix from GEO SOFT data:

// Simple: expression matrix only
let matrix = MatrixBuilder::new().from_soft(reader)?;

// Complete: matrix + sample metadata + platform annotation
let (matrix, metadata, annotation) = MatrixBuilder::new().build_all(reader)?;

Behavior:

  • Joins sample data tables on probe IDs
  • Maps probes to genes via platform annotation
  • Aggregates multiple probes per gene (mean by default)
  • Preserves null values in output

Aggregation methods: Mean, Median, Max, Min

Normalization

All normalization methods are explicit and composable—no hidden defaults.

// Log2 transformation: log2(x+1)
let log2 = Normalize::log2(&matrix)?;

// Z-score per gene: (x - mean) / std
let zscore = Normalize::z_score_per_gene(&matrix)?;

// Quantile normalization
let quantile = Normalize::quantile(&matrix)?;

// Compose: log2 then z-score
let composed = Normalize::z_score_per_gene(&Normalize::log2(&matrix)?)?;

Properties:

  • Null values propagate unchanged through all transformations
  • Methods return new matrices; original is unmodified
  • All transformations preserve gene and sample ordering

License

Licensed under MIT OR Apache-2.0.