Skip to main content

Crate infer_sex

Crate infer_sex 

Source
Expand description

A high-performance, zero-dependency Rust library for inferring genetic sex from summarized variant data.

The algorithm consumes an iterator of VariantInfo structs in a single pass, counting valid and heterozygous observations across autosomes and sex chromosomes. Metrics are normalized by platform-level “attempted” locus counts that the caller must provide via PlatformDefinition, making the library resilient to platform density and sample quality differences.

§Example

use infer_sex::{
    Chromosome, DecisionThresholds, GenomeBuild, InferenceConfig, InferenceResult,
    InferredSex, PlatformDefinition, SexInferenceAccumulator, VariantInfo,
};

let config = InferenceConfig {
    build: GenomeBuild::Build38,
    platform: PlatformDefinition {
        n_attempted_autosomes: 2_000,
        n_attempted_y_nonpar: 1_000,
    },
    thresholds: Some(DecisionThresholds::default()),
};

let mut acc = SexInferenceAccumulator::new(config);
let variants = vec![
    // Autosomal signal for normalization.
    VariantInfo { chrom: Chromosome::Autosome, pos: 1_000_000, is_heterozygous: true },
    VariantInfo { chrom: Chromosome::Autosome, pos: 2_000_000, is_heterozygous: false },
    // X non-PAR heterozygosity (diploid X implies female).
    VariantInfo { chrom: Chromosome::X, pos: 10_000_000, is_heterozygous: true },
    VariantInfo { chrom: Chromosome::X, pos: 20_000_000, is_heterozygous: true },
];

for v in &variants {
    acc.process_variant(v);
}

let result: InferenceResult = acc.finish().expect("valid platform counts");
assert_eq!(result.final_call, InferredSex::Female);
println!("Report: {:?}", result.report);

The library returns InferredSex::Male, InferredSex::Female, or InferredSex::Indeterminate when no sex-chromosome evidence is observed. If you do not supply DecisionThresholds, a built-in default heuristic is used to derive the call while still exposing the underlying metrics for custom downstream logic.

§Platform definitions (n_attempted_*)

The attempted locus counts must match the exact loci that will be streamed into [process_variant]. A common pattern is to pre-scan a BIM (or similar) file:

use infer_sex::PlatformDefinition;

struct BimRow { chrom: String, pos: u64 }

fn derive_platform_from_bim(rows: impl Iterator<Item = BimRow>) -> PlatformDefinition {
    let mut auto = 0u64;
    let mut y_nonpar = 0u64;
    fn is_in_y_par(_pos: u64) -> bool { unimplemented!("project-specific PAR check") }
    for row in rows {
        match row.chrom.as_str() {
            "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" | "10" | "11" | "12"
            | "13" | "14" | "15" | "16" | "17" | "18" | "19" | "20" | "21" | "22" => {
                auto += 1;
            }
            "Y" => {
                if !is_in_y_par(row.pos) {
                    y_nonpar += 1;
                }
            }
            _ => {}
        }
    }
    PlatformDefinition {
        n_attempted_autosomes: auto,
        n_attempted_y_nonpar: y_nonpar,
    }
}

The variant stream passed to SexInferenceAccumulator must be derived from the same locus set; down-sampling autosomes for speed requires that n_attempted_autosomes reflect the down-sampled set.

Structs§

AlgorithmConstants
DecisionThresholds
EvidenceReport
InferenceConfig
InferenceResult
PlatformDefinition
SexInferenceAccumulator
VariantInfo

Enums§

Chromosome
GenomeBuild
InferenceError
InferredSex