rsomics-plink-missing
Per-sample and per-variant genotype missingness from a PLINK1 binary
fileset — a Rust reimplementation of plink --missing.
It writes the same two reports PLINK does:
| file | one row per | columns |
|---|---|---|
.imiss |
sample | FID IID MISS_PHENO N_MISS N_GENO F_MISS |
.lmiss |
variant | CHR SNP N_MISS N_GENO F_MISS |
A genotype is missing when its packed 2-bit code is 01. F_MISS is
N_MISS / N_GENO; N_GENO is the variant count in .imiss and the sample
count in .lmiss. MISS_PHENO is Y when the .fam phenotype is unset
(-9 or 0), else N.
Usage
# both reports to stdout (.imiss then .lmiss)
# write path/to/out.imiss and path/to/out.lmiss (matches plink --out)
# explicit component paths instead of a shared prefix
# parallelise the scan
Compatibility
Every field value is identical to PLINK 1.9 — N_MISS, N_GENO, the
%g-formatted F_MISS, MISS_PHENO, and the numeric chromosome codes (X→23,
Y→24, XY→25, MT→26). tests/compat.rs diffs both reports field-for-field
against PLINK live when the plink binary (v1.9) is on PATH, and against
checked-in golden output otherwise.
The inter-column whitespace is not always byte-identical. PLINK sizes its FID,
IID, and SNP columns from an internal id buffer whose width depends on the
.fam/.bim load order rather than on the true maximum id length, so for some
datasets its padding is one space narrower than the maximum id would imply. We
right-justify to the true maximum, which matches PLINK byte-for-byte on the
common case (the in-repo golden is byte-identical) but can differ by a space
under PLINK's buffer quirk. Whitespace-splitting parsers see identical data
either way.
PLINK's --missing --within <file> cluster breakdown (the N_CLST/per-cluster
.imiss/.lmiss columns) is not implemented; the unclustered whole-sample
report is the common case.
Origin
This crate is an independent Rust reimplementation of plink --missing based on:
- The published method: Chang et al. 2015 (PLINK 1.9, doi:10.1186/s13742-015-0047-8) and Purcell et al. 2007 (PLINK 1, doi:10.1086/519795).
- The public PLINK 1.9
--missing/ basic-statistics documentation (https://www.cog-genomics.org/plink/1.9/basic_stats) and binary-fileset format spec (https://www.cog-genomics.org/plink/1.9/formats). - Black-box behaviour testing against the
plink1.9 binary.
No source code from the GPL upstream was used as reference during implementation. Test fixtures are independently generated.
License: MIT OR Apache-2.0. Upstream credit: PLINK 1.9 (Christopher Chang et al., GPLv3).