Crate bed_reader
source · [−]Expand description
bed-reader
Read and write the PLINK BED format, simply and efficiently.
Features:
- Fast multi-threaded engine.
- Supports many indexing methods. Select data by individuals (samples) and/or SNPs (variants).
- Used by Python packages PySnpTools, FaST-LMM, and PyStatGen.
- Supports PLINK 1.9.
Usage
Read all genotype data from a .bed file.
use ndarray as nd;
use bed_reader::{Bed, ReadOptions};
use bed_reader::assert_eq_nan;
let file_name = "bed_reader/tests/data/small.bed";
let mut bed = Bed::new(file_name)?;
let val = ReadOptions::builder().f64().read(&mut bed)?;
assert_eq_nan(
&val,
&nd::array![
[1.0, 0.0, f64::NAN, 0.0],
[2.0, 0.0, f64::NAN, 2.0],
[0.0, 1.0, 2.0, 0.0]
],
);Read individual (samples) from 20 to 30 and every second SNP (variant).
use bed_reader::ReadOptions;
use ndarray::s;
let file_name = "bed_reader/tests/data/some_missing.bed";
let mut bed = Bed::new(file_name)?;
let val = ReadOptions::builder()
.iid_index(s![..;2])
.sid_index(20..30)
.f64()
.read(&mut bed)?;
assert!(val.dim() == (50, 10));List the first 5 individual (sample) ids, the first 5 SNP (variant) ids, and every unique chromosome. Then, read every value in chromosome 5.
use std::collections::HashSet;
let mut bed = Bed::new(file_name)?;
println!("{:?}", bed.iid()?.slice(s![..5])); // Outputs ndarray: ["iid_0", "iid_1", "iid_2", "iid_3", "iid_4"]
println!("{:?}", bed.sid()?.slice(s![..5])); // Outputs ndarray: ["sid_0", "sid_1", "sid_2", "sid_3", "sid_4"]
println!("{:?}", bed.chromosome()?.iter().collect::<HashSet<_>>());
// Outputs: {"12", "10", "4", "8", "19", "21", "9", "15", "6", "16", "13", "7", "17", "18", "1", "22", "11", "2", "20", "3", "5", "14"}
let val = ReadOptions::builder()
.sid_index(bed.chromosome()?.map(|elem| elem == "5"))
.f64()
.read(&mut bed)?;
assert!(val.dim() == (100, 6));Project Links
- Documentation cmk
- Questions to fastlmm-dev@python.org
- Source code
- Bug Reports
- Discussion
- Project Website
Main Functions
| Function | Description |
|---|---|
Bed::new or Bed::builder | Open a PLINK .bed file for reading genotype data and metadata. |
ReadOptions::builder | Read genotype data. Supports indexing and options. |
WriteOptions::builder | Write values to a file in PLINK .bed format. Supports metadata and options. |
Bed Metadata Methods
After using Bed::new or Bed::builder to open a PLINK .bed file for reading, use
these methods to see metadata.
| Method | Description |
|---|---|
iid_count | Number of individuals (samples) |
sid_count | Number of SNPs (variants) |
dim | Number of individuals and SNPs |
fid | Family id of each of individual (sample) |
iid | Individual id of each of individual (sample) |
father | Father id of each of individual (sample) |
mother | Mother id of each of individual (sample) |
sex | Sex of each individual (sample) |
pheno | A phenotype for each individual (seldom used) |
chromosome | Chromosome of each SNP (variant) |
sid | SNP Id of each SNP (variant) |
cm_position | Centimorgan position of each SNP (variant) |
bp_position | Base-pair position of each SNP (variant) |
allele_1 | First allele of each SNP (variant) |
allele_2 | Second allele of each SNP (variant) |
metadata | All the metadata returned as a struct.Metadata |
ReadOptions
When using ReadOptions::builder to read genotype data, use these options to
specify a desired numeric type,
which individuals (samples) to read, which SNPs (variants) to read, etc.
| Option | Description |
|---|---|
i8 | Read values as i8 |
f32 | Read values as f32 |
f64 | Read values as f64 |
iid_index | Index of individuals (samples) to read (defaults to all) |
sid_index | Index of SNPs (variants) to read (defaults to all) |
f | Order of the output array, Fortran-style (default) |
c | Order of the output array, C-style |
is_f | Is order of the output array Fortran-style? (defaults to true) |
missing_value | Value to use for missing values (defaults to -127 or NaN) |
count_a1 | Count the number allele 1 (default) |
count_a2 | Count the number allele 2 |
is_a1_counted | Is allele 1 counted? (defaults to true) |
num_threads | Number of threads to use (defaults to all processors) |
Index Expressions
Select which individuals (samples) and SNPs (variants) to read by using these
iid_index and/or
sid_index expressions.
| Example | Type | Description |
|---|---|---|
| nothing | () | All |
2 | isize | Index position 2 |
-1 | isize | Last index position |
vec![0, 10, -2] | Vec<isize> | Index positions 0, 10, and 2nd from last |
[0, 10, -2] | [isize] | Index positions 0, 10, and 2nd from last |
ndarray::array![0, 10, -2] | ndarray::Array1<isize> | Index positions 0, 10, and 2nd from last |
10..20 | Range<usize> | Index positions 10 (inclusive) to 20 (exclusive). Note: Rust ranges don’t support negatives |
..=19 | RangeInclusive<usize> | Index positions 0 (inclusive) to 19 (inclusive). Note: Rust ranges don’t support negatives |
| any Rust ranges | Range*<usize> | Note: Rust ranges don’t support negatives |
s![10..20;2] | ndarray::SliceInfo1 | Index positions 10 (inclusive) to 20 (exclusive) in steps of 2 |
s![-20..-10;-2] | ndarray::SliceInfo1 | 10th from last (exclusive) to 20th from last (inclusive), in steps of -2 |
vec![true, false, true] | Vec<bool> | Index positions 0 and 2. |
[true, false, true] | [bool] | Index positions 0 and 2. |
ndarray::array![true, false, true] | ndarray::Array1<bool> | Index positions 0 and 2. |
s![true, false, true] | ndarray::SliceInfo1 | Index positions 0 and 2. |
Environment Variables
If ReadOptionsBuilder::num_threads
or WriteOptionsBuilder::num_threads is not specified,
the number of threads to use is determined by these environment variable (in order of priority):
BED_READER_NUM_THREADSNUM_THREADS
If neither of these environment variables are set, all processors are used.
Structs
Represents a PLINK .bed file that is open for reading genotype data and metadata.
Builder for Bed.
Represents the metadata from PLINK .fam and .bim files.
Builder for Metadata.
Represents options for reading genotype data from a PLINK .bed file.
Builder for ReadOptions.
Represents options for writing genotype data and metadata to a PLINK .bed file.
Builder for WriteOptions.
Enums
All errors specific to this library.
All possible errors returned by this library and the libraries it depends on.
A specification of which individuals (samples) or SNPs (variants) to read.
All Metadata fields.
Error type for WriteOptionsBuilder
Traits
A trait alias, used internally, for the values of a .bed file, namely i8, f32, f64.
A trait alias, used internally, to provide default missing values for i8, f32, f64.
Functions
True if and only if two 2-D arrays are equal, within a given tolerance and possibly treating NaNs as values.
Asserts two 2-D arrays are equal, treating NaNs as values.
Return a path to a temporary directory.
Type Definitions
Type alias for 1-D slices of NDArrays.