Crate bed_reader
source ·Expand description
§bed-reader
Read and write the PLINK BED format, simply and efficiently.
§Highlights
- Fast and multi-threaded
- Supports many indexing methods. Slice data by individuals (samples) and/or SNPs (variants).
- The Python-facing APIs for this library is used by PySnpTools, FaST-LMM, and PyStatGen.
- Supports PLINK 1.9.
- Read data locally or from the cloud, efficiently and directly.
§Install
Full version: Can read local and cloud files
cargo add bed-reader
Minimal version: Can read local files, only
cargo add bed-reader --no-default-features
§Examples
Read all genotype data from a .bed file.
use ndarray as nd;
use bed_reader::{Bed, ReadOptions, assert_eq_nan, sample_bed_file};
let file_name = sample_bed_file("small.bed")?;
let mut bed = Bed::new(file_name)?;
let val = ReadOptions::builder().f64().read(&mut bed)?;
assert_eq_nan(
&val,
&nd::array![
[1.0, 0.0, f64::NAN, 0.0],
[2.0, 0.0, f64::NAN, 2.0],
[0.0, 1.0, 2.0, 0.0]
],
);
Read every second individual (samples) and SNPs (variants) 20 to 30.
use ndarray::s;
let file_name = sample_bed_file("some_missing.bed")?;
let mut bed = Bed::new(file_name)?;
let val = ReadOptions::builder()
.iid_index(s![..;2])
.sid_index(20..30)
.f64()
.read(&mut bed)?;
assert!(val.dim() == (50, 10));
List the first 5 individual (sample) ids, the first 5 SNP (variant) ids, and every unique chromosome. Then, read every genomic value in chromosome 5.
use std::collections::HashSet;
let mut bed = Bed::new(file_name)?;
println!("{:?}", bed.iid()?.slice(s![..5])); // Outputs ndarray: ["iid_0", "iid_1", "iid_2", "iid_3", "iid_4"]
println!("{:?}", bed.sid()?.slice(s![..5])); // Outputs ndarray: ["sid_0", "sid_1", "sid_2", "sid_3", "sid_4"]
println!("{:?}", bed.chromosome()?.iter().collect::<HashSet<_>>());
// Outputs: {"12", "10", "4", "8", "19", "21", "9", "15", "6", "16", "13", "7", "17", "18", "1", "22", "11", "2", "20", "3", "5", "14"}
let val = ReadOptions::builder()
.sid_index(bed.chromosome()?.map(|elem| elem == "5"))
.f64()
.read(&mut bed)?;
assert!(val.dim() == (100, 6));
From the cloud: open a file and read data for one SNP (variant)
at index position 2. (See “Cloud URLs and CloudFile
Examples”
for details specifying a file in the cloud.)
use ndarray as nd;
use bed_reader::{assert_eq_nan, BedCloud, ReadOptions};
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/small.bed";
let mut bed_cloud = BedCloud::new(url).await?;
let val = ReadOptions::builder().sid_index(2).f64().read_cloud(&mut bed_cloud).await?;
assert_eq_nan(&val, &nd::array![[f64::NAN], [f64::NAN], [2.0]]);
§Project Links
- Installation
- Documentation
- Questions via email
- Source code
- Discussion
- Bug Reports
- Project Website
- Change Log
§Main Functions
Function | Description |
---|---|
Bed::new or Bed::builder | Open a local PLINK .bed file for reading genotype data and metadata. |
BedCloud::new , BedCloud::new_with_options ,BedCloud::builder , BedCloud::builder_with_options ,BedCloud::from_cloud_file , BedCloud::builder_from_cloud_file | Open a cloud PLINK .bed file for reading genotype data and metadata. |
ReadOptions::builder | Read genotype data from a local or cloud file. Supports indexing and options. |
WriteOptions::builder | Write values to a local file in PLINK .bed format. Supports metadata and options. |
§Bed
Metadata Methods
After using Bed::new
or Bed::builder
to open a PLINK .bed file for reading, use
these methods to see metadata.
Method | Description |
---|---|
iid_count | Number of individuals (samples) |
sid_count | Number of SNPs (variants) |
dim | Number of individuals and SNPs |
fid | Family id of each of individual (sample) |
iid | Individual id of each of individual (sample) |
father | Father id of each of individual (sample) |
mother | Mother id of each of individual (sample) |
sex | Sex of each individual (sample) |
pheno | A phenotype for each individual (seldom used) |
chromosome | Chromosome of each SNP (variant) |
sid | SNP Id of each SNP (variant) |
cm_position | Centimorgan position of each SNP (variant) |
bp_position | Base-pair position of each SNP (variant) |
allele_1 | First allele of each SNP (variant) |
allele_2 | Second allele of each SNP (variant) |
metadata | All the metadata returned as a struct.Metadata |
§ReadOptions
When using ReadOptions::builder
to read genotype data, use these options to
specify a desired numeric type,
which individuals (samples) to read, which SNPs (variants) to read, etc.
Option | Description |
---|---|
i8 | Read values as i8 |
f32 | Read values as f32 |
f64 | Read values as f64 |
iid_index | Index of individuals (samples) to read (defaults to all) |
sid_index | Index of SNPs (variants) to read (defaults to all) |
f | Order of the output array, Fortran-style (default) |
c | Order of the output array, C-style |
is_f | Is order of the output array Fortran-style? (defaults to true) |
missing_value | Value to use for missing values (defaults to -127 or NaN) |
count_a1 | Count the number allele 1 (default) |
count_a2 | Count the number allele 2 |
is_a1_counted | Is allele 1 counted? (defaults to true) |
num_threads | Number of threads to use (defaults to all processors) |
max_concurrent_requests | Maximum number of concurrent async requests (defaults to 10) – Used by BedCloud . |
max_chunk_bytes | Maximum chunk size of async requests (defaults to 8_000_000 bytes) – Used by BedCloud . |
§Index
Expressions
Select which individuals (samples) and SNPs (variants) to read by using these
iid_index
and/or
sid_index
expressions.
Example | Type | Description |
---|---|---|
nothing | () | All |
2 | isize | Index position 2 |
-1 | isize | Last index position |
vec![0, 10, -2] | Vec<isize> | Index positions 0, 10, and 2nd from last |
[0, 10, -2] | [isize] and [isize;n] | Index positions 0, 10, and 2nd from last |
ndarray::array![0, 10, -2] | ndarray::Array1<isize> | Index positions 0, 10, and 2nd from last |
10..20 | Range<usize> | Index positions 10 (inclusive) to 20 (exclusive). Note: Rust ranges don’t support negatives |
..=19 | RangeInclusive<usize> | Index positions 0 (inclusive) to 19 (inclusive). Note: Rust ranges don’t support negatives |
any Rust ranges | Range*<usize> | Note: Rust ranges don’t support negatives |
s![10..20;2] | ndarray::SliceInfo1 | Index positions 10 (inclusive) to 20 (exclusive) in steps of 2 |
s![-20..-10;-2] | ndarray::SliceInfo1 | 10th from last (exclusive) to 20th from last (inclusive), in steps of -2 |
vec![true, false, true] | Vec<bool> | Index positions 0 and 2. |
[true, false, true] | [bool] and [bool;n] | Index positions 0 and 2. |
ndarray::array![true, false, true] | ndarray::Array1<bool> | Index positions 0 and 2. |
§Environment Variables
BED_READER_NUM_THREADS
NUM_THREADS
If ReadOptionsBuilder::num_threads
or WriteOptionsBuilder::num_threads
is not specified,
the number of threads to use is determined by these environment variable (in order of priority):
If neither of these environment variables are set, all processors are used.
BED_READER_DATA_DIR
Any requested sample file will be downloaded to this directory. If the environment variable is not set, a cache folder, appropriate to the OS, will be used.
Macros§
- Asserts that a result is an error and that the error is of a given variant.
Structs§
- Represents a PLINK .bed file that is open for reading genotype data and metadata.
- Builder for
Bed
. - Represents a PLINK .bed file in the cloud that is open for reading genotype data and metadata.
- Builder for
BedCloud
. - The main struct representing the location of a file in the cloud.
- Represents the metadata from PLINK .fam and .bim files.
- Builder for
Metadata
. - Represents options for reading genotype data from a PLINK .bed file.
- Builder for
ReadOptions
. - Represents options for writing genotype data and metadata to a PLINK .bed file.
- Builder for
WriteOptions
.
Enums§
- All errors specific to this library.
- All possible errors returned by this library and the libraries it depends on.
- The error type for
CloudFile
methods. - A specification of which individuals (samples) or SNPs (variants) to read.
- All Metadata fields.
- Error type for WriteOptionsBuilder
Constants§
- An empty set of cloud options
Traits§
- A trait alias, used internally, for the values of a .bed file, namely i8, f32, f64.
- A trait alias, used internally, to provide default missing values for i8, f32, f64.
Functions§
- True if and only if two 2-D arrays are equal, within a given tolerance and possibly treating NaNs as values.
- Asserts two 2-D arrays are equal, treating NaNs as values.
- Returns the local path to a sample .bed file. If necessary, the file will be downloaded.
- Returns the cloud location of a sample .bed file as a URL string.
- Returns the local path to a sample file. If necessary, the file will be downloaded.
- Returns the local paths to a list of files. If necessary, the files will be downloaded.
- Returns the cloud location of a sample file as a URL string.
- Returns the cloud locations of a list of files as URL strings.
Type Aliases§
- Type alias for 1-D slices of NDArrays.