Module shuffle

Module shuffle 

Source
Expand description

Read and write pseudo-shuffled SAF format.

The aim of the format is not to give a true, random shuffle of the sites in the input SAF file(s), but to move them around enough to break blocks of linkage disequilibrium. However, this must be done in constant memory, i.e. streaming through the input.

In overview, the strategy is to pre-allocate a file of the correct size, which must be known up front, and then “split” this file into B blocks. Then consecutive writes go into consecutive blocks to be spread across the file.

§Examples

§Write

use winsfs_core::io::shuffle::{Header, Writer};

// This must be known up front; for a single SAF, this can be gotten from the index.
let sites = 100_000; // Number of sites in the input.
let shape = vec![9]; // Number of values per site (per population, just one here).
let blocks = 20;     // Number of blocks to use for pseudo-shuffle.
let header = Header::new(sites, shape, blocks);

let mut writer = Writer::create("/path/to/saf.shuf", header)?;

// Get sites from somewhere and write. The number of values must match the shape.
let site = vec![0.; 9];
for _ in 0..sites {
    writer.write_site(site.as_slice())?;
}

// Writer expects exactly as many sites as provided in the header:
// any more will throw error in [`write_site`], any less will panic
// in the drop check; use try_finish to check for this error.
writer.try_finish()?;

Structs§

Header
The header for a pseudo-shuffled SAF file.
Reader
A pseudo-shuffled SAF file reader.
Writer
A pseudo-shuffled SAF file writer.

Constants§

MAGIC_NUMBER
The magic number written as the first 8 bytes of a pseudo-shuffled SAF file.