Skip to main content

ParSortIters

Struct ParSortIters 

Source
pub struct ParSortIters { /* private fields */ }
Expand description

Takes a sequence of iterators of (labelled)pairs as input, and turns them into SplitIters structure which is suitable for BvCompConfig::par_comp_lenders.

Note that batches will be memory-mapped. If you encounter OS-level errors using this class (e.g., ENOMEM: Out of memory under Linux), please review the limitations of your OS regarding memory-mapping (e.g., /proc/sys/vm/max_map_count under Linux).

§Examples

In this example we transpose a graph in parallel by splitting it, exchanging the source and destination of each arc, sorting the resulting pairs in parallel using ParSortIters, and then compressing the result using BvCompConfig::par_comp_lenders:

use std::num::NonZeroUsize;

use dsi_bitstream::traits::BE;
use rayon::prelude::*;
use webgraph::prelude::*;
use webgraph::graphs::bvgraph::{BvComp, CompFlags};
use webgraph::traits::{SequentialLabeling, SplitLabeling};
use webgraph::utils::par_sort_iters::ParSortIters;

// Build a small VecGraph
let g = VecGraph::from_arcs([
    (0, 4),
    (1, 0),
    (1, 3),
    (2, 1),
    (3, 2),
]);

let num_nodes = g.num_nodes();
let num_partitions = 2;

// Split the graph into lenders and convert each to pairs
let pairs: Vec<_> = g
    .split_iter(num_partitions)
    .into_iter()
    .map(|lender| lender.into_pairs().map(|(src, dst)| (dst, src)))
    .collect();

// Sort the pairs using ParSortIters
let pair_sorter = ParSortIters::new(num_nodes)?
    .num_partitions(NonZeroUsize::new(num_partitions).unwrap());

let sorted = pair_sorter.sort(pairs)?;

// Convert to (node, lender) pairs using From trait
let pairs: Vec<_> = sorted.into();

// Compress in parallel using par
let bvcomp_tmp_dir = tempfile::tempdir()?;
let bvcomp_out_dir = tempfile::tempdir()?;

// Use with par_comp_lenders
BvComp::with_basename(bvcomp_out_dir.path().join("graph")).
    par_comp_lenders::<BE, _>(pairs, num_nodes)?;

Implementations§

Source§

impl ParSortIters

Source

pub fn sort( &self, pairs: impl IntoIterator<Item: IntoIterator<Item = (usize, usize), IntoIter: Send + Sync> + Send + Sync, IntoIter: ExactSizeIterator + Send + Sync>, ) -> Result<SplitIters<impl IntoIterator<Item = (usize, usize), IntoIter: Send + Sync>>>

This is a convenience method for iterators that cannot fail. See try_sort.

Source

pub fn try_sort<E: Into<Error>>( &self, pairs: impl IntoIterator<Item: IntoIterator<Item = (usize, usize), IntoIter: Send + Sync> + Send + Sync, IntoIter: ExactSizeIterator + Send + Sync>, ) -> Result<SplitIters<impl IntoIterator<Item = (usize, usize), IntoIter: Send + Sync>>>

Sorts the output of the provided sequence of iterators, returning a SplitIters structure.

Source§

impl ParSortIters

Source

pub fn new(num_nodes: usize) -> Result<Self>

Creates a new ParSortIters instance.

The methods num_partitions (which sets the number of iterators in the resulting SplitIters), memory_usage, and expected_num_pairs can be used to customize the instance.

This method will return an error if the number of CPUs returned by num_cpus::get() is zero.

Source

pub fn expected_num_pairs(self, expected_num_pairs: usize) -> Self

Approximate number of pairs to be sorted.

Used only for progress reporting.

Source

pub fn num_partitions(self, num_partitions: NonZeroUsize) -> Self

How many partitions to split the nodes into.

This is the number of iterators in the resulting SplitIters.

Defaults to num_cpus::get().

Source

pub fn memory_usage(self, memory_usage: MemoryUsage) -> Self

How much memory to use for in-memory sorts.

Larger values yield faster merges (by reducing logarithmically the number of batches to merge) but consume linearly more memory. We suggest to set this parameter as large as possible, depending on the available memory. The default is the default of MemoryUsage.

Source

pub fn sort_labeled<C: BatchCodec, P: IntoIterator<Item: IntoIterator<Item = ((usize, usize), C::Label), IntoIter: Send> + Send, IntoIter: ExactSizeIterator>>( &self, batch_codec: C, pairs: P, ) -> Result<SplitIters<impl IntoIterator<Item = ((usize, usize), C::Label), IntoIter: Send + Sync> + use<C, P>>>

See try_sort_labeled.

This is a convenience method for iterators that cannot fail.

Source

pub fn try_sort_labeled<C: BatchCodec, E: Into<Error>, P: IntoIterator<Item: IntoIterator<Item = ((usize, usize), C::Label), IntoIter: Send> + Send, IntoIter: ExactSizeIterator>>( &self, batch_codec: C, pairs: P, ) -> Result<SplitIters<impl IntoIterator<Item = ((usize, usize), C::Label), IntoIter: Send + Sync> + use<C, E, P>>>

Sorts the output of the provided sequence of iterators of (labelled) pairs, returning a SplitIters structure.

This method accept as type parameter a BitSerializer and a BitDeserializer that are used to serialize and deserialize the labels.

The bit deserializer must be Clone because we need one for each BatchIterator, and there are possible scenarios in which the deserializer might be stateful.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CastableFrom<T> for T

Source§

fn cast_from(value: T) -> T

Call Self as W
Source§

impl<T, U> CastableInto<U> for T
where U: CastableFrom<T>,

Source§

fn cast(self) -> U

Call W::cast_from(self)
Source§

impl<T> DowncastableFrom<T> for T

Source§

fn downcast_from(value: T) -> T

Truncate the current UnsignedInt to a possibly smaller size
Source§

impl<T, U> DowncastableInto<U> for T
where U: DowncastableFrom<T>,

Source§

fn downcast(self) -> U

Call W::downcast_from(self)
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Splat<T> for T

Source§

fn splat(value: T) -> T

Source§

impl<T> To<T> for T

Source§

fn to(self) -> T

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> UpcastableFrom<T> for T

Source§

fn upcast_from(value: T) -> T

Extend the current UnsignedInt to a possibly bigger size.
Source§

impl<T, U> UpcastableInto<U> for T
where U: UpcastableFrom<T>,

Source§

fn upcast(self) -> U

Call W::upcast_from(self)
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V