pub struct ParSortIters { /* private fields */ }Expand description
Takes a sequence of iterators of (labelled)pairs as input, and turns them
into SplitIters structure which is suitable for
BvCompConfig::par_comp_lenders.
Note that batches will be memory-mapped. If you encounter OS-level errors
using this class (e.g., ENOMEM: Out of memory under Linux), please review
the limitations of your OS regarding memory-mapping (e.g.,
/proc/sys/vm/max_map_count under Linux).
§Examples
In this example we transpose a graph in parallel by splitting it, exchanging
the source and destination of each arc, sorting the resulting pairs in
parallel using ParSortIters, and then compressing the result using
BvCompConfig::par_comp_lenders:
use std::num::NonZeroUsize;
use dsi_bitstream::traits::BE;
use rayon::prelude::*;
use webgraph::prelude::*;
use webgraph::graphs::bvgraph::{BvComp, CompFlags};
use webgraph::traits::{SequentialLabeling, SplitLabeling};
use webgraph::utils::par_sort_iters::ParSortIters;
// Build a small VecGraph
let g = VecGraph::from_arcs([
(0, 4),
(1, 0),
(1, 3),
(2, 1),
(3, 2),
]);
let num_nodes = g.num_nodes();
let num_partitions = 2;
// Split the graph into lenders and convert each to pairs
let pairs: Vec<_> = g
.split_iter(num_partitions)
.into_iter()
.map(|lender| lender.into_pairs().map(|(src, dst)| (dst, src)))
.collect();
// Sort the pairs using ParSortIters
let pair_sorter = ParSortIters::new(num_nodes)?
.num_partitions(NonZeroUsize::new(num_partitions).unwrap());
let sorted = pair_sorter.sort(pairs)?;
// Convert to (node, lender) pairs using From trait
let pairs: Vec<_> = sorted.into();
// Compress in parallel using par
let bvcomp_tmp_dir = tempfile::tempdir()?;
let bvcomp_out_dir = tempfile::tempdir()?;
// Use with par_comp_lenders
BvComp::with_basename(bvcomp_out_dir.path().join("graph")).
par_comp_lenders::<BE, _>(pairs, num_nodes)?;Implementations§
Source§impl ParSortIters
impl ParSortIters
Sourcepub fn sort(
&self,
pairs: impl IntoIterator<Item: IntoIterator<Item = (usize, usize), IntoIter: Send + Sync> + Send + Sync, IntoIter: ExactSizeIterator + Send + Sync>,
) -> Result<SplitIters<impl IntoIterator<Item = (usize, usize), IntoIter: Send + Sync>>>
pub fn sort( &self, pairs: impl IntoIterator<Item: IntoIterator<Item = (usize, usize), IntoIter: Send + Sync> + Send + Sync, IntoIter: ExactSizeIterator + Send + Sync>, ) -> Result<SplitIters<impl IntoIterator<Item = (usize, usize), IntoIter: Send + Sync>>>
This is a convenience method for iterators that cannot fail.
See try_sort.
Sourcepub fn try_sort<E: Into<Error>>(
&self,
pairs: impl IntoIterator<Item: IntoIterator<Item = (usize, usize), IntoIter: Send + Sync> + Send + Sync, IntoIter: ExactSizeIterator + Send + Sync>,
) -> Result<SplitIters<impl IntoIterator<Item = (usize, usize), IntoIter: Send + Sync>>>
pub fn try_sort<E: Into<Error>>( &self, pairs: impl IntoIterator<Item: IntoIterator<Item = (usize, usize), IntoIter: Send + Sync> + Send + Sync, IntoIter: ExactSizeIterator + Send + Sync>, ) -> Result<SplitIters<impl IntoIterator<Item = (usize, usize), IntoIter: Send + Sync>>>
Sorts the output of the provided sequence of iterators, returning a
SplitIters structure.
Source§impl ParSortIters
impl ParSortIters
Sourcepub fn new(num_nodes: usize) -> Result<Self>
pub fn new(num_nodes: usize) -> Result<Self>
Creates a new ParSortIters instance.
The methods num_partitions (which sets
the number of iterators in the resulting SplitIters),
memory_usage, and
expected_num_pairs can be used to
customize the instance.
This method will return an error if the number of CPUs
returned by num_cpus::get() is zero.
Sourcepub fn expected_num_pairs(self, expected_num_pairs: usize) -> Self
pub fn expected_num_pairs(self, expected_num_pairs: usize) -> Self
Approximate number of pairs to be sorted.
Used only for progress reporting.
Sourcepub fn num_partitions(self, num_partitions: NonZeroUsize) -> Self
pub fn num_partitions(self, num_partitions: NonZeroUsize) -> Self
How many partitions to split the nodes into.
This is the number of iterators in the resulting SplitIters.
Defaults to num_cpus::get().
Sourcepub fn memory_usage(self, memory_usage: MemoryUsage) -> Self
pub fn memory_usage(self, memory_usage: MemoryUsage) -> Self
How much memory to use for in-memory sorts.
Larger values yield faster merges (by reducing logarithmically the
number of batches to merge) but consume linearly more memory. We suggest
to set this parameter as large as possible, depending on the available
memory. The default is the default of MemoryUsage.
Sourcepub fn sort_labeled<C: BatchCodec, P: IntoIterator<Item: IntoIterator<Item = ((usize, usize), C::Label), IntoIter: Send> + Send, IntoIter: ExactSizeIterator>>(
&self,
batch_codec: C,
pairs: P,
) -> Result<SplitIters<impl IntoIterator<Item = ((usize, usize), C::Label), IntoIter: Send + Sync> + use<C, P>>>
pub fn sort_labeled<C: BatchCodec, P: IntoIterator<Item: IntoIterator<Item = ((usize, usize), C::Label), IntoIter: Send> + Send, IntoIter: ExactSizeIterator>>( &self, batch_codec: C, pairs: P, ) -> Result<SplitIters<impl IntoIterator<Item = ((usize, usize), C::Label), IntoIter: Send + Sync> + use<C, P>>>
See try_sort_labeled.
This is a convenience method for iterators that cannot fail.
Sourcepub fn try_sort_labeled<C: BatchCodec, E: Into<Error>, P: IntoIterator<Item: IntoIterator<Item = ((usize, usize), C::Label), IntoIter: Send> + Send, IntoIter: ExactSizeIterator>>(
&self,
batch_codec: C,
pairs: P,
) -> Result<SplitIters<impl IntoIterator<Item = ((usize, usize), C::Label), IntoIter: Send + Sync> + use<C, E, P>>>
pub fn try_sort_labeled<C: BatchCodec, E: Into<Error>, P: IntoIterator<Item: IntoIterator<Item = ((usize, usize), C::Label), IntoIter: Send> + Send, IntoIter: ExactSizeIterator>>( &self, batch_codec: C, pairs: P, ) -> Result<SplitIters<impl IntoIterator<Item = ((usize, usize), C::Label), IntoIter: Send + Sync> + use<C, E, P>>>
Sorts the output of the provided sequence of iterators of (labelled)
pairs, returning a SplitIters structure.
This method accept as type parameter a
BitSerializer and a
BitDeserializer that are
used to serialize and deserialize the labels.
The bit deserializer must be Clone because we need one for each
BatchIterator, and there are possible
scenarios in which the deserializer might be stateful.
Auto Trait Implementations§
impl Freeze for ParSortIters
impl RefUnwindSafe for ParSortIters
impl Send for ParSortIters
impl Sync for ParSortIters
impl Unpin for ParSortIters
impl UnsafeUnpin for ParSortIters
impl UnwindSafe for ParSortIters
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T, U> CastableInto<U> for Twhere
U: CastableFrom<T>,
impl<T, U> CastableInto<U> for Twhere
U: CastableFrom<T>,
Source§impl<T> DowncastableFrom<T> for T
impl<T> DowncastableFrom<T> for T
Source§fn downcast_from(value: T) -> T
fn downcast_from(value: T) -> T
Source§impl<T, U> DowncastableInto<U> for Twhere
U: DowncastableFrom<T>,
impl<T, U> DowncastableInto<U> for Twhere
U: DowncastableFrom<T>,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more