Module dataset

Source
Expand description

This module contains all the resources needed to load and examine datasets.

A Dataset is, in essence, a list of Events, each of which contain all the pertinent information about a single set of initial- and final-state particles, as well as an index and weight within the Dataset.

This crate currently supports loading Datasets from ROOT and Parquet files (see Dataset::from_root and Dataset::from_parquet. These methods require the following “branches” or “columns” to be present in the file:

Branch NameData TypeNotes
WeightFloat32
E_BeamFloat32
Px_BeamFloat32
Py_BeamFloat32
Pz_BeamFloat32
E_FinalState[Float32][recoil, daughter #1, daughter #2, …]
Px_FinalState[Float32][recoil, daughter #1, daughter #2, …]
Py_FinalState[Float32][recoil, daughter #1, daughter #2, …]
Pz_FinalState[Float32][recoil, daughter #1, daughter #2, …]
EPS[Float32][$P_\gamma \cos(\Phi)$, $P_\gamma \sin(\Phi)$, $0.0$] for linear polarization with magnitude $P_\gamma$ and angle $\Phi$

The EPS branch is optional and files without such a branch can be loaded under the following conditions. First, if we don’t care about polarization, and wish to set EPS = [0.0, 0.0, 0.0], we can do so using the methods [ReadMethod::EPS(0.0, 0.0, 0.0)]. If a data file contains events with only one polarization, we can compute the EPS vector ourselves and use [ReadMethod::EPS(x, y, z)] to load the same vector for every event. Finally, to provide compatibility with the way polarization is sometimes included in AmpTools files, we can note that the beam is often only moving along the $z$-axis, so the $x$ and $y$ components are typically 0.0 anyway, so we can store the $x$, $y$, and $z$ components of EPS in the beam’s three-momentum and use the ReadMethod::EPSInBeam to extract it. All of these methods are used as an input for either Dataset::from_parquet or Dataset::from_root.

There are also several methods used to split up Datasets based on their component values. The Dataset::get_selected_indices method returns a Vec<usize> of event indices corresponding to events for which some input query returns True.

Often, we want to use a query to divide data into many bins, so there is a method Dataset::get_binned_indices which will bin data by a query which takes an Event and returns an Field value (rather than a bool).

This method also takes a range: (Field, Field) and a number of bins nbins: usize, and it returns a (Vec<Vec<usize>>, Vec<usize>, Vec<usize>). These fields correspond to the binned datasets, the underflow bin, and the overflow bin respectively, so no data should ever be “lost” by this operation. There is also a convenience method, Dataset::split_m, to split the dataset by the mass of the summed four-momentum of any of the daughter particles, specified by their index.

Structs§

Dataset
An array of Events with some helpful methods for accessing and parsing the data they contain.
Event
The Event struct contains all the information concerning a single interaction between particles in the experiment. See the individual fields for additional information.

Enums§

ReadMethod
An enum which lists various methods used to read data into Events.