Module rustitude_core::dataset
source · Expand description
This module contains all the resources needed to load and examine datasets.
A Dataset is, in essence, a list of Events, each of which contain all the pertinent
information about a single set of initial- and final-state particles, as well as an index
and weight within the Dataset.
This crate currently supports loading Datasets from ROOT and Parquet files (see
Dataset::from_root and Dataset::from_parquet. These methods require the following
“branches” or “columns” to be present in the file:
| Branch Name | Data Type | Notes |
|---|---|---|
Weight | Float32 | |
E_Beam | Float32 | |
Px_Beam | Float32 | |
Py_Beam | Float32 | |
Pz_Beam | Float32 | |
E_FinalState | [Float32] | [recoil, daughter #1, daughter #2, …] |
Px_FinalState | [Float32] | [recoil, daughter #1, daughter #2, …] |
Py_FinalState | [Float32] | [recoil, daughter #1, daughter #2, …] |
Pz_FinalState | [Float32] | [recoil, daughter #1, daughter #2, …] |
EPS | [Float32] | [$P_\gamma \cos(\Phi)$, $P_\gamma \sin(\Phi)$, $0.0$] for linear polarization with magnitude $P_\gamma$ and angle $\Phi$ |
The EPS branch is optional and files without such a branch can be loaded under the
following conditions. First, if we don’t care about polarization, and wish to set EPS =
[0.0, 0.0, 0.0], we can do so using the methods [ReadMethod::EPS(0.0, 0.0, 0.0)]. If
a data file contains events with only one polarization, we can compute the EPS vector
ourselves and use [ReadMethod::EPS(x, y, z)] to load the same vector for every event.
Finally, to provide compatibility with the way polarization is sometimes included in
AmpTools files, we can note that the beam is often only moving along the
$z$-axis, so the $x$ and $y$ components are typically 0.0 anyway, so we can store
the $x$, $y$, and $z$ components of EPS in the beam’s three-momentum and use the
ReadMethod::EPSInBeam to extract it. All of these methods are used as an input for either
Dataset::from_parquet or Dataset::from_root.
There are also several methods used to split up Datasets based on their component
values. The [Dataset::select] method takes mutable access to a dataset along with a query
function which takes an Event and returns a bool. For each event, if the query
returns true, the event is removed from the original dataset and added to a new dataset
which is then returned by the select function. The [Dataset::reject] method does the
opposite. For example,
let ds_original = Dataset::from_root("path.root", ReadMethod::Standard).unwrap();
let ds_a = ds_original.clone();
let ds_b = ds_original.clone();
let mass_gt_1_gev = |e: &Event| -> bool {
(e.daughter_p4s[0] + e.daughter_p4s[1]).m() > 1.0
};
let ds_a_selected = ds_a.select(mass_gt_1_gev);
let ds_b_rejected = ds_b.reject(mass_gt_1_gev);After this, ds_a and ds_b_rejected will contain events where the four-momentum of the
first two daughter particles combined has a mass less than $1.0$ GeV. On the other hand,
ds_a_selected and ds_b will have events where the opposite is true and the mass is
greater than $1.0$ GeV. The reason for this logic is two-fold. First, we might be
dealing with large datasets, so we don’t want to create copies of events if it can be
avoided. If copies are needed, they should be made explicitly with Dataset::clone.
Otherwise, we just extract the events from the dataset. The other reason is that the syntax
reads in a “correct” way. We expect let selected = data.select(condition); to put the
selected data into the selected dataset. We can then choose if we want to hold on to the
rejected data.
Since it is a common operation, there is also a method [Dataset::split] which will bin data
by a query which takes an Event and returns an Field value (rather than a bool).
This method also takes a range: (Field, Field) and a number of bins nbins: usize, and it
returns a (Vec<Dataset>, Dataset, Dataset). These fields correspond to the binned datasets,
the underflow bin, and the overflow bin respectively, so no data should ever be “lost” by this
operation. There is also a convenience method, Dataset::split_m, to split the dataset by
the mass of the summed four-momentum of any of the daughter particles, specified by their
index.
Structs§
- An array of
Events with some helpful methods for accessing and parsing the data they contain. - The
Eventstruct contains all the information concerning a single interaction between particles in the experiment. See the individual fields for additional information.
Enums§
- An enum which lists various methods used to read data into
Events.