Module rustitude_core::dataset

source ·
Expand description

This module contains all the resources needed to load and examine datasets.

A Dataset is, in essence, a list of Events, each of which contain all the pertinent information about a single set of initial- and final-state particles, as well as an index and weight within the Dataset.

This crate currently supports loading Datasets from ROOT and Parquet files (see Dataset::from_root and Dataset::from_parquet. These methods require the following “branches” or “columns” to be present in the file:

Branch NameData TypeNotes
WeightFloat32
E_BeamFloat32
Px_BeamFloat32
Py_BeamFloat32
Pz_BeamFloat32
E_FinalState[Float32][recoil, daughter #1, daughter #2, …]
Px_FinalState[Float32][recoil, daughter #1, daughter #2, …]
Py_FinalState[Float32][recoil, daughter #1, daughter #2, …]
Pz_FinalState[Float32][recoil, daughter #1, daughter #2, …]
EPS[Float32][$P_\gamma \cos(\Phi)$, $P_\gamma \sin(\Phi)$, $0.0$] for linear polarization with magnitude $P_\gamma$ and angle $\Phi$

The EPS branch is optional and files without such a branch can be loaded under the following conditions. First, if we don’t care about polarization, and wish to set EPS = [0.0, 0.0, 0.0], we can do so using the methods Dataset::from_root_unpolarized or Dataset::from_parquet_unpolarized. If a data file contains events with only one polarization, we can compute the EPS vector ourselves and use Dataset::from_root_with_eps or Dataset::from_parquet_with_eps to load the same vector for every event. Finally, to provide compatibility with the way polarization is sometimes included in AmpTools files, we can note that the beam is often only moving along the $z$-axis, so the $x$ and $y$ components are typically 0.0 anyway, so we can store the $x$ and $y$ components of EPS in the beam’s four-momentum and use the methods Dataset::from_root_eps_in_beam or Dataset::from_parquet_eps_in_beam to extract it.

There are also several methods used to split up Datasets based on their component values. The [Dataset::select] method takes mutable access to a dataset along with a query function which takes an Event and returns a bool. For each event, if the query returns true, the event is removed from the original dataset and added to a new dataset which is then returned by the select function. The [Dataset::reject] method does the opposite. For example,

let ds_original = Dataset::from_root("path.root").unwrap();
let ds_a = ds_original.clone();
let ds_b = ds_original.clone();
let mass_gt_1_gev = |e: &Event| -> bool {
    (e.daughter_p4s[0] + e.daughter_p4s[1]).m() > 1.0
};
let ds_a_selected = ds_a.select(mass_gt_1_gev);
let ds_b_rejected = ds_b.reject(mass_gt_1_gev);

After this, ds_a and ds_b_rejected will contain events where the four-momentum of the first two daughter particles combined has a mass less than $1.0$ GeV. On the other hand, ds_a_selected and ds_b will have events where the opposite is true and the mass is greater than $1.0$ GeV. The reason for this logic is two-fold. First, we might be dealing with large datasets, so we don’t want to create copies of events if it can be avoided. If copies are needed, they should be made explicitly with Dataset::clone. Otherwise, we just extract the events from the dataset. The other reason is that the syntax reads in a “correct” way. We expect let selected = data.select(condition); to put the selected data into the selected dataset. We can then choose if we want to hold on to the rejected data.

Since it is a common operation, there is also a method [Dataset::split] which will bin data by a query which takes an Event and returns an Field value (rather than a bool). This method also takes a range: (Field, Field) and a number of bins nbins: usize, and it returns a (Vec<Dataset>, Dataset, Dataset). These fields correspond to the binned datasets, the underflow bin, and the overflow bin respectively, so no data should ever be “lost” by this operation. There is also a convenience method, Dataset::split_m, to split the dataset by the mass of the summed four-momentum of any of the daughter particles, specified by their index.

Structs§

  • An array of Events with some helpful methods for accessing and parsing the data they contain.
  • The Event struct contains all the information concerning a single interaction between particles in the experiment. See the individual fields for additional information.