Expand description
This module contains all the resources needed to load and examine datasets.
A Dataset
is, in essence, a list of Event
s, each of which contain all the pertinent
information about a single set of initial- and final-state particles, as well as an index
and weight within the Dataset
.
This crate currently supports loading Dataset
s from ROOT and Parquet files (see
Dataset::from_root
and Dataset::from_parquet
. These methods require the following
“branches” or “columns” to be present in the file:
Branch Name | Data Type | Notes |
---|---|---|
Weight | Float32 | |
E_Beam | Float32 | |
Px_Beam | Float32 | |
Py_Beam | Float32 | |
Pz_Beam | Float32 | |
E_FinalState | [Float32] | [recoil, daughter #1, daughter #2, …] |
Px_FinalState | [Float32] | [recoil, daughter #1, daughter #2, …] |
Py_FinalState | [Float32] | [recoil, daughter #1, daughter #2, …] |
Pz_FinalState | [Float32] | [recoil, daughter #1, daughter #2, …] |
EPS | [Float32] | [$P_\gamma \cos(\Phi) $, $P_\gamma \sin(\Phi) $, $0.0 $] for linear polarization with magnitude $P_\gamma $ and angle $\Phi $ |
The EPS
branch is optional and files without such a branch can be loaded under the
following conditions. First, if we don’t care about polarization, and wish to set EPS
=
[0.0, 0.0, 0.0]
, we can do so using the methods [ReadMethod::EPS(0.0, 0.0, 0.0)
]. If
a data file contains events with only one polarization, we can compute the EPS
vector
ourselves and use [ReadMethod::EPS(x, y, z)
] to load the same vector for every event.
Finally, to provide compatibility with the way polarization is sometimes included in
AmpTools
files, we can note that the beam is often only moving along the
$z
$-axis, so the $x
$ and $y
$ components are typically 0.0
anyway, so we can store
the $x
$, $y
$, and $z
$ components of EPS
in the beam’s three-momentum and use the
ReadMethod::EPSInBeam
to extract it. All of these methods are used as an input for either
Dataset::from_parquet
or Dataset::from_root
.
There are also several methods used to split up Dataset
s based on their component
values. The Dataset::get_selected_indices
method returns a Vec<usize>
of event indices
corresponding to events for which some input query returns True
.
Often, we want to use a query to divide data into many bins, so there is a method
Dataset::get_binned_indices
which will bin data by a query which takes an Event
and
returns an Field
value (rather than a bool
).
This method also takes a range: (Field, Field)
and a number of bins nbins: usize
, and it
returns a (Vec<Vec<usize>>, Vec<usize>, Vec<usize>)
. These fields correspond to the binned
datasets, the underflow bin, and the overflow bin respectively, so no data should ever be
“lost” by this operation. There is also a convenience method, Dataset::split_m
, to split
the dataset by the mass of the summed four-momentum of any of the daughter particles,
specified by their index.
Structs§
- Dataset
- An array of
Event
s with some helpful methods for accessing and parsing the data they contain. - Event
- The
Event
struct contains all the information concerning a single interaction between particles in the experiment. See the individual fields for additional information.
Enums§
- Read
Method - An enum which lists various methods used to read data into
Event
s.