einsum_codegen/
lib.rs

1#![cfg_attr(
2    doc,
3    feature(prelude_import, custom_inner_attributes, proc_macro_hygiene)
4)]
5#![cfg_attr(doc, katexit::katexit)]
6//! Helper crate for einsum algorithm
7//!
8//! Introduction to einsum
9//! -----------------------
10//! The Einstein summation rule in theoretical physics and related field
11//! including machine learning is a rule for abbreviating tensor operations.
12//! For example, one of most basic tensor operation is inner product of
13//! two vectors in $n$-dimensional Euclidean space $x, y \in \mathbb{R}^n$:
14//! $$
15//! (x, y) = \sum_{i \in I} x_i y_i
16//! $$
17//! where $I$ denotes a set of indices, i.e. $I = \\{0, 1, \ldots, n-1 \\}$.
18//! Another example is matrix multiplications.
19//! A multiplication of three square matrices $A, B, C \in M_n(\mathbb{R})$
20//! can be written as its element:
21//! $$
22//! ABC_{il} = \sum_{j \in J} \sum_{k \in K} a_{ij} b_{jk} c_{kl}
23//! $$
24//!
25//! Many such tensor operations appear in various field,
26//! and we usually define many functions corresponding to each operations.
27//! For inner product of vectors, we may defines a function like
28//! ```ignore
29//! fn inner(a: Array1D<R>, b: Array1D<R>) -> R;
30//! ```
31//! for matrix multiplication:
32//! ```ignore
33//! fn matmul(a: Array2D<R>, b: Array2D<R>) -> Array2D<R>;
34//! ```
35//! or taking three matrices:
36//! ```ignore
37//! fn matmul3(a: Array2D<R>, b: Array2D<R>, c: Array2D<R>) -> Array2D<R>;
38//! ```
39//! and so on.
40//!
41//! These definitions are very similar, and actually,
42//! they can be represented in a single manner.
43//! These computations multiplicate the element of each tensor with some indices,
44//! and sum up them along some indices.
45//! So we have to determine
46//!
47//! - what indices to be used for each tensors in multiplications
48//! - what indices to be summed up
49//!
50//! This can be done by ordering indices for input tensors
51//! with a Einstein summation rule, i.e. sum up indices which appears more than once.
52//! For example, `inner` is represented by `i,i->`, `matmul` is represented by `ij,jk->ik`,
53//! `matmul3` is represented by `ij,jk,kl->il`, and so on
54//! where `,` is the separator of each indices
55//! and each index must be represented by a single char like `i` or `j`.
56//! `->` separates the indices of input tensors and indices of output tensor.
57//! If no indices are placed like `i,i->`, it means the tensor is 0-rank, i.e. a scalar value.
58//! "einsum" is an algorithm or runtime to be expand such string
59//! into actual tensor operations.
60//!
61//! einsum algorithm
62//! -----------------
63//! We discuss an overview of einsum algorithm for understanding the structure of this crate.
64//!
65//! ### Factorize and Memorize partial summation
66//! Partial summation and its memorization reduces number of floating point operations.
67//! For simplicity, both addition `+` and multiplication `*` are counted as 1 operation,
68//! and do not consider fused multiplication-addition (FMA).
69//! In the above `matmul3` example, there are $\\#K \times \\#J$ addition
70//! and $2 \times \\#K \times \\#J$ multiplications for every indices $(i, l)$,
71//! where $\\#$ denotes the number of elements in the index sets.
72//! Assuming the all sizes of indices are same and denoted by $N$,
73//! there are $O(N^4)$ floating point operations.
74//!
75//! When we sum up partially along `j`:
76//! $$
77//! \sum_{k \in K} c_{kl} \left( \sum_{j \in J} a_{ij} b_{jk} \right),
78//! $$
79//! and memorize its result as $d_{ik}$:
80//! $$
81//! \sum_{k \in K} c_{kl} d_{ik},
82//! \text{where} \space d_{ik} = \sum_{j \in J} a_{ij} b_{jk},
83//! $$
84//! there are $O(N^3)$ operations for both computing $d_{ik}$ and final summation
85//! with $O(N^2)$ memorization storage.
86//!
87//! When is this factorization possible? We know that above `matmul3` example
88//! is also written as associative matrix product $ABC = A(BC) = (AB)C$,
89//! and partial summation along $j$ is corresponding to store $D = AB$.
90//! This is not always possible. Let us consider a trace of two matrix product
91//! $$
92//! \text{Tr} (AB) = \sum_{i \in I} \sum_{j \in J} a_{ij} b_{ji}
93//! $$
94//! This is written as `ij,ji->` in einsum subscript form.
95//! We cannot factor out both $a_{ij}$ and $b_{ji}$ out of summation
96//! since they contain both indices.
97//! Whether this factorization is possible or not can be determined only
98//! from einsum subscript form, and we call a subscript is "reducible"
99//! if factorization is possible, and "irreducible" if not possible,
100//! i.e. `ij,jk,kl->il` is reducible and `ij,ji->` is irreducible.
101//!
102//! ### Subscript representation
103//!
104//! To discuss subscript factorization, we have to track which tensors are
105//! used as each input.
106//! In above `matmul3` example, `ij,jk,kl->il` is factorized into sub-subscripts
107//! `ij,jk->ik` and `ik,kl->il` where `ik` in the second subscript uses
108//! the output of first subscript. The information of the name of tensors
109//! has been dropped from sub-subscripts,
110//! and we have to create a mechanism for managing it.
111//!
112//! We introduce a subscript representation of `matmul3` case with tensor names:
113//!
114//! ```text
115//! ij,jk,kl->il | a b c -> out
116//! ```
117//!
118//! In this form, the factorization can be described:
119//!
120//! ```text
121//! ij,jk->ik | a b -> d
122//! ik,kl->il | d c -> out
123//! ```
124//!
125//! To clarify the tensor is given from user or created while factorization,
126//! we use `arg{N}` and `out{N}` identifiers:
127//!
128//! ```text
129//! ij,jk->ik | arg0 arg1 -> out0
130//! ik,kl->il | out0 arg2 -> out1
131//! ```
132//!
133//! ### Summation order
134//!
135//! This factorization is not unique.
136//! Apparently, there are two ways for `matmul3` case as corresponding to $(AB)C$:
137//!
138//! ```text
139//! ij,jk->ik | arg0 arg1 -> out0
140//! ik,kl->il | out0 arg2 -> out1
141//! ```
142//!
143//! and to $A(BC)$:
144//!
145//! ```text
146//! jk,kl->jl | arg1 arg2 -> out0
147//! jl,ij->il | out0 arg0 -> out1
148//! ```
149//!
150//! These are different procedure i.e. number of floating operations
151//! and required intermediate memories are different,
152//! but return same output tensor
153//! (we ignore non-associativity of floating numbers on this document).
154//! This becomes complicated combinational optimization problem
155//! if there are many contraction indicies,
156//! and the objective of this crate is to (heuristically) solve this problem.
157//!
158
159pub mod codegen;
160pub mod parser;
161
162mod namespace;
163mod path;
164mod subscripts;
165
166pub use namespace::*;
167pub use path::*;
168pub use subscripts::*;