Macro axpy::axpy
[−]
[src]
macro_rules! axpy { [$y:ident $assign:tt $($rest:tt)+] => { ... }; [! $y:ident $assign:tt ($($parsed:tt)*) $x:ident] => { ... }; [! $y:ident $assign:tt ($($parsed:tt)*) + $x:ident] => { ... }; [! $y:ident $assign:tt ($($parsed:tt)*) - $x:ident] => { ... }; [! $y:ident $assign:tt ($($parsed:tt)*) $x:ident + $($rest:tt)+] => { ... }; [! $y:ident $assign:tt ($($parsed:tt)*) $x:ident - $($rest:tt)+] => { ... }; [! $y:ident $assign:tt ($($parsed:tt)*) + $x:ident + $($rest:tt)+] => { ... }; [! $y:ident $assign:tt ($($parsed:tt)*) + $x:ident - $($rest:tt)+] => { ... }; [! $y:ident $assign:tt ($($parsed:tt)*) - $x:ident + $($rest:tt)+] => { ... }; [! $y:ident $assign:tt ($($parsed:tt)*) - $x:ident - $($rest:tt)+] => { ... }; [! $y:ident $assign:tt ($($parsed:tt)*) $a:tt * $x:ident $($rest:tt)*] => { ... }; [! $y:ident $assign:tt ($($parsed:tt)*) + $a:tt * $x:ident $($rest:tt)*] => { ... }; [! $y:ident $assign:tt ($($parsed:tt)*) - $a:tt * $x:ident $($rest:tt)*] => { ... }; [! $y:ident $assign:tt ($($parsed:tt)+)] => { ... }; [@ $y:ident; $iter:expr; ] => { ... }; [@ $y:ident; $iter:expr; $a:tt $op:tt $x:ident $($rest:tt)*] => { ... }; [# $y:ident; $car:expr; $cdr:expr; ($($parsed:tt)+)] => { ... }; [# $y:ident; $car:expr; $cdr:expr; ($($parsed:tt)*) $a:tt * $x:ident] => { ... }; [# $y:ident; $car:expr; $cdr:expr; ($($parsed:tt)*) + ^ $x:ident] => { ... }; [# $y:ident; $car:expr; $cdr:expr; ($($parsed:tt)*) - ^ $x:ident] => { ... }; [# $y:ident; $car:expr; $cdr:expr; ($($parsed:tt)*) $a:tt * $x:ident $($rest:tt)+] => { ... }; [# $y:ident; $car:expr; $cdr:expr; ($($parsed:tt)*) + ^ $x:ident $($rest:tt)+] => { ... }; [# $y:ident; $car:expr; $cdr:expr; ($($parsed:tt)*) - ^ $x:ident $($rest:tt)+] => { ... }; }
Exposes linear combinations of slice-like objects of Copy values to LLVM's auto-vectorizer
Linear combinations of vectors don't on their own lend themselves to nice optimizations. For
example, consider a*x+b*y+c*z
. Since the operator overloads are binary, this naively maps to
two two for-loops: one for temp = a*x+b*y
and another for result=temp+c*z
. The classic
solution is to employ "expression templates" which are effectively values representing lazy
operations, to be evaluated when encountering an assignment statement or when otherwise useful.
The C++ library Eigen is an excellent library implementing this approach, but as anybody who
has used it knows, there is a lot of magic going on that can lead to incomprehensible error
messages.
As a simple alternative, we provide a macro that converts a linear combination to a canonical
Rust representation that is amenable to LLVM's auto-vectorizer. That is, the macro converts
statements like z = a*x + b*y + c*z
to
for (x, (y, z)) in x.iter().zip(y.iter().zip(z.iter_mut())) {
*z = a * *x + b * *y + c * *z;
}
If x
, y
, and z
are slices, bounds-checks are known to be elided, resulting in fairly
optimal code. The value of the macro is that any combination-like expression is generated, e.g.
w = 2.0 * x - z
becomes
for (x, (z, w)) in x.iter().zip(z.iter().zip(w.iter_mut())) {
*w = 2.0 * *x - *z;
}
In addition to =
, both +=
and -=
are supported. (Technically any assignment operator
works, e.g. /=
, but that is an accident of implementation rather than an intended feature.)
The assigned variable may appear anywhere in the constructed expression, as the macro is
designed to take appropriate care of the mutable borrow. Coefficients may be compatible scalar
literals or variables.