Macro axpy::axpy [] [src]

macro_rules! axpy {
    [$y:ident $assign:tt $($rest:tt)+] => { ... };
    [! $y:ident $assign:tt ($($parsed:tt)*)   $x:ident] => { ... };
    [! $y:ident $assign:tt ($($parsed:tt)*) + $x:ident] => { ... };
    [! $y:ident $assign:tt ($($parsed:tt)*) - $x:ident] => { ... };
    [! $y:ident $assign:tt ($($parsed:tt)*)   $x:ident + $($rest:tt)+] => { ... };
    [! $y:ident $assign:tt ($($parsed:tt)*)   $x:ident - $($rest:tt)+] => { ... };
    [! $y:ident $assign:tt ($($parsed:tt)*) + $x:ident + $($rest:tt)+] => { ... };
    [! $y:ident $assign:tt ($($parsed:tt)*) + $x:ident - $($rest:tt)+] => { ... };
    [! $y:ident $assign:tt ($($parsed:tt)*) - $x:ident + $($rest:tt)+] => { ... };
    [! $y:ident $assign:tt ($($parsed:tt)*) - $x:ident - $($rest:tt)+] => { ... };
    [! $y:ident $assign:tt ($($parsed:tt)*)   $a:tt * $x:ident $($rest:tt)*] => { ... };
    [! $y:ident $assign:tt ($($parsed:tt)*) + $a:tt * $x:ident $($rest:tt)*] => { ... };
    [! $y:ident $assign:tt ($($parsed:tt)*) - $a:tt * $x:ident $($rest:tt)*] => { ... };
    [! $y:ident $assign:tt ($($parsed:tt)+)] => { ... };
    [@ $y:ident; $iter:expr; ] => { ... };
    [@ $y:ident; $iter:expr; $a:tt $op:tt $x:ident $($rest:tt)*] => { ... };
    [# $y:ident; $car:expr; $cdr:expr; ($($parsed:tt)+)] => { ... };
    [# $y:ident; $car:expr; $cdr:expr; ($($parsed:tt)*) $a:tt * $x:ident] => { ... };
    [# $y:ident; $car:expr; $cdr:expr; ($($parsed:tt)*) + ^ $x:ident] => { ... };
    [# $y:ident; $car:expr; $cdr:expr; ($($parsed:tt)*) - ^ $x:ident] => { ... };
    [# $y:ident; $car:expr; $cdr:expr; ($($parsed:tt)*) $a:tt * $x:ident $($rest:tt)+] => { ... };
    [# $y:ident; $car:expr; $cdr:expr; ($($parsed:tt)*) + ^ $x:ident $($rest:tt)+] => { ... };
    [# $y:ident; $car:expr; $cdr:expr; ($($parsed:tt)*) - ^ $x:ident $($rest:tt)+] => { ... };
}

Exposes linear combinations of slice-like objects of Copy values to LLVM's auto-vectorizer

Linear combinations of vectors don't on their own lend themselves to nice optimizations. For example, consider a*x+b*y+c*z. Since the operator overloads are binary, this naively maps to two two for-loops: one for temp = a*x+b*y and another for result=temp+c*z. The classic solution is to employ "expression templates" which are effectively values representing lazy operations, to be evaluated when encountering an assignment statement or when otherwise useful. The C++ library Eigen is an excellent library implementing this approach, but as anybody who has used it knows, there is a lot of magic going on that can lead to incomprehensible error messages.

As a simple alternative, we provide a macro that converts a linear combination to a canonical Rust representation that is amenable to LLVM's auto-vectorizer. That is, the macro converts statements like z = a*x + b*y + c*z to for (x, (y, z)) in x.iter().zip(y.iter().zip(z.iter_mut())) { *z = a * *x + b * *y + c * *z; }

If x, y, and z are slices, bounds-checks are known to be elided, resulting in fairly optimal code. The value of the macro is that any combination-like expression is generated, e.g. w = 2.0 * x - z becomes for (x, (z, w)) in x.iter().zip(z.iter().zip(w.iter_mut())) { *w = 2.0 * *x - *z; }

In addition to =, both += and -= are supported. (Technically any assignment operator works, e.g. /=, but that is an accident of implementation rather than an intended feature.) The assigned variable may appear anywhere in the constructed expression, as the macro is designed to take appropriate care of the mutable borrow. Coefficients may be compatible scalar literals or variables.