Module guff_matrix::simulator[][src]

Expand description

Non-SIMD, pure Rust simulation of matrix multiplication algorithm

There are two simulators included here. Both are based on the idea of treating the transform and input matrices as infinite streams that wrap around. These streams are read sequentially and multiplied together to get a product stream. The product stream is then apportioned into dot products.

Using this method, the generated dot products fill the output matrix by progressing along the diagonal, with wrap-around. So long as the number of columns in the input and output matrices does not have the number of rows in the transform matrix as a factor, all elements in the output matrix will be populated.

Example: 4x3 transform x 3x5 input/output matrices

| a b c |   | i0 i3 i6 i9 ic |   | 0 16 12  8  4 |
| d e f | x | i1 i4 i7 ia id | = | 5  1 17 13  9 |
| g h i |   | i2 i5 i8 ib ie |   |10  6  2 18 14 |
| j k l |                        |15 11  7  3 19 |

This shows the order that elements are written to the output matrix. As can be seen, the entire matrix is filled.

Example: 4x3 transform x 3x3 input/output matrices

| a b c |   | i0 i3 i6 |   | 0  4  8 |
| d e f | x | i1 i4 i7 | = | 9  1  5 |
| g h i |   | i2 i5 i8 |   | 6 10  2 |
| j k l |                  | 3  7 11 |

This also works since the number of output columns (3) does not have the number of rows in the transform matrix (4) as a factor.

However: 4x3 transform x 3x4 input/output matrices

| a b c |   | i0 i3 i6 i9 |   | 0 - - - |
| d e f | x | i1 i4 i7 ia | = | - 1 - - |
| g h i |   | i2 i5 i8 ib |   | - - 2 - |
| j k l |                     | - - - 3 |

Here, the next output starts back at the original output location, so the output matrix will not be filled. Extending the matrix to be a larger multiple of 4 will not help, either.

If the number of rows in the transform matrix is n, and the number of columns in the input and output matrices is c, then:

  • if n = 1, then c can be any positive value;
  • if n > 1, then c is conditional on gcd(n,c) != n != c

Note the additional condition which I didn’t mention above. Basically if n is a multiple of c or c is a multiple of n, the wrap-around will not work properly.

Bytewise Simulation

The first simulation uses TransformMatrix, InputMatrix and OutputMatrix structs to store the matrices. The first two implement Iterator to simulate reading them as infinite streams. The MultipyStream struct simulates the cross product of these streams, and it also implements Iterator.

The warm_multiply function simulates the matrix multiplication algorithm by doing byte-by-byte reads from the MultipyStream.

SIMD Simulation

The second simulation changes from byte-wise reading from matrices with simd-wise reading.

  • reimplement versions of the transform and input matrices
  • reuse original output matrix
  • new generic Simd trait
  • concrete SimSimd implementation using \[u8; 8\] as its vector type
  • matrices being read from implement Iterator<Item=SimSimd>
  • separate function simsimd_warm_multiply() using the above

The main reason for reimplementing the two matrix types is that the existing ones already implement Iterator, and it’s not possible to reimplement it using a different Iterator::Item type.

The implementation of Simd and matrix multiply in the crate’s main module and SIMD code in the architecture-dependent modules follow the same general design as above.

Structs

Input matrix for first simulation. Uses column-wise data storage.

Structure for holding iterators over TransformMatrix and InputMatrix

Output matrix for first simulation. Supports row-wise and column-wise storage.

Newtype for faking Simd architecture

Input matrix type for second simulation

Transform matrix type for second simulation

Transform matrix for first simulation. Uses row-wise data storage.

Traits

Simulated SIMD engine type

Functions

Interleave row-wise data for storage in column-wise matrix

Second (simd-wise) matrix multiply simulation

First (byte-wise) matrix multiply simulation