Module blas_array2::document_dev
source · Expand description
§Document for Develop
This document is written at v0.2.0 version.
§Structure of Wrapper
For example of GEMM:
-
trait GEMMFunc
(implemented forBLASFunc
)This trait directly wraps FFI function
?gemm_
, with generic support by<F> where F: BLASFloat
. -
struct GEMM_Driver
(with trait boundBLASDriver
)This struct is a (not so safe) wrapper to FFI wrapper
GEMMFunc
. This struct takes rust types, and return result by calling functionrun_blas
function, which calls FFI function.On calling function
run_blas
, all fields inGEMM_Driver
will be consumed (dropped), and only result matrix $\mathbf{C}$ will be returned. -
struct GEMM_
(with trait boundBLASBuilder_
)This struct acts as inner driver, and will be wrapped by
GEMM_Builder
by cratederive_builder
.Trait function
driver
(inner driver, trait function) ofBLASBuilder_
is not the same to struct functiondriver
(outer driver, automatically derived) ofGEMM_Builder
.This struct is not designed to be directly called. When calling
driver
function inGEMM_
,- All input (except output) should be column-major;
- All fields should be filled;
layout
must beSome(BLASColMajor)
;- Quick return if zero dimension input;
- Output
c
can be arbitary major, or simplyNone
.
Inner driver will perform
- Obtaining dimension and leading dimension that will be needed in FFI functions,
- Checking validability of parameters (all in column-major),
- Allocating output buffer if output
c
is not column-major (to beArrayOut::ToBeCloned
), or not defined (to beArrayOut::Owned
). - Returning FFI
GEMM_Driver
for future computation.
-
struct GEMM_Builder
(with trait boundBLASBuilder
)This struct is builder, automatically derived by crate
derive_builder
, which is designed to be accessible by API user.This struct accepts input with arbitary stride, though this is not recommanded.
Trait function
run
will perform the following works:- Check layout (will be further explained in Layout Convention),
- Perform trans / side / uplo flag flip, and matrix transposition to proper layout;
- Generate
GEMM_
struct, and perform FFI computation; - If layout is row-major, reverse axis of output matrix from FFI (FFI only accepts column-major matrix).
§Layout Convention
§Row-major and Col-major
For this crate, row-major and col-major is not the same meaning to C-contiguous and F-contiguous (flags of numpy).
Contiguous means that all data in memory is contiguous. Difference of C-contiguous and F-contiguous in that how the matrix is presented in memory.
C-contiguous F-contiguous Non-contiguous Sequential
(by row) (by col) (by row) (onedim)
+++++ +++-- ----- +++++
+++++ +++-- -+++-
+++++ +++-- -+++-
----- +++-- -+++-
----- +++-- -----
shape [3, 5] [5, 3] [3, 3]
strides [5, 1] [1, 5] [5, 1]
Major does not require all data in memory is contiguous. However, it require data in one dimension to be contiguous.
row-major col-major row/col-major row/col-major custom-layout
(by row) (by col) (by row) (by col) (by row)
++++- ----- ----- ----- +-+-+
----- +-+-+ -+++- -+++- -----
++++- +-+-+ -+++- -+++- +-+-+
----- +-+-+ -+++- -+++- -----
++++- +-+-+ ----- ----- +-+-+
shape [ 3, 4] [4, 3] [3, 3] [3, 3] [ 3, 3]
strides [10, 1] [1, 10] [5, 1] [1, 5] [10, 2]
In this crate, we check row/col-major if stride of second/first dimension is one.
§BLAS3
For all BLAS3 functions, specifying row-major or col-major to keyword layout
will not change result; however, efficiency may vary.
Layout is defined by this way:
- Use layout provided by user; otherwise (user does not explicitly provide layout)
- If output matrix provided by user is either row-major or col-major, then specify layout by output matrix layout; otherwise (output matrix is custom-layout, or no output matrix provided)
- If all input is col-major, then specify layout as col-major; otherwise, specify layout as row-major.
§BLAS2
For packed and banded operations, row-major and col-major could incur different results. For more information, we refer to BLAST document of C Interface to the Legacy BLAS.
In other cases, result is unrelated to layout, as the same case in BLAS3.