runmat-runtime 0.4.1

Core runtime for RunMat with builtins, BLAS/LAPACK integration, and execution APIs
Documentation
{
  "title": "gpuArray",
  "category": "acceleration/gpu",
  "keywords": [
    "gpuArray",
    "gpu",
    "device",
    "upload",
    "accelerate",
    "dtype",
    "like",
    "size"
  ],
  "summary": "Move MATLAB values onto the active GPU with optional size, dtype, and prototype controls.",
  "references": [
    "https://www.mathworks.com/help/parallel-computing/gpuarray.html"
  ],
  "gpu_support": {
    "elementwise": false,
    "reduction": false,
    "precisions": [
      "f32",
      "f64"
    ],
    "broadcasting": "none",
    "notes": "Uploads host-resident data through the provider `upload` hook, re-uploading gpuArray inputs when dtype conversion is requested. Supports MATLAB-style size vectors, class strings, and `'like'` prototypes."
  },
  "fusion": {
    "elementwise": false,
    "reduction": false,
    "max_inputs": 1,
    "constants": "inline"
  },
  "requires_feature": null,
  "tested": {
    "unit": "builtins::acceleration::gpu::gpuarray::tests",
    "integration": "builtins::acceleration::gpu::gpuarray::tests::gpu_array_transfers_numeric_tensor",
    "conversions": "builtins::acceleration::gpu::gpuarray::tests::gpu_array_casts_to_int32",
    "reshape": "builtins::acceleration::gpu::gpuarray::tests::gpu_array_applies_size_arguments",
    "wgpu": "builtins::acceleration::gpu::gpuarray::tests::gpu_array_wgpu_roundtrip"
  },
  "description": "`gpuArray(X)` moves MATLAB values onto the active GPU and returns a handle that the rest of the runtime can execute on. RunMat mirrors MATLAB semantics, including MATLAB-style size arguments, explicit dtype toggles (such as `'single'`, `'int32'`, `'logical'`), and the `'like'` prototype syntax that matches the class of an existing array.",
  "behaviors": [
    "Accepts numeric tensors, logical arrays, booleans, character vectors, and existing gpuArray handles. Other input types raise descriptive errors so callers can gather or convert first.",
    "Optional leading size arguments (`gpuArray(data, m, n, ...)` or `gpuArray(data, [m n ...])`) reshape the uploaded value. The element count must match the requested size.",
    "Class strings such as `'single'`, `'double'`, `'int32'`, `'uint8'`, and `'logical'` convert the data before upload, matching MATLAB casting semantics (round-to-nearest with saturation for integers, `NaN`→0 for integer classes, and errors when converting `NaN` to logical).",
    "`'like', prototype` infers the dtype (and logical state) from `prototype`. Explicit class strings override the inference when both are supplied.",
    "`\"gpuArray\"` strings are accepted as no-ops so call-sites that forward arguments from constructors such as `zeros(..., 'gpuArray')` remain compatible.",
    "Inputs that are already gpuArray handles pass through by default. When a class change is requested, RunMat gathers the data, performs the conversion, uploads a fresh buffer, and frees the old handle.",
    "When no acceleration provider is registered, the builtin raises `gpuArray: no acceleration provider registered`."
  ],
  "examples": [
    {
      "description": "Moving a matrix to the GPU for elementwise work",
      "input": "A = [1 2 3; 4 5 6];\nG = gpuArray(A);\nout = gather(sin(G))",
      "output": "out =\n  2×3\n\n    0.8415    0.9093    0.1411\n   -0.7568   -0.9589   -0.2794"
    },
    {
      "description": "Uploading a scalar with dtype conversion",
      "input": "pi_single = gpuArray(pi, 'single');\nisa(pi_single, 'gpuArray');\nclass(gather(pi_single))",
      "output": "ans =\n  logical\n     1\n\nans =\n  single"
    },
    {
      "description": "Converting host data to a logical gpuArray",
      "input": "mask = gpuArray([0 2 -5 0], 'logical');\ngather(mask)",
      "output": "ans =\n  1×4 logical array\n\n   0   1   1   0"
    },
    {
      "description": "Matching an existing prototype with `'like'`",
      "input": "template = gpuArray(true(2, 2));\nvalues = gpuArray([10 20 30 40], [2 2], 'like', template);\nisequal(gather(values), logical([10 20; 30 40]))",
      "output": "ans =\n  logical\n     1"
    },
    {
      "description": "Reshaping during upload",
      "input": "flat = 1:6;\nG = gpuArray(flat, 2, 3);\nsize(G)",
      "output": "ans =\n     2     3"
    },
    {
      "description": "Calling `gpuArray` on an existing gpuArray handle",
      "input": "G = gpuArray([1 2 3]);\nH = gpuArray(G, 'double');\nisequal(G, H)",
      "output": "ans =\n  logical\n     1"
    }
  ],
  "faqs": [],
  "links": [
    {
      "label": "gather",
      "url": "./gather"
    },
    {
      "label": "gpuDevice",
      "url": "./gpudevice"
    },
    {
      "label": "gpuInfo",
      "url": "./gpuinfo"
    },
    {
      "label": "arrayfun",
      "url": "./arrayfun"
    },
    {
      "label": "zeros",
      "url": "./zeros"
    },
    {
      "label": "sum",
      "url": "./sum"
    },
    {
      "label": "pagefun",
      "url": "./pagefun"
    }
  ],
  "source": {
    "label": "crates/runmat-runtime/src/builtins/acceleration/gpu/gpuarray.rs",
    "url": "crates/runmat-runtime/src/builtins/acceleration/gpu/gpuarray.rs"
  },
  "gpu_residency": "RunMat’s auto-offload planner transparently moves and keeps tensors on the GPU when it predicts a benefit. You typically call `gpuArray` to honour MATLAB scripts that opt-in explicitly, to enforce residency before a long computation, or when you need MATLAB-style dtype conversion alongside the upload. The builtin never forces a host copy once the handle has been created.",
  "gpu_behavior": [
    "`gpuArray` itself runs on the CPU. For host inputs it prepares a `HostTensorView` and forwards it to the provider’s `upload` hook. For gpuArray inputs that require dtype conversion, the builtin gathers the existing buffer, casts the result on the host, uploads a replacement, and frees the original handle. Providers that do not yet implement `upload` should report an informative error; the builtin surface mirrors MATLAB’s message by prefixing it with `gpuArray:`."
  ]
}