runmat-runtime 0.4.1

Core runtime for RunMat with builtins, BLAS/LAPACK integration, and execution APIs
Documentation
{
  "title": "gather",
  "category": "acceleration/gpu",
  "keywords": [
    "gather",
    "gpuArray",
    "download",
    "host copy",
    "accelerate",
    "residency"
  ],
  "summary": "Transfer gpuArray data back to host memory, recursively handling cells, structs, and objects.",
  "references": [
    "https://www.mathworks.com/help/parallel-computing/gather.html"
  ],
  "gpu_support": {
    "elementwise": false,
    "reduction": false,
    "precisions": [
      "f32",
      "f64"
    ],
    "broadcasting": "none",
    "notes": "Executes on the CPU; gpuArray inputs are downloaded through the provider's `download` hook and residency metadata is cleared so planners know the value now lives on the host."
  },
  "fusion": {
    "elementwise": false,
    "reduction": false,
    "max_inputs": 1,
    "constants": "inline"
  },
  "requires_feature": null,
  "tested": {
    "unit": "builtins::acceleration::gpu::gather::tests",
    "integration": "builtins::acceleration::gpu::gather::tests::gather_downloads_gpu_tensor",
    "wgpu": "builtins::acceleration::gpu::gather::tests::gather_wgpu_provider_roundtrip"
  },
  "description": "`gather(X)` copies data that resides on the GPU or in another distributed storage back into host memory. In RunMat, this means turning `gpuArray` handles into dense MATLAB values while leaving input values that are already on the CPU unchanged.",
  "behaviors": [
    "Accepts any MATLAB value. Non-GPU inputs (numbers, logicals, structs, strings, etc.) pass through untouched, so `gather` is safe to call unconditionally at API boundaries.",
    "Downloads gpuArray tensors via the active acceleration provider, producing dense double-precision matrices. Logical gpuArray inputs return logical arrays with MATLAB-compatible 0/1 encoding.",
    "Recursively descends into cells, structs, and objects, gathering every nested gpuArray handle. This mirrors MATLAB's behaviour when you gather composite data structures.",
    "Clears residency metadata so the auto-offload planner treats the gathered value as host-resident.",
    "Supports multiple inputs: in a single-output context it returns a `1×N` cell array preserving the original order; in a multi-output assignment the number of inputs and outputs must match, mirroring MATLAB's requirement.",
    "Raises `gather: no acceleration provider registered` when you attempt to download gpuArray data without an active provider, and propagates provider-specific download errors verbatim."
  ],
  "examples": [
    {
      "description": "Converting a gpuArray back to host memory",
      "input": "G = gpuArray([1 2 3; 4 5 6]);\nH = gather(G)",
      "output": "H =\n     1     2     3\n     4     5     6"
    },
    {
      "description": "Gathering data that is already on the CPU",
      "input": "x = [10 20 30];\ny = gather(x)",
      "output": "y =\n    10    20    30"
    },
    {
      "description": "Preserving logical values when gathering",
      "input": "mask = gpuArray(logical([1 0 1 0]));\nhostMask = gather(mask)",
      "output": "hostMask =\n  1×4 logical array\n   1   0   1   0"
    },
    {
      "description": "Gathering gpuArray values stored inside a cell array",
      "input": "C = {gpuArray([1 2]), 42};\nhostC = gather(C)",
      "output": "hostC =\n  1×2 cell array\n    {[1 2]}    {[42]}"
    },
    {
      "description": "Gathering struct fields that live on the GPU",
      "input": "S.data = gpuArray(magic(3));\nS.label = \"gpu result\";\nS_host = gather(S)",
      "output": "S_host =\n  struct with fields:\n     data: [3×3 double]\n    label: \"gpu result\""
    },
    {
      "description": "Gathering multiple gpuArrays into one cell result",
      "input": "A = gpuArray(eye(3));\nB = gpuArray(ones(3));\ncellOut = gather(A, B)",
      "output": "cellOut =\n  1×2 cell array\n    {[3×3 double]}    {[3×3 double]}"
    },
    {
      "description": "Gathering results at the end of a GPU pipeline",
      "input": "A = gpuArray(rand(1024, 1));\nB = sin(A) .* 5;\nresult = gather(B)",
      "output": "result(1:3) =\n    4.1377\n    2.4884\n    0.1003"
    }
  ],
  "faqs": [
    {
      "question": "Does `gather` modify the original gpuArray?",
      "answer": "No. `gather` returns a host-side copy. The original gpuArray value remains valid and continues to reside on the GPU until it goes out of scope."
    },
    {
      "question": "What happens if the input does not live on the GPU?",
      "answer": "Nothing changes—the value is returned as-is. This makes `gather` safe to sprinkle into code paths that may or may not run on the GPU."
    },
    {
      "question": "How are logical gpuArray values represented after gathering?",
      "answer": "Logical handles are tagged during `gpuArray` creation. `gather` reads that metadata and produces a MATLAB logical array with the same shape, ensuring comparisons like `isa(result, 'logical')` behave as expected."
    },
    {
      "question": "Does `gather` recurse into cells, structs, and objects?",
      "answer": "Yes. Every nested gpuArray handle inside a cell array, struct field, or object property is downloaded and replaced with host data."
    },
    {
      "question": "What happens when I pass multiple inputs but capture a single output?",
      "answer": "RunMat follows MATLAB: it gathers each input and returns a `1×N` cell array so you can unpack values later. In multi-output assignments you must request the same number of outputs as inputs."
    },
    {
      "question": "What if no acceleration provider is registered?",
      "answer": "RunMat raises `gather: no acceleration provider registered` when you attempt to gather a gpuArray without an active provider. Register a provider (for example, via `runmat-accelerate`) before calling `gather`."
    },
    {
      "question": "Does `gather` free GPU memory automatically?",
      "answer": "No. The gpuArray remains on the device. Free the handle explicitly (by clearing the variable) if you no longer need it."
    }
  ],
  "links": [
    {
      "label": "gpuArray",
      "url": "./gpuarray"
    },
    {
      "label": "gpuDevice",
      "url": "./gpudevice"
    },
    {
      "label": "sum",
      "url": "./sum"
    },
    {
      "label": "mean",
      "url": "./mean"
    },
    {
      "label": "arrayfun",
      "url": "./arrayfun"
    },
    {
      "label": "gpuInfo",
      "url": "./gpuinfo"
    },
    {
      "label": "pagefun",
      "url": "./pagefun"
    }
  ],
  "source": {
    "label": "`crates/runmat-runtime/src/builtins/acceleration/gpu/gather.rs`",
    "url": "https://github.com/runmat-org/runmat/blob/main/crates/runmat-runtime/src/builtins/acceleration/gpu/gather.rs"
  },
  "gpu_residency": "RunMat's auto-offload planner keeps tensors on the GPU until a builtin marked as a sink (such as `gather`, plotting functions, or I/O) requests host access. You usually call `gather` at API boundaries, for example to log results or hand them to CPU-only libraries. If the upstream computation never leaves the GPU, you can omit `gather` and keep chaining gpu-aware builtins.",
  "gpu_behavior": [
    "`gather` itself runs on the CPU. When the input contains gpuArray handles, the builtin calls the provider's `download` hook to retrieve a `HostTensorOwned` view, converts the result into MATLAB data, and clears residency via `runmat_accelerate_api::clear_residency`. If the provider does not implement `download`, the builtin surfaces the provider error so you know the backend must be extended. When the input is already on the host, no provider work is required."
  ]
}