Skip to main content

edgefirst_image/
lib.rs

1// SPDX-FileCopyrightText: Copyright 2025 Au-Zone Technologies
2// SPDX-License-Identifier: Apache-2.0
3
4/*!
5
6## EdgeFirst HAL - Image Converter
7
8The `edgefirst_image` crate is part of the EdgeFirst Hardware Abstraction
9Layer (HAL) and provides functionality for converting images between
10different formats and sizes.  The crate is designed to work with hardware
11acceleration when available, but also provides a CPU-based fallback for
12environments where hardware acceleration is not present or not suitable.
13
14The main features of the `edgefirst_image` crate include:
15- Support for various image formats, including YUYV, RGB, RGBA, and GREY.
16- Support for source crop, destination crop, rotation, and flipping.
17- Image conversion using hardware acceleration (G2D, OpenGL) when available.
18- CPU-based image conversion as a fallback option.
19
20The crate uses [`TensorDyn`] from `edgefirst_tensor` to represent images,
21with [`PixelFormat`] metadata describing the pixel layout. The
22[`ImageProcessor`] struct manages the conversion process, selecting
23the appropriate conversion method based on the available hardware.
24
25## Examples
26
27```rust
28# use edgefirst_image::{ImageProcessor, Rotation, Flip, Crop, ImageProcessorTrait, load_image};
29# use edgefirst_tensor::{PixelFormat, DType, TensorDyn};
30# fn main() -> Result<(), edgefirst_image::Error> {
31let image = edgefirst_bench::testdata::read("zidane.jpg");
32let src = load_image(&image, Some(PixelFormat::Rgba), None)?;
33let mut converter = ImageProcessor::new()?;
34let mut dst = converter.create_image(640, 480, PixelFormat::Rgb, DType::U8, None)?;
35converter.convert(&src, &mut dst, Rotation::None, Flip::None, Crop::default())?;
36# Ok(())
37# }
38```
39
40## Environment Variables
41The behavior of the `edgefirst_image::ImageProcessor` struct can be influenced by the
42following environment variables:
43- `EDGEFIRST_FORCE_BACKEND`: When set to `cpu`, `g2d`, or `opengl` (case-insensitive),
44  only that single backend is initialized and no fallback chain is used. If the
45  forced backend fails to initialize, an error is returned immediately. This is
46  useful for benchmarking individual backends in isolation. When this variable is
47  set, the `EDGEFIRST_DISABLE_*` variables are ignored.
48- `EDGEFIRST_DISABLE_GL`: If set to `1`, disables the use of OpenGL for image
49  conversion, forcing the use of CPU or other available hardware methods.
50- `EDGEFIRST_DISABLE_G2D`: If set to `1`, disables the use of G2D for image
51  conversion, forcing the use of CPU or other available hardware methods.
52- `EDGEFIRST_DISABLE_CPU`: If set to `1`, disables the use of CPU for image
53  conversion, forcing the use of hardware acceleration methods. If no hardware
54  acceleration methods are available, an error will be returned when attempting
55  to create an `ImageProcessor`.
56
57Additionally the TensorMemory used by default allocations can be controlled using the
58`EDGEFIRST_TENSOR_FORCE_MEM` environment variable. If set to `1`, default tensor memory
59uses system memory. This will disable the use of specialized memory regions for tensors
60and hardware acceleration. However, this will increase the performance of the CPU converter.
61*/
62#![cfg_attr(coverage_nightly, feature(coverage_attribute))]
63
64/// Pitch alignment requirement for DMA-BUF tensors that may be imported as
65/// EGLImages by the GL backend. Mali Valhall (i.MX 95 / G310) rejects
66/// `eglCreateImageKHR` with `EGL_BAD_ALLOC` for any DMA-BUF whose row pitch
67/// is not a multiple of 64 bytes; Vivante GC7000UL (i.MX 8MP) accepts any
68/// pitch so the constant is harmless on that path. 64 is the smallest
69/// alignment that satisfies every embedded ARM GPU we ship to.
70///
71/// Applied automatically inside [`ImageProcessor::create_image`] when the
72/// allocation lands on `TensorMemory::Dma`. External callers that allocate
73/// their own DMA-BUF tensors (e.g. GStreamer plugins, video pipelines) can
74/// use [`align_width_for_gpu_pitch`] to compute a width whose resulting row
75/// stride satisfies this requirement.
76pub const GPU_DMA_BUF_PITCH_ALIGNMENT_BYTES: usize = 64;
77
78/// Round `width` (in pixels) up so the resulting row stride
79/// `width * bpp` is a multiple of [`GPU_DMA_BUF_PITCH_ALIGNMENT_BYTES`]
80/// AND a multiple of `bpp` (so the rounded width is an integer pixel count).
81///
82/// `bpp` must be the per-pixel byte count for the image's primary plane
83/// (e.g. 4 for RGBA8/BGRA8, 3 for RGB888, 1 for Grey/NV12-luma).
84///
85/// External callers — GStreamer plugins, video pipelines, anyone wrapping a
86/// foreign DMA-BUF — should call this when sizing the destination so that
87/// `eglCreateImageKHR` doesn't reject the import on Mali. Pre-aligned widths
88/// (640, 1280, 1920, 3008, 3840 …) round-trip unchanged; misaligned widths
89/// are bumped up to the next valid value.
90///
91/// # Overflow behaviour
92///
93/// All arithmetic is checked. If the alignment computation or the rounded
94/// width would overflow `usize`, the function logs a warning and returns the
95/// original `width` unchanged rather than wrapping or producing a smaller
96/// value. Callers can rely on the returned width being **at least** the
97/// requested width.
98///
99/// `bpp == 0` and `width == 0` short-circuit to return the input unchanged.
100///
101/// # Examples
102///
103/// ```
104/// use edgefirst_image::align_width_for_gpu_pitch;
105///
106/// // RGBA8 (bpp=4): width must round to a multiple of 16 pixels (64-byte stride).
107/// assert_eq!(align_width_for_gpu_pitch(1920, 4), 1920); // already aligned
108/// assert_eq!(align_width_for_gpu_pitch(3004, 4), 3008); // crowd.png case: +4 px
109/// assert_eq!(align_width_for_gpu_pitch(1281, 4), 1296); // +15 px
110///
111/// // RGB888 (bpp=3): width must round to a multiple of 64 pixels (192-byte stride).
112/// assert_eq!(align_width_for_gpu_pitch(640, 3), 640);
113/// assert_eq!(align_width_for_gpu_pitch(641, 3), 704);
114/// ```
115pub fn align_width_for_gpu_pitch(width: usize, bpp: usize) -> usize {
116    if bpp == 0 || width == 0 {
117        return width;
118    }
119
120    // The minimum aligned stride must be a common multiple of both the
121    // GPU's pitch alignment and the per-pixel byte count. Using the LCM
122    // guarantees the rounded stride is an integer multiple of `bpp`, so
123    // converting back to a pixel count is exact.
124    //
125    // Compute the alignment in pixels (`width_alignment`) so we never need
126    // to multiply `width * bpp`, which is the only operation that could
127    // realistically overflow for large caller-supplied widths.
128    let Some(lcm_alignment) = checked_num_integer_lcm(GPU_DMA_BUF_PITCH_ALIGNMENT_BYTES, bpp)
129    else {
130        log::warn!(
131            "align_width_for_gpu_pitch: lcm({GPU_DMA_BUF_PITCH_ALIGNMENT_BYTES}, {bpp}) \
132             overflows usize, returning unaligned width {width}"
133        );
134        return width;
135    };
136    if lcm_alignment == 0 {
137        return width;
138    }
139
140    debug_assert_eq!(lcm_alignment % bpp, 0);
141    let width_alignment = lcm_alignment / bpp;
142    if width_alignment == 0 {
143        return width;
144    }
145
146    let remainder = width % width_alignment;
147    if remainder == 0 {
148        return width;
149    }
150
151    let pad = width_alignment - remainder;
152    match width.checked_add(pad) {
153        Some(aligned) => aligned,
154        None => {
155            log::warn!(
156                "align_width_for_gpu_pitch: width {width} + pad {pad} overflows usize, \
157                 returning unaligned (caller should use a smaller width or pre-aligned size)"
158            );
159            width
160        }
161    }
162}
163
164/// Round `min_pitch_bytes` up to the next multiple of
165/// [`GPU_DMA_BUF_PITCH_ALIGNMENT_BYTES`]. Returns `None` if the rounded
166/// value would overflow `usize`. Returns `Some(0)` for input 0.
167///
168/// Used internally by [`ImageProcessor::create_image`] to compute the
169/// padded row stride for DMA-backed image allocations. External callers
170/// that need pixel-counted alignment (instead of raw byte pitch) should
171/// use [`align_width_for_gpu_pitch`] instead.
172#[cfg(target_os = "linux")]
173pub(crate) fn align_pitch_bytes_to_gpu_alignment(min_pitch_bytes: usize) -> Option<usize> {
174    let alignment = GPU_DMA_BUF_PITCH_ALIGNMENT_BYTES;
175    if min_pitch_bytes == 0 {
176        return Some(0);
177    }
178    let remainder = min_pitch_bytes % alignment;
179    if remainder == 0 {
180        return Some(min_pitch_bytes);
181    }
182    min_pitch_bytes.checked_add(alignment - remainder)
183}
184
185/// Overflow-safe least common multiple. Returns `None` when `(a / gcd) * b`
186/// would wrap.
187fn checked_num_integer_lcm(a: usize, b: usize) -> Option<usize> {
188    if a == 0 || b == 0 {
189        return Some(0);
190    }
191    let g = num_integer_gcd(a, b);
192    // a / g is exact (g divides a by definition) and at most a, so this
193    // division never panics. Only the subsequent multiply can overflow.
194    (a / g).checked_mul(b)
195}
196
197fn num_integer_gcd(a: usize, b: usize) -> usize {
198    if b == 0 {
199        a
200    } else {
201        num_integer_gcd(b, a % b)
202    }
203}
204
205/// Bytes-per-pixel for the primary plane of `format` at element size `elem`.
206/// Returns `None` for formats that don't have a single packed BPP (semi-planar
207/// chroma is handled separately, returning the luma-plane bpp).
208///
209/// External callers can use this together with [`align_width_for_gpu_pitch`]
210/// to size their own DMA-BUFs without having to remember per-format BPPs:
211///
212/// ```
213/// use edgefirst_image::{align_width_for_gpu_pitch, primary_plane_bpp};
214/// use edgefirst_tensor::PixelFormat;
215///
216/// let bpp = primary_plane_bpp(PixelFormat::Rgba, 1).unwrap();
217/// let aligned = align_width_for_gpu_pitch(3004, bpp);
218/// assert_eq!(aligned, 3008);
219/// ```
220pub fn primary_plane_bpp(format: PixelFormat, elem: usize) -> Option<usize> {
221    use edgefirst_tensor::PixelLayout;
222    match format.layout() {
223        PixelLayout::Packed => Some(format.channels() * elem),
224        PixelLayout::Planar => Some(elem),
225        // For NV12/NV16 the luma plane is single-channel so the pitch
226        // matches `elem`; the chroma plane uses the same pitch in bytes
227        // (UV is half-width but two interleaved channels = same pitch).
228        PixelLayout::SemiPlanar => Some(elem),
229        // `PixelLayout` is non-exhaustive — fall through unaligned for
230        // any future variant we don't yet recognise.
231        _ => None,
232    }
233}
234
235/// Return the GPU-aligned pitch in bytes when a DMA-backed image of
236/// `width × fmt` would need row-stride padding, or `None` when the
237/// natural pitch already satisfies `GPU_DMA_BUF_PITCH_ALIGNMENT_BYTES`
238/// or the caller has explicitly requested non-DMA memory.
239///
240/// Mali G310 (i.MX 95) rejects `eglCreateImage` from DMA-BUFs whose
241/// `PLANE0_PITCH_EXT` is not a multiple of 64 bytes, surfacing as
242/// `EGL_BAD_ALLOC`. Decoders like [`load_jpeg`]/[`load_png`] use this
243/// helper to decide whether to route through the two-buffer padded
244/// decode path.
245#[cfg(target_os = "linux")]
246pub(crate) fn padded_dma_pitch_for(
247    fmt: PixelFormat,
248    width: usize,
249    memory: &Option<TensorMemory>,
250) -> Option<usize> {
251    // Only pad when the caller explicitly requested DMA, or when they
252    // left memory selection to the allocator AND DMA is actually
253    // available. `Tensor::image_with_stride(..., None)` always routes
254    // through DMA allocation, so treating `None` as "DMA wanted"
255    // unconditionally would convert a normally-working image load into
256    // a hard failure on systems where DMA is unavailable (sandboxed
257    // CI, missing `/dev/dma_heap`, permission-denied containers) —
258    // whereas `Tensor::image(..., None)` would have fallen back to
259    // SHM/Mem there.
260    match memory {
261        Some(TensorMemory::Dma) => {}
262        None if edgefirst_tensor::is_dma_available() => {}
263        _ => return None,
264    }
265    // Padding only applies to packed layouts — `Tensor::image_with_stride`
266    // rejects semi-planar / planar formats, and those take their own
267    // per-plane pitches on import anyway.
268    if fmt.layout() != PixelLayout::Packed {
269        return None;
270    }
271    let bpp = primary_plane_bpp(fmt, 1)?;
272    let natural = width.checked_mul(bpp)?;
273    let aligned = align_pitch_bytes_to_gpu_alignment(natural)?;
274    if aligned > natural {
275        Some(aligned)
276    } else {
277        None
278    }
279}
280
281/// Row-copy a tightly-packed `src` tensor into a `dst` tensor that has a
282/// larger row stride (typically a DMA-BUF allocated with GPU-aligned pitch).
283///
284/// Both tensors must share the same width, height and pixel format. The
285/// bytes between the end of each source row and the next destination row
286/// are left untouched — EGL import doesn't read past the row's valid
287/// width, so the padding can remain whatever the allocator produced.
288#[cfg(target_os = "linux")]
289pub(crate) fn copy_packed_to_padded_dma(src: &Tensor<u8>, dst: &mut Tensor<u8>) -> Result<()> {
290    let width = dst.width().ok_or(Error::NotAnImage)?;
291    let height = dst.height().ok_or(Error::NotAnImage)?;
292    let fmt = dst.format().ok_or(Error::NotAnImage)?;
293    let src_width = src.width().ok_or(Error::NotAnImage)?;
294    let src_height = src.height().ok_or(Error::NotAnImage)?;
295    let src_fmt = src.format().ok_or(Error::NotAnImage)?;
296    if src_width != width || src_height != height || src_fmt != fmt {
297        return Err(Error::Internal(format!(
298            "copy_packed_to_padded_dma: src and dst image metadata must match \
299             (src: {src_width}x{src_height} {src_fmt:?}, dst: {width}x{height} {fmt:?})"
300        )));
301    }
302    let bpp = primary_plane_bpp(fmt, 1).ok_or_else(|| {
303        Error::NotSupported(format!(
304            "copy_packed_to_padded_dma: unknown bpp for {fmt:?}"
305        ))
306    })?;
307    let natural = width.checked_mul(bpp).ok_or_else(|| {
308        Error::Internal(format!(
309            "copy_packed_to_padded_dma: width {width} × bpp {bpp} overflows"
310        ))
311    })?;
312    let dst_stride = dst.effective_row_stride().ok_or_else(|| {
313        Error::Internal("copy_packed_to_padded_dma: dst has no effective row stride".into())
314    })?;
315
316    // `TensorMap` derefs to `[T]`, which gives us the slice without
317    // needing to import the `TensorMapTrait` at this call site.
318    let src_map = src.map()?;
319    let src_bytes: &[u8] = &src_map;
320    let mut dst_map = dst.map()?;
321    let dst_bytes: &mut [u8] = &mut dst_map;
322
323    if src_bytes.len() < natural.saturating_mul(height) {
324        return Err(Error::Internal(format!(
325            "copy_packed_to_padded_dma: src has {} bytes, need {} ({}x{} @ {} bpp)",
326            src_bytes.len(),
327            natural.saturating_mul(height),
328            width,
329            height,
330            bpp,
331        )));
332    }
333    if dst_bytes.len() < dst_stride.saturating_mul(height) {
334        return Err(Error::Internal(format!(
335            "copy_packed_to_padded_dma: dst has {} bytes, need {} ({} stride × {} rows)",
336            dst_bytes.len(),
337            dst_stride.saturating_mul(height),
338            dst_stride,
339            height,
340        )));
341    }
342
343    for row in 0..height {
344        let s = row * natural;
345        let d = row * dst_stride;
346        dst_bytes[d..d + natural].copy_from_slice(&src_bytes[s..s + natural]);
347    }
348    Ok(())
349}
350
351#[cfg(test)]
352use edgefirst_decoder::ProtoLayout;
353use edgefirst_decoder::{DetectBox, ProtoData, Segmentation};
354use edgefirst_tensor::{
355    DType, PixelFormat, PixelLayout, Tensor, TensorDyn, TensorMemory, TensorTrait as _,
356};
357use enum_dispatch::enum_dispatch;
358use std::{fmt::Display, time::Instant};
359use zune_jpeg::{
360    zune_core::{bytestream::ZCursor, colorspace::ColorSpace, options::DecoderOptions},
361    JpegDecoder,
362};
363use zune_png::PngDecoder;
364
365pub use cpu::CPUProcessor;
366pub use error::{Error, Result};
367#[cfg(target_os = "linux")]
368pub use g2d::G2DProcessor;
369#[cfg(target_os = "linux")]
370#[cfg(feature = "opengl")]
371pub use opengl_headless::GLProcessorThreaded;
372#[cfg(target_os = "linux")]
373#[cfg(feature = "opengl")]
374pub use opengl_headless::Int8InterpolationMode;
375#[cfg(target_os = "linux")]
376#[cfg(feature = "opengl")]
377pub use opengl_headless::{probe_egl_displays, EglDisplayInfo, EglDisplayKind};
378
379mod cpu;
380mod error;
381mod g2d;
382#[path = "gl/mod.rs"]
383mod opengl_headless;
384
385// Use `edgefirst_tensor::PixelFormat` variants (Rgb, Rgba, Grey, etc.) and
386// `TensorDyn` / `Tensor<u8>` with `.format()` metadata instead.
387
388/// Flips the image data, then rotates it. Returns a new `TensorDyn`.
389fn rotate_flip_to_dyn(
390    src: &Tensor<u8>,
391    src_fmt: PixelFormat,
392    rotation: Rotation,
393    flip: Flip,
394    memory: Option<TensorMemory>,
395) -> Result<TensorDyn, Error> {
396    let src_w = src.width().unwrap();
397    let src_h = src.height().unwrap();
398    let channels = src_fmt.channels();
399
400    let (dst_w, dst_h) = match rotation {
401        Rotation::None | Rotation::Rotate180 => (src_w, src_h),
402        Rotation::Clockwise90 | Rotation::CounterClockwise90 => (src_h, src_w),
403    };
404
405    // Rotate/flip into Mem staging then row-copy into padded DMA when the
406    // caller wants DMA and the destination width would produce an
407    // unaligned pitch (see [`padded_dma_pitch_for`]).
408    #[cfg(target_os = "linux")]
409    if let Some(aligned_pitch) = padded_dma_pitch_for(src_fmt, dst_w, &memory) {
410        let tmp = Tensor::<u8>::image(dst_w, dst_h, src_fmt, Some(TensorMemory::Mem))?;
411        let src_map = src.map()?;
412        let mut tmp_map = tmp.map()?;
413        CPUProcessor::flip_rotate_ndarray_pf(
414            &src_map,
415            &mut tmp_map,
416            dst_w,
417            dst_h,
418            channels,
419            rotation,
420            flip,
421        )?;
422        drop(tmp_map);
423        drop(src_map);
424        let mut dma = Tensor::<u8>::image_with_stride(
425            dst_w,
426            dst_h,
427            src_fmt,
428            aligned_pitch,
429            Some(TensorMemory::Dma),
430        )?;
431        copy_packed_to_padded_dma(&tmp, &mut dma)?;
432        return Ok(TensorDyn::from(dma));
433    }
434
435    let dst = Tensor::<u8>::image(dst_w, dst_h, src_fmt, memory)?;
436    let src_map = src.map()?;
437    let mut dst_map = dst.map()?;
438
439    CPUProcessor::flip_rotate_ndarray_pf(
440        &src_map,
441        &mut dst_map,
442        dst_w,
443        dst_h,
444        channels,
445        rotation,
446        flip,
447    )?;
448    drop(dst_map);
449    drop(src_map);
450
451    Ok(TensorDyn::from(dst))
452}
453
454#[derive(Debug, Clone, Copy, PartialEq, Eq)]
455pub enum Rotation {
456    None = 0,
457    Clockwise90 = 1,
458    Rotate180 = 2,
459    CounterClockwise90 = 3,
460}
461impl Rotation {
462    /// Creates a Rotation enum from an angle in degrees. The angle must be a
463    /// multiple of 90.
464    ///
465    /// # Panics
466    /// Panics if the angle is not a multiple of 90.
467    ///
468    /// # Examples
469    /// ```rust
470    /// # use edgefirst_image::Rotation;
471    /// let rotation = Rotation::from_degrees_clockwise(270);
472    /// assert_eq!(rotation, Rotation::CounterClockwise90);
473    /// ```
474    pub fn from_degrees_clockwise(angle: usize) -> Rotation {
475        match angle.rem_euclid(360) {
476            0 => Rotation::None,
477            90 => Rotation::Clockwise90,
478            180 => Rotation::Rotate180,
479            270 => Rotation::CounterClockwise90,
480            _ => panic!("rotation angle is not a multiple of 90"),
481        }
482    }
483}
484
485#[derive(Debug, Clone, Copy, PartialEq, Eq)]
486pub enum Flip {
487    None = 0,
488    Vertical = 1,
489    Horizontal = 2,
490}
491
492/// Controls how the color palette index is chosen for each detected object.
493#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
494pub enum ColorMode {
495    /// Color is chosen by object class label (`det.label`). Default.
496    ///
497    /// Preserves backward compatibility and is correct for semantic
498    /// segmentation where colors carry class meaning.
499    #[default]
500    Class,
501    /// Color is chosen by instance order (loop index, zero-based).
502    ///
503    /// Each detected object gets a unique color regardless of class,
504    /// useful for instance segmentation.
505    Instance,
506    /// Color is chosen by track ID (future use; currently behaves like
507    /// [`Instance`](Self::Instance)).
508    Track,
509}
510
511impl ColorMode {
512    /// Return the palette index for a detection given its loop index and label.
513    #[inline]
514    pub fn index(self, idx: usize, label: usize) -> usize {
515        match self {
516            ColorMode::Class => label,
517            ColorMode::Instance | ColorMode::Track => idx,
518        }
519    }
520}
521
522/// Controls the resolution and coordinate frame of masks produced by
523/// [`ImageProcessor::materialize_masks`].
524///
525/// - [`Proto`](Self::Proto) returns per-detection tiles at proto-plane
526///   resolution (e.g. 48×32 u8 for a typical COCO bbox on a 160×160 proto
527///   plane). This is the historical behavior of `materialize_masks` and the
528///   fastest path because no upsample runs inside HAL. Mask values are
529///   continuous sigmoid output quantized to `uint8 [0, 255]`.
530/// - [`Scaled`](Self::Scaled) returns per-detection tiles at caller-specified
531///   pixel resolution by upsampling the full proto plane once and cropping by
532///   bbox after sigmoid. The upsample uses bilinear interpolation with
533///   edge-clamp sampling — semantically equivalent to Ultralytics'
534///   `process_masks_retina` reference. When a `letterbox` is also passed to
535///   [`materialize_masks`], the inverse letterbox transform is applied during
536///   the upsample so mask pixels land in original-content coordinates
537///   (drop-in for overlay on the original image). Mask values are binary
538///   `uint8 {0, 255}` after thresholding sigmoid > 0.5 — interchangeable
539///   with `Proto` output via the same `> 127` test.
540///
541/// [`materialize_masks`]: ImageProcessor::materialize_masks
542#[derive(Debug, Default, Clone, Copy, PartialEq, Eq)]
543pub enum MaskResolution {
544    /// Per-detection tile at proto-plane resolution (default).
545    #[default]
546    Proto,
547    /// Per-detection tile at `(width, height)` pixel resolution in the
548    /// coordinate frame determined by the `letterbox` parameter of
549    /// [`ImageProcessor::materialize_masks`].
550    Scaled {
551        /// Target pixel width of the output coordinate frame.
552        width: u32,
553        /// Target pixel height of the output coordinate frame.
554        height: u32,
555    },
556}
557
558/// Options for mask overlay rendering.
559///
560/// Controls how segmentation masks are composited onto the destination image:
561/// - `background`: when set, the background image is drawn first and masks
562///   are composited over it (result written to `dst`). When `None`, `dst` is
563///   cleared to `0x00000000` (fully transparent) before masks are drawn.
564///   **`dst` is always fully overwritten — its prior contents are never
565///   preserved.** Callers who used to pre-load an image into `dst` before
566///   calling `draw_decoded_masks` / `draw_proto_masks` must now supply that
567///   image via `background` instead (behaviour changed in v0.16.4).
568/// - `opacity`: scales the alpha of rendered mask colors. `1.0` (default)
569///   preserves the class color's alpha unchanged; `0.5` makes masks
570///   semi-transparent.
571/// - `color_mode`: controls whether colors are assigned by class label,
572///   instance index, or track ID. Defaults to [`ColorMode::Class`].
573#[derive(Debug, Clone, Copy)]
574pub struct MaskOverlay<'a> {
575    /// Compositing source image. Must have the same dimensions and pixel
576    /// format as `dst`. When `Some`, the output is `background + masks`.
577    /// When `None`, `dst` is cleared to `0x00000000` before masks are drawn.
578    pub background: Option<&'a TensorDyn>,
579    pub opacity: f32,
580    /// Normalized letterbox region `[xmin, ymin, xmax, ymax]` in model-input
581    /// space that contains actual image content (the rest is padding).
582    ///
583    /// When set, bounding boxes and mask coordinates from the decoder (which
584    /// are in model-input normalized space) are mapped back to the original
585    /// image coordinate space before rendering.
586    ///
587    /// Use [`with_letterbox_crop`](Self::with_letterbox_crop) to compute this
588    /// from the [`Crop`] that was used in the model input [`convert`](crate::ImageProcessorTrait::convert) call.
589    pub letterbox: Option<[f32; 4]>,
590    pub color_mode: ColorMode,
591}
592
593impl Default for MaskOverlay<'_> {
594    fn default() -> Self {
595        Self {
596            background: None,
597            opacity: 1.0,
598            letterbox: None,
599            color_mode: ColorMode::Class,
600        }
601    }
602}
603
604impl<'a> MaskOverlay<'a> {
605    pub fn new() -> Self {
606        Self::default()
607    }
608
609    /// Set the compositing source image.
610    ///
611    /// `bg` must have the same dimensions and pixel format as the `dst` passed
612    /// to [`draw_decoded_masks`](crate::ImageProcessorTrait::draw_decoded_masks) /
613    /// [`draw_proto_masks`](crate::ImageProcessorTrait::draw_proto_masks).
614    /// The output will be `bg + masks`. Without a background, `dst` is cleared
615    /// to `0x00000000`.
616    pub fn with_background(mut self, bg: &'a TensorDyn) -> Self {
617        self.background = Some(bg);
618        self
619    }
620
621    pub fn with_opacity(mut self, opacity: f32) -> Self {
622        self.opacity = opacity.clamp(0.0, 1.0);
623        self
624    }
625
626    pub fn with_color_mode(mut self, mode: ColorMode) -> Self {
627        self.color_mode = mode;
628        self
629    }
630
631    /// Set the letterbox transform from the [`Crop`] used when preparing the
632    /// model input, so that bounding boxes and masks are correctly mapped back
633    /// to the original image coordinate space during rendering.
634    ///
635    /// Pass the same `crop` that was given to
636    /// [`convert`](crate::ImageProcessorTrait::convert) along with the model
637    /// input dimensions (`model_w` × `model_h`).
638    ///
639    /// Has no effect when `crop.dst_rect` is `None` (no letterbox applied).
640    pub fn with_letterbox_crop(mut self, crop: &Crop, model_w: usize, model_h: usize) -> Self {
641        if let Some(r) = crop.dst_rect {
642            self.letterbox = Some([
643                r.left as f32 / model_w as f32,
644                r.top as f32 / model_h as f32,
645                (r.left + r.width) as f32 / model_w as f32,
646                (r.top + r.height) as f32 / model_h as f32,
647            ]);
648        }
649        self
650    }
651}
652
653/// Apply the inverse letterbox transform to a bounding box.
654///
655/// `letterbox` is `[lx0, ly0, lx1, ly1]` — the normalized region of the model
656/// input that contains actual image content (output of
657/// [`MaskOverlay::with_letterbox_crop`]).
658///
659/// Converts model-input-normalized coords to output-image-normalized coords,
660/// clamped to `[0.0, 1.0]`. Also canonicalises the bbox (ensures xmin ≤ xmax).
661#[inline]
662fn unletter_bbox(bbox: DetectBox, lb: [f32; 4]) -> DetectBox {
663    let b = bbox.bbox.to_canonical();
664    let [lx0, ly0, lx1, ly1] = lb;
665    let inv_w = if lx1 > lx0 { 1.0 / (lx1 - lx0) } else { 1.0 };
666    let inv_h = if ly1 > ly0 { 1.0 / (ly1 - ly0) } else { 1.0 };
667    DetectBox {
668        bbox: edgefirst_decoder::BoundingBox {
669            xmin: ((b.xmin - lx0) * inv_w).clamp(0.0, 1.0),
670            ymin: ((b.ymin - ly0) * inv_h).clamp(0.0, 1.0),
671            xmax: ((b.xmax - lx0) * inv_w).clamp(0.0, 1.0),
672            ymax: ((b.ymax - ly0) * inv_h).clamp(0.0, 1.0),
673        },
674        ..bbox
675    }
676}
677
678#[derive(Debug, Clone, Copy, PartialEq, Eq)]
679pub struct Crop {
680    pub src_rect: Option<Rect>,
681    pub dst_rect: Option<Rect>,
682    pub dst_color: Option<[u8; 4]>,
683}
684
685impl Default for Crop {
686    fn default() -> Self {
687        Crop::new()
688    }
689}
690impl Crop {
691    // Creates a new Crop with default values (no cropping).
692    pub fn new() -> Self {
693        Crop {
694            src_rect: None,
695            dst_rect: None,
696            dst_color: None,
697        }
698    }
699
700    // Sets the source rectangle for cropping.
701    pub fn with_src_rect(mut self, src_rect: Option<Rect>) -> Self {
702        self.src_rect = src_rect;
703        self
704    }
705
706    // Sets the destination rectangle for cropping.
707    pub fn with_dst_rect(mut self, dst_rect: Option<Rect>) -> Self {
708        self.dst_rect = dst_rect;
709        self
710    }
711
712    // Sets the destination color for areas outside the cropped region.
713    pub fn with_dst_color(mut self, dst_color: Option<[u8; 4]>) -> Self {
714        self.dst_color = dst_color;
715        self
716    }
717
718    // Creates a new Crop with no cropping.
719    pub fn no_crop() -> Self {
720        Crop::new()
721    }
722
723    /// Validate crop rectangles against explicit dimensions.
724    pub(crate) fn check_crop_dims(
725        &self,
726        src_w: usize,
727        src_h: usize,
728        dst_w: usize,
729        dst_h: usize,
730    ) -> Result<(), Error> {
731        let src_ok = self
732            .src_rect
733            .is_none_or(|r| r.left + r.width <= src_w && r.top + r.height <= src_h);
734        let dst_ok = self
735            .dst_rect
736            .is_none_or(|r| r.left + r.width <= dst_w && r.top + r.height <= dst_h);
737        match (src_ok, dst_ok) {
738            (true, true) => Ok(()),
739            (true, false) => Err(Error::CropInvalid(format!(
740                "Dest crop invalid: {:?}",
741                self.dst_rect
742            ))),
743            (false, true) => Err(Error::CropInvalid(format!(
744                "Src crop invalid: {:?}",
745                self.src_rect
746            ))),
747            (false, false) => Err(Error::CropInvalid(format!(
748                "Dest and Src crop invalid: {:?} {:?}",
749                self.dst_rect, self.src_rect
750            ))),
751        }
752    }
753
754    /// Validate crop rectangles against TensorDyn source and destination.
755    pub fn check_crop_dyn(
756        &self,
757        src: &edgefirst_tensor::TensorDyn,
758        dst: &edgefirst_tensor::TensorDyn,
759    ) -> Result<(), Error> {
760        self.check_crop_dims(
761            src.width().unwrap_or(0),
762            src.height().unwrap_or(0),
763            dst.width().unwrap_or(0),
764            dst.height().unwrap_or(0),
765        )
766    }
767}
768
769#[derive(Debug, Clone, Copy, PartialEq, Eq)]
770pub struct Rect {
771    pub left: usize,
772    pub top: usize,
773    pub width: usize,
774    pub height: usize,
775}
776
777impl Rect {
778    // Creates a new Rect with the specified left, top, width, and height.
779    pub fn new(left: usize, top: usize, width: usize, height: usize) -> Self {
780        Self {
781            left,
782            top,
783            width,
784            height,
785        }
786    }
787
788    // Checks if the rectangle is valid for the given TensorDyn image.
789    pub fn check_rect_dyn(&self, image: &TensorDyn) -> bool {
790        let w = image.width().unwrap_or(0);
791        let h = image.height().unwrap_or(0);
792        self.left + self.width <= w && self.top + self.height <= h
793    }
794}
795
796#[enum_dispatch(ImageProcessor)]
797pub trait ImageProcessorTrait {
798    /// Converts the source image to the destination image format and size. The
799    /// image is cropped first, then flipped, then rotated
800    ///
801    /// # Arguments
802    ///
803    /// * `dst` - The destination image to be converted to.
804    /// * `src` - The source image to convert from.
805    /// * `rotation` - The rotation to apply to the destination image.
806    /// * `flip` - Flips the image
807    /// * `crop` - An optional rectangle specifying the area to crop from the
808    ///   source image
809    ///
810    /// # Returns
811    ///
812    /// A `Result` indicating success or failure of the conversion.
813    fn convert(
814        &mut self,
815        src: &TensorDyn,
816        dst: &mut TensorDyn,
817        rotation: Rotation,
818        flip: Flip,
819        crop: Crop,
820    ) -> Result<()>;
821
822    /// Draw pre-decoded detection boxes and segmentation masks onto `dst`.
823    ///
824    /// Supports two segmentation modes based on the mask channel count:
825    /// - **Instance segmentation** (`C=1`): one `Segmentation` per detection,
826    ///   `segmentation` and `detect` are zipped.
827    /// - **Semantic segmentation** (`C>1`): a single `Segmentation` covering
828    ///   all classes; only the first element is used.
829    ///
830    /// # Format requirements
831    ///
832    /// - CPU backend: `dst` must be `RGBA` or `RGB`.
833    /// - OpenGL backend: `dst` must be `RGBA`, `BGRA`, or `RGB`.
834    /// - G2D backend: only produces the base frame (empty detections);
835    ///   returns `NotImplemented` when any detection or segmentation is
836    ///   supplied.
837    ///
838    /// # Output contract
839    ///
840    /// This function always fully writes `dst` — it never relies on the
841    /// caller having pre-cleared the destination. The four cases are:
842    ///
843    /// | detections | background | output                              |
844    /// |------------|------------|-------------------------------------|
845    /// | none       | none       | dst cleared to `0x00000000`         |
846    /// | none       | set        | dst ← background                    |
847    /// | set        | none       | masks drawn over cleared dst        |
848    /// | set        | set        | masks drawn over background         |
849    ///
850    /// Each backend implements this with its native primitives: G2D uses
851    /// `g2d_clear` / `g2d_blit`, OpenGL uses `glClear` / DMA-BUF GPU blit
852    /// plus the mask program, and CPU uses direct buffer fill / memcpy as
853    /// the terminal fallback. CPU-memcpy of DMA buffers is avoided on the
854    /// accelerated paths.
855    ///
856    /// An empty `segmentation` slice is valid — only bounding boxes are drawn.
857    ///
858    /// `overlay` controls compositing: `background` is the compositing source
859    /// (must match `dst` in size and format); `opacity` scales mask alpha.
860    ///
861    /// # Buffer aliasing
862    ///
863    /// `dst` and `overlay.background` must reference **distinct underlying
864    /// buffers**. An aliased pair returns [`Error::AliasedBuffers`] without
865    /// dispatching to any backend — the GL path would otherwise read and
866    /// write the same texture in a single draw, which is undefined behaviour
867    /// on most drivers. Aliasing is detected via
868    /// [`TensorDyn::aliases`](edgefirst_tensor::TensorDyn::aliases), which
869    /// catches both shared-allocation clones and separate imports over the
870    /// same dmabuf fd.
871    ///
872    /// # Migration from v0.16.3 and earlier
873    ///
874    /// Prior to v0.16.4 the call silently preserved `dst`'s contents on empty
875    /// detections. That invariant no longer holds — `dst` is always fully
876    /// written. Callers who pre-loaded an image into `dst` before calling this
877    /// function must now pass that image via `overlay.background` instead.
878    fn draw_decoded_masks(
879        &mut self,
880        dst: &mut TensorDyn,
881        detect: &[DetectBox],
882        segmentation: &[Segmentation],
883        overlay: MaskOverlay<'_>,
884    ) -> Result<()>;
885
886    /// Draw masks from proto data onto image (fused decode+draw).
887    ///
888    /// For YOLO segmentation models, this avoids materializing intermediate
889    /// `Array3<u8>` masks. The `ProtoData` contains mask coefficients and the
890    /// prototype tensor; the renderer computes `mask_coeff @ protos` directly
891    /// at the output resolution using bilinear sampling.
892    ///
893    /// `detect` and `proto_data.mask_coefficients` must have the same length
894    /// (enforced by zip — excess entries are silently ignored). An empty
895    /// `detect` slice is valid and produces the base frame — cleared or
896    /// background-blitted — via the selected backend's native primitive.
897    ///
898    /// # Format requirements and output contract
899    ///
900    /// Same as [`draw_decoded_masks`](Self::draw_decoded_masks), including
901    /// the "always fully writes dst" guarantee across all four
902    /// detection/background combinations.
903    ///
904    /// `overlay` controls compositing — see [`draw_decoded_masks`](Self::draw_decoded_masks).
905    fn draw_proto_masks(
906        &mut self,
907        dst: &mut TensorDyn,
908        detect: &[DetectBox],
909        proto_data: &ProtoData,
910        overlay: MaskOverlay<'_>,
911    ) -> Result<()>;
912
913    /// Sets the colors used for rendering segmentation masks. Up to 20 colors
914    /// can be set.
915    fn set_class_colors(&mut self, colors: &[[u8; 4]]) -> Result<()>;
916}
917
918/// Configuration for [`ImageProcessor`] construction.
919///
920/// Use with [`ImageProcessor::with_config`] to override the default EGL
921/// display auto-detection and backend selection. The default configuration
922/// preserves the existing auto-detection behaviour.
923#[derive(Debug, Clone, Default)]
924pub struct ImageProcessorConfig {
925    /// Force OpenGL to use this EGL display type instead of auto-detecting.
926    ///
927    /// When `None`, the processor probes displays in priority order: GBM,
928    /// PlatformDevice, Default. Use [`probe_egl_displays`] to discover
929    /// which displays are available on the current system.
930    ///
931    /// Ignored when `EDGEFIRST_DISABLE_GL=1` is set.
932    #[cfg(target_os = "linux")]
933    #[cfg(feature = "opengl")]
934    pub egl_display: Option<EglDisplayKind>,
935
936    /// Preferred compute backend.
937    ///
938    /// When set to a specific backend (not [`ComputeBackend::Auto`]), the
939    /// processor initializes that backend with no fallback — returns an error if the conversion is not supported.
940    /// This takes precedence over `EDGEFIRST_FORCE_BACKEND` and the
941    /// `EDGEFIRST_DISABLE_*` environment variables.
942    ///
943    /// - [`ComputeBackend::OpenGl`]: init OpenGL + CPU, skip G2D
944    /// - [`ComputeBackend::G2d`]: init G2D + CPU, skip OpenGL
945    /// - [`ComputeBackend::Cpu`]: init CPU only
946    /// - [`ComputeBackend::Auto`]: existing env-var-driven selection
947    pub backend: ComputeBackend,
948}
949
950/// Compute backend selection for [`ImageProcessor`].
951///
952/// Use with [`ImageProcessorConfig::backend`] to select which backend the
953/// processor should prefer. When a specific backend is selected, the
954/// processor initializes that backend plus CPU as a fallback. When `Auto`
955/// is used, the existing environment-variable-driven selection applies.
956#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
957pub enum ComputeBackend {
958    /// Auto-detect based on available hardware and environment variables.
959    #[default]
960    Auto,
961    /// CPU-only processing (no hardware acceleration).
962    Cpu,
963    /// Prefer G2D hardware blitter (+ CPU fallback).
964    G2d,
965    /// Prefer OpenGL ES (+ CPU fallback).
966    OpenGl,
967}
968
969/// Backend forced via the `EDGEFIRST_FORCE_BACKEND` environment variable
970/// or [`ImageProcessorConfig::backend`].
971///
972/// When set, the [`ImageProcessor`] only initializes and dispatches to the
973/// selected backend — no fallback chain is used.
974#[derive(Debug, Clone, Copy, PartialEq, Eq)]
975pub(crate) enum ForcedBackend {
976    Cpu,
977    G2d,
978    OpenGl,
979}
980
981/// Image converter that uses available hardware acceleration or CPU as a
982/// fallback.
983#[derive(Debug)]
984pub struct ImageProcessor {
985    /// CPU-based image converter as a fallback. This is only None if the
986    /// EDGEFIRST_DISABLE_CPU environment variable is set.
987    pub cpu: Option<CPUProcessor>,
988
989    #[cfg(target_os = "linux")]
990    /// G2D-based image converter for Linux systems. This is only available if
991    /// the EDGEFIRST_DISABLE_G2D environment variable is not set and libg2d.so
992    /// is available.
993    pub g2d: Option<G2DProcessor>,
994    #[cfg(target_os = "linux")]
995    #[cfg(feature = "opengl")]
996    /// OpenGL-based image converter for Linux systems. This is only available
997    /// if the EDGEFIRST_DISABLE_GL environment variable is not set and OpenGL
998    /// ES is available.
999    pub opengl: Option<GLProcessorThreaded>,
1000
1001    /// When set, only the specified backend is used — no fallback chain.
1002    pub(crate) forced_backend: Option<ForcedBackend>,
1003}
1004
1005unsafe impl Send for ImageProcessor {}
1006unsafe impl Sync for ImageProcessor {}
1007
1008impl ImageProcessor {
1009    /// Creates a new `ImageProcessor` instance, initializing available
1010    /// hardware converters based on the system capabilities and environment
1011    /// variables.
1012    ///
1013    /// # Examples
1014    /// ```rust,no_run
1015    /// # use edgefirst_image::{ImageProcessor, Rotation, Flip, Crop, ImageProcessorTrait, load_image};
1016    /// # use edgefirst_tensor::{PixelFormat, DType, TensorDyn};
1017    /// # fn main() -> Result<(), edgefirst_image::Error> {
1018    /// let image = std::fs::read("zidane.jpg")?;
1019    /// let src = load_image(&image, Some(PixelFormat::Rgba), None)?;
1020    /// let mut converter = ImageProcessor::new()?;
1021    /// let mut dst = converter.create_image(640, 480, PixelFormat::Rgb, DType::U8, None)?;
1022    /// converter.convert(&src, &mut dst, Rotation::None, Flip::None, Crop::default())?;
1023    /// # Ok(())
1024    /// # }
1025    /// ```
1026    pub fn new() -> Result<Self> {
1027        Self::with_config(ImageProcessorConfig::default())
1028    }
1029
1030    /// Creates a new `ImageProcessor` with the given configuration.
1031    ///
1032    /// When [`ImageProcessorConfig::backend`] is set to a specific backend,
1033    /// environment variables are ignored and the processor initializes the
1034    /// requested backend plus CPU as a fallback.
1035    ///
1036    /// When `Auto`, the existing `EDGEFIRST_FORCE_BACKEND` and
1037    /// `EDGEFIRST_DISABLE_*` environment variables apply.
1038    #[allow(unused_variables)]
1039    pub fn with_config(config: ImageProcessorConfig) -> Result<Self> {
1040        // ── Config-driven backend selection ──────────────────────────
1041        // When the caller explicitly requests a backend via the config,
1042        // skip all environment variable logic.
1043        match config.backend {
1044            ComputeBackend::Cpu => {
1045                log::info!("ComputeBackend::Cpu — CPU only");
1046                return Ok(Self {
1047                    cpu: Some(CPUProcessor::new()),
1048                    #[cfg(target_os = "linux")]
1049                    g2d: None,
1050                    #[cfg(target_os = "linux")]
1051                    #[cfg(feature = "opengl")]
1052                    opengl: None,
1053                    forced_backend: None,
1054                });
1055            }
1056            ComputeBackend::G2d => {
1057                log::info!("ComputeBackend::G2d — G2D + CPU fallback");
1058                #[cfg(target_os = "linux")]
1059                {
1060                    let g2d = match G2DProcessor::new() {
1061                        Ok(g) => Some(g),
1062                        Err(e) => {
1063                            log::warn!("G2D requested but failed to initialize: {e:?}");
1064                            None
1065                        }
1066                    };
1067                    return Ok(Self {
1068                        cpu: Some(CPUProcessor::new()),
1069                        g2d,
1070                        #[cfg(feature = "opengl")]
1071                        opengl: None,
1072                        forced_backend: None,
1073                    });
1074                }
1075                #[cfg(not(target_os = "linux"))]
1076                {
1077                    log::warn!("G2D requested but not available on this platform, using CPU");
1078                    return Ok(Self {
1079                        cpu: Some(CPUProcessor::new()),
1080                        forced_backend: None,
1081                    });
1082                }
1083            }
1084            ComputeBackend::OpenGl => {
1085                log::info!("ComputeBackend::OpenGl — OpenGL + CPU fallback");
1086                #[cfg(target_os = "linux")]
1087                {
1088                    #[cfg(feature = "opengl")]
1089                    let opengl = match GLProcessorThreaded::new(config.egl_display) {
1090                        Ok(gl) => Some(gl),
1091                        Err(e) => {
1092                            log::warn!("OpenGL requested but failed to initialize: {e:?}");
1093                            None
1094                        }
1095                    };
1096                    return Ok(Self {
1097                        cpu: Some(CPUProcessor::new()),
1098                        g2d: None,
1099                        #[cfg(feature = "opengl")]
1100                        opengl,
1101                        forced_backend: None,
1102                    });
1103                }
1104                #[cfg(not(target_os = "linux"))]
1105                {
1106                    log::warn!("OpenGL requested but not available on this platform, using CPU");
1107                    return Ok(Self {
1108                        cpu: Some(CPUProcessor::new()),
1109                        forced_backend: None,
1110                    });
1111                }
1112            }
1113            ComputeBackend::Auto => { /* fall through to env-var logic below */ }
1114        }
1115
1116        // ── EDGEFIRST_FORCE_BACKEND ──────────────────────────────────
1117        // When set, only the requested backend is initialised and no
1118        // fallback chain is used. Accepted values (case-insensitive):
1119        //   "cpu", "g2d", "opengl"
1120        if let Ok(val) = std::env::var("EDGEFIRST_FORCE_BACKEND") {
1121            let val_lower = val.to_lowercase();
1122            let forced = match val_lower.as_str() {
1123                "cpu" => ForcedBackend::Cpu,
1124                "g2d" => ForcedBackend::G2d,
1125                "opengl" => ForcedBackend::OpenGl,
1126                other => {
1127                    return Err(Error::ForcedBackendUnavailable(format!(
1128                        "unknown EDGEFIRST_FORCE_BACKEND value: {other:?} (expected cpu, g2d, or opengl)"
1129                    )));
1130                }
1131            };
1132
1133            log::info!("EDGEFIRST_FORCE_BACKEND={val} — only initializing {val_lower} backend");
1134
1135            return match forced {
1136                ForcedBackend::Cpu => Ok(Self {
1137                    cpu: Some(CPUProcessor::new()),
1138                    #[cfg(target_os = "linux")]
1139                    g2d: None,
1140                    #[cfg(target_os = "linux")]
1141                    #[cfg(feature = "opengl")]
1142                    opengl: None,
1143                    forced_backend: Some(ForcedBackend::Cpu),
1144                }),
1145                ForcedBackend::G2d => {
1146                    #[cfg(target_os = "linux")]
1147                    {
1148                        let g2d = G2DProcessor::new().map_err(|e| {
1149                            Error::ForcedBackendUnavailable(format!(
1150                                "g2d forced but failed to initialize: {e:?}"
1151                            ))
1152                        })?;
1153                        Ok(Self {
1154                            cpu: None,
1155                            g2d: Some(g2d),
1156                            #[cfg(feature = "opengl")]
1157                            opengl: None,
1158                            forced_backend: Some(ForcedBackend::G2d),
1159                        })
1160                    }
1161                    #[cfg(not(target_os = "linux"))]
1162                    {
1163                        Err(Error::ForcedBackendUnavailable(
1164                            "g2d backend is only available on Linux".into(),
1165                        ))
1166                    }
1167                }
1168                ForcedBackend::OpenGl => {
1169                    #[cfg(target_os = "linux")]
1170                    #[cfg(feature = "opengl")]
1171                    {
1172                        let opengl = GLProcessorThreaded::new(config.egl_display).map_err(|e| {
1173                            Error::ForcedBackendUnavailable(format!(
1174                                "opengl forced but failed to initialize: {e:?}"
1175                            ))
1176                        })?;
1177                        Ok(Self {
1178                            cpu: None,
1179                            g2d: None,
1180                            opengl: Some(opengl),
1181                            forced_backend: Some(ForcedBackend::OpenGl),
1182                        })
1183                    }
1184                    #[cfg(not(all(target_os = "linux", feature = "opengl")))]
1185                    {
1186                        Err(Error::ForcedBackendUnavailable(
1187                            "opengl backend requires Linux with the 'opengl' feature enabled"
1188                                .into(),
1189                        ))
1190                    }
1191                }
1192            };
1193        }
1194
1195        // ── Existing DISABLE logic (unchanged) ──────────────────────
1196        #[cfg(target_os = "linux")]
1197        let g2d = if std::env::var("EDGEFIRST_DISABLE_G2D")
1198            .map(|x| x != "0" && x.to_lowercase() != "false")
1199            .unwrap_or(false)
1200        {
1201            log::debug!("EDGEFIRST_DISABLE_G2D is set");
1202            None
1203        } else {
1204            match G2DProcessor::new() {
1205                Ok(g2d_converter) => Some(g2d_converter),
1206                Err(err) => {
1207                    log::warn!("Failed to initialize G2D converter: {err:?}");
1208                    None
1209                }
1210            }
1211        };
1212
1213        #[cfg(target_os = "linux")]
1214        #[cfg(feature = "opengl")]
1215        let opengl = if std::env::var("EDGEFIRST_DISABLE_GL")
1216            .map(|x| x != "0" && x.to_lowercase() != "false")
1217            .unwrap_or(false)
1218        {
1219            log::debug!("EDGEFIRST_DISABLE_GL is set");
1220            None
1221        } else {
1222            match GLProcessorThreaded::new(config.egl_display) {
1223                Ok(gl_converter) => Some(gl_converter),
1224                Err(err) => {
1225                    log::warn!("Failed to initialize GL converter: {err:?}");
1226                    None
1227                }
1228            }
1229        };
1230
1231        let cpu = if std::env::var("EDGEFIRST_DISABLE_CPU")
1232            .map(|x| x != "0" && x.to_lowercase() != "false")
1233            .unwrap_or(false)
1234        {
1235            log::debug!("EDGEFIRST_DISABLE_CPU is set");
1236            None
1237        } else {
1238            Some(CPUProcessor::new())
1239        };
1240        Ok(Self {
1241            cpu,
1242            #[cfg(target_os = "linux")]
1243            g2d,
1244            #[cfg(target_os = "linux")]
1245            #[cfg(feature = "opengl")]
1246            opengl,
1247            forced_backend: None,
1248        })
1249    }
1250
1251    /// Sets the interpolation mode for int8 proto textures on the OpenGL
1252    /// backend. No-op if OpenGL is not available.
1253    #[cfg(target_os = "linux")]
1254    #[cfg(feature = "opengl")]
1255    pub fn set_int8_interpolation_mode(&mut self, mode: Int8InterpolationMode) -> Result<()> {
1256        if let Some(ref mut gl) = self.opengl {
1257            gl.set_int8_interpolation_mode(mode)?;
1258        }
1259        Ok(())
1260    }
1261
1262    /// Create a [`TensorDyn`] image with the best available memory backend.
1263    ///
1264    /// Priority: DMA-buf → PBO (byte-sized types: u8, i8) → system memory.
1265    ///
1266    /// Use this method instead of [`TensorDyn::image()`] when the tensor will
1267    /// be used with [`ImageProcessor::convert()`]. It selects the optimal
1268    /// memory backing (including PBO for GPU zero-copy) which direct
1269    /// allocation cannot achieve.
1270    ///
1271    /// This method is on [`ImageProcessor`] rather than [`ImageProcessorTrait`]
1272    /// because optimal allocation requires knowledge of the active compute
1273    /// backends (e.g. the GL context handle for PBO allocation). Individual
1274    /// backend implementations ([`CPUProcessor`], etc.) do not have this
1275    /// cross-backend visibility.
1276    ///
1277    /// # Arguments
1278    ///
1279    /// * `width` - Image width in pixels
1280    /// * `height` - Image height in pixels
1281    /// * `format` - Pixel format
1282    /// * `dtype` - Element data type (e.g. `DType::U8`, `DType::I8`)
1283    /// * `memory` - Optional memory type override; when `None`, the best
1284    ///   available backend is selected automatically.
1285    ///
1286    /// # Returns
1287    ///
1288    /// A [`TensorDyn`] backed by the highest-performance memory type
1289    /// available on this system.
1290    ///
1291    /// # Pitch alignment for DMA-backed allocations
1292    ///
1293    /// DMA-BUF imports into the GL backend (Mali Valhall on i.MX 95
1294    /// specifically) require every row pitch to be a multiple of
1295    /// [`GPU_DMA_BUF_PITCH_ALIGNMENT_BYTES`] (currently 64). When this
1296    /// method lands on `TensorMemory::Dma`, the underlying allocation is
1297    /// silently padded so the row stride satisfies that requirement.
1298    ///
1299    /// **The user-requested `width` is preserved** — `tensor.width()`
1300    /// returns the same value you passed in. The padding is carried by
1301    /// [`TensorDyn::row_stride`] / `effective_row_stride()`, which the
1302    /// GL backend reads when importing the buffer as an EGLImage.
1303    /// Callers that compute byte offsets from the tensor must use the
1304    /// stride, not `width × bytes_per_pixel`; the CPU mapping spans the
1305    /// full `stride × height` bytes.
1306    ///
1307    /// Pre-aligned widths (640, 1280, 1920, 3008, 3840 …) allocate
1308    /// exactly `width × bpp × height` bytes with no padding. PBO and
1309    /// Mem fallbacks never pad — they don't go through EGLImage import.
1310    ///
1311    /// See also [`align_width_for_gpu_pitch`] for an advisory helper
1312    /// that external callers (GStreamer plugins, video pipelines) can
1313    /// use to size their own DMA-BUFs for GL compatibility.
1314    ///
1315    /// # Errors
1316    ///
1317    /// Returns an error if all allocation strategies fail.
1318    pub fn create_image(
1319        &self,
1320        width: usize,
1321        height: usize,
1322        format: PixelFormat,
1323        dtype: DType,
1324        memory: Option<TensorMemory>,
1325    ) -> Result<TensorDyn> {
1326        // Compute the GPU-aligned row stride in bytes for this image.
1327        // `None` means either the format has no defined primary-plane bpp
1328        // (unknown future layout) or the stride calculation would overflow
1329        // — in both cases we fall back to the natural layout via the plain
1330        // `TensorDyn::image` constructor, and the slow-path warning inside
1331        // `draw_*_masks` will fire if the subsequent GL import fails.
1332        //
1333        // DMA allocation is Linux-only (see `TensorMemory::Dma` cfg gate),
1334        // so both the stride computation and the helper closure are gated
1335        // accordingly — the callers below are already Linux-only.
1336        #[cfg(target_os = "linux")]
1337        let dma_stride_bytes: Option<usize> = primary_plane_bpp(format, dtype.size())
1338            .and_then(|bpp| width.checked_mul(bpp))
1339            .and_then(align_pitch_bytes_to_gpu_alignment);
1340
1341        // Helper: allocate a DMA image, using the padded-stride constructor
1342        // when the computed stride exceeds the natural pitch, otherwise the
1343        // plain constructor (byte-identical result in the common case).
1344        #[cfg(target_os = "linux")]
1345        let try_dma = || -> Result<TensorDyn> {
1346            // Stride padding is only meaningful for packed pixel layouts
1347            // (RGBA8, BGRA8, RGB888, Grey) — the formats the GL backend
1348            // renders into. Semi-planar (NV12, NV16) and planar (PlanarRgb,
1349            // PlanarRgba) tensors go through `TensorDyn::image(...)` with
1350            // their natural layout; they're imported from camera capture
1351            // via `from_fd` far more often than allocated here, and
1352            // `Tensor::image_with_stride` explicitly rejects them.
1353            let packed = format.layout() == edgefirst_tensor::PixelLayout::Packed;
1354            match dma_stride_bytes {
1355                Some(stride)
1356                    if packed
1357                        && primary_plane_bpp(format, dtype.size())
1358                            .and_then(|bpp| width.checked_mul(bpp))
1359                            .is_some_and(|natural| stride > natural) =>
1360                {
1361                    log::debug!(
1362                        "create_image: padding row stride for {format:?} {width}x{height} \
1363                         from natural pitch to {stride} bytes for GPU alignment"
1364                    );
1365                    Ok(TensorDyn::image_with_stride(
1366                        width,
1367                        height,
1368                        format,
1369                        dtype,
1370                        stride,
1371                        Some(edgefirst_tensor::TensorMemory::Dma),
1372                    )?)
1373                }
1374                _ => Ok(TensorDyn::image(
1375                    width,
1376                    height,
1377                    format,
1378                    dtype,
1379                    Some(edgefirst_tensor::TensorMemory::Dma),
1380                )?),
1381            }
1382        };
1383
1384        // If an explicit memory type is requested, honour it directly.
1385        // On Linux, `TensorMemory::Dma` gets the padded-stride treatment;
1386        // other memory types take the user-requested width verbatim.
1387        match memory {
1388            #[cfg(target_os = "linux")]
1389            Some(TensorMemory::Dma) => {
1390                return try_dma();
1391            }
1392            Some(mem) => {
1393                return Ok(TensorDyn::image(width, height, format, dtype, Some(mem))?);
1394            }
1395            None => {}
1396        }
1397
1398        // Try DMA first on Linux — skip only when GL has explicitly selected PBO
1399        // as the preferred transfer path (PBO is better than DMA in that case).
1400        #[cfg(target_os = "linux")]
1401        {
1402            #[cfg(feature = "opengl")]
1403            let gl_uses_pbo = self
1404                .opengl
1405                .as_ref()
1406                .is_some_and(|gl| gl.transfer_backend() == opengl_headless::TransferBackend::Pbo);
1407            #[cfg(not(feature = "opengl"))]
1408            let gl_uses_pbo = false;
1409
1410            if !gl_uses_pbo {
1411                if let Ok(img) = try_dma() {
1412                    return Ok(img);
1413                }
1414            }
1415        }
1416
1417        // Try PBO (if GL available).
1418        // PBO buffers are u8-sized; the int8 shader emulates i8 output via
1419        // XOR 0x80 on the same underlying buffer, so both U8 and I8 work.
1420        #[cfg(target_os = "linux")]
1421        #[cfg(feature = "opengl")]
1422        if dtype.size() == 1 {
1423            if let Some(gl) = &self.opengl {
1424                match gl.create_pbo_image(width, height, format) {
1425                    Ok(t) => {
1426                        if dtype == DType::I8 {
1427                            // SAFETY: Tensor<u8> and Tensor<i8> are layout-
1428                            // identical (same element size, no T-dependent
1429                            // drop glue). The int8 shader applies XOR 0x80
1430                            // on the same PBO buffer. Same rationale as
1431                            // gl::processor::tensor_i8_as_u8_mut.
1432                            // Invariant: PBO tensors never have chroma
1433                            // (create_pbo_image → Tensor::wrap sets it None).
1434                            debug_assert!(
1435                                t.chroma().is_none(),
1436                                "PBO i8 transmute requires chroma == None"
1437                            );
1438                            let t_i8: Tensor<i8> = unsafe { std::mem::transmute(t) };
1439                            return Ok(TensorDyn::from(t_i8));
1440                        }
1441                        return Ok(TensorDyn::from(t));
1442                    }
1443                    Err(e) => log::debug!("PBO image creation failed, falling back to Mem: {e:?}"),
1444                }
1445            }
1446        }
1447
1448        // Fallback to Mem
1449        Ok(TensorDyn::image(
1450            width,
1451            height,
1452            format,
1453            dtype,
1454            Some(edgefirst_tensor::TensorMemory::Mem),
1455        )?)
1456    }
1457
1458    /// Import an external DMA-BUF image.
1459    ///
1460    /// Each [`PlaneDescriptor`] owns an already-duped fd; this method
1461    /// consumes the descriptors and takes ownership of those fds (whether
1462    /// the call succeeds or fails).
1463    ///
1464    /// The caller must ensure the DMA-BUF allocation is large enough for the
1465    /// specified width, height, format, and any stride/offset on the plane
1466    /// descriptors. No buffer-size validation is performed; an undersized
1467    /// buffer may cause GPU faults or EGL import failure.
1468    ///
1469    /// # Arguments
1470    ///
1471    /// * `image` - Plane descriptor for the primary (or only) plane
1472    /// * `chroma` - Optional plane descriptor for the UV chroma plane
1473    ///   (required for multiplane NV12)
1474    /// * `width` - Image width in pixels
1475    /// * `height` - Image height in pixels
1476    /// * `format` - Pixel format of the buffer
1477    /// * `dtype` - Element data type (e.g. `DType::U8`)
1478    ///
1479    /// # Returns
1480    ///
1481    /// A `TensorDyn` configured as an image.
1482    ///
1483    /// # Errors
1484    ///
1485    /// * [`Error::NotSupported`] if `chroma` is `Some` for a non-semi-planar
1486    ///   format, or multiplane NV16 (not yet supported), or the fd is not
1487    ///   DMA-backed
1488    /// * [`Error::InvalidShape`] if NV12 height is odd
1489    ///
1490    /// # Platform
1491    ///
1492    /// Linux only.
1493    ///
1494    /// # Examples
1495    ///
1496    /// ```rust,ignore
1497    /// use edgefirst_tensor::PlaneDescriptor;
1498    ///
1499    /// // Single-plane RGBA
1500    /// let pd = PlaneDescriptor::new(fd.as_fd())?;
1501    /// let src = proc.import_image(pd, None, 1920, 1080, PixelFormat::Rgba, DType::U8)?;
1502    ///
1503    /// // Multi-plane NV12 with stride
1504    /// let y_pd = PlaneDescriptor::new(y_fd.as_fd())?.with_stride(2048);
1505    /// let uv_pd = PlaneDescriptor::new(uv_fd.as_fd())?.with_stride(2048);
1506    /// let src = proc.import_image(y_pd, Some(uv_pd), 1920, 1080,
1507    ///                             PixelFormat::Nv12, DType::U8)?;
1508    /// ```
1509    #[cfg(target_os = "linux")]
1510    pub fn import_image(
1511        &self,
1512        image: edgefirst_tensor::PlaneDescriptor,
1513        chroma: Option<edgefirst_tensor::PlaneDescriptor>,
1514        width: usize,
1515        height: usize,
1516        format: PixelFormat,
1517        dtype: DType,
1518    ) -> Result<TensorDyn> {
1519        use edgefirst_tensor::{Tensor, TensorMemory};
1520
1521        // Capture stride/offset from descriptors before consuming them
1522        let image_stride = image.stride();
1523        let image_offset = image.offset();
1524        let chroma_stride = chroma.as_ref().and_then(|c| c.stride());
1525        let chroma_offset = chroma.as_ref().and_then(|c| c.offset());
1526
1527        if let Some(chroma_pd) = chroma {
1528            // ── Multiplane path ──────────────────────────────────────
1529            // Multiplane tensors are backed by Tensor<u8> (or transmuted to
1530            // Tensor<i8>). Reject other dtypes to avoid silently returning a
1531            // tensor with the wrong element type.
1532            if dtype != DType::U8 && dtype != DType::I8 {
1533                return Err(Error::NotSupported(format!(
1534                    "multiplane import only supports U8/I8, got {dtype:?}"
1535                )));
1536            }
1537            if format.layout() != PixelLayout::SemiPlanar {
1538                return Err(Error::NotSupported(format!(
1539                    "import_image with chroma requires a semi-planar format, got {format:?}"
1540                )));
1541            }
1542
1543            let chroma_h = match format {
1544                PixelFormat::Nv12 => {
1545                    if !height.is_multiple_of(2) {
1546                        return Err(Error::InvalidShape(format!(
1547                            "NV12 requires even height, got {height}"
1548                        )));
1549                    }
1550                    height / 2
1551                }
1552                // NV16 multiplane will be supported in a future release;
1553                // the GL backend currently only handles NV12 plane1 attributes.
1554                PixelFormat::Nv16 => {
1555                    return Err(Error::NotSupported(
1556                        "multiplane NV16 is not yet supported; use contiguous NV16 instead".into(),
1557                    ))
1558                }
1559                _ => {
1560                    return Err(Error::NotSupported(format!(
1561                        "unsupported semi-planar format: {format:?}"
1562                    )))
1563                }
1564            };
1565
1566            let luma = Tensor::<u8>::from_fd(image.into_fd(), &[height, width], Some("luma"))?;
1567            if luma.memory() != TensorMemory::Dma {
1568                return Err(Error::NotSupported(format!(
1569                    "luma fd must be DMA-backed, got {:?}",
1570                    luma.memory()
1571                )));
1572            }
1573
1574            let chroma_tensor =
1575                Tensor::<u8>::from_fd(chroma_pd.into_fd(), &[chroma_h, width], Some("chroma"))?;
1576            if chroma_tensor.memory() != TensorMemory::Dma {
1577                return Err(Error::NotSupported(format!(
1578                    "chroma fd must be DMA-backed, got {:?}",
1579                    chroma_tensor.memory()
1580                )));
1581            }
1582
1583            // from_planes creates the combined tensor with format set,
1584            // preserving luma's row_stride (currently None since luma was raw).
1585            let mut tensor = Tensor::<u8>::from_planes(luma, chroma_tensor, format)?;
1586
1587            // Apply stride/offset to the combined tensor (luma plane)
1588            if let Some(s) = image_stride {
1589                tensor.set_row_stride(s)?;
1590            }
1591            if let Some(o) = image_offset {
1592                tensor.set_plane_offset(o);
1593            }
1594
1595            // Apply stride/offset to the chroma sub-tensor.
1596            // The chroma tensor is a raw 2D [chroma_h, width] tensor without
1597            // format metadata, so we validate stride manually rather than
1598            // using set_row_stride (which requires format).
1599            if let Some(chroma_ref) = tensor.chroma_mut() {
1600                if let Some(s) = chroma_stride {
1601                    if s < width {
1602                        return Err(Error::InvalidShape(format!(
1603                            "chroma stride {s} < minimum {width} for {format:?}"
1604                        )));
1605                    }
1606                    chroma_ref.set_row_stride_unchecked(s);
1607                }
1608                if let Some(o) = chroma_offset {
1609                    chroma_ref.set_plane_offset(o);
1610                }
1611            }
1612
1613            if dtype == DType::I8 {
1614                // SAFETY: Tensor<u8> and Tensor<i8> have identical layout because
1615                // the struct contains only type-erased storage (OwnedFd, shape, name),
1616                // no inline T values. This assertion catches layout drift at compile time.
1617                const {
1618                    assert!(std::mem::size_of::<Tensor<u8>>() == std::mem::size_of::<Tensor<i8>>());
1619                    assert!(
1620                        std::mem::align_of::<Tensor<u8>>() == std::mem::align_of::<Tensor<i8>>()
1621                    );
1622                }
1623                let tensor_i8: Tensor<i8> = unsafe { std::mem::transmute(tensor) };
1624                return Ok(TensorDyn::from(tensor_i8));
1625            }
1626            Ok(TensorDyn::from(tensor))
1627        } else {
1628            // ── Single-plane path ────────────────────────────────────
1629            let shape = match format.layout() {
1630                PixelLayout::Packed => vec![height, width, format.channels()],
1631                PixelLayout::Planar => vec![format.channels(), height, width],
1632                PixelLayout::SemiPlanar => {
1633                    let total_h = match format {
1634                        PixelFormat::Nv12 => {
1635                            if !height.is_multiple_of(2) {
1636                                return Err(Error::InvalidShape(format!(
1637                                    "NV12 requires even height, got {height}"
1638                                )));
1639                            }
1640                            height * 3 / 2
1641                        }
1642                        PixelFormat::Nv16 => height * 2,
1643                        _ => {
1644                            return Err(Error::InvalidShape(format!(
1645                                "unknown semi-planar height multiplier for {format:?}"
1646                            )))
1647                        }
1648                    };
1649                    vec![total_h, width]
1650                }
1651                _ => {
1652                    return Err(Error::NotSupported(format!(
1653                        "unsupported pixel layout for import_image: {:?}",
1654                        format.layout()
1655                    )));
1656                }
1657            };
1658            let tensor = TensorDyn::from_fd(image.into_fd(), &shape, dtype, None)?;
1659            if tensor.memory() != TensorMemory::Dma {
1660                return Err(Error::NotSupported(format!(
1661                    "import_image requires DMA-backed fd, got {:?}",
1662                    tensor.memory()
1663                )));
1664            }
1665            let mut tensor = tensor.with_format(format)?;
1666            if let Some(s) = image_stride {
1667                tensor.set_row_stride(s)?;
1668            }
1669            if let Some(o) = image_offset {
1670                tensor.set_plane_offset(o);
1671            }
1672            Ok(tensor)
1673        }
1674    }
1675
1676    /// Decode model outputs and draw segmentation masks onto `dst`.
1677    ///
1678    /// This is the primary mask rendering API. The processor decodes via the
1679    /// provided [`Decoder`], selects the optimal rendering path (hybrid
1680    /// CPU+GL or fused GPU), and composites masks onto `dst`.
1681    ///
1682    /// Returns the detected bounding boxes.
1683    pub fn draw_masks(
1684        &mut self,
1685        decoder: &edgefirst_decoder::Decoder,
1686        outputs: &[&TensorDyn],
1687        dst: &mut TensorDyn,
1688        overlay: MaskOverlay<'_>,
1689    ) -> Result<Vec<DetectBox>> {
1690        let mut output_boxes = Vec::with_capacity(100);
1691
1692        // Try proto path first (fused rendering without materializing masks)
1693        let proto_result = decoder
1694            .decode_proto(outputs, &mut output_boxes)
1695            .map_err(|e| Error::Internal(format!("decode_proto: {e:#?}")))?;
1696
1697        if let Some(proto_data) = proto_result {
1698            self.draw_proto_masks(dst, &output_boxes, &proto_data, overlay)?;
1699        } else {
1700            // Detection-only or unsupported model: full decode + render
1701            let mut output_masks = Vec::with_capacity(100);
1702            decoder
1703                .decode(outputs, &mut output_boxes, &mut output_masks)
1704                .map_err(|e| Error::Internal(format!("decode: {e:#?}")))?;
1705            self.draw_decoded_masks(dst, &output_boxes, &output_masks, overlay)?;
1706        }
1707        Ok(output_boxes)
1708    }
1709
1710    /// Decode tracked model outputs and draw segmentation masks onto `dst`.
1711    ///
1712    /// Like [`draw_masks`](Self::draw_masks) but integrates a tracker for
1713    /// maintaining object identities across frames. The tracker runs after
1714    /// NMS but before mask extraction.
1715    ///
1716    /// Returns detected boxes and track info.
1717    #[cfg(feature = "tracker")]
1718    pub fn draw_masks_tracked<TR: edgefirst_tracker::Tracker<DetectBox>>(
1719        &mut self,
1720        decoder: &edgefirst_decoder::Decoder,
1721        tracker: &mut TR,
1722        timestamp: u64,
1723        outputs: &[&TensorDyn],
1724        dst: &mut TensorDyn,
1725        overlay: MaskOverlay<'_>,
1726    ) -> Result<(Vec<DetectBox>, Vec<edgefirst_tracker::TrackInfo>)> {
1727        let mut output_boxes = Vec::with_capacity(100);
1728        let mut output_tracks = Vec::new();
1729
1730        let proto_result = decoder
1731            .decode_proto_tracked(
1732                tracker,
1733                timestamp,
1734                outputs,
1735                &mut output_boxes,
1736                &mut output_tracks,
1737            )
1738            .map_err(|e| Error::Internal(format!("decode_proto_tracked: {e:#?}")))?;
1739
1740        if let Some(proto_data) = proto_result {
1741            self.draw_proto_masks(dst, &output_boxes, &proto_data, overlay)?;
1742        } else {
1743            // Note: decode_proto_tracked returns None for detection-only/ModelPack
1744            // models WITHOUT calling the tracker. The else branch below is the
1745            // first (and only) tracker call for those model types.
1746            let mut output_masks = Vec::with_capacity(100);
1747            decoder
1748                .decode_tracked(
1749                    tracker,
1750                    timestamp,
1751                    outputs,
1752                    &mut output_boxes,
1753                    &mut output_masks,
1754                    &mut output_tracks,
1755                )
1756                .map_err(|e| Error::Internal(format!("decode_tracked: {e:#?}")))?;
1757            self.draw_decoded_masks(dst, &output_boxes, &output_masks, overlay)?;
1758        }
1759        Ok((output_boxes, output_tracks))
1760    }
1761
1762    /// Materialize per-instance segmentation masks from raw prototype data.
1763    ///
1764    /// Computes `mask_coeff @ protos` with sigmoid activation for each detection,
1765    /// producing compact masks at prototype resolution (e.g., 160×160 crops).
1766    /// Mask values are continuous sigmoid confidence outputs quantized to u8
1767    /// (0 = background, 255 = full confidence), NOT binary thresholded.
1768    ///
1769    /// The returned [`Vec<Segmentation>`] can be:
1770    /// - Inspected or exported for analytics, IoU computation, etc.
1771    /// - Passed directly to [`ImageProcessorTrait::draw_decoded_masks`] for
1772    ///   GPU-interpolated rendering.
1773    ///
1774    /// # Performance Note
1775    ///
1776    /// Calling `materialize_masks` + `draw_decoded_masks` separately prevents
1777    /// the HAL from using its internal fused optimization path. For render-only
1778    /// use cases, prefer [`ImageProcessorTrait::draw_proto_masks`] which selects
1779    /// the fastest path automatically (currently 1.6×–27× faster on tested
1780    /// platforms). Use this method when you need access to the intermediate masks.
1781    ///
1782    /// # Errors
1783    ///
1784    /// Returns [`Error::NoConverter`] if the CPU backend is not available.
1785    pub fn materialize_masks(
1786        &mut self,
1787        detect: &[DetectBox],
1788        proto_data: &ProtoData,
1789        letterbox: Option<[f32; 4]>,
1790        resolution: MaskResolution,
1791    ) -> Result<Vec<Segmentation>> {
1792        let cpu = self.cpu.as_mut().ok_or(Error::NoConverter)?;
1793        match resolution {
1794            MaskResolution::Proto => cpu.materialize_segmentations(detect, proto_data, letterbox),
1795            MaskResolution::Scaled { width, height } => {
1796                cpu.materialize_scaled_segmentations(detect, proto_data, letterbox, width, height)
1797            }
1798        }
1799    }
1800}
1801
1802impl ImageProcessorTrait for ImageProcessor {
1803    /// Converts the source image to the destination image format and size. The
1804    /// image is cropped first, then flipped, then rotated
1805    ///
1806    /// Prefer hardware accelerators when available, falling back to CPU if
1807    /// necessary.
1808    fn convert(
1809        &mut self,
1810        src: &TensorDyn,
1811        dst: &mut TensorDyn,
1812        rotation: Rotation,
1813        flip: Flip,
1814        crop: Crop,
1815    ) -> Result<()> {
1816        let start = Instant::now();
1817        let src_fmt = src.format();
1818        let dst_fmt = dst.format();
1819        let _span = tracing::trace_span!(
1820            "image_convert",
1821            ?src_fmt,
1822            ?dst_fmt,
1823            src_memory = ?src.memory(),
1824            dst_memory = ?dst.memory(),
1825            ?rotation,
1826            ?flip,
1827        )
1828        .entered();
1829        log::trace!(
1830            "convert: {src_fmt:?}({:?}/{:?}) → {dst_fmt:?}({:?}/{:?}), \
1831             rotation={rotation:?}, flip={flip:?}, backend={:?}",
1832            src.dtype(),
1833            src.memory(),
1834            dst.dtype(),
1835            dst.memory(),
1836            self.forced_backend,
1837        );
1838
1839        // ── Forced backend: no fallback chain ────────────────────────
1840        if let Some(forced) = self.forced_backend {
1841            return match forced {
1842                ForcedBackend::Cpu => {
1843                    if let Some(cpu) = self.cpu.as_mut() {
1844                        let r = cpu.convert(src, dst, rotation, flip, crop);
1845                        log::trace!(
1846                            "convert: forced=cpu result={} ({:?})",
1847                            if r.is_ok() { "ok" } else { "err" },
1848                            start.elapsed()
1849                        );
1850                        return r;
1851                    }
1852                    Err(Error::ForcedBackendUnavailable("cpu".into()))
1853                }
1854                ForcedBackend::G2d => {
1855                    #[cfg(target_os = "linux")]
1856                    if let Some(g2d) = self.g2d.as_mut() {
1857                        let r = g2d.convert(src, dst, rotation, flip, crop);
1858                        log::trace!(
1859                            "convert: forced=g2d result={} ({:?})",
1860                            if r.is_ok() { "ok" } else { "err" },
1861                            start.elapsed()
1862                        );
1863                        return r;
1864                    }
1865                    Err(Error::ForcedBackendUnavailable("g2d".into()))
1866                }
1867                ForcedBackend::OpenGl => {
1868                    #[cfg(target_os = "linux")]
1869                    #[cfg(feature = "opengl")]
1870                    if let Some(opengl) = self.opengl.as_mut() {
1871                        let r = opengl.convert(src, dst, rotation, flip, crop);
1872                        log::trace!(
1873                            "convert: forced=opengl result={} ({:?})",
1874                            if r.is_ok() { "ok" } else { "err" },
1875                            start.elapsed()
1876                        );
1877                        return r;
1878                    }
1879                    Err(Error::ForcedBackendUnavailable("opengl".into()))
1880                }
1881            };
1882        }
1883
1884        // ── Auto fallback chain: OpenGL → G2D → CPU ──────────────────
1885        #[cfg(target_os = "linux")]
1886        #[cfg(feature = "opengl")]
1887        if let Some(opengl) = self.opengl.as_mut() {
1888            match opengl.convert(src, dst, rotation, flip, crop) {
1889                Ok(_) => {
1890                    log::trace!(
1891                        "convert: auto selected=opengl for {src_fmt:?}→{dst_fmt:?} ({:?})",
1892                        start.elapsed()
1893                    );
1894                    return Ok(());
1895                }
1896                Err(e) => {
1897                    log::trace!("convert: auto opengl declined {src_fmt:?}→{dst_fmt:?}: {e}");
1898                }
1899            }
1900        }
1901
1902        #[cfg(target_os = "linux")]
1903        if let Some(g2d) = self.g2d.as_mut() {
1904            match g2d.convert(src, dst, rotation, flip, crop) {
1905                Ok(_) => {
1906                    log::trace!(
1907                        "convert: auto selected=g2d for {src_fmt:?}→{dst_fmt:?} ({:?})",
1908                        start.elapsed()
1909                    );
1910                    return Ok(());
1911                }
1912                Err(e) => {
1913                    log::trace!("convert: auto g2d declined {src_fmt:?}→{dst_fmt:?}: {e}");
1914                }
1915            }
1916        }
1917
1918        if let Some(cpu) = self.cpu.as_mut() {
1919            match cpu.convert(src, dst, rotation, flip, crop) {
1920                Ok(_) => {
1921                    log::trace!(
1922                        "convert: auto selected=cpu for {src_fmt:?}→{dst_fmt:?} ({:?})",
1923                        start.elapsed()
1924                    );
1925                    return Ok(());
1926                }
1927                Err(e) => {
1928                    log::trace!("convert: auto cpu failed {src_fmt:?}→{dst_fmt:?}: {e}");
1929                    return Err(e);
1930                }
1931            }
1932        }
1933        Err(Error::NoConverter)
1934    }
1935
1936    fn draw_decoded_masks(
1937        &mut self,
1938        dst: &mut TensorDyn,
1939        detect: &[DetectBox],
1940        segmentation: &[Segmentation],
1941        overlay: MaskOverlay<'_>,
1942    ) -> Result<()> {
1943        let _span = tracing::trace_span!(
1944            "draw_masks",
1945            n_detections = detect.len(),
1946            n_segmentations = segmentation.len(),
1947        )
1948        .entered();
1949        let start = Instant::now();
1950
1951        if let Some(bg) = overlay.background {
1952            if bg.aliases(dst) {
1953                return Err(Error::AliasedBuffers(
1954                    "background must not reference the same buffer as dst".to_string(),
1955                ));
1956            }
1957        }
1958
1959        // Un-letterbox detect boxes and segmentation bboxes for rendering when
1960        // a letterbox was applied to prepare the model input.
1961        let lb_boxes: Vec<DetectBox>;
1962        let lb_segs: Vec<Segmentation>;
1963        let (detect, segmentation) = if let Some(lb) = overlay.letterbox {
1964            lb_boxes = detect.iter().map(|&d| unletter_bbox(d, lb)).collect();
1965            // Keep segmentation bboxes in sync with the transformed detect boxes
1966            // when we have a 1:1 correspondence (instance segmentation).
1967            lb_segs = if segmentation.len() == lb_boxes.len() {
1968                segmentation
1969                    .iter()
1970                    .zip(lb_boxes.iter())
1971                    .map(|(s, d)| Segmentation {
1972                        xmin: d.bbox.xmin,
1973                        ymin: d.bbox.ymin,
1974                        xmax: d.bbox.xmax,
1975                        ymax: d.bbox.ymax,
1976                        segmentation: s.segmentation.clone(),
1977                    })
1978                    .collect()
1979            } else {
1980                segmentation.to_vec()
1981            };
1982            (lb_boxes.as_slice(), lb_segs.as_slice())
1983        } else {
1984            (detect, segmentation)
1985        };
1986        #[cfg(target_os = "linux")]
1987        let is_empty_frame = detect.is_empty() && segmentation.is_empty();
1988
1989        // ── Forced backend: no fallback chain ────────────────────────
1990        if let Some(forced) = self.forced_backend {
1991            return match forced {
1992                ForcedBackend::Cpu => {
1993                    if let Some(cpu) = self.cpu.as_mut() {
1994                        return cpu.draw_decoded_masks(dst, detect, segmentation, overlay);
1995                    }
1996                    Err(Error::ForcedBackendUnavailable("cpu".into()))
1997                }
1998                ForcedBackend::G2d => {
1999                    // G2D can only produce empty frames (clear / bg blit).
2000                    // For populated frames it has no rasterizer — fail loudly.
2001                    #[cfg(target_os = "linux")]
2002                    if let Some(g2d) = self.g2d.as_mut() {
2003                        return g2d.draw_decoded_masks(dst, detect, segmentation, overlay);
2004                    }
2005                    Err(Error::ForcedBackendUnavailable("g2d".into()))
2006                }
2007                ForcedBackend::OpenGl => {
2008                    // GL handles background natively via GPU blit, and now
2009                    // actively clears when there is no background.
2010                    #[cfg(target_os = "linux")]
2011                    #[cfg(feature = "opengl")]
2012                    if let Some(opengl) = self.opengl.as_mut() {
2013                        return opengl.draw_decoded_masks(dst, detect, segmentation, overlay);
2014                    }
2015                    Err(Error::ForcedBackendUnavailable("opengl".into()))
2016                }
2017            };
2018        }
2019
2020        // ── Auto dispatch ──────────────────────────────────────────
2021        // Empty frames prefer G2D when available — a single g2d_clear or
2022        // g2d_blit is the cheapest HW path to produce the correct output
2023        // and avoids spinning up the GL pipeline every zero-detection
2024        // frame in a triple-buffered display loop.
2025        #[cfg(target_os = "linux")]
2026        if is_empty_frame {
2027            if let Some(g2d) = self.g2d.as_mut() {
2028                match g2d.draw_decoded_masks(dst, detect, segmentation, overlay) {
2029                    Ok(_) => {
2030                        log::trace!(
2031                            "draw_decoded_masks empty frame via g2d in {:?}",
2032                            start.elapsed()
2033                        );
2034                        return Ok(());
2035                    }
2036                    Err(e) => log::trace!("g2d empty-frame path unavailable: {e:?}"),
2037                }
2038            }
2039        }
2040
2041        // Populated frames (or G2D unavailable): GL first, CPU fallback.
2042        // Both backends now own their own base-layer handling (bg blit
2043        // or clear), so we hand the overlay through untouched.
2044        #[cfg(target_os = "linux")]
2045        #[cfg(feature = "opengl")]
2046        if let Some(opengl) = self.opengl.as_mut() {
2047            log::trace!(
2048                "draw_decoded_masks started with opengl in {:?}",
2049                start.elapsed()
2050            );
2051            match opengl.draw_decoded_masks(dst, detect, segmentation, overlay) {
2052                Ok(_) => {
2053                    log::trace!("draw_decoded_masks with opengl in {:?}", start.elapsed());
2054                    return Ok(());
2055                }
2056                Err(e) => {
2057                    log::trace!("draw_decoded_masks didn't work with opengl: {e:?}")
2058                }
2059            }
2060        }
2061
2062        log::trace!(
2063            "draw_decoded_masks started with cpu in {:?}",
2064            start.elapsed()
2065        );
2066        if let Some(cpu) = self.cpu.as_mut() {
2067            match cpu.draw_decoded_masks(dst, detect, segmentation, overlay) {
2068                Ok(_) => {
2069                    log::trace!("draw_decoded_masks with cpu in {:?}", start.elapsed());
2070                    return Ok(());
2071                }
2072                Err(e) => {
2073                    log::trace!("draw_decoded_masks didn't work with cpu: {e:?}");
2074                    return Err(e);
2075                }
2076            }
2077        }
2078        Err(Error::NoConverter)
2079    }
2080
2081    fn draw_proto_masks(
2082        &mut self,
2083        dst: &mut TensorDyn,
2084        detect: &[DetectBox],
2085        proto_data: &ProtoData,
2086        overlay: MaskOverlay<'_>,
2087    ) -> Result<()> {
2088        let start = Instant::now();
2089
2090        if let Some(bg) = overlay.background {
2091            if bg.aliases(dst) {
2092                return Err(Error::AliasedBuffers(
2093                    "background must not reference the same buffer as dst".to_string(),
2094                ));
2095            }
2096        }
2097
2098        // Un-letterbox detect boxes for rendering when a letterbox was applied
2099        // to prepare the model input.  The original `detect` coords are still
2100        // passed to `materialize_segmentations` (which needs model-space coords
2101        // to correctly crop the proto tensor) alongside `overlay.letterbox` so
2102        // it can emit `Segmentation` structs in output-image space.
2103        let lb_boxes: Vec<DetectBox>;
2104        let render_detect = if let Some(lb) = overlay.letterbox {
2105            lb_boxes = detect.iter().map(|&d| unletter_bbox(d, lb)).collect();
2106            lb_boxes.as_slice()
2107        } else {
2108            detect
2109        };
2110        #[cfg(target_os = "linux")]
2111        let is_empty_frame = detect.is_empty();
2112
2113        // ── Forced backend: no fallback chain ────────────────────────
2114        if let Some(forced) = self.forced_backend {
2115            return match forced {
2116                ForcedBackend::Cpu => {
2117                    if let Some(cpu) = self.cpu.as_mut() {
2118                        return cpu.draw_proto_masks(dst, render_detect, proto_data, overlay);
2119                    }
2120                    Err(Error::ForcedBackendUnavailable("cpu".into()))
2121                }
2122                ForcedBackend::G2d => {
2123                    #[cfg(target_os = "linux")]
2124                    if let Some(g2d) = self.g2d.as_mut() {
2125                        return g2d.draw_proto_masks(dst, render_detect, proto_data, overlay);
2126                    }
2127                    Err(Error::ForcedBackendUnavailable("g2d".into()))
2128                }
2129                ForcedBackend::OpenGl => {
2130                    #[cfg(target_os = "linux")]
2131                    #[cfg(feature = "opengl")]
2132                    if let Some(opengl) = self.opengl.as_mut() {
2133                        return opengl.draw_proto_masks(dst, render_detect, proto_data, overlay);
2134                    }
2135                    Err(Error::ForcedBackendUnavailable("opengl".into()))
2136                }
2137            };
2138        }
2139
2140        // ── Auto dispatch ──────────────────────────────────────────
2141        // Empty frames: prefer G2D — cheapest HW path (clear or bg blit).
2142        #[cfg(target_os = "linux")]
2143        if is_empty_frame {
2144            if let Some(g2d) = self.g2d.as_mut() {
2145                match g2d.draw_proto_masks(dst, render_detect, proto_data, overlay) {
2146                    Ok(_) => {
2147                        log::trace!(
2148                            "draw_proto_masks empty frame via g2d in {:?}",
2149                            start.elapsed()
2150                        );
2151                        return Ok(());
2152                    }
2153                    Err(e) => log::trace!("g2d empty-frame path unavailable: {e:?}"),
2154                }
2155            }
2156        }
2157
2158        // Hybrid path: CPU materialize + GL overlay (benchmarked faster than
2159        // full-GPU draw_proto_masks on all tested platforms: 27× on imx8mp,
2160        // 4× on imx95, 2.5× on rpi5, 1.6× on x86).
2161        // GL owns its own bg-blit / glClear — we pass the overlay through.
2162        //
2163        // CPU materialize needs `&mut` for its MaskScratch buffers; GL also
2164        // needs `&mut`. The CPU borrow is scoped to its block so the
2165        // subsequent GL borrow is free to take over `self`.
2166        #[cfg(target_os = "linux")]
2167        #[cfg(feature = "opengl")]
2168        if let (Some(_), Some(_)) = (self.cpu.as_ref(), self.opengl.as_ref()) {
2169            let segmentation = match self.cpu.as_mut() {
2170                Some(cpu) => {
2171                    log::trace!(
2172                        "draw_proto_masks started with hybrid (cpu+opengl) in {:?}",
2173                        start.elapsed()
2174                    );
2175                    cpu.materialize_segmentations(detect, proto_data, overlay.letterbox)?
2176                }
2177                None => unreachable!("cpu presence checked above"),
2178            };
2179            if let Some(opengl) = self.opengl.as_mut() {
2180                match opengl.draw_decoded_masks(dst, render_detect, &segmentation, overlay) {
2181                    Ok(_) => {
2182                        log::trace!(
2183                            "draw_proto_masks with hybrid (cpu+opengl) in {:?}",
2184                            start.elapsed()
2185                        );
2186                        return Ok(());
2187                    }
2188                    Err(e) => {
2189                        log::trace!(
2190                            "draw_proto_masks hybrid path failed, falling back to cpu: {e:?}"
2191                        );
2192                    }
2193                }
2194            }
2195        }
2196
2197        let Some(cpu) = self.cpu.as_mut() else {
2198            return Err(Error::Internal(
2199                "draw_proto_masks requires CPU backend for fallback path".into(),
2200            ));
2201        };
2202        log::trace!("draw_proto_masks started with cpu in {:?}", start.elapsed());
2203        cpu.draw_proto_masks(dst, render_detect, proto_data, overlay)
2204    }
2205
2206    fn set_class_colors(&mut self, colors: &[[u8; 4]]) -> Result<()> {
2207        let start = Instant::now();
2208
2209        // ── Forced backend: no fallback chain ────────────────────────
2210        if let Some(forced) = self.forced_backend {
2211            return match forced {
2212                ForcedBackend::Cpu => {
2213                    if let Some(cpu) = self.cpu.as_mut() {
2214                        return cpu.set_class_colors(colors);
2215                    }
2216                    Err(Error::ForcedBackendUnavailable("cpu".into()))
2217                }
2218                ForcedBackend::G2d => Err(Error::NotSupported(
2219                    "g2d does not support set_class_colors".into(),
2220                )),
2221                ForcedBackend::OpenGl => {
2222                    #[cfg(target_os = "linux")]
2223                    #[cfg(feature = "opengl")]
2224                    if let Some(opengl) = self.opengl.as_mut() {
2225                        return opengl.set_class_colors(colors);
2226                    }
2227                    Err(Error::ForcedBackendUnavailable("opengl".into()))
2228                }
2229            };
2230        }
2231
2232        // skip G2D as it doesn't support rendering to image
2233
2234        #[cfg(target_os = "linux")]
2235        #[cfg(feature = "opengl")]
2236        if let Some(opengl) = self.opengl.as_mut() {
2237            log::trace!("image started with opengl in {:?}", start.elapsed());
2238            match opengl.set_class_colors(colors) {
2239                Ok(_) => {
2240                    log::trace!("colors set with opengl in {:?}", start.elapsed());
2241                    return Ok(());
2242                }
2243                Err(e) => {
2244                    log::trace!("colors didn't set with opengl: {e:?}")
2245                }
2246            }
2247        }
2248        log::trace!("image started with cpu in {:?}", start.elapsed());
2249        if let Some(cpu) = self.cpu.as_mut() {
2250            match cpu.set_class_colors(colors) {
2251                Ok(_) => {
2252                    log::trace!("colors set with cpu in {:?}", start.elapsed());
2253                    return Ok(());
2254                }
2255                Err(e) => {
2256                    log::trace!("colors didn't set with cpu: {e:?}");
2257                    return Err(e);
2258                }
2259            }
2260        }
2261        Err(Error::NoConverter)
2262    }
2263}
2264
2265// ---------------------------------------------------------------------------
2266// Image loading / saving helpers
2267// ---------------------------------------------------------------------------
2268
2269/// Read EXIF orientation from raw EXIF bytes and return (Rotation, Flip).
2270fn read_exif_orientation(exif_bytes: &[u8]) -> (Rotation, Flip) {
2271    let exifreader = exif::Reader::new();
2272    let Ok(exif_) = exifreader.read_raw(exif_bytes.to_vec()) else {
2273        return (Rotation::None, Flip::None);
2274    };
2275    let Some(orientation) = exif_.get_field(exif::Tag::Orientation, exif::In::PRIMARY) else {
2276        return (Rotation::None, Flip::None);
2277    };
2278    match orientation.value.get_uint(0) {
2279        Some(1) => (Rotation::None, Flip::None),
2280        Some(2) => (Rotation::None, Flip::Horizontal),
2281        Some(3) => (Rotation::Rotate180, Flip::None),
2282        Some(4) => (Rotation::Rotate180, Flip::Horizontal),
2283        Some(5) => (Rotation::Clockwise90, Flip::Horizontal),
2284        Some(6) => (Rotation::Clockwise90, Flip::None),
2285        Some(7) => (Rotation::CounterClockwise90, Flip::Horizontal),
2286        Some(8) => (Rotation::CounterClockwise90, Flip::None),
2287        Some(v) => {
2288            log::warn!("broken orientation EXIF value: {v}");
2289            (Rotation::None, Flip::None)
2290        }
2291        None => (Rotation::None, Flip::None),
2292    }
2293}
2294
2295/// Map a [`PixelFormat`] to the zune-jpeg `ColorSpace` for decoding.
2296/// Returns `None` for formats that the JPEG decoder cannot output directly.
2297fn pixelfmt_to_colorspace(fmt: PixelFormat) -> Option<ColorSpace> {
2298    match fmt {
2299        PixelFormat::Rgb => Some(ColorSpace::RGB),
2300        PixelFormat::Rgba => Some(ColorSpace::RGBA),
2301        PixelFormat::Grey => Some(ColorSpace::Luma),
2302        _ => None,
2303    }
2304}
2305
2306/// Map a zune-jpeg `ColorSpace` to a [`PixelFormat`].
2307fn colorspace_to_pixelfmt(cs: ColorSpace) -> Option<PixelFormat> {
2308    match cs {
2309        ColorSpace::RGB => Some(PixelFormat::Rgb),
2310        ColorSpace::RGBA => Some(PixelFormat::Rgba),
2311        ColorSpace::Luma => Some(PixelFormat::Grey),
2312        _ => None,
2313    }
2314}
2315
2316/// Load a JPEG image from raw bytes and return a [`TensorDyn`].
2317// TODO: evaluate replacing zune-jpeg with libjpeg-turbo (via `turbojpeg`
2318// crate). `tjDecompress2` accepts an explicit `pitch` parameter, which
2319// would let us decode directly into a pitch-padded DMA-BUF and drop the
2320// Mem-staging + row-copy introduced below for Mali G310 pitch alignment.
2321// Dropping zune-jpeg also gets us a 2-4× faster SIMD decode on AArch64.
2322// Blockers: adds a C dep (mozjpeg-sys / libturbojpeg) to the build;
2323// cross-compilation story needs validating with zigbuild.
2324fn load_jpeg(
2325    image: &[u8],
2326    format: Option<PixelFormat>,
2327    memory: Option<TensorMemory>,
2328) -> Result<TensorDyn> {
2329    let colour = match format {
2330        Some(f) => pixelfmt_to_colorspace(f)
2331            .ok_or_else(|| Error::NotSupported(format!("Unsupported image format {f:?}")))?,
2332        None => ColorSpace::RGB,
2333    };
2334    let options = DecoderOptions::default().jpeg_set_out_colorspace(colour);
2335    let mut decoder = JpegDecoder::new_with_options(ZCursor::new(image), options);
2336    decoder.decode_headers()?;
2337
2338    let image_info = decoder.info().ok_or(Error::Internal(
2339        "JPEG did not return decoded image info".to_string(),
2340    ))?;
2341
2342    let converted_cs = decoder
2343        .output_colorspace()
2344        .ok_or(Error::Internal("No output colorspace".to_string()))?;
2345
2346    let converted_fmt = colorspace_to_pixelfmt(converted_cs).ok_or(Error::NotSupported(
2347        "Unsupported JPEG decoder output".to_string(),
2348    ))?;
2349
2350    let dest_fmt = format.unwrap_or(converted_fmt);
2351
2352    let (rotation, flip) = decoder
2353        .exif()
2354        .map(|x| read_exif_orientation(x))
2355        .unwrap_or((Rotation::None, Flip::None));
2356
2357    let w = image_info.width as usize;
2358    let h = image_info.height as usize;
2359
2360    if (rotation, flip) == (Rotation::None, Flip::None) {
2361        // When caller wants DMA and the natural pitch would be rejected by
2362        // the GPU's DMA-BUF import (Mali G310 needs 64-byte pitch), decode
2363        // into a tightly-packed Mem staging buffer and row-copy into a
2364        // pitch-padded DMA tensor. zune-jpeg has no stride-aware decode,
2365        // so the Mem intermediate is unavoidable until we swap decoders
2366        // (see TODO below).
2367        #[cfg(target_os = "linux")]
2368        if let Some(aligned_pitch) = padded_dma_pitch_for(dest_fmt, w, &memory) {
2369            let staging = Tensor::<u8>::image(w, h, converted_fmt, Some(TensorMemory::Mem))?;
2370            decoder.decode_into(&mut staging.map()?)?;
2371            let packed = if converted_fmt != dest_fmt {
2372                let mut tmp = Tensor::<u8>::image(w, h, dest_fmt, Some(TensorMemory::Mem))?;
2373                CPUProcessor::convert_format_pf(&staging, &mut tmp, converted_fmt, dest_fmt)?;
2374                tmp
2375            } else {
2376                staging
2377            };
2378            let mut dma = Tensor::<u8>::image_with_stride(
2379                w,
2380                h,
2381                dest_fmt,
2382                aligned_pitch,
2383                Some(TensorMemory::Dma),
2384            )?;
2385            copy_packed_to_padded_dma(&packed, &mut dma)?;
2386            return Ok(TensorDyn::from(dma));
2387        }
2388
2389        let mut img = Tensor::<u8>::image(w, h, dest_fmt, memory)?;
2390
2391        if converted_fmt != dest_fmt {
2392            let tmp = Tensor::<u8>::image(w, h, converted_fmt, Some(TensorMemory::Mem))?;
2393            decoder.decode_into(&mut tmp.map()?)?;
2394            CPUProcessor::convert_format_pf(&tmp, &mut img, converted_fmt, dest_fmt)?;
2395            return Ok(TensorDyn::from(img));
2396        }
2397        decoder.decode_into(&mut img.map()?)?;
2398        return Ok(TensorDyn::from(img));
2399    }
2400
2401    let mut tmp = Tensor::<u8>::image(w, h, dest_fmt, Some(TensorMemory::Mem))?;
2402
2403    if converted_fmt != dest_fmt {
2404        let tmp2 = Tensor::<u8>::image(w, h, converted_fmt, Some(TensorMemory::Mem))?;
2405        decoder.decode_into(&mut tmp2.map()?)?;
2406        CPUProcessor::convert_format_pf(&tmp2, &mut tmp, converted_fmt, dest_fmt)?;
2407    } else {
2408        decoder.decode_into(&mut tmp.map()?)?;
2409    }
2410
2411    rotate_flip_to_dyn(&tmp, dest_fmt, rotation, flip, memory)
2412}
2413
2414/// Load a PNG image from raw bytes and return a [`TensorDyn`].
2415///
2416/// Supports the same destination formats as the CPU backend's format
2417/// converter (`Rgb`, `Rgba`, `Bgra`, `Grey`, etc.). Earlier revisions only
2418/// accepted `Rgb`/`Rgba`; greyscale PNGs decoded to `Grey` now work through
2419/// the same pitch-aware DMA path as JPEG. LumaA PNGs are normalised to
2420/// `Grey` inline (alpha stripped) before going through the shared CPU
2421/// converter.
2422fn load_png(
2423    image: &[u8],
2424    format: Option<PixelFormat>,
2425    memory: Option<TensorMemory>,
2426) -> Result<TensorDyn> {
2427    let dest_fmt = format.unwrap_or(PixelFormat::Rgb);
2428
2429    // Decode with add_alpha=false — any alpha upgrade/strip happens via
2430    // the CPU converter downstream so we share one code path with
2431    // load_jpeg instead of duplicating promotion logic here.
2432    let options = DecoderOptions::default()
2433        .png_set_add_alpha_channel(false)
2434        .png_set_decode_animated(false);
2435    let mut decoder = PngDecoder::new_with_options(ZCursor::new(image), options);
2436    decoder.decode_headers()?;
2437
2438    let (width, height, rotation, flip) = {
2439        let info = decoder
2440            .info()
2441            .ok_or_else(|| Error::Internal("PNG did not return decoded image info".to_string()))?;
2442        let (rot, flip) = info
2443            .exif
2444            .as_ref()
2445            .map(|x| read_exif_orientation(x))
2446            .unwrap_or((Rotation::None, Flip::None));
2447        (info.width, info.height, rot, flip)
2448    };
2449
2450    // Map the decoder's native colorspace onto a PixelFormat that the CPU
2451    // converter understands. LumaA has no direct PixelFormat variant so we
2452    // decode as LumaA and then strip alpha inline to get Grey.
2453    let decoder_cs = decoder
2454        .colorspace()
2455        .ok_or_else(|| Error::Internal("PNG decoder did not return colorspace".to_string()))?;
2456    let (decoded_fmt, strip_luma_alpha) = match decoder_cs {
2457        ColorSpace::Luma => (PixelFormat::Grey, false),
2458        ColorSpace::LumaA => (PixelFormat::Grey, true),
2459        ColorSpace::RGB => (PixelFormat::Rgb, false),
2460        ColorSpace::RGBA => (PixelFormat::Rgba, false),
2461        other => {
2462            return Err(Error::NotSupported(format!(
2463                "PNG decoder produced unsupported colorspace {other:?}"
2464            )));
2465        }
2466    };
2467
2468    // Reject destinations the CPU converter can't reach from the decoder's
2469    // output so callers get a precise error rather than a downstream map
2470    // failure. (`Grey → Grey` / `Rgb → Rgb` / etc. are identity pairs and
2471    // are always valid.)
2472    if decoded_fmt != dest_fmt
2473        && !crate::cpu::CPUProcessor::support_conversion_pf(decoded_fmt, dest_fmt)
2474    {
2475        return Err(Error::NotSupported(format!(
2476            "load_png: cannot convert decoder output {decoded_fmt:?} to {dest_fmt:?}"
2477        )));
2478    }
2479
2480    // Decode into a Mem staging buffer in the decoder's native format. For
2481    // LumaA we allocate an extra byte-pair-per-pixel buffer since our Tensor
2482    // API only knows 1-channel (Grey); after decode we compact to Grey.
2483    let staging = if strip_luma_alpha {
2484        // LumaA is 2 bytes per pixel in the raw decode; allocate a flat
2485        // Tensor large enough to hold it, then compact to Grey in place.
2486        let raw = Tensor::<u8>::new(&[height, width, 2], Some(TensorMemory::Mem), None)?;
2487        decoder.decode_into(&mut raw.map()?)?;
2488        let grey = Tensor::<u8>::image(width, height, PixelFormat::Grey, Some(TensorMemory::Mem))?;
2489        {
2490            let raw_map = raw.map()?;
2491            let mut grey_map = grey.map()?;
2492            let raw_bytes: &[u8] = &raw_map;
2493            let grey_bytes: &mut [u8] = &mut grey_map;
2494            for (pair, out) in raw_bytes.chunks_exact(2).zip(grey_bytes.iter_mut()) {
2495                *out = pair[0];
2496            }
2497        }
2498        grey
2499    } else {
2500        let staging = Tensor::<u8>::image(width, height, decoded_fmt, Some(TensorMemory::Mem))?;
2501        decoder.decode_into(&mut staging.map()?)?;
2502        staging
2503    };
2504
2505    // Optional CPU format conversion before the final memory placement.
2506    let packed = if decoded_fmt != dest_fmt {
2507        let mut tmp = Tensor::<u8>::image(width, height, dest_fmt, Some(TensorMemory::Mem))?;
2508        CPUProcessor::convert_format_pf(&staging, &mut tmp, decoded_fmt, dest_fmt)?;
2509        tmp
2510    } else {
2511        staging
2512    };
2513
2514    if (rotation, flip) != (Rotation::None, Flip::None) {
2515        return rotate_flip_to_dyn(&packed, dest_fmt, rotation, flip, memory);
2516    }
2517
2518    // Final placement. When the caller wants DMA and the natural pitch
2519    // would be rejected by the GPU's DMA-BUF import (see
2520    // `padded_dma_pitch_for`), allocate a pitch-padded DMA tensor and
2521    // row-copy. Otherwise allocate in the requested memory domain and
2522    // linear-copy — or, when the caller asked for Mem, just return the
2523    // staging tensor directly.
2524    #[cfg(target_os = "linux")]
2525    if let Some(aligned_pitch) = padded_dma_pitch_for(dest_fmt, width, &memory) {
2526        let mut dma = Tensor::<u8>::image_with_stride(
2527            width,
2528            height,
2529            dest_fmt,
2530            aligned_pitch,
2531            Some(TensorMemory::Dma),
2532        )?;
2533        copy_packed_to_padded_dma(&packed, &mut dma)?;
2534        return Ok(TensorDyn::from(dma));
2535    }
2536
2537    if matches!(memory, Some(TensorMemory::Mem)) {
2538        return Ok(TensorDyn::from(packed));
2539    }
2540    // DMA (default on Linux) or Shm with naturally-aligned pitch.
2541    let out = Tensor::<u8>::image(width, height, dest_fmt, memory)?;
2542    {
2543        let src_map = packed.map()?;
2544        let mut dst_map = out.map()?;
2545        let src_bytes: &[u8] = &src_map;
2546        let dst_bytes: &mut [u8] = &mut dst_map;
2547        dst_bytes.copy_from_slice(src_bytes);
2548    }
2549    Ok(TensorDyn::from(out))
2550}
2551
2552/// Load an image from raw bytes (JPEG or PNG) and return a [`TensorDyn`].
2553///
2554/// The optional `format` specifies the desired output pixel format (e.g.,
2555/// [`PixelFormat::Rgb`], [`PixelFormat::Rgba`]); if `None`, the native
2556/// format of the file is used (typically RGB for JPEG).
2557///
2558/// # Examples
2559/// ```rust,no_run
2560/// use edgefirst_image::load_image;
2561/// use edgefirst_tensor::PixelFormat;
2562/// # fn main() -> Result<(), edgefirst_image::Error> {
2563/// let jpeg = std::fs::read("zidane.jpg")?;
2564/// let img = load_image(&jpeg, Some(PixelFormat::Rgb), None)?;
2565/// assert_eq!(img.width(), Some(1280));
2566/// assert_eq!(img.height(), Some(720));
2567/// # Ok(())
2568/// # }
2569/// ```
2570pub fn load_image(
2571    image: &[u8],
2572    format: Option<PixelFormat>,
2573    memory: Option<TensorMemory>,
2574) -> Result<TensorDyn> {
2575    if let Ok(i) = load_jpeg(image, format, memory) {
2576        return Ok(i);
2577    }
2578    if let Ok(i) = load_png(image, format, memory) {
2579        return Ok(i);
2580    }
2581    Err(Error::NotSupported(
2582        "Could not decode as jpeg or png".to_string(),
2583    ))
2584}
2585
2586/// Save a [`TensorDyn`] image as a JPEG file.
2587///
2588/// Only packed RGB and RGBA formats are supported.
2589pub fn save_jpeg(tensor: &TensorDyn, path: impl AsRef<std::path::Path>, quality: u8) -> Result<()> {
2590    let t = tensor.as_u8().ok_or(Error::UnsupportedFormat(
2591        "save_jpeg requires u8 tensor".to_string(),
2592    ))?;
2593    let fmt = t.format().ok_or(Error::NotAnImage)?;
2594    if fmt.layout() != PixelLayout::Packed {
2595        return Err(Error::NotImplemented(
2596            "Saving planar images is not supported".to_string(),
2597        ));
2598    }
2599
2600    let colour = match fmt {
2601        PixelFormat::Rgb => jpeg_encoder::ColorType::Rgb,
2602        PixelFormat::Rgba => jpeg_encoder::ColorType::Rgba,
2603        _ => {
2604            return Err(Error::NotImplemented(
2605                "Unsupported image format for saving".to_string(),
2606            ));
2607        }
2608    };
2609
2610    let w = t.width().ok_or(Error::NotAnImage)?;
2611    let h = t.height().ok_or(Error::NotAnImage)?;
2612    let encoder = jpeg_encoder::Encoder::new_file(path, quality)?;
2613    let tensor_map = t.map()?;
2614
2615    encoder.encode(&tensor_map, w as u16, h as u16, colour)?;
2616
2617    Ok(())
2618}
2619
2620pub(crate) struct FunctionTimer<T: Display> {
2621    name: T,
2622    start: std::time::Instant,
2623}
2624
2625impl<T: Display> FunctionTimer<T> {
2626    pub fn new(name: T) -> Self {
2627        Self {
2628            name,
2629            start: std::time::Instant::now(),
2630        }
2631    }
2632}
2633
2634impl<T: Display> Drop for FunctionTimer<T> {
2635    fn drop(&mut self) {
2636        log::trace!("{} elapsed: {:?}", self.name, self.start.elapsed())
2637    }
2638}
2639
2640const DEFAULT_COLORS: [[f32; 4]; 20] = [
2641    [0., 1., 0., 0.7],
2642    [1., 0.5568628, 0., 0.7],
2643    [0.25882353, 0.15294118, 0.13333333, 0.7],
2644    [0.8, 0.7647059, 0.78039216, 0.7],
2645    [0.3137255, 0.3137255, 0.3137255, 0.7],
2646    [0.1411765, 0.3098039, 0.1215686, 0.7],
2647    [1., 0.95686275, 0.5137255, 0.7],
2648    [0.3529412, 0.32156863, 0., 0.7],
2649    [0.4235294, 0.6235294, 0.6509804, 0.7],
2650    [0.5098039, 0.5098039, 0.7294118, 0.7],
2651    [0.00784314, 0.18823529, 0.29411765, 0.7],
2652    [0.0, 0.2706, 1.0, 0.7],
2653    [0.0, 0.0, 0.0, 0.7],
2654    [0.0, 0.5, 0.0, 0.7],
2655    [1.0, 0.0, 0.0, 0.7],
2656    [0.0, 0.0, 1.0, 0.7],
2657    [1.0, 0.5, 0.5, 0.7],
2658    [0.1333, 0.5451, 0.1333, 0.7],
2659    [0.1176, 0.4118, 0.8235, 0.7],
2660    [1., 1., 1., 0.7],
2661];
2662
2663const fn denorm<const M: usize, const N: usize>(a: [[f32; M]; N]) -> [[u8; M]; N] {
2664    let mut result = [[0; M]; N];
2665    let mut i = 0;
2666    while i < N {
2667        let mut j = 0;
2668        while j < M {
2669            result[i][j] = (a[i][j] * 255.0).round() as u8;
2670            j += 1;
2671        }
2672        i += 1;
2673    }
2674    result
2675}
2676
2677const DEFAULT_COLORS_U8: [[u8; 4]; 20] = denorm(DEFAULT_COLORS);
2678
2679#[cfg(test)]
2680#[cfg_attr(coverage_nightly, coverage(off))]
2681mod alignment_tests {
2682    use super::*;
2683
2684    #[test]
2685    fn align_width_rgba8_common_widths() {
2686        // RGBA8 (bpp=4, lcm(64,4)=64, so width must round to multiple of 16 px).
2687        assert_eq!(align_width_for_gpu_pitch(640, 4), 640); // 2560 byte pitch — already aligned
2688        assert_eq!(align_width_for_gpu_pitch(1280, 4), 1280); // 5120
2689        assert_eq!(align_width_for_gpu_pitch(1920, 4), 1920); // 7680
2690        assert_eq!(align_width_for_gpu_pitch(3840, 4), 3840); // 15360
2691                                                              // crowd.png case from the imx95 investigation:
2692        assert_eq!(align_width_for_gpu_pitch(3004, 4), 3008); // 12016 → 12032
2693        assert_eq!(align_width_for_gpu_pitch(3000, 4), 3008); // 12000 → 12032
2694        assert_eq!(align_width_for_gpu_pitch(17, 4), 32); // 68 → 128
2695        assert_eq!(align_width_for_gpu_pitch(1, 4), 16); // 4 → 64
2696    }
2697
2698    #[test]
2699    fn align_width_rgb888_packed() {
2700        // RGB888 (bpp=3, lcm(64,3)=192, so width must round to multiple of 64 px).
2701        assert_eq!(align_width_for_gpu_pitch(64, 3), 64); // 192 byte pitch
2702        assert_eq!(align_width_for_gpu_pitch(640, 3), 640); // 1920
2703        assert_eq!(align_width_for_gpu_pitch(1, 3), 64); // 3 → 192
2704        assert_eq!(align_width_for_gpu_pitch(65, 3), 128); // 195 → 384
2705                                                           // Verify the rounded width × bpp is a clean multiple of the LCM.
2706        for w in [3004usize, 1281, 100, 17] {
2707            let padded = align_width_for_gpu_pitch(w, 3);
2708            assert!(padded >= w);
2709            assert_eq!((padded * 3) % 64, 0);
2710            assert_eq!((padded * 3) % 3, 0);
2711        }
2712    }
2713
2714    #[test]
2715    fn align_width_grey_u8() {
2716        // Grey (bpp=1, lcm(64,1)=64, so width must round to multiple of 64 px).
2717        assert_eq!(align_width_for_gpu_pitch(64, 1), 64);
2718        assert_eq!(align_width_for_gpu_pitch(640, 1), 640);
2719        assert_eq!(align_width_for_gpu_pitch(1, 1), 64);
2720        assert_eq!(align_width_for_gpu_pitch(65, 1), 128);
2721    }
2722
2723    #[test]
2724    fn align_width_zero_inputs() {
2725        assert_eq!(align_width_for_gpu_pitch(0, 4), 0);
2726        assert_eq!(align_width_for_gpu_pitch(640, 0), 640);
2727    }
2728
2729    #[test]
2730    fn align_width_never_returns_smaller_than_input() {
2731        // Spot-check the "returned width >= input width" contract across a
2732        // range of values that would previously have hit `width * bpp`
2733        // overflow paths.
2734        for &bpp in &[1usize, 2, 3, 4, 8] {
2735            for &w in &[
2736                1usize,
2737                17,
2738                64,
2739                65,
2740                100,
2741                1280,
2742                1281,
2743                1920,
2744                3004,
2745                3072,
2746                3840,
2747                usize::MAX / 8,
2748                usize::MAX / 4,
2749                usize::MAX / 2,
2750                usize::MAX - 1,
2751                usize::MAX,
2752            ] {
2753                let aligned = align_width_for_gpu_pitch(w, bpp);
2754                assert!(
2755                    aligned >= w,
2756                    "align_width_for_gpu_pitch({w}, {bpp}) = {aligned} < {w}"
2757                );
2758            }
2759        }
2760    }
2761
2762    #[test]
2763    fn align_width_overflow_returns_unaligned_not_smaller() {
2764        // For width values close to usize::MAX, padding up would wrap. The
2765        // function must return the original width rather than wrapping or
2766        // panicking. A pre-aligned width round-trips unchanged even at the
2767        // extreme.
2768        let aligned_extreme = usize::MAX - 15; // 16-pixel boundary for RGBA8
2769        assert_eq!(
2770            align_width_for_gpu_pitch(aligned_extreme, 4),
2771            aligned_extreme
2772        );
2773        // A misaligned extreme value cannot be rounded up — the function
2774        // returns the original.
2775        let misaligned_extreme = usize::MAX - 1;
2776        let result = align_width_for_gpu_pitch(misaligned_extreme, 4);
2777        assert!(
2778            result == misaligned_extreme || result >= misaligned_extreme,
2779            "extreme misaligned width must not be rounded down to {result}"
2780        );
2781    }
2782
2783    #[test]
2784    fn checked_lcm_basic_and_overflow() {
2785        assert_eq!(checked_num_integer_lcm(64, 4), Some(64));
2786        assert_eq!(checked_num_integer_lcm(64, 3), Some(192));
2787        assert_eq!(checked_num_integer_lcm(64, 1), Some(64));
2788        assert_eq!(checked_num_integer_lcm(0, 4), Some(0));
2789        assert_eq!(checked_num_integer_lcm(64, 0), Some(0));
2790        // Coprime values whose product exceeds usize::MAX must return None.
2791        assert_eq!(
2792            checked_num_integer_lcm(usize::MAX, usize::MAX - 1),
2793            None,
2794            "coprime extreme values must overflow detect, not panic"
2795        );
2796    }
2797
2798    #[test]
2799    fn primary_plane_bpp_known_formats() {
2800        // Packed formats use channels × elem_size.
2801        assert_eq!(primary_plane_bpp(PixelFormat::Rgba, 1), Some(4));
2802        assert_eq!(primary_plane_bpp(PixelFormat::Bgra, 1), Some(4));
2803        assert_eq!(primary_plane_bpp(PixelFormat::Rgb, 1), Some(3));
2804        assert_eq!(primary_plane_bpp(PixelFormat::Grey, 1), Some(1));
2805        // Semi-planar (NV12) reports the luma plane's bpp.
2806        assert_eq!(primary_plane_bpp(PixelFormat::Nv12, 1), Some(1));
2807    }
2808}
2809
2810#[cfg(test)]
2811#[cfg_attr(coverage_nightly, coverage(off))]
2812mod image_tests {
2813    use super::*;
2814    use crate::{CPUProcessor, Rotation};
2815    #[cfg(target_os = "linux")]
2816    use edgefirst_tensor::is_dma_available;
2817    use edgefirst_tensor::{TensorMapTrait, TensorMemory, TensorTrait};
2818    use image::buffer::ConvertBuffer;
2819
2820    /// Test helper: call `ImageProcessorTrait::convert()` on two `TensorDyn`s
2821    /// by going through the `TensorDyn` API.
2822    ///
2823    /// Returns the `(src_image, dst_image)` reconstructed from the TensorDyn
2824    /// round-trip so the caller can feed them to `compare_images` etc.
2825    fn convert_img(
2826        proc: &mut dyn ImageProcessorTrait,
2827        src: TensorDyn,
2828        dst: TensorDyn,
2829        rotation: Rotation,
2830        flip: Flip,
2831        crop: Crop,
2832    ) -> (Result<()>, TensorDyn, TensorDyn) {
2833        let src_fourcc = src.format().unwrap();
2834        let dst_fourcc = dst.format().unwrap();
2835        let src_dyn = src;
2836        let mut dst_dyn = dst;
2837        let result = proc.convert(&src_dyn, &mut dst_dyn, rotation, flip, crop);
2838        let src_back = {
2839            let mut __t = src_dyn.into_u8().unwrap();
2840            __t.set_format(src_fourcc).unwrap();
2841            TensorDyn::from(__t)
2842        };
2843        let dst_back = {
2844            let mut __t = dst_dyn.into_u8().unwrap();
2845            __t.set_format(dst_fourcc).unwrap();
2846            TensorDyn::from(__t)
2847        };
2848        (result, src_back, dst_back)
2849    }
2850
2851    #[ctor::ctor]
2852    fn init() {
2853        env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
2854    }
2855
2856    macro_rules! function {
2857        () => {{
2858            fn f() {}
2859            fn type_name_of<T>(_: T) -> &'static str {
2860                std::any::type_name::<T>()
2861            }
2862            let name = type_name_of(f);
2863
2864            // Find and cut the rest of the path
2865            match &name[..name.len() - 3].rfind(':') {
2866                Some(pos) => &name[pos + 1..name.len() - 3],
2867                None => &name[..name.len() - 3],
2868            }
2869        }};
2870    }
2871
2872    #[test]
2873    fn test_invalid_crop() {
2874        let src = TensorDyn::image(100, 100, PixelFormat::Rgb, DType::U8, None).unwrap();
2875        let dst = TensorDyn::image(100, 100, PixelFormat::Rgb, DType::U8, None).unwrap();
2876
2877        let crop = Crop::new()
2878            .with_src_rect(Some(Rect::new(50, 50, 60, 60)))
2879            .with_dst_rect(Some(Rect::new(0, 0, 150, 150)));
2880
2881        let result = crop.check_crop_dyn(&src, &dst);
2882        assert!(matches!(
2883            result,
2884            Err(Error::CropInvalid(e)) if e.starts_with("Dest and Src crop invalid")
2885        ));
2886
2887        let crop = crop.with_src_rect(Some(Rect::new(0, 0, 10, 10)));
2888        let result = crop.check_crop_dyn(&src, &dst);
2889        assert!(matches!(
2890            result,
2891            Err(Error::CropInvalid(e)) if e.starts_with("Dest crop invalid")
2892        ));
2893
2894        let crop = crop
2895            .with_src_rect(Some(Rect::new(50, 50, 60, 60)))
2896            .with_dst_rect(Some(Rect::new(0, 0, 50, 50)));
2897        let result = crop.check_crop_dyn(&src, &dst);
2898        assert!(matches!(
2899            result,
2900            Err(Error::CropInvalid(e)) if e.starts_with("Src crop invalid")
2901        ));
2902
2903        let crop = crop.with_src_rect(Some(Rect::new(50, 50, 50, 50)));
2904
2905        let result = crop.check_crop_dyn(&src, &dst);
2906        assert!(result.is_ok());
2907    }
2908
2909    #[test]
2910    fn test_invalid_tensor_format() -> Result<(), Error> {
2911        // 4D tensor cannot be set to a 3-channel pixel format
2912        let mut tensor = Tensor::<u8>::new(&[720, 1280, 4, 1], None, None)?;
2913        let result = tensor.set_format(PixelFormat::Rgb);
2914        assert!(result.is_err(), "4D tensor should reject set_format");
2915
2916        // Tensor with wrong channel count for the format
2917        let mut tensor = Tensor::<u8>::new(&[720, 1280, 4], None, None)?;
2918        let result = tensor.set_format(PixelFormat::Rgb);
2919        assert!(result.is_err(), "4-channel tensor should reject RGB format");
2920
2921        Ok(())
2922    }
2923
2924    #[test]
2925    fn test_invalid_image_file() -> Result<(), Error> {
2926        let result = crate::load_image(&[123; 5000], None, None);
2927        assert!(matches!(
2928            result,
2929            Err(Error::NotSupported(e)) if e == "Could not decode as jpeg or png"));
2930
2931        Ok(())
2932    }
2933
2934    #[test]
2935    fn test_invalid_jpeg_format() -> Result<(), Error> {
2936        let result = crate::load_image(&[123; 5000], Some(PixelFormat::Yuyv), None);
2937        assert!(matches!(
2938            result,
2939            Err(Error::NotSupported(e)) if e == "Could not decode as jpeg or png"));
2940
2941        Ok(())
2942    }
2943
2944    #[test]
2945    fn test_load_resize_save() {
2946        let file = edgefirst_bench::testdata::read("zidane.jpg");
2947        let img = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
2948        assert_eq!(img.width(), Some(1280));
2949        assert_eq!(img.height(), Some(720));
2950
2951        let dst = TensorDyn::image(640, 360, PixelFormat::Rgba, DType::U8, None).unwrap();
2952        let mut converter = CPUProcessor::new();
2953        let (result, _img, dst) = convert_img(
2954            &mut converter,
2955            img,
2956            dst,
2957            Rotation::None,
2958            Flip::None,
2959            Crop::no_crop(),
2960        );
2961        result.unwrap();
2962        assert_eq!(dst.width(), Some(640));
2963        assert_eq!(dst.height(), Some(360));
2964
2965        crate::save_jpeg(&dst, "zidane_resized.jpg", 80).unwrap();
2966
2967        let file = std::fs::read("zidane_resized.jpg").unwrap();
2968        let img = crate::load_image(&file, None, None).unwrap();
2969        assert_eq!(img.width(), Some(640));
2970        assert_eq!(img.height(), Some(360));
2971        assert_eq!(img.format().unwrap(), PixelFormat::Rgb);
2972    }
2973
2974    #[test]
2975    fn test_from_tensor_planar() -> Result<(), Error> {
2976        let mut tensor = Tensor::new(&[3, 720, 1280], None, None)?;
2977        tensor
2978            .map()?
2979            .copy_from_slice(&edgefirst_bench::testdata::read("camera720p.8bps"));
2980        let planar = {
2981            tensor
2982                .set_format(PixelFormat::PlanarRgb)
2983                .map_err(|e| crate::Error::Internal(e.to_string()))?;
2984            TensorDyn::from(tensor)
2985        };
2986
2987        let rbga = load_bytes_to_tensor(
2988            1280,
2989            720,
2990            PixelFormat::Rgba,
2991            None,
2992            &edgefirst_bench::testdata::read("camera720p.rgba"),
2993        )?;
2994        compare_images_convert_to_rgb(&planar, &rbga, 0.98, function!());
2995
2996        Ok(())
2997    }
2998
2999    #[test]
3000    fn test_from_tensor_invalid_format() {
3001        // PixelFormat::from_fourcc_str returns None for unknown FourCC codes.
3002        // Since there's no "TEST" pixel format, this validates graceful handling.
3003        assert!(PixelFormat::from_fourcc(u32::from_le_bytes(*b"TEST")).is_none());
3004    }
3005
3006    #[test]
3007    #[should_panic(expected = "Failed to save planar RGB image")]
3008    fn test_save_planar() {
3009        let planar_img = load_bytes_to_tensor(
3010            1280,
3011            720,
3012            PixelFormat::PlanarRgb,
3013            None,
3014            &edgefirst_bench::testdata::read("camera720p.8bps"),
3015        )
3016        .unwrap();
3017
3018        let save_path = "/tmp/planar_rgb.jpg";
3019        crate::save_jpeg(&planar_img, save_path, 90).expect("Failed to save planar RGB image");
3020    }
3021
3022    #[test]
3023    #[should_panic(expected = "Failed to save YUYV image")]
3024    fn test_save_yuyv() {
3025        let planar_img = load_bytes_to_tensor(
3026            1280,
3027            720,
3028            PixelFormat::Yuyv,
3029            None,
3030            &edgefirst_bench::testdata::read("camera720p.yuyv"),
3031        )
3032        .unwrap();
3033
3034        let save_path = "/tmp/yuyv.jpg";
3035        crate::save_jpeg(&planar_img, save_path, 90).expect("Failed to save YUYV image");
3036    }
3037
3038    #[test]
3039    fn test_rotation_angle() {
3040        assert_eq!(Rotation::from_degrees_clockwise(0), Rotation::None);
3041        assert_eq!(Rotation::from_degrees_clockwise(90), Rotation::Clockwise90);
3042        assert_eq!(Rotation::from_degrees_clockwise(180), Rotation::Rotate180);
3043        assert_eq!(
3044            Rotation::from_degrees_clockwise(270),
3045            Rotation::CounterClockwise90
3046        );
3047        assert_eq!(Rotation::from_degrees_clockwise(360), Rotation::None);
3048        assert_eq!(Rotation::from_degrees_clockwise(450), Rotation::Clockwise90);
3049        assert_eq!(Rotation::from_degrees_clockwise(540), Rotation::Rotate180);
3050        assert_eq!(
3051            Rotation::from_degrees_clockwise(630),
3052            Rotation::CounterClockwise90
3053        );
3054    }
3055
3056    #[test]
3057    #[should_panic(expected = "rotation angle is not a multiple of 90")]
3058    fn test_rotation_angle_panic() {
3059        Rotation::from_degrees_clockwise(361);
3060    }
3061
3062    #[test]
3063    fn test_disable_env_var() -> Result<(), Error> {
3064        // EDGEFIRST_FORCE_BACKEND takes precedence over EDGEFIRST_DISABLE_*,
3065        // so clear it for the duration of this test to avoid races with
3066        // test_force_backend_cpu running in parallel.
3067        let saved_force = std::env::var("EDGEFIRST_FORCE_BACKEND").ok();
3068        unsafe { std::env::remove_var("EDGEFIRST_FORCE_BACKEND") };
3069
3070        #[cfg(target_os = "linux")]
3071        {
3072            let original = std::env::var("EDGEFIRST_DISABLE_G2D").ok();
3073            unsafe { std::env::set_var("EDGEFIRST_DISABLE_G2D", "1") };
3074            let converter = ImageProcessor::new()?;
3075            match original {
3076                Some(s) => unsafe { std::env::set_var("EDGEFIRST_DISABLE_G2D", s) },
3077                None => unsafe { std::env::remove_var("EDGEFIRST_DISABLE_G2D") },
3078            }
3079            assert!(converter.g2d.is_none());
3080        }
3081
3082        #[cfg(target_os = "linux")]
3083        #[cfg(feature = "opengl")]
3084        {
3085            let original = std::env::var("EDGEFIRST_DISABLE_GL").ok();
3086            unsafe { std::env::set_var("EDGEFIRST_DISABLE_GL", "1") };
3087            let converter = ImageProcessor::new()?;
3088            match original {
3089                Some(s) => unsafe { std::env::set_var("EDGEFIRST_DISABLE_GL", s) },
3090                None => unsafe { std::env::remove_var("EDGEFIRST_DISABLE_GL") },
3091            }
3092            assert!(converter.opengl.is_none());
3093        }
3094
3095        let original = std::env::var("EDGEFIRST_DISABLE_CPU").ok();
3096        unsafe { std::env::set_var("EDGEFIRST_DISABLE_CPU", "1") };
3097        let converter = ImageProcessor::new()?;
3098        match original {
3099            Some(s) => unsafe { std::env::set_var("EDGEFIRST_DISABLE_CPU", s) },
3100            None => unsafe { std::env::remove_var("EDGEFIRST_DISABLE_CPU") },
3101        }
3102        assert!(converter.cpu.is_none());
3103
3104        let original_cpu = std::env::var("EDGEFIRST_DISABLE_CPU").ok();
3105        unsafe { std::env::set_var("EDGEFIRST_DISABLE_CPU", "1") };
3106        let original_gl = std::env::var("EDGEFIRST_DISABLE_GL").ok();
3107        unsafe { std::env::set_var("EDGEFIRST_DISABLE_GL", "1") };
3108        let original_g2d = std::env::var("EDGEFIRST_DISABLE_G2D").ok();
3109        unsafe { std::env::set_var("EDGEFIRST_DISABLE_G2D", "1") };
3110        let mut converter = ImageProcessor::new()?;
3111
3112        let src = TensorDyn::image(1280, 720, PixelFormat::Rgba, DType::U8, None)?;
3113        let dst = TensorDyn::image(640, 360, PixelFormat::Rgba, DType::U8, None)?;
3114        let (result, _src, _dst) = convert_img(
3115            &mut converter,
3116            src,
3117            dst,
3118            Rotation::None,
3119            Flip::None,
3120            Crop::no_crop(),
3121        );
3122        assert!(matches!(result, Err(Error::NoConverter)));
3123
3124        match original_cpu {
3125            Some(s) => unsafe { std::env::set_var("EDGEFIRST_DISABLE_CPU", s) },
3126            None => unsafe { std::env::remove_var("EDGEFIRST_DISABLE_CPU") },
3127        }
3128        match original_gl {
3129            Some(s) => unsafe { std::env::set_var("EDGEFIRST_DISABLE_GL", s) },
3130            None => unsafe { std::env::remove_var("EDGEFIRST_DISABLE_GL") },
3131        }
3132        match original_g2d {
3133            Some(s) => unsafe { std::env::set_var("EDGEFIRST_DISABLE_G2D", s) },
3134            None => unsafe { std::env::remove_var("EDGEFIRST_DISABLE_G2D") },
3135        }
3136        match saved_force {
3137            Some(s) => unsafe { std::env::set_var("EDGEFIRST_FORCE_BACKEND", s) },
3138            None => unsafe { std::env::remove_var("EDGEFIRST_FORCE_BACKEND") },
3139        }
3140
3141        Ok(())
3142    }
3143
3144    #[test]
3145    fn test_unsupported_conversion() {
3146        let src = TensorDyn::image(1280, 720, PixelFormat::Nv12, DType::U8, None).unwrap();
3147        let dst = TensorDyn::image(640, 360, PixelFormat::Nv12, DType::U8, None).unwrap();
3148        let mut converter = ImageProcessor::new().unwrap();
3149        let (result, _src, _dst) = convert_img(
3150            &mut converter,
3151            src,
3152            dst,
3153            Rotation::None,
3154            Flip::None,
3155            Crop::no_crop(),
3156        );
3157        log::debug!("result: {:?}", result);
3158        assert!(matches!(
3159            result,
3160            Err(Error::NotSupported(e)) if e.starts_with("Conversion from NV12 to NV12")
3161        ));
3162    }
3163
3164    #[test]
3165    fn test_load_grey() {
3166        let grey_img = crate::load_image(
3167            &edgefirst_bench::testdata::read("grey.jpg"),
3168            Some(PixelFormat::Rgba),
3169            None,
3170        )
3171        .unwrap();
3172
3173        let grey_but_rgb_img = crate::load_image(
3174            &edgefirst_bench::testdata::read("grey-rgb.jpg"),
3175            Some(PixelFormat::Rgba),
3176            None,
3177        )
3178        .unwrap();
3179
3180        compare_images(&grey_img, &grey_but_rgb_img, 0.99, function!());
3181    }
3182
3183    #[test]
3184    fn test_new_nv12() {
3185        let nv12 = TensorDyn::image(1280, 720, PixelFormat::Nv12, DType::U8, None).unwrap();
3186        assert_eq!(nv12.height(), Some(720));
3187        assert_eq!(nv12.width(), Some(1280));
3188        assert_eq!(nv12.format().unwrap(), PixelFormat::Nv12);
3189        // PixelFormat::Nv12.channels() returns 1 (luma plane channel count)
3190        assert_eq!(nv12.format().unwrap().channels(), 1);
3191        assert!(nv12.format().is_some_and(
3192            |f| f.layout() == PixelLayout::Planar || f.layout() == PixelLayout::SemiPlanar
3193        ))
3194    }
3195
3196    #[test]
3197    #[cfg(target_os = "linux")]
3198    fn test_new_image_converter() {
3199        let dst_width = 640;
3200        let dst_height = 360;
3201        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
3202        let src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
3203
3204        let mut converter = ImageProcessor::new().unwrap();
3205        let converter_dst = converter
3206            .create_image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None)
3207            .unwrap();
3208        let (result, src, converter_dst) = convert_img(
3209            &mut converter,
3210            src,
3211            converter_dst,
3212            Rotation::None,
3213            Flip::None,
3214            Crop::no_crop(),
3215        );
3216        result.unwrap();
3217
3218        let cpu_dst =
3219            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
3220        let mut cpu_converter = CPUProcessor::new();
3221        let (result, _src, cpu_dst) = convert_img(
3222            &mut cpu_converter,
3223            src,
3224            cpu_dst,
3225            Rotation::None,
3226            Flip::None,
3227            Crop::no_crop(),
3228        );
3229        result.unwrap();
3230
3231        compare_images(&converter_dst, &cpu_dst, 0.98, function!());
3232    }
3233
3234    #[test]
3235    #[cfg(target_os = "linux")]
3236    fn test_create_image_dtype_i8() {
3237        let mut converter = ImageProcessor::new().unwrap();
3238
3239        // I8 image should allocate successfully via create_image
3240        let dst = converter
3241            .create_image(320, 240, PixelFormat::Rgb, DType::I8, None)
3242            .unwrap();
3243        assert_eq!(dst.dtype(), DType::I8);
3244        assert!(dst.width() == Some(320));
3245        assert!(dst.height() == Some(240));
3246        assert_eq!(dst.format(), Some(PixelFormat::Rgb));
3247
3248        // U8 for comparison
3249        let dst_u8 = converter
3250            .create_image(320, 240, PixelFormat::Rgb, DType::U8, None)
3251            .unwrap();
3252        assert_eq!(dst_u8.dtype(), DType::U8);
3253
3254        // Convert into I8 dst should succeed
3255        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
3256        let src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
3257        let mut dst_i8 = converter
3258            .create_image(320, 240, PixelFormat::Rgb, DType::I8, None)
3259            .unwrap();
3260        converter
3261            .convert(
3262                &src,
3263                &mut dst_i8,
3264                Rotation::None,
3265                Flip::None,
3266                Crop::no_crop(),
3267            )
3268            .unwrap();
3269    }
3270
3271    #[test]
3272    #[cfg(target_os = "linux")]
3273    fn test_create_image_nv12_dma_non_aligned_width() {
3274        // Regression for C2: create_image must not apply stride padding to
3275        // non-packed formats. NV12 is semi-planar (PixelLayout::SemiPlanar),
3276        // so the try_dma path should fall through to the plain
3277        // TensorDyn::image allocation for any width, regardless of the
3278        // 64-byte GPU pitch alignment.
3279        let converter = ImageProcessor::new().unwrap();
3280
3281        // 100 is intentionally not a multiple of 64 (the Mali pitch
3282        // alignment) to prove that non-packed layouts do not take the
3283        // stride-padded branch.
3284        let result = converter.create_image(
3285            100,
3286            64,
3287            PixelFormat::Nv12,
3288            DType::U8,
3289            Some(TensorMemory::Dma),
3290        );
3291
3292        match result {
3293            Ok(img) => {
3294                assert_eq!(img.width(), Some(100));
3295                assert_eq!(img.height(), Some(64));
3296                assert_eq!(img.format(), Some(PixelFormat::Nv12));
3297                // Non-packed formats must never carry a row_stride override.
3298                assert!(
3299                    img.row_stride().is_none(),
3300                    "NV12 must not be stride-padded by create_image",
3301                );
3302            }
3303            Err(e) => {
3304                // Accept skip on hosts without a dma-heap, but never the
3305                // "NotImplemented" we used to return for non-packed layouts.
3306                let msg = format!("{e}");
3307                assert!(
3308                    !msg.contains("image_with_stride"),
3309                    "NV12 should not hit the stride-padded path: {msg}",
3310                );
3311            }
3312        }
3313    }
3314
3315    #[test]
3316    #[ignore] // Hangs on desktop platforms where DMA-buf is unavailable and PBO
3317              // fallback triggers a GPU driver hang during SHM→texture upload (e.g.,
3318              // NVIDIA without /dev/dma_heap permissions). Works on embedded targets.
3319    fn test_crop_skip() {
3320        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
3321        let src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
3322
3323        let mut converter = ImageProcessor::new().unwrap();
3324        let converter_dst = converter
3325            .create_image(1280, 720, PixelFormat::Rgba, DType::U8, None)
3326            .unwrap();
3327        let crop = Crop::new()
3328            .with_src_rect(Some(Rect::new(0, 0, 640, 640)))
3329            .with_dst_rect(Some(Rect::new(0, 0, 640, 640)));
3330        let (result, src, converter_dst) = convert_img(
3331            &mut converter,
3332            src,
3333            converter_dst,
3334            Rotation::None,
3335            Flip::None,
3336            crop,
3337        );
3338        result.unwrap();
3339
3340        let cpu_dst = TensorDyn::image(1280, 720, PixelFormat::Rgba, DType::U8, None).unwrap();
3341        let mut cpu_converter = CPUProcessor::new();
3342        let (result, _src, cpu_dst) = convert_img(
3343            &mut cpu_converter,
3344            src,
3345            cpu_dst,
3346            Rotation::None,
3347            Flip::None,
3348            crop,
3349        );
3350        result.unwrap();
3351
3352        compare_images(&converter_dst, &cpu_dst, 0.99999, function!());
3353    }
3354
3355    #[test]
3356    fn test_invalid_pixel_format() {
3357        // PixelFormat::from_fourcc returns None for unknown formats,
3358        // so TensorDyn::image cannot be called with an invalid format.
3359        assert!(PixelFormat::from_fourcc(u32::from_le_bytes(*b"TEST")).is_none());
3360    }
3361
3362    // Helper function to check if G2D library is available (Linux/i.MX8 only)
3363    #[cfg(target_os = "linux")]
3364    static G2D_AVAILABLE: std::sync::OnceLock<bool> = std::sync::OnceLock::new();
3365
3366    #[cfg(target_os = "linux")]
3367    fn is_g2d_available() -> bool {
3368        *G2D_AVAILABLE.get_or_init(|| G2DProcessor::new().is_ok())
3369    }
3370
3371    #[cfg(target_os = "linux")]
3372    #[cfg(feature = "opengl")]
3373    static GL_AVAILABLE: std::sync::OnceLock<bool> = std::sync::OnceLock::new();
3374
3375    #[cfg(target_os = "linux")]
3376    #[cfg(feature = "opengl")]
3377    // Helper function to check if OpenGL is available
3378    fn is_opengl_available() -> bool {
3379        #[cfg(all(target_os = "linux", feature = "opengl"))]
3380        {
3381            *GL_AVAILABLE.get_or_init(|| GLProcessorThreaded::new(None).is_ok())
3382        }
3383
3384        #[cfg(not(all(target_os = "linux", feature = "opengl")))]
3385        {
3386            false
3387        }
3388    }
3389
3390    #[test]
3391    fn test_load_jpeg_with_exif() {
3392        let file = edgefirst_bench::testdata::read("zidane_rotated_exif.jpg").to_vec();
3393        let loaded = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
3394
3395        assert_eq!(loaded.height(), Some(1280));
3396        assert_eq!(loaded.width(), Some(720));
3397
3398        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
3399        let cpu_src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
3400
3401        let (dst_width, dst_height) = (cpu_src.height().unwrap(), cpu_src.width().unwrap());
3402
3403        let cpu_dst =
3404            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
3405        let mut cpu_converter = CPUProcessor::new();
3406
3407        let (result, _cpu_src, cpu_dst) = convert_img(
3408            &mut cpu_converter,
3409            cpu_src,
3410            cpu_dst,
3411            Rotation::Clockwise90,
3412            Flip::None,
3413            Crop::no_crop(),
3414        );
3415        result.unwrap();
3416
3417        compare_images(&loaded, &cpu_dst, 0.98, function!());
3418    }
3419
3420    #[test]
3421    fn test_load_png_with_exif() {
3422        let file = edgefirst_bench::testdata::read("zidane_rotated_exif_180.png").to_vec();
3423        let loaded = crate::load_png(&file, Some(PixelFormat::Rgba), None).unwrap();
3424
3425        assert_eq!(loaded.height(), Some(720));
3426        assert_eq!(loaded.width(), Some(1280));
3427
3428        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
3429        let cpu_src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
3430
3431        let cpu_dst = TensorDyn::image(1280, 720, PixelFormat::Rgba, DType::U8, None).unwrap();
3432        let mut cpu_converter = CPUProcessor::new();
3433
3434        let (result, _cpu_src, cpu_dst) = convert_img(
3435            &mut cpu_converter,
3436            cpu_src,
3437            cpu_dst,
3438            Rotation::Rotate180,
3439            Flip::None,
3440            Crop::no_crop(),
3441        );
3442        result.unwrap();
3443
3444        compare_images(&loaded, &cpu_dst, 0.98, function!());
3445    }
3446
3447    /// Synthesise an RGB JPEG with a deterministic pattern at `(width, height)`
3448    /// using the workspace's `jpeg-encoder` crate (the `image` crate is
3449    /// compiled without its JPEG feature). Used to exercise the decoder /
3450    /// pitch-padding paths for arbitrary dimensions without having to bundle
3451    /// a fixture file per test size.
3452    #[cfg(target_os = "linux")]
3453    fn make_rgb_jpeg(width: u32, height: u32) -> Vec<u8> {
3454        let mut bytes = Vec::with_capacity((width * height * 3) as usize);
3455        for y in 0..height {
3456            for x in 0..width {
3457                bytes.push(((x + y) & 0xFF) as u8);
3458                bytes.push(((x.wrapping_mul(3)) & 0xFF) as u8);
3459                bytes.push(((y.wrapping_mul(5)) & 0xFF) as u8);
3460            }
3461        }
3462        let mut out = Vec::new();
3463        let encoder = jpeg_encoder::Encoder::new(&mut out, 85);
3464        encoder
3465            .encode(
3466                &bytes,
3467                width as u16,
3468                height as u16,
3469                jpeg_encoder::ColorType::Rgb,
3470            )
3471            .expect("jpeg-encoder must succeed on trivial input");
3472        out
3473    }
3474
3475    /// End-to-end: a 375×333 RGBA JPEG (width NOT divisible by 4) loaded
3476    /// via the pitch-padded DMA path and letterboxed through the GL
3477    /// backend must produce correct output. Before the Rgba/Bgra
3478    /// width%4 relaxation in `DmaImportAttrs::from_tensor`, this case
3479    /// failed the pre-check and forced a CPU texture upload fallback;
3480    /// with the relaxation, EGL import succeeds at the driver level and
3481    /// the GL fast path runs. Output correctness is checked against a
3482    /// CPU reference (convert ran with `EDGEFIRST_FORCE_BACKEND=cpu`).
3483    #[test]
3484    #[cfg(target_os = "linux")]
3485    #[cfg(feature = "opengl")]
3486    fn test_convert_rgba_non_4_aligned_width_end_to_end() {
3487        use edgefirst_tensor::is_dma_available;
3488        if !is_dma_available() {
3489            eprintln!(
3490                "SKIPPED: test_convert_rgba_non_4_aligned_width_end_to_end — DMA not available"
3491            );
3492            return;
3493        }
3494        // 375 is the canonical failure width from dataset loaders —
3495        // 375 * 4 = 1500 bytes/row, pitch-padded to 1536. Width%4 = 3,
3496        // so the old pre-check rejected it; new code accepts it.
3497        let jpeg = make_rgb_jpeg(375, 333);
3498        let src_gl = crate::load_jpeg(&jpeg, Some(PixelFormat::Rgba), None).unwrap();
3499        assert_eq!(src_gl.width(), Some(375));
3500        // Row stride must still be pitch-padded (separate concern from width).
3501        let stride = src_gl.row_stride().unwrap();
3502        assert_eq!(stride, 1536, "expected padded pitch 1536, got {stride}");
3503
3504        // GL-backed convert into a pitch-aligned 640×640 Rgba dest.
3505        let mut gl_proc = ImageProcessor::new().unwrap();
3506        let gl_dst = gl_proc
3507            .create_image(640, 640, PixelFormat::Rgba, DType::U8, None)
3508            .unwrap();
3509        let (r_gl, _src_gl, gl_dst) = convert_img(
3510            &mut gl_proc,
3511            src_gl,
3512            gl_dst,
3513            Rotation::None,
3514            Flip::None,
3515            Crop::no_crop(),
3516        );
3517        r_gl.expect("GL-backed convert must succeed for 375x333 Rgba src");
3518
3519        // CPU reference via a fresh load so the two paths start from
3520        // byte-identical inputs. `with_config(backend=Cpu)` forces the
3521        // CPU-only processor regardless of which backends the host has
3522        // available.
3523        let src_cpu =
3524            crate::load_jpeg(&jpeg, Some(PixelFormat::Rgba), Some(TensorMemory::Mem)).unwrap();
3525        let mut cpu_proc = ImageProcessor::with_config(ImageProcessorConfig {
3526            backend: ComputeBackend::Cpu,
3527            ..Default::default()
3528        })
3529        .unwrap();
3530        let cpu_dst = TensorDyn::image(
3531            640,
3532            640,
3533            PixelFormat::Rgba,
3534            DType::U8,
3535            Some(TensorMemory::Mem),
3536        )
3537        .unwrap();
3538        let (r_cpu, _src_cpu, cpu_dst) = convert_img(
3539            &mut cpu_proc,
3540            src_cpu,
3541            cpu_dst,
3542            Rotation::None,
3543            Flip::None,
3544            Crop::no_crop(),
3545        );
3546        r_cpu.unwrap();
3547
3548        // Structural similarity: the GL path may have gone through EGL
3549        // import OR fallen back to CPU texture upload — either way, the
3550        // output must match the CPU reference closely.
3551        compare_images(&gl_dst, &cpu_dst, 0.95, function!());
3552    }
3553
3554    /// Regression lock: loading a JPEG at a non-64-aligned RGBA pitch (e.g.
3555    /// 500×333 → natural pitch 2000, needs to be padded to 2048) must go
3556    /// through `image_with_stride` and set `row_stride()` / `effective_row_stride()`
3557    /// to the padded value. The earlier pitch-padding commit fixed this in
3558    /// `load_jpeg`; a regression would surface as `row_stride == None` or
3559    /// `effective_row_stride == 2000`.
3560    #[test]
3561    #[cfg(target_os = "linux")]
3562    fn test_load_jpeg_rgba_non_aligned_pitch_padded_dma() {
3563        use edgefirst_tensor::is_dma_available;
3564        if !is_dma_available() {
3565            eprintln!(
3566                "SKIPPED: test_load_jpeg_rgba_non_aligned_pitch_padded_dma — DMA not available"
3567            );
3568            return;
3569        }
3570        // Widths that force a non-64-aligned natural RGBA pitch. All three
3571        // are divisible by 4 so the EGL width-alignment pre-check passes.
3572        // The pitch-padding fix is what makes these importable at all.
3573        for &w in &[500u32, 612, 428] {
3574            let jpeg = make_rgb_jpeg(w, 333);
3575            let loaded = crate::load_jpeg(&jpeg, Some(PixelFormat::Rgba), None).unwrap();
3576            let natural = (w as usize) * 4;
3577            let aligned = crate::align_pitch_bytes_to_gpu_alignment(natural).unwrap();
3578            assert!(
3579                aligned > natural,
3580                "test sanity: width {w} should be unaligned"
3581            );
3582            let stride = loaded
3583                .row_stride()
3584                .expect("padded DMA path must set an explicit row_stride — regression if None");
3585            assert_eq!(
3586                stride, aligned,
3587                "width {w}: expected padded stride {aligned}, got {stride} \
3588                 (regression: pitch-padding branch skipped?)"
3589            );
3590            let eff = loaded.effective_row_stride().unwrap();
3591            assert_eq!(
3592                eff, aligned,
3593                "effective_row_stride must match stored stride"
3594            );
3595            assert_eq!(loaded.width(), Some(w as usize));
3596            assert_eq!(loaded.height(), Some(333));
3597        }
3598    }
3599
3600    /// `padded_dma_pitch_for` must respect the caller's memory choice and
3601    /// must NOT route into the pitch-padded DMA path when the caller left
3602    /// the choice to the allocator (`None`) but DMA is unavailable on the
3603    /// host. The padded path requires `image_with_stride`, which always
3604    /// allocates DMA — taking it on a system without `/dev/dma_heap`
3605    /// would convert a normally-working image load into a hard failure
3606    /// (since `Tensor::image(..., None)` would have fallen back to
3607    /// SHM/Mem).
3608    #[test]
3609    #[cfg(target_os = "linux")]
3610    fn test_padded_dma_pitch_for_respects_memory_choice() {
3611        use edgefirst_tensor::{is_dma_available, TensorMemory};
3612
3613        // 500×4 = 2000 → padded to 2048 by GPU alignment. Use it for
3614        // every case so any "no padding" answer is unambiguous.
3615        let unaligned_w = 500;
3616
3617        // Caller asks for Mem / Shm: never pad, regardless of DMA.
3618        assert_eq!(
3619            crate::padded_dma_pitch_for(PixelFormat::Rgba, unaligned_w, &Some(TensorMemory::Mem),),
3620            None,
3621            "Mem must never trigger DMA padding"
3622        );
3623        assert_eq!(
3624            crate::padded_dma_pitch_for(PixelFormat::Rgba, unaligned_w, &Some(TensorMemory::Shm),),
3625            None,
3626            "Shm must never trigger DMA padding"
3627        );
3628
3629        // Caller explicitly asks for DMA: always pad if width needs it.
3630        // Even if the runtime can't actually allocate DMA, the caller
3631        // owns that decision and the resulting allocation error is
3632        // their problem, not ours.
3633        assert_eq!(
3634            crate::padded_dma_pitch_for(PixelFormat::Rgba, unaligned_w, &Some(TensorMemory::Dma),),
3635            Some(2048),
3636            "explicit Dma must pad regardless of runtime DMA availability"
3637        );
3638
3639        // Caller leaves it to the allocator: behaviour depends on
3640        // host-runtime DMA availability. This is the case the fix
3641        // guards against.
3642        let none_result = crate::padded_dma_pitch_for(PixelFormat::Rgba, unaligned_w, &None);
3643        if is_dma_available() {
3644            assert_eq!(
3645                none_result,
3646                Some(2048),
3647                "memory=None + DMA available → pad (will route through DMA)"
3648            );
3649        } else {
3650            assert_eq!(
3651                none_result, None,
3652                "memory=None + DMA unavailable → must NOT pad (would force \
3653                 image_with_stride into a DMA-only allocation that fails). \
3654                 Regression: padded_dma_pitch_for ignored is_dma_available()."
3655            );
3656        }
3657    }
3658
3659    // Synthesise a small greyscale PNG in memory at `(width, height)` with a
3660    // deterministic ramp pattern so multiple tests can cross-check output
3661    // without bundling an extra fixture file.
3662    fn make_grey_png(width: u32, height: u32) -> Vec<u8> {
3663        let mut bytes = Vec::with_capacity((width * height) as usize);
3664        for y in 0..height {
3665            for x in 0..width {
3666                bytes.push(((x + y) & 0xFF) as u8);
3667            }
3668        }
3669        let img = image::GrayImage::from_vec(width, height, bytes).unwrap();
3670        let mut buf = Vec::new();
3671        img.write_to(&mut std::io::Cursor::new(&mut buf), image::ImageFormat::Png)
3672            .unwrap();
3673        buf
3674    }
3675
3676    /// Greyscale PNG with a width that forces a pitch-misaligned natural
3677    /// row stride (612 bytes is not a multiple of the 64-byte GPU pitch
3678    /// alignment) must still load via the pitch-padded DMA path. Gated on
3679    /// DMA availability because `image_with_stride` is DMA-only.
3680    #[test]
3681    #[cfg(target_os = "linux")]
3682    fn test_load_png_grey_misaligned_width_dma() {
3683        use edgefirst_tensor::is_dma_available;
3684        if !is_dma_available() {
3685            eprintln!("SKIPPED: test_load_png_grey_misaligned_width_dma — DMA not available");
3686            return;
3687        }
3688        let png = make_grey_png(612, 388);
3689        let loaded = crate::load_png(&png, Some(PixelFormat::Grey), None).unwrap();
3690        assert_eq!(loaded.width(), Some(612));
3691        assert_eq!(loaded.height(), Some(388));
3692        assert_eq!(loaded.format(), Some(PixelFormat::Grey));
3693
3694        // Round-trip pixels — natural-pitch DMA-BUFs pad the stride so we
3695        // must indirect through row_stride() rather than assume width.
3696        let map = loaded.as_u8().unwrap().map().unwrap();
3697        let stride = loaded.row_stride().unwrap_or(612);
3698        assert!(stride >= 612);
3699        let bytes: &[u8] = &map;
3700        for y in 0..388usize {
3701            for x in 0..612usize {
3702                let expected = ((x + y) & 0xFF) as u8;
3703                let got = bytes[y * stride + x];
3704                assert_eq!(
3705                    got, expected,
3706                    "grey png mismatch at ({x},{y}): got {got} expected {expected}"
3707                );
3708            }
3709        }
3710    }
3711
3712    /// Greyscale PNG loaded with explicit Mem backing — runs on any
3713    /// platform (no DMA permission requirement) and covers the
3714    /// decoder-native Luma → Grey no-conversion path.
3715    #[test]
3716    fn test_load_png_grey_mem() {
3717        use edgefirst_tensor::TensorMemory;
3718        let png = make_grey_png(612, 100);
3719        let loaded =
3720            crate::load_png(&png, Some(PixelFormat::Grey), Some(TensorMemory::Mem)).unwrap();
3721        assert_eq!(loaded.width(), Some(612));
3722        assert_eq!(loaded.height(), Some(100));
3723        assert_eq!(loaded.format(), Some(PixelFormat::Grey));
3724        let map = loaded.as_u8().unwrap().map().unwrap();
3725        let bytes: &[u8] = &map;
3726        // Mem allocation uses the natural pitch — 612 bytes per row, exact.
3727        assert_eq!(bytes.len(), 612 * 100);
3728        for y in 0..100 {
3729            for x in 0..612 {
3730                assert_eq!(bytes[y * 612 + x], ((x + y) & 0xFF) as u8);
3731            }
3732        }
3733    }
3734
3735    /// Greyscale PNG decoded into RGB — exercises the decoder-colorspace
3736    /// mismatch path (Luma → Rgb via CPU converter). Uses Mem memory to
3737    /// stay portable to host-side test environments.
3738    #[test]
3739    fn test_load_png_grey_to_rgb_mem() {
3740        use edgefirst_tensor::TensorMemory;
3741        let png = make_grey_png(620, 240);
3742        let loaded =
3743            crate::load_png(&png, Some(PixelFormat::Rgb), Some(TensorMemory::Mem)).unwrap();
3744        assert_eq!(loaded.width(), Some(620));
3745        assert_eq!(loaded.height(), Some(240));
3746        assert_eq!(loaded.format(), Some(PixelFormat::Rgb));
3747
3748        // Greyscale promoted to RGB replicates luma into each channel.
3749        let map = loaded.as_u8().unwrap().map().unwrap();
3750        let bytes: &[u8] = &map;
3751        for (x, y) in [(0usize, 0usize), (100, 50), (619, 239)] {
3752            let expected = ((x + y) & 0xFF) as u8;
3753            let off = (y * 620 + x) * 3;
3754            assert_eq!(bytes[off], expected, "R@{x},{y}");
3755            assert_eq!(bytes[off + 1], expected, "G@{x},{y}");
3756            assert_eq!(bytes[off + 2], expected, "B@{x},{y}");
3757        }
3758    }
3759
3760    #[test]
3761    #[cfg(target_os = "linux")]
3762    fn test_g2d_resize() {
3763        if !is_g2d_available() {
3764            eprintln!("SKIPPED: test_g2d_resize - G2D library (libg2d.so.2) not available");
3765            return;
3766        }
3767        if !is_dma_available() {
3768            eprintln!(
3769                "SKIPPED: test_g2d_resize - DMA memory allocation not available (permission denied or no DMA-BUF support)"
3770            );
3771            return;
3772        }
3773
3774        let dst_width = 640;
3775        let dst_height = 360;
3776        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
3777        let src =
3778            crate::load_image(&file, Some(PixelFormat::Rgba), Some(TensorMemory::Dma)).unwrap();
3779
3780        let g2d_dst = TensorDyn::image(
3781            dst_width,
3782            dst_height,
3783            PixelFormat::Rgba,
3784            DType::U8,
3785            Some(TensorMemory::Dma),
3786        )
3787        .unwrap();
3788        let mut g2d_converter = G2DProcessor::new().unwrap();
3789        let (result, src, g2d_dst) = convert_img(
3790            &mut g2d_converter,
3791            src,
3792            g2d_dst,
3793            Rotation::None,
3794            Flip::None,
3795            Crop::no_crop(),
3796        );
3797        result.unwrap();
3798
3799        let cpu_dst =
3800            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
3801        let mut cpu_converter = CPUProcessor::new();
3802        let (result, _src, cpu_dst) = convert_img(
3803            &mut cpu_converter,
3804            src,
3805            cpu_dst,
3806            Rotation::None,
3807            Flip::None,
3808            Crop::no_crop(),
3809        );
3810        result.unwrap();
3811
3812        compare_images(&g2d_dst, &cpu_dst, 0.98, function!());
3813    }
3814
3815    #[test]
3816    #[cfg(target_os = "linux")]
3817    #[cfg(feature = "opengl")]
3818    fn test_opengl_resize() {
3819        if !is_opengl_available() {
3820            eprintln!("SKIPPED: {} - OpenGL not available", function!());
3821            return;
3822        }
3823
3824        let dst_width = 640;
3825        let dst_height = 360;
3826        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
3827        let src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
3828
3829        let cpu_dst =
3830            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
3831        let mut cpu_converter = CPUProcessor::new();
3832        let (result, src, cpu_dst) = convert_img(
3833            &mut cpu_converter,
3834            src,
3835            cpu_dst,
3836            Rotation::None,
3837            Flip::None,
3838            Crop::no_crop(),
3839        );
3840        result.unwrap();
3841
3842        let mut src = src;
3843        let mut gl_converter = GLProcessorThreaded::new(None).unwrap();
3844
3845        for _ in 0..5 {
3846            let gl_dst =
3847                TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None)
3848                    .unwrap();
3849            let (result, src_back, gl_dst) = convert_img(
3850                &mut gl_converter,
3851                src,
3852                gl_dst,
3853                Rotation::None,
3854                Flip::None,
3855                Crop::no_crop(),
3856            );
3857            result.unwrap();
3858            src = src_back;
3859
3860            compare_images(&gl_dst, &cpu_dst, 0.98, function!());
3861        }
3862    }
3863
3864    #[test]
3865    #[cfg(target_os = "linux")]
3866    #[cfg(feature = "opengl")]
3867    fn test_opengl_10_threads() {
3868        if !is_opengl_available() {
3869            eprintln!("SKIPPED: {} - OpenGL not available", function!());
3870            return;
3871        }
3872
3873        let handles: Vec<_> = (0..10)
3874            .map(|i| {
3875                std::thread::Builder::new()
3876                    .name(format!("Thread {i}"))
3877                    .spawn(test_opengl_resize)
3878                    .unwrap()
3879            })
3880            .collect();
3881        handles.into_iter().for_each(|h| {
3882            if let Err(e) = h.join() {
3883                std::panic::resume_unwind(e)
3884            }
3885        });
3886    }
3887
3888    #[test]
3889    #[cfg(target_os = "linux")]
3890    #[cfg(feature = "opengl")]
3891    fn test_opengl_grey() {
3892        if !is_opengl_available() {
3893            eprintln!("SKIPPED: {} - OpenGL not available", function!());
3894            return;
3895        }
3896
3897        let img = crate::load_image(
3898            &edgefirst_bench::testdata::read("grey.jpg"),
3899            Some(PixelFormat::Grey),
3900            None,
3901        )
3902        .unwrap();
3903
3904        let gl_dst = TensorDyn::image(640, 640, PixelFormat::Grey, DType::U8, None).unwrap();
3905        let cpu_dst = TensorDyn::image(640, 640, PixelFormat::Grey, DType::U8, None).unwrap();
3906
3907        let mut converter = CPUProcessor::new();
3908
3909        let (result, img, cpu_dst) = convert_img(
3910            &mut converter,
3911            img,
3912            cpu_dst,
3913            Rotation::None,
3914            Flip::None,
3915            Crop::no_crop(),
3916        );
3917        result.unwrap();
3918
3919        let mut gl = GLProcessorThreaded::new(None).unwrap();
3920        let (result, _img, gl_dst) = convert_img(
3921            &mut gl,
3922            img,
3923            gl_dst,
3924            Rotation::None,
3925            Flip::None,
3926            Crop::no_crop(),
3927        );
3928        result.unwrap();
3929
3930        compare_images(&gl_dst, &cpu_dst, 0.98, function!());
3931    }
3932
3933    #[test]
3934    #[cfg(target_os = "linux")]
3935    fn test_g2d_src_crop() {
3936        if !is_g2d_available() {
3937            eprintln!("SKIPPED: test_g2d_src_crop - G2D library (libg2d.so.2) not available");
3938            return;
3939        }
3940        if !is_dma_available() {
3941            eprintln!(
3942                "SKIPPED: test_g2d_src_crop - DMA memory allocation not available (permission denied or no DMA-BUF support)"
3943            );
3944            return;
3945        }
3946
3947        let dst_width = 640;
3948        let dst_height = 640;
3949        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
3950        let src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
3951
3952        let cpu_dst =
3953            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
3954        let mut cpu_converter = CPUProcessor::new();
3955        let crop = Crop {
3956            src_rect: Some(Rect {
3957                left: 0,
3958                top: 0,
3959                width: 640,
3960                height: 360,
3961            }),
3962            dst_rect: None,
3963            dst_color: None,
3964        };
3965        let (result, src, cpu_dst) = convert_img(
3966            &mut cpu_converter,
3967            src,
3968            cpu_dst,
3969            Rotation::None,
3970            Flip::None,
3971            crop,
3972        );
3973        result.unwrap();
3974
3975        let g2d_dst =
3976            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
3977        let mut g2d_converter = G2DProcessor::new().unwrap();
3978        let (result, _src, g2d_dst) = convert_img(
3979            &mut g2d_converter,
3980            src,
3981            g2d_dst,
3982            Rotation::None,
3983            Flip::None,
3984            crop,
3985        );
3986        result.unwrap();
3987
3988        compare_images(&g2d_dst, &cpu_dst, 0.98, function!());
3989    }
3990
3991    #[test]
3992    #[cfg(target_os = "linux")]
3993    fn test_g2d_dst_crop() {
3994        if !is_g2d_available() {
3995            eprintln!("SKIPPED: test_g2d_dst_crop - G2D library (libg2d.so.2) not available");
3996            return;
3997        }
3998        if !is_dma_available() {
3999            eprintln!(
4000                "SKIPPED: test_g2d_dst_crop - DMA memory allocation not available (permission denied or no DMA-BUF support)"
4001            );
4002            return;
4003        }
4004
4005        let dst_width = 640;
4006        let dst_height = 640;
4007        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
4008        let src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
4009
4010        let cpu_dst =
4011            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
4012        let mut cpu_converter = CPUProcessor::new();
4013        let crop = Crop {
4014            src_rect: None,
4015            dst_rect: Some(Rect::new(100, 100, 512, 288)),
4016            dst_color: None,
4017        };
4018        let (result, src, cpu_dst) = convert_img(
4019            &mut cpu_converter,
4020            src,
4021            cpu_dst,
4022            Rotation::None,
4023            Flip::None,
4024            crop,
4025        );
4026        result.unwrap();
4027
4028        let g2d_dst =
4029            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
4030        let mut g2d_converter = G2DProcessor::new().unwrap();
4031        let (result, _src, g2d_dst) = convert_img(
4032            &mut g2d_converter,
4033            src,
4034            g2d_dst,
4035            Rotation::None,
4036            Flip::None,
4037            crop,
4038        );
4039        result.unwrap();
4040
4041        compare_images(&g2d_dst, &cpu_dst, 0.98, function!());
4042    }
4043
4044    #[test]
4045    #[cfg(target_os = "linux")]
4046    fn test_g2d_all_rgba() {
4047        if !is_g2d_available() {
4048            eprintln!("SKIPPED: test_g2d_all_rgba - G2D library (libg2d.so.2) not available");
4049            return;
4050        }
4051        if !is_dma_available() {
4052            eprintln!(
4053                "SKIPPED: test_g2d_all_rgba - DMA memory allocation not available (permission denied or no DMA-BUF support)"
4054            );
4055            return;
4056        }
4057
4058        let dst_width = 640;
4059        let dst_height = 640;
4060        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
4061        let src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
4062        let src_dyn = src;
4063
4064        let mut cpu_dst =
4065            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
4066        let mut cpu_converter = CPUProcessor::new();
4067        let mut g2d_dst =
4068            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
4069        let mut g2d_converter = G2DProcessor::new().unwrap();
4070
4071        let crop = Crop {
4072            src_rect: Some(Rect::new(50, 120, 1024, 576)),
4073            dst_rect: Some(Rect::new(100, 100, 512, 288)),
4074            dst_color: None,
4075        };
4076
4077        for rot in [
4078            Rotation::None,
4079            Rotation::Clockwise90,
4080            Rotation::Rotate180,
4081            Rotation::CounterClockwise90,
4082        ] {
4083            cpu_dst
4084                .as_u8()
4085                .unwrap()
4086                .map()
4087                .unwrap()
4088                .as_mut_slice()
4089                .fill(114);
4090            g2d_dst
4091                .as_u8()
4092                .unwrap()
4093                .map()
4094                .unwrap()
4095                .as_mut_slice()
4096                .fill(114);
4097            for flip in [Flip::None, Flip::Horizontal, Flip::Vertical] {
4098                let mut cpu_dst_dyn = cpu_dst;
4099                cpu_converter
4100                    .convert(&src_dyn, &mut cpu_dst_dyn, Rotation::None, Flip::None, crop)
4101                    .unwrap();
4102                cpu_dst = {
4103                    let mut __t = cpu_dst_dyn.into_u8().unwrap();
4104                    __t.set_format(PixelFormat::Rgba).unwrap();
4105                    TensorDyn::from(__t)
4106                };
4107
4108                let mut g2d_dst_dyn = g2d_dst;
4109                g2d_converter
4110                    .convert(&src_dyn, &mut g2d_dst_dyn, Rotation::None, Flip::None, crop)
4111                    .unwrap();
4112                g2d_dst = {
4113                    let mut __t = g2d_dst_dyn.into_u8().unwrap();
4114                    __t.set_format(PixelFormat::Rgba).unwrap();
4115                    TensorDyn::from(__t)
4116                };
4117
4118                compare_images(
4119                    &g2d_dst,
4120                    &cpu_dst,
4121                    0.98,
4122                    &format!("{} {:?} {:?}", function!(), rot, flip),
4123                );
4124            }
4125        }
4126    }
4127
4128    #[test]
4129    #[cfg(target_os = "linux")]
4130    #[cfg(feature = "opengl")]
4131    fn test_opengl_src_crop() {
4132        if !is_opengl_available() {
4133            eprintln!("SKIPPED: {} - OpenGL not available", function!());
4134            return;
4135        }
4136
4137        let dst_width = 640;
4138        let dst_height = 360;
4139        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
4140        let src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
4141        let crop = Crop {
4142            src_rect: Some(Rect {
4143                left: 320,
4144                top: 180,
4145                width: 1280 - 320,
4146                height: 720 - 180,
4147            }),
4148            dst_rect: None,
4149            dst_color: None,
4150        };
4151
4152        let cpu_dst =
4153            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
4154        let mut cpu_converter = CPUProcessor::new();
4155        let (result, src, cpu_dst) = convert_img(
4156            &mut cpu_converter,
4157            src,
4158            cpu_dst,
4159            Rotation::None,
4160            Flip::None,
4161            crop,
4162        );
4163        result.unwrap();
4164
4165        let gl_dst =
4166            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
4167        let mut gl_converter = GLProcessorThreaded::new(None).unwrap();
4168        let (result, _src, gl_dst) = convert_img(
4169            &mut gl_converter,
4170            src,
4171            gl_dst,
4172            Rotation::None,
4173            Flip::None,
4174            crop,
4175        );
4176        result.unwrap();
4177
4178        compare_images(&gl_dst, &cpu_dst, 0.98, function!());
4179    }
4180
4181    #[test]
4182    #[cfg(target_os = "linux")]
4183    #[cfg(feature = "opengl")]
4184    fn test_opengl_dst_crop() {
4185        if !is_opengl_available() {
4186            eprintln!("SKIPPED: {} - OpenGL not available", function!());
4187            return;
4188        }
4189
4190        let dst_width = 640;
4191        let dst_height = 640;
4192        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
4193        let src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
4194
4195        let cpu_dst =
4196            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
4197        let mut cpu_converter = CPUProcessor::new();
4198        let crop = Crop {
4199            src_rect: None,
4200            dst_rect: Some(Rect::new(100, 100, 512, 288)),
4201            dst_color: None,
4202        };
4203        let (result, src, cpu_dst) = convert_img(
4204            &mut cpu_converter,
4205            src,
4206            cpu_dst,
4207            Rotation::None,
4208            Flip::None,
4209            crop,
4210        );
4211        result.unwrap();
4212
4213        let gl_dst =
4214            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
4215        let mut gl_converter = GLProcessorThreaded::new(None).unwrap();
4216        let (result, _src, gl_dst) = convert_img(
4217            &mut gl_converter,
4218            src,
4219            gl_dst,
4220            Rotation::None,
4221            Flip::None,
4222            crop,
4223        );
4224        result.unwrap();
4225
4226        compare_images(&gl_dst, &cpu_dst, 0.98, function!());
4227    }
4228
4229    #[test]
4230    #[cfg(target_os = "linux")]
4231    #[cfg(feature = "opengl")]
4232    fn test_opengl_all_rgba() {
4233        if !is_opengl_available() {
4234            eprintln!("SKIPPED: {} - OpenGL not available", function!());
4235            return;
4236        }
4237
4238        let dst_width = 640;
4239        let dst_height = 640;
4240        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
4241
4242        let mut cpu_converter = CPUProcessor::new();
4243
4244        let mut gl_converter = GLProcessorThreaded::new(None).unwrap();
4245
4246        let mut mem = vec![None, Some(TensorMemory::Mem), Some(TensorMemory::Shm)];
4247        if is_dma_available() {
4248            mem.push(Some(TensorMemory::Dma));
4249        }
4250        let crop = Crop {
4251            src_rect: Some(Rect::new(50, 120, 1024, 576)),
4252            dst_rect: Some(Rect::new(100, 100, 512, 288)),
4253            dst_color: None,
4254        };
4255        for m in mem {
4256            let src = crate::load_image(&file, Some(PixelFormat::Rgba), m).unwrap();
4257            let src_dyn = src;
4258
4259            for rot in [
4260                Rotation::None,
4261                Rotation::Clockwise90,
4262                Rotation::Rotate180,
4263                Rotation::CounterClockwise90,
4264            ] {
4265                for flip in [Flip::None, Flip::Horizontal, Flip::Vertical] {
4266                    let cpu_dst =
4267                        TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, m)
4268                            .unwrap();
4269                    let gl_dst =
4270                        TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, m)
4271                            .unwrap();
4272                    cpu_dst
4273                        .as_u8()
4274                        .unwrap()
4275                        .map()
4276                        .unwrap()
4277                        .as_mut_slice()
4278                        .fill(114);
4279                    gl_dst
4280                        .as_u8()
4281                        .unwrap()
4282                        .map()
4283                        .unwrap()
4284                        .as_mut_slice()
4285                        .fill(114);
4286
4287                    let mut cpu_dst_dyn = cpu_dst;
4288                    cpu_converter
4289                        .convert(&src_dyn, &mut cpu_dst_dyn, Rotation::None, Flip::None, crop)
4290                        .unwrap();
4291                    let cpu_dst = {
4292                        let mut __t = cpu_dst_dyn.into_u8().unwrap();
4293                        __t.set_format(PixelFormat::Rgba).unwrap();
4294                        TensorDyn::from(__t)
4295                    };
4296
4297                    let mut gl_dst_dyn = gl_dst;
4298                    gl_converter
4299                        .convert(&src_dyn, &mut gl_dst_dyn, Rotation::None, Flip::None, crop)
4300                        .map_err(|e| {
4301                            log::error!("error mem {m:?} rot {rot:?} error: {e:?}");
4302                            e
4303                        })
4304                        .unwrap();
4305                    let gl_dst = {
4306                        let mut __t = gl_dst_dyn.into_u8().unwrap();
4307                        __t.set_format(PixelFormat::Rgba).unwrap();
4308                        TensorDyn::from(__t)
4309                    };
4310
4311                    compare_images(
4312                        &gl_dst,
4313                        &cpu_dst,
4314                        0.98,
4315                        &format!("{} {:?} {:?}", function!(), rot, flip),
4316                    );
4317                }
4318            }
4319        }
4320    }
4321
4322    #[test]
4323    #[cfg(target_os = "linux")]
4324    fn test_cpu_rotate() {
4325        for rot in [
4326            Rotation::Clockwise90,
4327            Rotation::Rotate180,
4328            Rotation::CounterClockwise90,
4329        ] {
4330            test_cpu_rotate_(rot);
4331        }
4332    }
4333
4334    #[cfg(target_os = "linux")]
4335    fn test_cpu_rotate_(rot: Rotation) {
4336        // This test rotates the image 4 times and checks that the image was returned to
4337        // be the same Currently doesn't check if rotations actually rotated in
4338        // right direction
4339        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
4340
4341        let unchanged_src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
4342        let src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
4343
4344        let (dst_width, dst_height) = match rot {
4345            Rotation::None | Rotation::Rotate180 => (src.width().unwrap(), src.height().unwrap()),
4346            Rotation::Clockwise90 | Rotation::CounterClockwise90 => {
4347                (src.height().unwrap(), src.width().unwrap())
4348            }
4349        };
4350
4351        let cpu_dst =
4352            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
4353        let mut cpu_converter = CPUProcessor::new();
4354
4355        // After rotating 4 times, the image should be the same as the original
4356
4357        let (result, src, cpu_dst) = convert_img(
4358            &mut cpu_converter,
4359            src,
4360            cpu_dst,
4361            rot,
4362            Flip::None,
4363            Crop::no_crop(),
4364        );
4365        result.unwrap();
4366
4367        let (result, cpu_dst, src) = convert_img(
4368            &mut cpu_converter,
4369            cpu_dst,
4370            src,
4371            rot,
4372            Flip::None,
4373            Crop::no_crop(),
4374        );
4375        result.unwrap();
4376
4377        let (result, src, cpu_dst) = convert_img(
4378            &mut cpu_converter,
4379            src,
4380            cpu_dst,
4381            rot,
4382            Flip::None,
4383            Crop::no_crop(),
4384        );
4385        result.unwrap();
4386
4387        let (result, _cpu_dst, src) = convert_img(
4388            &mut cpu_converter,
4389            cpu_dst,
4390            src,
4391            rot,
4392            Flip::None,
4393            Crop::no_crop(),
4394        );
4395        result.unwrap();
4396
4397        compare_images(&src, &unchanged_src, 0.98, function!());
4398    }
4399
4400    #[test]
4401    #[cfg(target_os = "linux")]
4402    #[cfg(feature = "opengl")]
4403    fn test_opengl_rotate() {
4404        if !is_opengl_available() {
4405            eprintln!("SKIPPED: {} - OpenGL not available", function!());
4406            return;
4407        }
4408
4409        let size = (1280, 720);
4410        let mut mem = vec![None, Some(TensorMemory::Shm), Some(TensorMemory::Mem)];
4411
4412        if is_dma_available() {
4413            mem.push(Some(TensorMemory::Dma));
4414        }
4415        for m in mem {
4416            for rot in [
4417                Rotation::Clockwise90,
4418                Rotation::Rotate180,
4419                Rotation::CounterClockwise90,
4420            ] {
4421                test_opengl_rotate_(size, rot, m);
4422            }
4423        }
4424    }
4425
4426    #[cfg(target_os = "linux")]
4427    #[cfg(feature = "opengl")]
4428    fn test_opengl_rotate_(
4429        size: (usize, usize),
4430        rot: Rotation,
4431        tensor_memory: Option<TensorMemory>,
4432    ) {
4433        let (dst_width, dst_height) = match rot {
4434            Rotation::None | Rotation::Rotate180 => size,
4435            Rotation::Clockwise90 | Rotation::CounterClockwise90 => (size.1, size.0),
4436        };
4437
4438        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
4439        let src = crate::load_image(&file, Some(PixelFormat::Rgba), tensor_memory).unwrap();
4440
4441        let cpu_dst =
4442            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
4443        let mut cpu_converter = CPUProcessor::new();
4444
4445        let (result, mut src, cpu_dst) = convert_img(
4446            &mut cpu_converter,
4447            src,
4448            cpu_dst,
4449            rot,
4450            Flip::None,
4451            Crop::no_crop(),
4452        );
4453        result.unwrap();
4454
4455        let mut gl_converter = GLProcessorThreaded::new(None).unwrap();
4456
4457        for _ in 0..5 {
4458            let gl_dst = TensorDyn::image(
4459                dst_width,
4460                dst_height,
4461                PixelFormat::Rgba,
4462                DType::U8,
4463                tensor_memory,
4464            )
4465            .unwrap();
4466            let (result, src_back, gl_dst) = convert_img(
4467                &mut gl_converter,
4468                src,
4469                gl_dst,
4470                rot,
4471                Flip::None,
4472                Crop::no_crop(),
4473            );
4474            result.unwrap();
4475            src = src_back;
4476            compare_images(&gl_dst, &cpu_dst, 0.98, function!());
4477        }
4478    }
4479
4480    #[test]
4481    #[cfg(target_os = "linux")]
4482    fn test_g2d_rotate() {
4483        if !is_g2d_available() {
4484            eprintln!("SKIPPED: test_g2d_rotate - G2D library (libg2d.so.2) not available");
4485            return;
4486        }
4487        if !is_dma_available() {
4488            eprintln!(
4489                "SKIPPED: test_g2d_rotate - DMA memory allocation not available (permission denied or no DMA-BUF support)"
4490            );
4491            return;
4492        }
4493
4494        let size = (1280, 720);
4495        for rot in [
4496            Rotation::Clockwise90,
4497            Rotation::Rotate180,
4498            Rotation::CounterClockwise90,
4499        ] {
4500            test_g2d_rotate_(size, rot);
4501        }
4502    }
4503
4504    #[cfg(target_os = "linux")]
4505    fn test_g2d_rotate_(size: (usize, usize), rot: Rotation) {
4506        let (dst_width, dst_height) = match rot {
4507            Rotation::None | Rotation::Rotate180 => size,
4508            Rotation::Clockwise90 | Rotation::CounterClockwise90 => (size.1, size.0),
4509        };
4510
4511        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
4512        let src =
4513            crate::load_image(&file, Some(PixelFormat::Rgba), Some(TensorMemory::Dma)).unwrap();
4514
4515        let cpu_dst =
4516            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
4517        let mut cpu_converter = CPUProcessor::new();
4518
4519        let (result, src, cpu_dst) = convert_img(
4520            &mut cpu_converter,
4521            src,
4522            cpu_dst,
4523            rot,
4524            Flip::None,
4525            Crop::no_crop(),
4526        );
4527        result.unwrap();
4528
4529        let g2d_dst = TensorDyn::image(
4530            dst_width,
4531            dst_height,
4532            PixelFormat::Rgba,
4533            DType::U8,
4534            Some(TensorMemory::Dma),
4535        )
4536        .unwrap();
4537        let mut g2d_converter = G2DProcessor::new().unwrap();
4538
4539        let (result, _src, g2d_dst) = convert_img(
4540            &mut g2d_converter,
4541            src,
4542            g2d_dst,
4543            rot,
4544            Flip::None,
4545            Crop::no_crop(),
4546        );
4547        result.unwrap();
4548
4549        compare_images(&g2d_dst, &cpu_dst, 0.98, function!());
4550    }
4551
4552    #[test]
4553    fn test_rgba_to_yuyv_resize_cpu() {
4554        let src = load_bytes_to_tensor(
4555            1280,
4556            720,
4557            PixelFormat::Rgba,
4558            None,
4559            &edgefirst_bench::testdata::read("camera720p.rgba"),
4560        )
4561        .unwrap();
4562
4563        let (dst_width, dst_height) = (640, 360);
4564
4565        let dst =
4566            TensorDyn::image(dst_width, dst_height, PixelFormat::Yuyv, DType::U8, None).unwrap();
4567
4568        let dst_through_yuyv =
4569            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
4570        let dst_direct =
4571            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
4572
4573        let mut cpu_converter = CPUProcessor::new();
4574
4575        let (result, src, dst) = convert_img(
4576            &mut cpu_converter,
4577            src,
4578            dst,
4579            Rotation::None,
4580            Flip::None,
4581            Crop::no_crop(),
4582        );
4583        result.unwrap();
4584
4585        let (result, _dst, dst_through_yuyv) = convert_img(
4586            &mut cpu_converter,
4587            dst,
4588            dst_through_yuyv,
4589            Rotation::None,
4590            Flip::None,
4591            Crop::no_crop(),
4592        );
4593        result.unwrap();
4594
4595        let (result, _src, dst_direct) = convert_img(
4596            &mut cpu_converter,
4597            src,
4598            dst_direct,
4599            Rotation::None,
4600            Flip::None,
4601            Crop::no_crop(),
4602        );
4603        result.unwrap();
4604
4605        compare_images(&dst_through_yuyv, &dst_direct, 0.98, function!());
4606    }
4607
4608    #[test]
4609    #[cfg(target_os = "linux")]
4610    #[cfg(feature = "opengl")]
4611    #[ignore = "opengl doesn't support rendering to PixelFormat::Yuyv texture"]
4612    fn test_rgba_to_yuyv_resize_opengl() {
4613        if !is_opengl_available() {
4614            eprintln!("SKIPPED: {} - OpenGL not available", function!());
4615            return;
4616        }
4617
4618        if !is_dma_available() {
4619            eprintln!(
4620                "SKIPPED: {} - DMA memory allocation not available (permission denied or no DMA-BUF support)",
4621                function!()
4622            );
4623            return;
4624        }
4625
4626        let src = load_bytes_to_tensor(
4627            1280,
4628            720,
4629            PixelFormat::Rgba,
4630            None,
4631            &edgefirst_bench::testdata::read("camera720p.rgba"),
4632        )
4633        .unwrap();
4634
4635        let (dst_width, dst_height) = (640, 360);
4636
4637        let dst = TensorDyn::image(
4638            dst_width,
4639            dst_height,
4640            PixelFormat::Yuyv,
4641            DType::U8,
4642            Some(TensorMemory::Dma),
4643        )
4644        .unwrap();
4645
4646        let mut gl_converter = GLProcessorThreaded::new(None).unwrap();
4647
4648        let (result, src, dst) = convert_img(
4649            &mut gl_converter,
4650            src,
4651            dst,
4652            Rotation::None,
4653            Flip::None,
4654            Crop::new()
4655                .with_dst_rect(Some(Rect::new(100, 100, 100, 100)))
4656                .with_dst_color(Some([255, 255, 255, 255])),
4657        );
4658        result.unwrap();
4659
4660        std::fs::write(
4661            "rgba_to_yuyv_opengl.yuyv",
4662            dst.as_u8().unwrap().map().unwrap().as_slice(),
4663        )
4664        .unwrap();
4665        let cpu_dst = TensorDyn::image(
4666            dst_width,
4667            dst_height,
4668            PixelFormat::Yuyv,
4669            DType::U8,
4670            Some(TensorMemory::Dma),
4671        )
4672        .unwrap();
4673        let (result, _src, cpu_dst) = convert_img(
4674            &mut CPUProcessor::new(),
4675            src,
4676            cpu_dst,
4677            Rotation::None,
4678            Flip::None,
4679            Crop::no_crop(),
4680        );
4681        result.unwrap();
4682
4683        compare_images_convert_to_rgb(&dst, &cpu_dst, 0.98, function!());
4684    }
4685
4686    #[test]
4687    #[cfg(target_os = "linux")]
4688    fn test_rgba_to_yuyv_resize_g2d() {
4689        if !is_g2d_available() {
4690            eprintln!(
4691                "SKIPPED: test_rgba_to_yuyv_resize_g2d - G2D library (libg2d.so.2) not available"
4692            );
4693            return;
4694        }
4695        if !is_dma_available() {
4696            eprintln!(
4697                "SKIPPED: test_rgba_to_yuyv_resize_g2d - DMA memory allocation not available (permission denied or no DMA-BUF support)"
4698            );
4699            return;
4700        }
4701
4702        let src = load_bytes_to_tensor(
4703            1280,
4704            720,
4705            PixelFormat::Rgba,
4706            Some(TensorMemory::Dma),
4707            &edgefirst_bench::testdata::read("camera720p.rgba"),
4708        )
4709        .unwrap();
4710
4711        let (dst_width, dst_height) = (1280, 720);
4712
4713        let cpu_dst = TensorDyn::image(
4714            dst_width,
4715            dst_height,
4716            PixelFormat::Yuyv,
4717            DType::U8,
4718            Some(TensorMemory::Dma),
4719        )
4720        .unwrap();
4721
4722        let g2d_dst = TensorDyn::image(
4723            dst_width,
4724            dst_height,
4725            PixelFormat::Yuyv,
4726            DType::U8,
4727            Some(TensorMemory::Dma),
4728        )
4729        .unwrap();
4730
4731        let mut g2d_converter = G2DProcessor::new().unwrap();
4732        let crop = Crop {
4733            src_rect: None,
4734            dst_rect: Some(Rect::new(100, 100, 2, 2)),
4735            dst_color: None,
4736        };
4737
4738        g2d_dst
4739            .as_u8()
4740            .unwrap()
4741            .map()
4742            .unwrap()
4743            .as_mut_slice()
4744            .fill(128);
4745        let (result, src, g2d_dst) = convert_img(
4746            &mut g2d_converter,
4747            src,
4748            g2d_dst,
4749            Rotation::None,
4750            Flip::None,
4751            crop,
4752        );
4753        result.unwrap();
4754
4755        let cpu_dst_img = cpu_dst;
4756        cpu_dst_img
4757            .as_u8()
4758            .unwrap()
4759            .map()
4760            .unwrap()
4761            .as_mut_slice()
4762            .fill(128);
4763        let (result, _src, cpu_dst) = convert_img(
4764            &mut CPUProcessor::new(),
4765            src,
4766            cpu_dst_img,
4767            Rotation::None,
4768            Flip::None,
4769            crop,
4770        );
4771        result.unwrap();
4772
4773        compare_images_convert_to_rgb(&cpu_dst, &g2d_dst, 0.98, function!());
4774    }
4775
4776    #[test]
4777    fn test_yuyv_to_rgba_cpu() {
4778        let file = edgefirst_bench::testdata::read("camera720p.yuyv").to_vec();
4779        let src = TensorDyn::image(1280, 720, PixelFormat::Yuyv, DType::U8, None).unwrap();
4780        src.as_u8()
4781            .unwrap()
4782            .map()
4783            .unwrap()
4784            .as_mut_slice()
4785            .copy_from_slice(&file);
4786
4787        let dst = TensorDyn::image(1280, 720, PixelFormat::Rgba, DType::U8, None).unwrap();
4788        let mut cpu_converter = CPUProcessor::new();
4789
4790        let (result, _src, dst) = convert_img(
4791            &mut cpu_converter,
4792            src,
4793            dst,
4794            Rotation::None,
4795            Flip::None,
4796            Crop::no_crop(),
4797        );
4798        result.unwrap();
4799
4800        let target_image = TensorDyn::image(1280, 720, PixelFormat::Rgba, DType::U8, None).unwrap();
4801        target_image
4802            .as_u8()
4803            .unwrap()
4804            .map()
4805            .unwrap()
4806            .as_mut_slice()
4807            .copy_from_slice(&edgefirst_bench::testdata::read("camera720p.rgba"));
4808
4809        compare_images(&dst, &target_image, 0.98, function!());
4810    }
4811
4812    #[test]
4813    fn test_yuyv_to_rgb_cpu() {
4814        let file = edgefirst_bench::testdata::read("camera720p.yuyv").to_vec();
4815        let src = TensorDyn::image(1280, 720, PixelFormat::Yuyv, DType::U8, None).unwrap();
4816        src.as_u8()
4817            .unwrap()
4818            .map()
4819            .unwrap()
4820            .as_mut_slice()
4821            .copy_from_slice(&file);
4822
4823        let dst = TensorDyn::image(1280, 720, PixelFormat::Rgb, DType::U8, None).unwrap();
4824        let mut cpu_converter = CPUProcessor::new();
4825
4826        let (result, _src, dst) = convert_img(
4827            &mut cpu_converter,
4828            src,
4829            dst,
4830            Rotation::None,
4831            Flip::None,
4832            Crop::no_crop(),
4833        );
4834        result.unwrap();
4835
4836        let target_image = TensorDyn::image(1280, 720, PixelFormat::Rgb, DType::U8, None).unwrap();
4837        target_image
4838            .as_u8()
4839            .unwrap()
4840            .map()
4841            .unwrap()
4842            .as_mut_slice()
4843            .as_chunks_mut::<3>()
4844            .0
4845            .iter_mut()
4846            .zip(
4847                edgefirst_bench::testdata::read("camera720p.rgba")
4848                    .as_chunks::<4>()
4849                    .0,
4850            )
4851            .for_each(|(dst, src)| *dst = [src[0], src[1], src[2]]);
4852
4853        compare_images(&dst, &target_image, 0.98, function!());
4854    }
4855
4856    #[test]
4857    #[cfg(target_os = "linux")]
4858    fn test_yuyv_to_rgba_g2d() {
4859        if !is_g2d_available() {
4860            eprintln!("SKIPPED: test_yuyv_to_rgba_g2d - G2D library (libg2d.so.2) not available");
4861            return;
4862        }
4863        if !is_dma_available() {
4864            eprintln!(
4865                "SKIPPED: test_yuyv_to_rgba_g2d - DMA memory allocation not available (permission denied or no DMA-BUF support)"
4866            );
4867            return;
4868        }
4869
4870        let src = load_bytes_to_tensor(
4871            1280,
4872            720,
4873            PixelFormat::Yuyv,
4874            None,
4875            &edgefirst_bench::testdata::read("camera720p.yuyv"),
4876        )
4877        .unwrap();
4878
4879        let dst = TensorDyn::image(
4880            1280,
4881            720,
4882            PixelFormat::Rgba,
4883            DType::U8,
4884            Some(TensorMemory::Dma),
4885        )
4886        .unwrap();
4887        let mut g2d_converter = G2DProcessor::new().unwrap();
4888
4889        let (result, _src, dst) = convert_img(
4890            &mut g2d_converter,
4891            src,
4892            dst,
4893            Rotation::None,
4894            Flip::None,
4895            Crop::no_crop(),
4896        );
4897        result.unwrap();
4898
4899        let target_image = TensorDyn::image(1280, 720, PixelFormat::Rgba, DType::U8, None).unwrap();
4900        target_image
4901            .as_u8()
4902            .unwrap()
4903            .map()
4904            .unwrap()
4905            .as_mut_slice()
4906            .copy_from_slice(&edgefirst_bench::testdata::read("camera720p.rgba"));
4907
4908        compare_images(&dst, &target_image, 0.98, function!());
4909    }
4910
4911    #[test]
4912    #[cfg(target_os = "linux")]
4913    #[cfg(feature = "opengl")]
4914    fn test_yuyv_to_rgba_opengl() {
4915        if !is_opengl_available() {
4916            eprintln!("SKIPPED: {} - OpenGL not available", function!());
4917            return;
4918        }
4919        if !is_dma_available() {
4920            eprintln!(
4921                "SKIPPED: {} - DMA memory allocation not available (permission denied or no DMA-BUF support)",
4922                function!()
4923            );
4924            return;
4925        }
4926
4927        let src = load_bytes_to_tensor(
4928            1280,
4929            720,
4930            PixelFormat::Yuyv,
4931            Some(TensorMemory::Dma),
4932            &edgefirst_bench::testdata::read("camera720p.yuyv"),
4933        )
4934        .unwrap();
4935
4936        let dst = TensorDyn::image(
4937            1280,
4938            720,
4939            PixelFormat::Rgba,
4940            DType::U8,
4941            Some(TensorMemory::Dma),
4942        )
4943        .unwrap();
4944        let mut gl_converter = GLProcessorThreaded::new(None).unwrap();
4945
4946        let (result, _src, dst) = convert_img(
4947            &mut gl_converter,
4948            src,
4949            dst,
4950            Rotation::None,
4951            Flip::None,
4952            Crop::no_crop(),
4953        );
4954        result.unwrap();
4955
4956        let target_image = TensorDyn::image(1280, 720, PixelFormat::Rgba, DType::U8, None).unwrap();
4957        target_image
4958            .as_u8()
4959            .unwrap()
4960            .map()
4961            .unwrap()
4962            .as_mut_slice()
4963            .copy_from_slice(&edgefirst_bench::testdata::read("camera720p.rgba"));
4964
4965        compare_images(&dst, &target_image, 0.98, function!());
4966    }
4967
4968    #[test]
4969    #[cfg(target_os = "linux")]
4970    fn test_yuyv_to_rgb_g2d() {
4971        if !is_g2d_available() {
4972            eprintln!("SKIPPED: test_yuyv_to_rgb_g2d - G2D library (libg2d.so.2) not available");
4973            return;
4974        }
4975        if !is_dma_available() {
4976            eprintln!(
4977                "SKIPPED: test_yuyv_to_rgb_g2d - DMA memory allocation not available (permission denied or no DMA-BUF support)"
4978            );
4979            return;
4980        }
4981
4982        let src = load_bytes_to_tensor(
4983            1280,
4984            720,
4985            PixelFormat::Yuyv,
4986            None,
4987            &edgefirst_bench::testdata::read("camera720p.yuyv"),
4988        )
4989        .unwrap();
4990
4991        let g2d_dst = TensorDyn::image(
4992            1280,
4993            720,
4994            PixelFormat::Rgb,
4995            DType::U8,
4996            Some(TensorMemory::Dma),
4997        )
4998        .unwrap();
4999        let mut g2d_converter = G2DProcessor::new().unwrap();
5000
5001        let (result, src, g2d_dst) = convert_img(
5002            &mut g2d_converter,
5003            src,
5004            g2d_dst,
5005            Rotation::None,
5006            Flip::None,
5007            Crop::no_crop(),
5008        );
5009        result.unwrap();
5010
5011        let cpu_dst = TensorDyn::image(1280, 720, PixelFormat::Rgb, DType::U8, None).unwrap();
5012        let mut cpu_converter: CPUProcessor = CPUProcessor::new();
5013
5014        let (result, _src, cpu_dst) = convert_img(
5015            &mut cpu_converter,
5016            src,
5017            cpu_dst,
5018            Rotation::None,
5019            Flip::None,
5020            Crop::no_crop(),
5021        );
5022        result.unwrap();
5023
5024        compare_images(&g2d_dst, &cpu_dst, 0.98, function!());
5025    }
5026
5027    #[test]
5028    #[cfg(target_os = "linux")]
5029    fn test_yuyv_to_yuyv_resize_g2d() {
5030        if !is_g2d_available() {
5031            eprintln!(
5032                "SKIPPED: test_yuyv_to_yuyv_resize_g2d - G2D library (libg2d.so.2) not available"
5033            );
5034            return;
5035        }
5036        if !is_dma_available() {
5037            eprintln!(
5038                "SKIPPED: test_yuyv_to_yuyv_resize_g2d - DMA memory allocation not available (permission denied or no DMA-BUF support)"
5039            );
5040            return;
5041        }
5042
5043        let src = load_bytes_to_tensor(
5044            1280,
5045            720,
5046            PixelFormat::Yuyv,
5047            None,
5048            &edgefirst_bench::testdata::read("camera720p.yuyv"),
5049        )
5050        .unwrap();
5051
5052        let g2d_dst = TensorDyn::image(
5053            600,
5054            400,
5055            PixelFormat::Yuyv,
5056            DType::U8,
5057            Some(TensorMemory::Dma),
5058        )
5059        .unwrap();
5060        let mut g2d_converter = G2DProcessor::new().unwrap();
5061
5062        let (result, src, g2d_dst) = convert_img(
5063            &mut g2d_converter,
5064            src,
5065            g2d_dst,
5066            Rotation::None,
5067            Flip::None,
5068            Crop::no_crop(),
5069        );
5070        result.unwrap();
5071
5072        let cpu_dst = TensorDyn::image(600, 400, PixelFormat::Yuyv, DType::U8, None).unwrap();
5073        let mut cpu_converter: CPUProcessor = CPUProcessor::new();
5074
5075        let (result, _src, cpu_dst) = convert_img(
5076            &mut cpu_converter,
5077            src,
5078            cpu_dst,
5079            Rotation::None,
5080            Flip::None,
5081            Crop::no_crop(),
5082        );
5083        result.unwrap();
5084
5085        // TODO: compare PixelFormat::Yuyv and PixelFormat::Yuyv images without having to convert them to PixelFormat::Rgb
5086        compare_images_convert_to_rgb(&g2d_dst, &cpu_dst, 0.98, function!());
5087    }
5088
5089    #[test]
5090    fn test_yuyv_to_rgba_resize_cpu() {
5091        let src = load_bytes_to_tensor(
5092            1280,
5093            720,
5094            PixelFormat::Yuyv,
5095            None,
5096            &edgefirst_bench::testdata::read("camera720p.yuyv"),
5097        )
5098        .unwrap();
5099
5100        let (dst_width, dst_height) = (960, 540);
5101
5102        let dst =
5103            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
5104        let mut cpu_converter = CPUProcessor::new();
5105
5106        let (result, _src, dst) = convert_img(
5107            &mut cpu_converter,
5108            src,
5109            dst,
5110            Rotation::None,
5111            Flip::None,
5112            Crop::no_crop(),
5113        );
5114        result.unwrap();
5115
5116        let dst_target =
5117            TensorDyn::image(dst_width, dst_height, PixelFormat::Rgba, DType::U8, None).unwrap();
5118        let src_target = load_bytes_to_tensor(
5119            1280,
5120            720,
5121            PixelFormat::Rgba,
5122            None,
5123            &edgefirst_bench::testdata::read("camera720p.rgba"),
5124        )
5125        .unwrap();
5126        let (result, _src_target, dst_target) = convert_img(
5127            &mut cpu_converter,
5128            src_target,
5129            dst_target,
5130            Rotation::None,
5131            Flip::None,
5132            Crop::no_crop(),
5133        );
5134        result.unwrap();
5135
5136        compare_images(&dst, &dst_target, 0.98, function!());
5137    }
5138
5139    #[test]
5140    #[cfg(target_os = "linux")]
5141    fn test_yuyv_to_rgba_crop_flip_g2d() {
5142        if !is_g2d_available() {
5143            eprintln!(
5144                "SKIPPED: test_yuyv_to_rgba_crop_flip_g2d - G2D library (libg2d.so.2) not available"
5145            );
5146            return;
5147        }
5148        if !is_dma_available() {
5149            eprintln!(
5150                "SKIPPED: test_yuyv_to_rgba_crop_flip_g2d - DMA memory allocation not available (permission denied or no DMA-BUF support)"
5151            );
5152            return;
5153        }
5154
5155        let src = load_bytes_to_tensor(
5156            1280,
5157            720,
5158            PixelFormat::Yuyv,
5159            Some(TensorMemory::Dma),
5160            &edgefirst_bench::testdata::read("camera720p.yuyv"),
5161        )
5162        .unwrap();
5163
5164        let (dst_width, dst_height) = (640, 640);
5165
5166        let dst_g2d = TensorDyn::image(
5167            dst_width,
5168            dst_height,
5169            PixelFormat::Rgba,
5170            DType::U8,
5171            Some(TensorMemory::Dma),
5172        )
5173        .unwrap();
5174        let mut g2d_converter = G2DProcessor::new().unwrap();
5175        let crop = Crop {
5176            src_rect: Some(Rect {
5177                left: 20,
5178                top: 15,
5179                width: 400,
5180                height: 300,
5181            }),
5182            dst_rect: None,
5183            dst_color: None,
5184        };
5185
5186        let (result, src, dst_g2d) = convert_img(
5187            &mut g2d_converter,
5188            src,
5189            dst_g2d,
5190            Rotation::None,
5191            Flip::Horizontal,
5192            crop,
5193        );
5194        result.unwrap();
5195
5196        let dst_cpu = TensorDyn::image(
5197            dst_width,
5198            dst_height,
5199            PixelFormat::Rgba,
5200            DType::U8,
5201            Some(TensorMemory::Dma),
5202        )
5203        .unwrap();
5204        let mut cpu_converter = CPUProcessor::new();
5205
5206        let (result, _src, dst_cpu) = convert_img(
5207            &mut cpu_converter,
5208            src,
5209            dst_cpu,
5210            Rotation::None,
5211            Flip::Horizontal,
5212            crop,
5213        );
5214        result.unwrap();
5215        compare_images(&dst_g2d, &dst_cpu, 0.98, function!());
5216    }
5217
5218    #[test]
5219    #[cfg(target_os = "linux")]
5220    #[cfg(feature = "opengl")]
5221    fn test_yuyv_to_rgba_crop_flip_opengl() {
5222        if !is_opengl_available() {
5223            eprintln!("SKIPPED: {} - OpenGL not available", function!());
5224            return;
5225        }
5226
5227        if !is_dma_available() {
5228            eprintln!(
5229                "SKIPPED: {} - DMA memory allocation not available (permission denied or no DMA-BUF support)",
5230                function!()
5231            );
5232            return;
5233        }
5234
5235        let src = load_bytes_to_tensor(
5236            1280,
5237            720,
5238            PixelFormat::Yuyv,
5239            Some(TensorMemory::Dma),
5240            &edgefirst_bench::testdata::read("camera720p.yuyv"),
5241        )
5242        .unwrap();
5243
5244        let (dst_width, dst_height) = (640, 640);
5245
5246        let dst_gl = TensorDyn::image(
5247            dst_width,
5248            dst_height,
5249            PixelFormat::Rgba,
5250            DType::U8,
5251            Some(TensorMemory::Dma),
5252        )
5253        .unwrap();
5254        let mut gl_converter = GLProcessorThreaded::new(None).unwrap();
5255        let crop = Crop {
5256            src_rect: Some(Rect {
5257                left: 20,
5258                top: 15,
5259                width: 400,
5260                height: 300,
5261            }),
5262            dst_rect: None,
5263            dst_color: None,
5264        };
5265
5266        let (result, src, dst_gl) = convert_img(
5267            &mut gl_converter,
5268            src,
5269            dst_gl,
5270            Rotation::None,
5271            Flip::Horizontal,
5272            crop,
5273        );
5274        result.unwrap();
5275
5276        let dst_cpu = TensorDyn::image(
5277            dst_width,
5278            dst_height,
5279            PixelFormat::Rgba,
5280            DType::U8,
5281            Some(TensorMemory::Dma),
5282        )
5283        .unwrap();
5284        let mut cpu_converter = CPUProcessor::new();
5285
5286        let (result, _src, dst_cpu) = convert_img(
5287            &mut cpu_converter,
5288            src,
5289            dst_cpu,
5290            Rotation::None,
5291            Flip::Horizontal,
5292            crop,
5293        );
5294        result.unwrap();
5295        compare_images(&dst_gl, &dst_cpu, 0.98, function!());
5296    }
5297
5298    #[test]
5299    fn test_vyuy_to_rgba_cpu() {
5300        let file = edgefirst_bench::testdata::read("camera720p.vyuy").to_vec();
5301        let src = TensorDyn::image(1280, 720, PixelFormat::Vyuy, DType::U8, None).unwrap();
5302        src.as_u8()
5303            .unwrap()
5304            .map()
5305            .unwrap()
5306            .as_mut_slice()
5307            .copy_from_slice(&file);
5308
5309        let dst = TensorDyn::image(1280, 720, PixelFormat::Rgba, DType::U8, None).unwrap();
5310        let mut cpu_converter = CPUProcessor::new();
5311
5312        let (result, _src, dst) = convert_img(
5313            &mut cpu_converter,
5314            src,
5315            dst,
5316            Rotation::None,
5317            Flip::None,
5318            Crop::no_crop(),
5319        );
5320        result.unwrap();
5321
5322        let target_image = TensorDyn::image(1280, 720, PixelFormat::Rgba, DType::U8, None).unwrap();
5323        target_image
5324            .as_u8()
5325            .unwrap()
5326            .map()
5327            .unwrap()
5328            .as_mut_slice()
5329            .copy_from_slice(&edgefirst_bench::testdata::read("camera720p.rgba"));
5330
5331        compare_images(&dst, &target_image, 0.98, function!());
5332    }
5333
5334    #[test]
5335    fn test_vyuy_to_rgb_cpu() {
5336        let file = edgefirst_bench::testdata::read("camera720p.vyuy").to_vec();
5337        let src = TensorDyn::image(1280, 720, PixelFormat::Vyuy, DType::U8, None).unwrap();
5338        src.as_u8()
5339            .unwrap()
5340            .map()
5341            .unwrap()
5342            .as_mut_slice()
5343            .copy_from_slice(&file);
5344
5345        let dst = TensorDyn::image(1280, 720, PixelFormat::Rgb, DType::U8, None).unwrap();
5346        let mut cpu_converter = CPUProcessor::new();
5347
5348        let (result, _src, dst) = convert_img(
5349            &mut cpu_converter,
5350            src,
5351            dst,
5352            Rotation::None,
5353            Flip::None,
5354            Crop::no_crop(),
5355        );
5356        result.unwrap();
5357
5358        let target_image = TensorDyn::image(1280, 720, PixelFormat::Rgb, DType::U8, None).unwrap();
5359        target_image
5360            .as_u8()
5361            .unwrap()
5362            .map()
5363            .unwrap()
5364            .as_mut_slice()
5365            .as_chunks_mut::<3>()
5366            .0
5367            .iter_mut()
5368            .zip(
5369                edgefirst_bench::testdata::read("camera720p.rgba")
5370                    .as_chunks::<4>()
5371                    .0,
5372            )
5373            .for_each(|(dst, src)| *dst = [src[0], src[1], src[2]]);
5374
5375        compare_images(&dst, &target_image, 0.98, function!());
5376    }
5377
5378    #[test]
5379    #[cfg(target_os = "linux")]
5380    #[ignore = "G2D does not support VYUY; re-enable when hardware support is added"]
5381    fn test_vyuy_to_rgba_g2d() {
5382        if !is_g2d_available() {
5383            eprintln!("SKIPPED: test_vyuy_to_rgba_g2d - G2D library (libg2d.so.2) not available");
5384            return;
5385        }
5386        if !is_dma_available() {
5387            eprintln!(
5388                "SKIPPED: test_vyuy_to_rgba_g2d - DMA memory allocation not available (permission denied or no DMA-BUF support)"
5389            );
5390            return;
5391        }
5392
5393        let src = load_bytes_to_tensor(
5394            1280,
5395            720,
5396            PixelFormat::Vyuy,
5397            None,
5398            &edgefirst_bench::testdata::read("camera720p.vyuy"),
5399        )
5400        .unwrap();
5401
5402        let dst = TensorDyn::image(
5403            1280,
5404            720,
5405            PixelFormat::Rgba,
5406            DType::U8,
5407            Some(TensorMemory::Dma),
5408        )
5409        .unwrap();
5410        let mut g2d_converter = G2DProcessor::new().unwrap();
5411
5412        let (result, _src, dst) = convert_img(
5413            &mut g2d_converter,
5414            src,
5415            dst,
5416            Rotation::None,
5417            Flip::None,
5418            Crop::no_crop(),
5419        );
5420        match result {
5421            Err(Error::G2D(_)) => {
5422                eprintln!("SKIPPED: test_vyuy_to_rgba_g2d - G2D does not support PixelFormat::Vyuy format");
5423                return;
5424            }
5425            r => r.unwrap(),
5426        }
5427
5428        let target_image = TensorDyn::image(1280, 720, PixelFormat::Rgba, DType::U8, None).unwrap();
5429        target_image
5430            .as_u8()
5431            .unwrap()
5432            .map()
5433            .unwrap()
5434            .as_mut_slice()
5435            .copy_from_slice(&edgefirst_bench::testdata::read("camera720p.rgba"));
5436
5437        compare_images(&dst, &target_image, 0.98, function!());
5438    }
5439
5440    #[test]
5441    #[cfg(target_os = "linux")]
5442    #[ignore = "G2D does not support VYUY; re-enable when hardware support is added"]
5443    fn test_vyuy_to_rgb_g2d() {
5444        if !is_g2d_available() {
5445            eprintln!("SKIPPED: test_vyuy_to_rgb_g2d - G2D library (libg2d.so.2) not available");
5446            return;
5447        }
5448        if !is_dma_available() {
5449            eprintln!(
5450                "SKIPPED: test_vyuy_to_rgb_g2d - DMA memory allocation not available (permission denied or no DMA-BUF support)"
5451            );
5452            return;
5453        }
5454
5455        let src = load_bytes_to_tensor(
5456            1280,
5457            720,
5458            PixelFormat::Vyuy,
5459            None,
5460            &edgefirst_bench::testdata::read("camera720p.vyuy"),
5461        )
5462        .unwrap();
5463
5464        let g2d_dst = TensorDyn::image(
5465            1280,
5466            720,
5467            PixelFormat::Rgb,
5468            DType::U8,
5469            Some(TensorMemory::Dma),
5470        )
5471        .unwrap();
5472        let mut g2d_converter = G2DProcessor::new().unwrap();
5473
5474        let (result, src, g2d_dst) = convert_img(
5475            &mut g2d_converter,
5476            src,
5477            g2d_dst,
5478            Rotation::None,
5479            Flip::None,
5480            Crop::no_crop(),
5481        );
5482        match result {
5483            Err(Error::G2D(_)) => {
5484                eprintln!(
5485                    "SKIPPED: test_vyuy_to_rgb_g2d - G2D does not support PixelFormat::Vyuy format"
5486                );
5487                return;
5488            }
5489            r => r.unwrap(),
5490        }
5491
5492        let cpu_dst = TensorDyn::image(1280, 720, PixelFormat::Rgb, DType::U8, None).unwrap();
5493        let mut cpu_converter: CPUProcessor = CPUProcessor::new();
5494
5495        let (result, _src, cpu_dst) = convert_img(
5496            &mut cpu_converter,
5497            src,
5498            cpu_dst,
5499            Rotation::None,
5500            Flip::None,
5501            Crop::no_crop(),
5502        );
5503        result.unwrap();
5504
5505        compare_images(&g2d_dst, &cpu_dst, 0.98, function!());
5506    }
5507
5508    #[test]
5509    #[cfg(target_os = "linux")]
5510    #[cfg(feature = "opengl")]
5511    fn test_vyuy_to_rgba_opengl() {
5512        if !is_opengl_available() {
5513            eprintln!("SKIPPED: {} - OpenGL not available", function!());
5514            return;
5515        }
5516        if !is_dma_available() {
5517            eprintln!(
5518                "SKIPPED: {} - DMA memory allocation not available (permission denied or no DMA-BUF support)",
5519                function!()
5520            );
5521            return;
5522        }
5523
5524        let src = load_bytes_to_tensor(
5525            1280,
5526            720,
5527            PixelFormat::Vyuy,
5528            Some(TensorMemory::Dma),
5529            &edgefirst_bench::testdata::read("camera720p.vyuy"),
5530        )
5531        .unwrap();
5532
5533        let dst = TensorDyn::image(
5534            1280,
5535            720,
5536            PixelFormat::Rgba,
5537            DType::U8,
5538            Some(TensorMemory::Dma),
5539        )
5540        .unwrap();
5541        let mut gl_converter = GLProcessorThreaded::new(None).unwrap();
5542
5543        let (result, _src, dst) = convert_img(
5544            &mut gl_converter,
5545            src,
5546            dst,
5547            Rotation::None,
5548            Flip::None,
5549            Crop::no_crop(),
5550        );
5551        match result {
5552            Err(Error::NotSupported(_)) => {
5553                eprintln!(
5554                    "SKIPPED: {} - OpenGL does not support PixelFormat::Vyuy DMA format",
5555                    function!()
5556                );
5557                return;
5558            }
5559            r => r.unwrap(),
5560        }
5561
5562        let target_image = TensorDyn::image(1280, 720, PixelFormat::Rgba, DType::U8, None).unwrap();
5563        target_image
5564            .as_u8()
5565            .unwrap()
5566            .map()
5567            .unwrap()
5568            .as_mut_slice()
5569            .copy_from_slice(&edgefirst_bench::testdata::read("camera720p.rgba"));
5570
5571        compare_images(&dst, &target_image, 0.98, function!());
5572    }
5573
5574    #[test]
5575    fn test_nv12_to_rgba_cpu() {
5576        let file = edgefirst_bench::testdata::read("zidane.nv12").to_vec();
5577        let src = TensorDyn::image(1280, 720, PixelFormat::Nv12, DType::U8, None).unwrap();
5578        src.as_u8().unwrap().map().unwrap().as_mut_slice()[0..(1280 * 720 * 3 / 2)]
5579            .copy_from_slice(&file);
5580
5581        let dst = TensorDyn::image(1280, 720, PixelFormat::Rgba, DType::U8, None).unwrap();
5582        let mut cpu_converter = CPUProcessor::new();
5583
5584        let (result, _src, dst) = convert_img(
5585            &mut cpu_converter,
5586            src,
5587            dst,
5588            Rotation::None,
5589            Flip::None,
5590            Crop::no_crop(),
5591        );
5592        result.unwrap();
5593
5594        let target_image = crate::load_image(
5595            &edgefirst_bench::testdata::read("zidane.jpg"),
5596            Some(PixelFormat::Rgba),
5597            None,
5598        )
5599        .unwrap();
5600
5601        compare_images(&dst, &target_image, 0.98, function!());
5602    }
5603
5604    #[test]
5605    fn test_nv12_to_rgb_cpu() {
5606        let file = edgefirst_bench::testdata::read("zidane.nv12").to_vec();
5607        let src = TensorDyn::image(1280, 720, PixelFormat::Nv12, DType::U8, None).unwrap();
5608        src.as_u8().unwrap().map().unwrap().as_mut_slice()[0..(1280 * 720 * 3 / 2)]
5609            .copy_from_slice(&file);
5610
5611        let dst = TensorDyn::image(1280, 720, PixelFormat::Rgb, DType::U8, None).unwrap();
5612        let mut cpu_converter = CPUProcessor::new();
5613
5614        let (result, _src, dst) = convert_img(
5615            &mut cpu_converter,
5616            src,
5617            dst,
5618            Rotation::None,
5619            Flip::None,
5620            Crop::no_crop(),
5621        );
5622        result.unwrap();
5623
5624        let target_image = crate::load_image(
5625            &edgefirst_bench::testdata::read("zidane.jpg"),
5626            Some(PixelFormat::Rgb),
5627            None,
5628        )
5629        .unwrap();
5630
5631        compare_images(&dst, &target_image, 0.98, function!());
5632    }
5633
5634    #[test]
5635    fn test_nv12_to_grey_cpu() {
5636        let file = edgefirst_bench::testdata::read("zidane.nv12").to_vec();
5637        let src = TensorDyn::image(1280, 720, PixelFormat::Nv12, DType::U8, None).unwrap();
5638        src.as_u8().unwrap().map().unwrap().as_mut_slice()[0..(1280 * 720 * 3 / 2)]
5639            .copy_from_slice(&file);
5640
5641        let dst = TensorDyn::image(1280, 720, PixelFormat::Grey, DType::U8, None).unwrap();
5642        let mut cpu_converter = CPUProcessor::new();
5643
5644        let (result, _src, dst) = convert_img(
5645            &mut cpu_converter,
5646            src,
5647            dst,
5648            Rotation::None,
5649            Flip::None,
5650            Crop::no_crop(),
5651        );
5652        result.unwrap();
5653
5654        let target_image = crate::load_image(
5655            &edgefirst_bench::testdata::read("zidane.jpg"),
5656            Some(PixelFormat::Grey),
5657            None,
5658        )
5659        .unwrap();
5660
5661        compare_images(&dst, &target_image, 0.98, function!());
5662    }
5663
5664    #[test]
5665    fn test_nv12_to_yuyv_cpu() {
5666        let file = edgefirst_bench::testdata::read("zidane.nv12").to_vec();
5667        let src = TensorDyn::image(1280, 720, PixelFormat::Nv12, DType::U8, None).unwrap();
5668        src.as_u8().unwrap().map().unwrap().as_mut_slice()[0..(1280 * 720 * 3 / 2)]
5669            .copy_from_slice(&file);
5670
5671        let dst = TensorDyn::image(1280, 720, PixelFormat::Yuyv, DType::U8, None).unwrap();
5672        let mut cpu_converter = CPUProcessor::new();
5673
5674        let (result, _src, dst) = convert_img(
5675            &mut cpu_converter,
5676            src,
5677            dst,
5678            Rotation::None,
5679            Flip::None,
5680            Crop::no_crop(),
5681        );
5682        result.unwrap();
5683
5684        let target_image = crate::load_image(
5685            &edgefirst_bench::testdata::read("zidane.jpg"),
5686            Some(PixelFormat::Rgb),
5687            None,
5688        )
5689        .unwrap();
5690
5691        compare_images_convert_to_rgb(&dst, &target_image, 0.98, function!());
5692    }
5693
5694    #[test]
5695    fn test_cpu_resize_planar_rgb() {
5696        let src = TensorDyn::image(4, 4, PixelFormat::Rgba, DType::U8, None).unwrap();
5697        #[rustfmt::skip]
5698        let src_image = [
5699                    255, 0, 0, 255,     0, 255, 0, 255,     0, 0, 255, 255,     255, 255, 0, 255,
5700                    255, 0, 0, 0,       0, 0, 0, 255,       255,  0, 255, 0,    255, 0, 255, 255,
5701                    0, 0, 255, 0,       0, 255, 255, 255,   255, 255, 0, 0,     0, 0, 0, 255,
5702                    255, 0, 0, 0,       0, 0, 0, 255,       255,  0, 255, 0,    255, 0, 255, 255,
5703        ];
5704        src.as_u8()
5705            .unwrap()
5706            .map()
5707            .unwrap()
5708            .as_mut_slice()
5709            .copy_from_slice(&src_image);
5710
5711        let cpu_dst = TensorDyn::image(5, 5, PixelFormat::PlanarRgb, DType::U8, None).unwrap();
5712        let mut cpu_converter = CPUProcessor::new();
5713
5714        let (result, _src, cpu_dst) = convert_img(
5715            &mut cpu_converter,
5716            src,
5717            cpu_dst,
5718            Rotation::None,
5719            Flip::None,
5720            Crop::new()
5721                .with_dst_rect(Some(Rect {
5722                    left: 1,
5723                    top: 1,
5724                    width: 4,
5725                    height: 4,
5726                }))
5727                .with_dst_color(Some([114, 114, 114, 255])),
5728        );
5729        result.unwrap();
5730
5731        #[rustfmt::skip]
5732        let expected_dst = [
5733            114, 114, 114, 114, 114,    114, 255, 0, 0, 255,    114, 255, 0, 255, 255,      114, 0, 0, 255, 0,        114, 255, 0, 255, 255,
5734            114, 114, 114, 114, 114,    114, 0, 255, 0, 255,    114, 0, 0, 0, 0,            114, 0, 255, 255, 0,      114, 0, 0, 0, 0,
5735            114, 114, 114, 114, 114,    114, 0, 0, 255, 0,      114, 0, 0, 255, 255,        114, 255, 255, 0, 0,      114, 0, 0, 255, 255,
5736        ];
5737
5738        assert_eq!(
5739            cpu_dst.as_u8().unwrap().map().unwrap().as_slice(),
5740            &expected_dst
5741        );
5742    }
5743
5744    #[test]
5745    fn test_cpu_resize_planar_rgba() {
5746        let src = TensorDyn::image(4, 4, PixelFormat::Rgba, DType::U8, None).unwrap();
5747        #[rustfmt::skip]
5748        let src_image = [
5749                    255, 0, 0, 255,     0, 255, 0, 255,     0, 0, 255, 255,     255, 255, 0, 255,
5750                    255, 0, 0, 0,       0, 0, 0, 255,       255,  0, 255, 0,    255, 0, 255, 255,
5751                    0, 0, 255, 0,       0, 255, 255, 255,   255, 255, 0, 0,     0, 0, 0, 255,
5752                    255, 0, 0, 0,       0, 0, 0, 255,       255,  0, 255, 0,    255, 0, 255, 255,
5753        ];
5754        src.as_u8()
5755            .unwrap()
5756            .map()
5757            .unwrap()
5758            .as_mut_slice()
5759            .copy_from_slice(&src_image);
5760
5761        let cpu_dst = TensorDyn::image(5, 5, PixelFormat::PlanarRgba, DType::U8, None).unwrap();
5762        let mut cpu_converter = CPUProcessor::new();
5763
5764        let (result, _src, cpu_dst) = convert_img(
5765            &mut cpu_converter,
5766            src,
5767            cpu_dst,
5768            Rotation::None,
5769            Flip::None,
5770            Crop::new()
5771                .with_dst_rect(Some(Rect {
5772                    left: 1,
5773                    top: 1,
5774                    width: 4,
5775                    height: 4,
5776                }))
5777                .with_dst_color(Some([114, 114, 114, 255])),
5778        );
5779        result.unwrap();
5780
5781        #[rustfmt::skip]
5782        let expected_dst = [
5783            114, 114, 114, 114, 114,    114, 255, 0, 0, 255,        114, 255, 0, 255, 255,      114, 0, 0, 255, 0,        114, 255, 0, 255, 255,
5784            114, 114, 114, 114, 114,    114, 0, 255, 0, 255,        114, 0, 0, 0, 0,            114, 0, 255, 255, 0,      114, 0, 0, 0, 0,
5785            114, 114, 114, 114, 114,    114, 0, 0, 255, 0,          114, 0, 0, 255, 255,        114, 255, 255, 0, 0,      114, 0, 0, 255, 255,
5786            255, 255, 255, 255, 255,    255, 255, 255, 255, 255,    255, 0, 255, 0, 255,        255, 0, 255, 0, 255,      255, 0, 255, 0, 255,
5787        ];
5788
5789        assert_eq!(
5790            cpu_dst.as_u8().unwrap().map().unwrap().as_slice(),
5791            &expected_dst
5792        );
5793    }
5794
5795    #[test]
5796    #[cfg(target_os = "linux")]
5797    #[cfg(feature = "opengl")]
5798    fn test_opengl_resize_planar_rgb() {
5799        if !is_opengl_available() {
5800            eprintln!("SKIPPED: {} - OpenGL not available", function!());
5801            return;
5802        }
5803
5804        if !is_dma_available() {
5805            eprintln!(
5806                "SKIPPED: {} - DMA memory allocation not available (permission denied or no DMA-BUF support)",
5807                function!()
5808            );
5809            return;
5810        }
5811
5812        let dst_width = 640;
5813        let dst_height = 640;
5814        let file = edgefirst_bench::testdata::read("test_image.jpg").to_vec();
5815        let src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
5816
5817        let cpu_dst = TensorDyn::image(
5818            dst_width,
5819            dst_height,
5820            PixelFormat::PlanarRgb,
5821            DType::U8,
5822            None,
5823        )
5824        .unwrap();
5825        let mut cpu_converter = CPUProcessor::new();
5826        let (result, src, cpu_dst) = convert_img(
5827            &mut cpu_converter,
5828            src,
5829            cpu_dst,
5830            Rotation::None,
5831            Flip::None,
5832            Crop::no_crop(),
5833        );
5834        result.unwrap();
5835        let crop_letterbox = Crop::new()
5836            .with_dst_rect(Some(Rect {
5837                left: 102,
5838                top: 102,
5839                width: 440,
5840                height: 440,
5841            }))
5842            .with_dst_color(Some([114, 114, 114, 114]));
5843        let (result, src, cpu_dst) = convert_img(
5844            &mut cpu_converter,
5845            src,
5846            cpu_dst,
5847            Rotation::None,
5848            Flip::None,
5849            crop_letterbox,
5850        );
5851        result.unwrap();
5852
5853        let gl_dst = TensorDyn::image(
5854            dst_width,
5855            dst_height,
5856            PixelFormat::PlanarRgb,
5857            DType::U8,
5858            None,
5859        )
5860        .unwrap();
5861        let mut gl_converter = GLProcessorThreaded::new(None).unwrap();
5862
5863        let (result, _src, gl_dst) = convert_img(
5864            &mut gl_converter,
5865            src,
5866            gl_dst,
5867            Rotation::None,
5868            Flip::None,
5869            crop_letterbox,
5870        );
5871        result.unwrap();
5872        compare_images(&gl_dst, &cpu_dst, 0.98, function!());
5873    }
5874
5875    #[test]
5876    fn test_cpu_resize_nv16() {
5877        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
5878        let src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
5879
5880        let cpu_nv16_dst = TensorDyn::image(640, 640, PixelFormat::Nv16, DType::U8, None).unwrap();
5881        let cpu_rgb_dst = TensorDyn::image(640, 640, PixelFormat::Rgb, DType::U8, None).unwrap();
5882        let mut cpu_converter = CPUProcessor::new();
5883        let crop = Crop::new()
5884            .with_dst_rect(Some(Rect {
5885                left: 20,
5886                top: 140,
5887                width: 600,
5888                height: 360,
5889            }))
5890            .with_dst_color(Some([255, 128, 0, 255]));
5891
5892        let (result, src, cpu_nv16_dst) = convert_img(
5893            &mut cpu_converter,
5894            src,
5895            cpu_nv16_dst,
5896            Rotation::None,
5897            Flip::None,
5898            crop,
5899        );
5900        result.unwrap();
5901
5902        let (result, _src, cpu_rgb_dst) = convert_img(
5903            &mut cpu_converter,
5904            src,
5905            cpu_rgb_dst,
5906            Rotation::None,
5907            Flip::None,
5908            crop,
5909        );
5910        result.unwrap();
5911        compare_images_convert_to_rgb(&cpu_nv16_dst, &cpu_rgb_dst, 0.99, function!());
5912    }
5913
5914    fn load_bytes_to_tensor(
5915        width: usize,
5916        height: usize,
5917        format: PixelFormat,
5918        memory: Option<TensorMemory>,
5919        bytes: &[u8],
5920    ) -> Result<TensorDyn, Error> {
5921        let src = TensorDyn::image(width, height, format, DType::U8, memory)?;
5922        src.as_u8()
5923            .unwrap()
5924            .map()?
5925            .as_mut_slice()
5926            .copy_from_slice(bytes);
5927        Ok(src)
5928    }
5929
5930    fn compare_images(img1: &TensorDyn, img2: &TensorDyn, threshold: f64, name: &str) {
5931        assert_eq!(img1.height(), img2.height(), "Heights differ");
5932        assert_eq!(img1.width(), img2.width(), "Widths differ");
5933        assert_eq!(
5934            img1.format().unwrap(),
5935            img2.format().unwrap(),
5936            "PixelFormat differ"
5937        );
5938        assert!(
5939            matches!(
5940                img1.format().unwrap(),
5941                PixelFormat::Rgb | PixelFormat::Rgba | PixelFormat::Grey | PixelFormat::PlanarRgb
5942            ),
5943            "format must be Rgb or Rgba for comparison"
5944        );
5945
5946        let image1 = match img1.format().unwrap() {
5947            PixelFormat::Rgb => image::RgbImage::from_vec(
5948                img1.width().unwrap() as u32,
5949                img1.height().unwrap() as u32,
5950                img1.as_u8().unwrap().map().unwrap().to_vec(),
5951            )
5952            .unwrap(),
5953            PixelFormat::Rgba => image::RgbaImage::from_vec(
5954                img1.width().unwrap() as u32,
5955                img1.height().unwrap() as u32,
5956                img1.as_u8().unwrap().map().unwrap().to_vec(),
5957            )
5958            .unwrap()
5959            .convert(),
5960            PixelFormat::Grey => image::GrayImage::from_vec(
5961                img1.width().unwrap() as u32,
5962                img1.height().unwrap() as u32,
5963                img1.as_u8().unwrap().map().unwrap().to_vec(),
5964            )
5965            .unwrap()
5966            .convert(),
5967            PixelFormat::PlanarRgb => image::GrayImage::from_vec(
5968                img1.width().unwrap() as u32,
5969                (img1.height().unwrap() * 3) as u32,
5970                img1.as_u8().unwrap().map().unwrap().to_vec(),
5971            )
5972            .unwrap()
5973            .convert(),
5974            _ => return,
5975        };
5976
5977        let image2 = match img2.format().unwrap() {
5978            PixelFormat::Rgb => image::RgbImage::from_vec(
5979                img2.width().unwrap() as u32,
5980                img2.height().unwrap() as u32,
5981                img2.as_u8().unwrap().map().unwrap().to_vec(),
5982            )
5983            .unwrap(),
5984            PixelFormat::Rgba => image::RgbaImage::from_vec(
5985                img2.width().unwrap() as u32,
5986                img2.height().unwrap() as u32,
5987                img2.as_u8().unwrap().map().unwrap().to_vec(),
5988            )
5989            .unwrap()
5990            .convert(),
5991            PixelFormat::Grey => image::GrayImage::from_vec(
5992                img2.width().unwrap() as u32,
5993                img2.height().unwrap() as u32,
5994                img2.as_u8().unwrap().map().unwrap().to_vec(),
5995            )
5996            .unwrap()
5997            .convert(),
5998            PixelFormat::PlanarRgb => image::GrayImage::from_vec(
5999                img2.width().unwrap() as u32,
6000                (img2.height().unwrap() * 3) as u32,
6001                img2.as_u8().unwrap().map().unwrap().to_vec(),
6002            )
6003            .unwrap()
6004            .convert(),
6005            _ => return,
6006        };
6007
6008        let similarity = image_compare::rgb_similarity_structure(
6009            &image_compare::Algorithm::RootMeanSquared,
6010            &image1,
6011            &image2,
6012        )
6013        .expect("Image Comparison failed");
6014        if similarity.score < threshold {
6015            // image1.save(format!("{name}_1.png"));
6016            // image2.save(format!("{name}_2.png"));
6017            similarity
6018                .image
6019                .to_color_map()
6020                .save(format!("{name}.png"))
6021                .unwrap();
6022            panic!(
6023                "{name}: converted image and target image have similarity score too low: {} < {}",
6024                similarity.score, threshold
6025            )
6026        }
6027    }
6028
6029    fn compare_images_convert_to_rgb(
6030        img1: &TensorDyn,
6031        img2: &TensorDyn,
6032        threshold: f64,
6033        name: &str,
6034    ) {
6035        assert_eq!(img1.height(), img2.height(), "Heights differ");
6036        assert_eq!(img1.width(), img2.width(), "Widths differ");
6037
6038        let mut img_rgb1 = TensorDyn::image(
6039            img1.width().unwrap(),
6040            img1.height().unwrap(),
6041            PixelFormat::Rgb,
6042            DType::U8,
6043            Some(TensorMemory::Mem),
6044        )
6045        .unwrap();
6046        let mut img_rgb2 = TensorDyn::image(
6047            img1.width().unwrap(),
6048            img1.height().unwrap(),
6049            PixelFormat::Rgb,
6050            DType::U8,
6051            Some(TensorMemory::Mem),
6052        )
6053        .unwrap();
6054        let mut __cv = CPUProcessor::default();
6055        let r1 = __cv.convert(
6056            img1,
6057            &mut img_rgb1,
6058            crate::Rotation::None,
6059            crate::Flip::None,
6060            crate::Crop::default(),
6061        );
6062        let r2 = __cv.convert(
6063            img2,
6064            &mut img_rgb2,
6065            crate::Rotation::None,
6066            crate::Flip::None,
6067            crate::Crop::default(),
6068        );
6069        if r1.is_err() || r2.is_err() {
6070            // Fallback: compare raw bytes as greyscale strip
6071            let w = img1.width().unwrap() as u32;
6072            let data1 = img1.as_u8().unwrap().map().unwrap().to_vec();
6073            let data2 = img2.as_u8().unwrap().map().unwrap().to_vec();
6074            let h1 = (data1.len() as u32) / w;
6075            let h2 = (data2.len() as u32) / w;
6076            let g1 = image::GrayImage::from_vec(w, h1, data1).unwrap();
6077            let g2 = image::GrayImage::from_vec(w, h2, data2).unwrap();
6078            let similarity = image_compare::gray_similarity_structure(
6079                &image_compare::Algorithm::RootMeanSquared,
6080                &g1,
6081                &g2,
6082            )
6083            .expect("Image Comparison failed");
6084            if similarity.score < threshold {
6085                panic!(
6086                    "{name}: converted image and target image have similarity score too low: {} < {}",
6087                    similarity.score, threshold
6088                )
6089            }
6090            return;
6091        }
6092
6093        let image1 = image::RgbImage::from_vec(
6094            img_rgb1.width().unwrap() as u32,
6095            img_rgb1.height().unwrap() as u32,
6096            img_rgb1.as_u8().unwrap().map().unwrap().to_vec(),
6097        )
6098        .unwrap();
6099
6100        let image2 = image::RgbImage::from_vec(
6101            img_rgb2.width().unwrap() as u32,
6102            img_rgb2.height().unwrap() as u32,
6103            img_rgb2.as_u8().unwrap().map().unwrap().to_vec(),
6104        )
6105        .unwrap();
6106
6107        let similarity = image_compare::rgb_similarity_structure(
6108            &image_compare::Algorithm::RootMeanSquared,
6109            &image1,
6110            &image2,
6111        )
6112        .expect("Image Comparison failed");
6113        if similarity.score < threshold {
6114            // image1.save(format!("{name}_1.png"));
6115            // image2.save(format!("{name}_2.png"));
6116            similarity
6117                .image
6118                .to_color_map()
6119                .save(format!("{name}.png"))
6120                .unwrap();
6121            panic!(
6122                "{name}: converted image and target image have similarity score too low: {} < {}",
6123                similarity.score, threshold
6124            )
6125        }
6126    }
6127
6128    // =========================================================================
6129    // PixelFormat::Nv12 Format Tests
6130    // =========================================================================
6131
6132    #[test]
6133    fn test_nv12_image_creation() {
6134        let width = 640;
6135        let height = 480;
6136        let img = TensorDyn::image(width, height, PixelFormat::Nv12, DType::U8, None).unwrap();
6137
6138        assert_eq!(img.width(), Some(width));
6139        assert_eq!(img.height(), Some(height));
6140        assert_eq!(img.format().unwrap(), PixelFormat::Nv12);
6141        // PixelFormat::Nv12 uses shape [H*3/2, W] to store Y plane + UV plane
6142        assert_eq!(img.as_u8().unwrap().shape(), &[height * 3 / 2, width]);
6143    }
6144
6145    #[test]
6146    fn test_nv12_channels() {
6147        let img = TensorDyn::image(640, 480, PixelFormat::Nv12, DType::U8, None).unwrap();
6148        // PixelFormat::Nv12.channels() returns 1 (luma plane)
6149        assert_eq!(img.format().unwrap().channels(), 1);
6150    }
6151
6152    // =========================================================================
6153    // Tensor Format Metadata Tests
6154    // =========================================================================
6155
6156    #[test]
6157    fn test_tensor_set_format_planar() {
6158        let mut tensor = Tensor::<u8>::new(&[3, 480, 640], None, None).unwrap();
6159        tensor.set_format(PixelFormat::PlanarRgb).unwrap();
6160        assert_eq!(tensor.format(), Some(PixelFormat::PlanarRgb));
6161        assert_eq!(tensor.width(), Some(640));
6162        assert_eq!(tensor.height(), Some(480));
6163    }
6164
6165    #[test]
6166    fn test_tensor_set_format_interleaved() {
6167        let mut tensor = Tensor::<u8>::new(&[480, 640, 4], None, None).unwrap();
6168        tensor.set_format(PixelFormat::Rgba).unwrap();
6169        assert_eq!(tensor.format(), Some(PixelFormat::Rgba));
6170        assert_eq!(tensor.width(), Some(640));
6171        assert_eq!(tensor.height(), Some(480));
6172    }
6173
6174    #[test]
6175    fn test_tensordyn_image_rgb() {
6176        let img = TensorDyn::image(640, 480, PixelFormat::Rgb, DType::U8, None).unwrap();
6177        assert_eq!(img.width(), Some(640));
6178        assert_eq!(img.height(), Some(480));
6179        assert_eq!(img.format(), Some(PixelFormat::Rgb));
6180    }
6181
6182    #[test]
6183    fn test_tensordyn_image_planar_rgb() {
6184        let img = TensorDyn::image(640, 480, PixelFormat::PlanarRgb, DType::U8, None).unwrap();
6185        assert_eq!(img.width(), Some(640));
6186        assert_eq!(img.height(), Some(480));
6187        assert_eq!(img.format(), Some(PixelFormat::PlanarRgb));
6188    }
6189
6190    #[test]
6191    fn test_rgb_int8_format() {
6192        // Int8 variant: same PixelFormat::Rgb but with DType::I8
6193        let img = TensorDyn::image(
6194            1280,
6195            720,
6196            PixelFormat::Rgb,
6197            DType::I8,
6198            Some(TensorMemory::Mem),
6199        )
6200        .unwrap();
6201        assert_eq!(img.width(), Some(1280));
6202        assert_eq!(img.height(), Some(720));
6203        assert_eq!(img.format(), Some(PixelFormat::Rgb));
6204        assert_eq!(img.dtype(), DType::I8);
6205    }
6206
6207    #[test]
6208    fn test_planar_rgb_int8_format() {
6209        let img = TensorDyn::image(
6210            1280,
6211            720,
6212            PixelFormat::PlanarRgb,
6213            DType::I8,
6214            Some(TensorMemory::Mem),
6215        )
6216        .unwrap();
6217        assert_eq!(img.width(), Some(1280));
6218        assert_eq!(img.height(), Some(720));
6219        assert_eq!(img.format(), Some(PixelFormat::PlanarRgb));
6220        assert_eq!(img.dtype(), DType::I8);
6221    }
6222
6223    #[test]
6224    fn test_rgb_from_tensor() {
6225        let mut tensor = Tensor::<u8>::new(&[720, 1280, 3], None, None).unwrap();
6226        tensor.set_format(PixelFormat::Rgb).unwrap();
6227        let img = TensorDyn::from(tensor);
6228        assert_eq!(img.width(), Some(1280));
6229        assert_eq!(img.height(), Some(720));
6230        assert_eq!(img.format(), Some(PixelFormat::Rgb));
6231    }
6232
6233    #[test]
6234    fn test_planar_rgb_from_tensor() {
6235        let mut tensor = Tensor::<u8>::new(&[3, 720, 1280], None, None).unwrap();
6236        tensor.set_format(PixelFormat::PlanarRgb).unwrap();
6237        let img = TensorDyn::from(tensor);
6238        assert_eq!(img.width(), Some(1280));
6239        assert_eq!(img.height(), Some(720));
6240        assert_eq!(img.format(), Some(PixelFormat::PlanarRgb));
6241    }
6242
6243    #[test]
6244    fn test_dtype_determines_int8() {
6245        // DType::I8 indicates int8 data
6246        let u8_img = TensorDyn::image(64, 64, PixelFormat::Rgb, DType::U8, None).unwrap();
6247        let i8_img = TensorDyn::image(64, 64, PixelFormat::Rgb, DType::I8, None).unwrap();
6248        assert_eq!(u8_img.dtype(), DType::U8);
6249        assert_eq!(i8_img.dtype(), DType::I8);
6250    }
6251
6252    #[test]
6253    fn test_pixel_layout_packed_vs_planar() {
6254        // Packed vs planar layout classification
6255        assert_eq!(PixelFormat::Rgb.layout(), PixelLayout::Packed);
6256        assert_eq!(PixelFormat::Rgba.layout(), PixelLayout::Packed);
6257        assert_eq!(PixelFormat::PlanarRgb.layout(), PixelLayout::Planar);
6258        assert_eq!(PixelFormat::Nv12.layout(), PixelLayout::SemiPlanar);
6259    }
6260
6261    /// Integration test that exercises the PBO-to-PBO convert path.
6262    /// Uses ImageProcessor::create_image() to allocate PBO-backed tensors,
6263    /// then converts between them. Skipped when GL is unavailable or the
6264    /// backend is not PBO (e.g. DMA-buf systems).
6265    #[cfg(target_os = "linux")]
6266    #[cfg(feature = "opengl")]
6267    #[test]
6268    fn test_convert_pbo_to_pbo() {
6269        let mut converter = ImageProcessor::new().unwrap();
6270
6271        // Skip if GL is not available or backend is not PBO
6272        let is_pbo = converter
6273            .opengl
6274            .as_ref()
6275            .is_some_and(|gl| gl.transfer_backend() == opengl_headless::TransferBackend::Pbo);
6276        if !is_pbo {
6277            eprintln!("Skipping test_convert_pbo_to_pbo: backend is not PBO");
6278            return;
6279        }
6280
6281        let src_w = 640;
6282        let src_h = 480;
6283        let dst_w = 320;
6284        let dst_h = 240;
6285
6286        // Create PBO-backed source image
6287        let pbo_src = converter
6288            .create_image(src_w, src_h, PixelFormat::Rgba, DType::U8, None)
6289            .unwrap();
6290        assert_eq!(
6291            pbo_src.as_u8().unwrap().memory(),
6292            TensorMemory::Pbo,
6293            "create_image should produce a PBO tensor"
6294        );
6295
6296        // Fill source PBO with test pattern: load JPEG then convert Mem→PBO
6297        let file = edgefirst_bench::testdata::read("zidane.jpg").to_vec();
6298        let jpeg_src = crate::load_image(&file, Some(PixelFormat::Rgba), None).unwrap();
6299
6300        // Resize JPEG into a Mem temp of the right size, then copy into PBO
6301        let mem_src = TensorDyn::image(
6302            src_w,
6303            src_h,
6304            PixelFormat::Rgba,
6305            DType::U8,
6306            Some(TensorMemory::Mem),
6307        )
6308        .unwrap();
6309        let (result, _jpeg_src, mem_src) = convert_img(
6310            &mut CPUProcessor::new(),
6311            jpeg_src,
6312            mem_src,
6313            Rotation::None,
6314            Flip::None,
6315            Crop::no_crop(),
6316        );
6317        result.unwrap();
6318
6319        // Copy pixel data into the PBO source by mapping it
6320        {
6321            let src_data = mem_src.as_u8().unwrap().map().unwrap();
6322            let mut pbo_map = pbo_src.as_u8().unwrap().map().unwrap();
6323            pbo_map.copy_from_slice(&src_data);
6324        }
6325
6326        // Create PBO-backed destination image
6327        let pbo_dst = converter
6328            .create_image(dst_w, dst_h, PixelFormat::Rgba, DType::U8, None)
6329            .unwrap();
6330        assert_eq!(pbo_dst.as_u8().unwrap().memory(), TensorMemory::Pbo);
6331
6332        // Convert PBO→PBO (this exercises convert_pbo_to_pbo)
6333        let mut pbo_dst = pbo_dst;
6334        let result = converter.convert(
6335            &pbo_src,
6336            &mut pbo_dst,
6337            Rotation::None,
6338            Flip::None,
6339            Crop::no_crop(),
6340        );
6341        result.unwrap();
6342
6343        // Verify: compare with CPU-only conversion of the same input
6344        let cpu_dst = TensorDyn::image(
6345            dst_w,
6346            dst_h,
6347            PixelFormat::Rgba,
6348            DType::U8,
6349            Some(TensorMemory::Mem),
6350        )
6351        .unwrap();
6352        let (result, _mem_src, cpu_dst) = convert_img(
6353            &mut CPUProcessor::new(),
6354            mem_src,
6355            cpu_dst,
6356            Rotation::None,
6357            Flip::None,
6358            Crop::no_crop(),
6359        );
6360        result.unwrap();
6361
6362        let pbo_dst_img = {
6363            let mut __t = pbo_dst.into_u8().unwrap();
6364            __t.set_format(PixelFormat::Rgba).unwrap();
6365            TensorDyn::from(__t)
6366        };
6367        compare_images(&pbo_dst_img, &cpu_dst, 0.95, function!());
6368        log::info!("test_convert_pbo_to_pbo: PASS — PBO-to-PBO convert matches CPU reference");
6369    }
6370
6371    #[test]
6372    fn test_image_bgra() {
6373        let img = TensorDyn::image(
6374            640,
6375            480,
6376            PixelFormat::Bgra,
6377            DType::U8,
6378            Some(edgefirst_tensor::TensorMemory::Mem),
6379        )
6380        .unwrap();
6381        assert_eq!(img.width(), Some(640));
6382        assert_eq!(img.height(), Some(480));
6383        assert_eq!(img.format().unwrap().channels(), 4);
6384        assert_eq!(img.format().unwrap(), PixelFormat::Bgra);
6385    }
6386
6387    // ========================================================================
6388    // Tests for EDGEFIRST_FORCE_BACKEND env var
6389    // ========================================================================
6390
6391    #[test]
6392    fn test_force_backend_cpu() {
6393        let original = std::env::var("EDGEFIRST_FORCE_BACKEND").ok();
6394        unsafe { std::env::set_var("EDGEFIRST_FORCE_BACKEND", "cpu") };
6395        let result = ImageProcessor::new();
6396        match original {
6397            Some(s) => unsafe { std::env::set_var("EDGEFIRST_FORCE_BACKEND", s) },
6398            None => unsafe { std::env::remove_var("EDGEFIRST_FORCE_BACKEND") },
6399        }
6400        let converter = result.unwrap();
6401        assert!(converter.cpu.is_some());
6402        assert_eq!(converter.forced_backend, Some(ForcedBackend::Cpu));
6403    }
6404
6405    #[test]
6406    fn test_force_backend_invalid() {
6407        let original = std::env::var("EDGEFIRST_FORCE_BACKEND").ok();
6408        unsafe { std::env::set_var("EDGEFIRST_FORCE_BACKEND", "invalid") };
6409        let result = ImageProcessor::new();
6410        match original {
6411            Some(s) => unsafe { std::env::set_var("EDGEFIRST_FORCE_BACKEND", s) },
6412            None => unsafe { std::env::remove_var("EDGEFIRST_FORCE_BACKEND") },
6413        }
6414        assert!(
6415            matches!(&result, Err(Error::ForcedBackendUnavailable(s)) if s.contains("unknown")),
6416            "invalid backend value should return ForcedBackendUnavailable error: {result:?}"
6417        );
6418    }
6419
6420    #[test]
6421    fn test_force_backend_unset() {
6422        let original = std::env::var("EDGEFIRST_FORCE_BACKEND").ok();
6423        unsafe { std::env::remove_var("EDGEFIRST_FORCE_BACKEND") };
6424        let result = ImageProcessor::new();
6425        match original {
6426            Some(s) => unsafe { std::env::set_var("EDGEFIRST_FORCE_BACKEND", s) },
6427            None => unsafe { std::env::remove_var("EDGEFIRST_FORCE_BACKEND") },
6428        }
6429        let converter = result.unwrap();
6430        assert!(converter.forced_backend.is_none());
6431    }
6432
6433    // ========================================================================
6434    // Tests for hybrid mask path error handling
6435    // ========================================================================
6436
6437    #[test]
6438    fn test_draw_proto_masks_no_cpu_returns_error() {
6439        // Disable CPU backend to trigger the error path
6440        let original_cpu = std::env::var("EDGEFIRST_DISABLE_CPU").ok();
6441        unsafe { std::env::set_var("EDGEFIRST_DISABLE_CPU", "1") };
6442        let original_gl = std::env::var("EDGEFIRST_DISABLE_GL").ok();
6443        unsafe { std::env::set_var("EDGEFIRST_DISABLE_GL", "1") };
6444        let original_g2d = std::env::var("EDGEFIRST_DISABLE_G2D").ok();
6445        unsafe { std::env::set_var("EDGEFIRST_DISABLE_G2D", "1") };
6446
6447        let result = ImageProcessor::new();
6448
6449        match original_cpu {
6450            Some(s) => unsafe { std::env::set_var("EDGEFIRST_DISABLE_CPU", s) },
6451            None => unsafe { std::env::remove_var("EDGEFIRST_DISABLE_CPU") },
6452        }
6453        match original_gl {
6454            Some(s) => unsafe { std::env::set_var("EDGEFIRST_DISABLE_GL", s) },
6455            None => unsafe { std::env::remove_var("EDGEFIRST_DISABLE_GL") },
6456        }
6457        match original_g2d {
6458            Some(s) => unsafe { std::env::set_var("EDGEFIRST_DISABLE_G2D", s) },
6459            None => unsafe { std::env::remove_var("EDGEFIRST_DISABLE_G2D") },
6460        }
6461
6462        let mut converter = result.unwrap();
6463        assert!(converter.cpu.is_none(), "CPU should be disabled");
6464
6465        let dst = TensorDyn::image(
6466            640,
6467            480,
6468            PixelFormat::Rgba,
6469            DType::U8,
6470            Some(TensorMemory::Mem),
6471        )
6472        .unwrap();
6473        let mut dst_dyn = dst;
6474        let det = [DetectBox {
6475            bbox: edgefirst_decoder::BoundingBox {
6476                xmin: 0.1,
6477                ymin: 0.1,
6478                xmax: 0.5,
6479                ymax: 0.5,
6480            },
6481            score: 0.9,
6482            label: 0,
6483        }];
6484        let proto_data = {
6485            use edgefirst_tensor::{Tensor, TensorDyn};
6486            let coeff_t = Tensor::<f32>::from_slice(&[0.5_f32; 4], &[1, 4]).unwrap();
6487            let protos_t =
6488                Tensor::<f32>::from_slice(&vec![0.0_f32; 8 * 8 * 4], &[8, 8, 4]).unwrap();
6489            ProtoData {
6490                mask_coefficients: TensorDyn::F32(coeff_t),
6491                protos: TensorDyn::F32(protos_t),
6492                layout: ProtoLayout::Nhwc,
6493            }
6494        };
6495        let result =
6496            converter.draw_proto_masks(&mut dst_dyn, &det, &proto_data, Default::default());
6497        assert!(
6498            matches!(&result, Err(Error::Internal(s)) if s.contains("CPU backend")),
6499            "draw_proto_masks without CPU should return Internal error: {result:?}"
6500        );
6501    }
6502
6503    #[test]
6504    fn test_draw_proto_masks_cpu_fallback_works() {
6505        // Force CPU-only backend to ensure the CPU fallback path executes
6506        let original = std::env::var("EDGEFIRST_FORCE_BACKEND").ok();
6507        unsafe { std::env::set_var("EDGEFIRST_FORCE_BACKEND", "cpu") };
6508        let result = ImageProcessor::new();
6509        match original {
6510            Some(s) => unsafe { std::env::set_var("EDGEFIRST_FORCE_BACKEND", s) },
6511            None => unsafe { std::env::remove_var("EDGEFIRST_FORCE_BACKEND") },
6512        }
6513
6514        let mut converter = result.unwrap();
6515        assert!(converter.cpu.is_some());
6516
6517        let dst = TensorDyn::image(
6518            64,
6519            64,
6520            PixelFormat::Rgba,
6521            DType::U8,
6522            Some(TensorMemory::Mem),
6523        )
6524        .unwrap();
6525        let mut dst_dyn = dst;
6526        let det = [DetectBox {
6527            bbox: edgefirst_decoder::BoundingBox {
6528                xmin: 0.1,
6529                ymin: 0.1,
6530                xmax: 0.5,
6531                ymax: 0.5,
6532            },
6533            score: 0.9,
6534            label: 0,
6535        }];
6536        let proto_data = {
6537            use edgefirst_tensor::{Tensor, TensorDyn};
6538            let coeff_t = Tensor::<f32>::from_slice(&[0.5_f32; 4], &[1, 4]).unwrap();
6539            let protos_t =
6540                Tensor::<f32>::from_slice(&vec![0.0_f32; 8 * 8 * 4], &[8, 8, 4]).unwrap();
6541            ProtoData {
6542                mask_coefficients: TensorDyn::F32(coeff_t),
6543                protos: TensorDyn::F32(protos_t),
6544                layout: ProtoLayout::Nhwc,
6545            }
6546        };
6547        let result =
6548            converter.draw_proto_masks(&mut dst_dyn, &det, &proto_data, Default::default());
6549        assert!(result.is_ok(), "CPU fallback path should work: {result:?}");
6550    }
6551
6552    // ============================================================
6553    // draw_decoded_masks / draw_proto_masks — 4-scenario pixel-
6554    // verified tests. Exercises each backend against the full
6555    // output-contract matrix:
6556    //
6557    //   | detections | background | expected dst             |
6558    //   |------------|------------|--------------------------|
6559    //   | empty      | none       | fully cleared (0x00)     |
6560    //   | empty      | set        | fully equal to bg        |
6561    //   | set        | none       | cleared outside box +    |
6562    //   |            |            | mask-coloured inside     |
6563    //   | set        | set        | bg outside box + mask    |
6564    //   |            |            | blended inside           |
6565    //
6566    // Every test pre-fills dst with a non-zero "dirty" pattern so
6567    // that any silent `return Ok(())` leaks the pattern into the
6568    // asserted output and fails loudly.
6569    // ============================================================
6570
6571    /// Run `body` with `EDGEFIRST_FORCE_BACKEND` temporarily set (or
6572    /// removed), restoring the prior value afterward. Tests are mutated
6573    /// env-serialized via the process-wide `FORCE_BACKEND_MUTEX`.
6574    fn with_force_backend<R>(value: Option<&str>, body: impl FnOnce() -> R) -> R {
6575        use std::sync::{Mutex, MutexGuard, OnceLock};
6576        static LOCK: OnceLock<Mutex<()>> = OnceLock::new();
6577        let _guard: MutexGuard<()> = LOCK
6578            .get_or_init(|| Mutex::new(()))
6579            .lock()
6580            .unwrap_or_else(|e| e.into_inner());
6581        let original = std::env::var("EDGEFIRST_FORCE_BACKEND").ok();
6582        match value {
6583            Some(v) => unsafe { std::env::set_var("EDGEFIRST_FORCE_BACKEND", v) },
6584            None => unsafe { std::env::remove_var("EDGEFIRST_FORCE_BACKEND") },
6585        }
6586        let r = body();
6587        match original {
6588            Some(s) => unsafe { std::env::set_var("EDGEFIRST_FORCE_BACKEND", s) },
6589            None => unsafe { std::env::remove_var("EDGEFIRST_FORCE_BACKEND") },
6590        }
6591        r
6592    }
6593
6594    /// Allocate an RGBA image tensor and pre-fill every byte with a
6595    /// distinctive non-zero pattern. Any test that relies on the old
6596    /// "dst is already cleared" assumption will see this pattern leak
6597    /// through to the output and fail.
6598    fn make_dirty_dst(w: usize, h: usize, mem: Option<TensorMemory>) -> TensorDyn {
6599        let dst = TensorDyn::image(w, h, PixelFormat::Rgba, DType::U8, mem).unwrap();
6600        {
6601            use edgefirst_tensor::TensorMapTrait;
6602            let u8t = dst.as_u8().unwrap();
6603            let mut map = u8t.map().unwrap();
6604            for (i, b) in map.as_mut_slice().iter_mut().enumerate() {
6605                *b = 0xA0u8.wrapping_add((i as u8) & 0x3F);
6606            }
6607        }
6608        dst
6609    }
6610
6611    /// Allocate an RGBA background filled with a constant colour.
6612    fn make_bg(w: usize, h: usize, mem: Option<TensorMemory>, rgba: [u8; 4]) -> TensorDyn {
6613        let bg = TensorDyn::image(w, h, PixelFormat::Rgba, DType::U8, mem).unwrap();
6614        {
6615            use edgefirst_tensor::TensorMapTrait;
6616            let u8t = bg.as_u8().unwrap();
6617            let mut map = u8t.map().unwrap();
6618            for chunk in map.as_mut_slice().chunks_exact_mut(4) {
6619                chunk.copy_from_slice(&rgba);
6620            }
6621        }
6622        bg
6623    }
6624
6625    fn pixel_at(dst: &TensorDyn, x: usize, y: usize) -> [u8; 4] {
6626        use edgefirst_tensor::TensorMapTrait;
6627        let w = dst.width().unwrap();
6628        let off = (y * w + x) * 4;
6629        let u8t = dst.as_u8().unwrap();
6630        let map = u8t.map().unwrap();
6631        let s = map.as_slice();
6632        [s[off], s[off + 1], s[off + 2], s[off + 3]]
6633    }
6634
6635    fn assert_every_pixel_eq(dst: &TensorDyn, expected: [u8; 4], case: &str) {
6636        use edgefirst_tensor::TensorMapTrait;
6637        let u8t = dst.as_u8().unwrap();
6638        let map = u8t.map().unwrap();
6639        for (i, chunk) in map.as_slice().chunks_exact(4).enumerate() {
6640            assert_eq!(
6641                chunk, &expected,
6642                "{case}: pixel idx {i} = {chunk:?}, expected {expected:?}"
6643            );
6644        }
6645    }
6646
6647    /// Scenario 1: empty detections, empty segmentation, no background
6648    /// → dst must be fully cleared to 0x00000000.
6649    fn scenario_empty_no_bg(processor: &mut ImageProcessor, case: &str) {
6650        let mut dst = make_dirty_dst(64, 64, None);
6651        processor
6652            .draw_decoded_masks(&mut dst, &[], &[], MaskOverlay::default())
6653            .unwrap_or_else(|e| panic!("{case}/decoded_masks empty+no-bg failed: {e:?}"));
6654        assert_every_pixel_eq(&dst, [0, 0, 0, 0], &format!("{case}/decoded"));
6655
6656        let mut dst = make_dirty_dst(64, 64, None);
6657        let proto = {
6658            use edgefirst_tensor::{Tensor, TensorDyn};
6659            // Placeholder (no detections); shape [1, 4] to keep the tensor well-formed.
6660            let coeff_t = Tensor::<f32>::from_slice(&[0.0_f32; 4], &[1, 4]).unwrap();
6661            let protos_t =
6662                Tensor::<f32>::from_slice(&vec![0.0_f32; 8 * 8 * 4], &[8, 8, 4]).unwrap();
6663            ProtoData {
6664                mask_coefficients: TensorDyn::F32(coeff_t),
6665                protos: TensorDyn::F32(protos_t),
6666                layout: ProtoLayout::Nhwc,
6667            }
6668        };
6669        processor
6670            .draw_proto_masks(&mut dst, &[], &proto, MaskOverlay::default())
6671            .unwrap_or_else(|e| panic!("{case}/proto_masks empty+no-bg failed: {e:?}"));
6672        assert_every_pixel_eq(&dst, [0, 0, 0, 0], &format!("{case}/proto"));
6673    }
6674
6675    /// Scenario 2: empty detections, empty segmentation, background set
6676    /// → dst must be fully equal to bg.
6677    fn scenario_empty_with_bg(processor: &mut ImageProcessor, case: &str) {
6678        let bg_color = [42, 99, 200, 255];
6679        let bg = make_bg(64, 64, None, bg_color);
6680        let overlay = MaskOverlay::new().with_background(&bg);
6681
6682        let mut dst = make_dirty_dst(64, 64, None);
6683        processor
6684            .draw_decoded_masks(&mut dst, &[], &[], overlay)
6685            .unwrap_or_else(|e| panic!("{case}/decoded_masks empty+bg failed: {e:?}"));
6686        assert_every_pixel_eq(&dst, bg_color, &format!("{case}/decoded bg blit"));
6687
6688        let mut dst = make_dirty_dst(64, 64, None);
6689        let proto = {
6690            use edgefirst_tensor::{Tensor, TensorDyn};
6691            // Placeholder (no detections); shape [1, 4] to keep the tensor well-formed.
6692            let coeff_t = Tensor::<f32>::from_slice(&[0.0_f32; 4], &[1, 4]).unwrap();
6693            let protos_t =
6694                Tensor::<f32>::from_slice(&vec![0.0_f32; 8 * 8 * 4], &[8, 8, 4]).unwrap();
6695            ProtoData {
6696                mask_coefficients: TensorDyn::F32(coeff_t),
6697                protos: TensorDyn::F32(protos_t),
6698                layout: ProtoLayout::Nhwc,
6699            }
6700        };
6701        processor
6702            .draw_proto_masks(&mut dst, &[], &proto, overlay)
6703            .unwrap_or_else(|e| panic!("{case}/proto_masks empty+bg failed: {e:?}"));
6704        assert_every_pixel_eq(&dst, bg_color, &format!("{case}/proto bg blit"));
6705    }
6706
6707    /// Scenario 3: one detection with a fully-opaque segmentation fill,
6708    /// no background → outside the box dst must be 0x00, inside it must
6709    /// be a non-zero mask colour (the render_segmentation output).
6710    fn scenario_detect_no_bg(processor: &mut ImageProcessor, case: &str) {
6711        use edgefirst_decoder::Segmentation;
6712        use ndarray::Array3;
6713        processor
6714            .set_class_colors(&[[200, 80, 40, 255]])
6715            .expect("set_class_colors");
6716
6717        let detect = DetectBox {
6718            bbox: [0.25, 0.25, 0.75, 0.75].into(),
6719            score: 0.99,
6720            label: 0,
6721        };
6722        let seg_arr = Array3::from_shape_fn((4, 4, 1), |_| 255u8);
6723        let seg = Segmentation {
6724            segmentation: seg_arr,
6725            xmin: 0.25,
6726            ymin: 0.25,
6727            xmax: 0.75,
6728            ymax: 0.75,
6729        };
6730
6731        let mut dst = make_dirty_dst(64, 64, None);
6732        processor
6733            .draw_decoded_masks(&mut dst, &[detect], &[seg], MaskOverlay::default())
6734            .unwrap_or_else(|e| panic!("{case}/decoded_masks detect+no-bg failed: {e:?}"));
6735
6736        // Outside the bbox (corner): must be cleared black.
6737        let corner = pixel_at(&dst, 2, 2);
6738        assert_eq!(
6739            corner,
6740            [0, 0, 0, 0],
6741            "{case}/decoded: corner (2,2) leaked dirty pattern: {corner:?}"
6742        );
6743        // Inside the bbox (center): the mask colour must be visible.
6744        // Any non-zero pixel is acceptable — exact rendering varies
6745        // between backends (GL smoothstep, CPU nearest).
6746        let center = pixel_at(&dst, 32, 32);
6747        assert!(
6748            center != [0, 0, 0, 0],
6749            "{case}/decoded: center (32,32) was not coloured: {center:?}"
6750        );
6751    }
6752
6753    /// Scenario 4: detection + background. Outside the box must match
6754    /// bg; inside the box must NOT match bg (mask blended on top).
6755    fn scenario_detect_with_bg(processor: &mut ImageProcessor, case: &str) {
6756        use edgefirst_decoder::Segmentation;
6757        use ndarray::Array3;
6758        processor
6759            .set_class_colors(&[[200, 80, 40, 255]])
6760            .expect("set_class_colors");
6761        let bg_color = [10, 20, 30, 255];
6762        let bg = make_bg(64, 64, None, bg_color);
6763
6764        let detect = DetectBox {
6765            bbox: [0.25, 0.25, 0.75, 0.75].into(),
6766            score: 0.99,
6767            label: 0,
6768        };
6769        let seg_arr = Array3::from_shape_fn((4, 4, 1), |_| 255u8);
6770        let seg = Segmentation {
6771            segmentation: seg_arr,
6772            xmin: 0.25,
6773            ymin: 0.25,
6774            xmax: 0.75,
6775            ymax: 0.75,
6776        };
6777
6778        let overlay = MaskOverlay::new().with_background(&bg);
6779        let mut dst = make_dirty_dst(64, 64, None);
6780        processor
6781            .draw_decoded_masks(&mut dst, &[detect], &[seg], overlay)
6782            .unwrap_or_else(|e| panic!("{case}/decoded_masks detect+bg failed: {e:?}"));
6783
6784        // Outside the bbox (corner): bg colour.
6785        let corner = pixel_at(&dst, 2, 2);
6786        assert_eq!(
6787            corner, bg_color,
6788            "{case}/decoded: corner (2,2) should show bg {bg_color:?} got {corner:?}"
6789        );
6790        // Inside the bbox (center): mask blended on bg, must differ from
6791        // pure bg (alpha-blend with mask colour produces a distinct shade).
6792        let center = pixel_at(&dst, 32, 32);
6793        assert!(
6794            center != bg_color,
6795            "{case}/decoded: center (32,32) should differ from bg {bg_color:?}, got {center:?}"
6796        );
6797    }
6798
6799    /// Run all 4 scenarios against the processor. Skip gracefully if
6800    /// construction fails (backend unavailable on this host).
6801    fn run_all_scenarios(
6802        force_backend: Option<&'static str>,
6803        case: &'static str,
6804        require_dma_for_bg: bool,
6805    ) {
6806        if require_dma_for_bg && !edgefirst_tensor::is_dma_available() {
6807            eprintln!("SKIPPED: {case} — DMA not available on this host");
6808            return;
6809        }
6810        let processor_result = with_force_backend(force_backend, ImageProcessor::new);
6811        let mut processor = match processor_result {
6812            Ok(p) => p,
6813            Err(e) => {
6814                eprintln!("SKIPPED: {case} — backend init failed: {e:?}");
6815                return;
6816            }
6817        };
6818        scenario_empty_no_bg(&mut processor, case);
6819        scenario_empty_with_bg(&mut processor, case);
6820        scenario_detect_no_bg(&mut processor, case);
6821        scenario_detect_with_bg(&mut processor, case);
6822    }
6823
6824    #[test]
6825    fn test_draw_masks_4_scenarios_cpu() {
6826        run_all_scenarios(Some("cpu"), "cpu", false);
6827    }
6828
6829    #[test]
6830    fn test_draw_masks_4_scenarios_auto() {
6831        run_all_scenarios(None, "auto", false);
6832    }
6833
6834    #[cfg(target_os = "linux")]
6835    #[cfg(feature = "opengl")]
6836    #[test]
6837    fn test_draw_masks_4_scenarios_opengl() {
6838        run_all_scenarios(Some("opengl"), "opengl", false);
6839    }
6840
6841    /// G2D forced backend: exercises the zero-detection empty-frame
6842    /// paths via `g2d_clear` and `g2d_blit`. Scenarios 3 and 4 (with
6843    /// detections) expect `NotImplemented` since G2D has no rasterizer
6844    /// for boxes / masks.
6845    #[cfg(target_os = "linux")]
6846    #[test]
6847    fn test_draw_masks_zero_detection_g2d_forced() {
6848        if !edgefirst_tensor::is_dma_available() {
6849            eprintln!("SKIPPED: g2d forced — DMA not available on this host");
6850            return;
6851        }
6852        let processor_result = with_force_backend(Some("g2d"), ImageProcessor::new);
6853        let mut processor = match processor_result {
6854            Ok(p) => p,
6855            Err(e) => {
6856                eprintln!("SKIPPED: g2d forced — init failed: {e:?}");
6857                return;
6858            }
6859        };
6860
6861        // Case 1: empty + no bg. G2D requires DMA-backed dst.
6862        let mut dst = TensorDyn::image(
6863            64,
6864            64,
6865            PixelFormat::Rgba,
6866            DType::U8,
6867            Some(TensorMemory::Dma),
6868        )
6869        .unwrap();
6870        {
6871            use edgefirst_tensor::TensorMapTrait;
6872            let u8t = dst.as_u8_mut().unwrap();
6873            let mut map = u8t.map().unwrap();
6874            map.as_mut_slice().fill(0xBB);
6875        }
6876        processor
6877            .draw_decoded_masks(&mut dst, &[], &[], MaskOverlay::default())
6878            .expect("g2d empty+no-bg");
6879        assert_every_pixel_eq(&dst, [0, 0, 0, 0], "g2d/case1 cleared");
6880
6881        // Case 2: empty + bg. Both surfaces DMA-backed for g2d_blit.
6882        let bg_color = [7, 11, 13, 255];
6883        let bg = {
6884            let t = TensorDyn::image(
6885                64,
6886                64,
6887                PixelFormat::Rgba,
6888                DType::U8,
6889                Some(TensorMemory::Dma),
6890            )
6891            .unwrap();
6892            {
6893                use edgefirst_tensor::TensorMapTrait;
6894                let u8t = t.as_u8().unwrap();
6895                let mut map = u8t.map().unwrap();
6896                for chunk in map.as_mut_slice().chunks_exact_mut(4) {
6897                    chunk.copy_from_slice(&bg_color);
6898                }
6899            }
6900            t
6901        };
6902        let mut dst = TensorDyn::image(
6903            64,
6904            64,
6905            PixelFormat::Rgba,
6906            DType::U8,
6907            Some(TensorMemory::Dma),
6908        )
6909        .unwrap();
6910        {
6911            use edgefirst_tensor::TensorMapTrait;
6912            let u8t = dst.as_u8_mut().unwrap();
6913            let mut map = u8t.map().unwrap();
6914            map.as_mut_slice().fill(0x55);
6915        }
6916        processor
6917            .draw_decoded_masks(&mut dst, &[], &[], MaskOverlay::new().with_background(&bg))
6918            .expect("g2d empty+bg");
6919        assert_every_pixel_eq(&dst, bg_color, "g2d/case2 bg blit");
6920
6921        // Case 3 and 4: detect present — must return NotImplemented.
6922        let detect = DetectBox {
6923            bbox: [0.25, 0.25, 0.75, 0.75].into(),
6924            score: 0.9,
6925            label: 0,
6926        };
6927        let mut dst = TensorDyn::image(
6928            64,
6929            64,
6930            PixelFormat::Rgba,
6931            DType::U8,
6932            Some(TensorMemory::Dma),
6933        )
6934        .unwrap();
6935        let err = processor
6936            .draw_decoded_masks(&mut dst, &[detect], &[], MaskOverlay::default())
6937            .expect_err("g2d must reject detect-present draw_decoded_masks");
6938        assert!(
6939            matches!(err, Error::NotImplemented(_)),
6940            "g2d case3 wrong error: {err:?}"
6941        );
6942    }
6943
6944    #[test]
6945    fn test_set_format_then_cpu_convert() {
6946        // Force CPU backend (save/restore to avoid leaking into other tests)
6947        let original = std::env::var("EDGEFIRST_FORCE_BACKEND").ok();
6948        unsafe { std::env::set_var("EDGEFIRST_FORCE_BACKEND", "cpu") };
6949        let mut processor = ImageProcessor::new().unwrap();
6950        match original {
6951            Some(s) => unsafe { std::env::set_var("EDGEFIRST_FORCE_BACKEND", s) },
6952            None => unsafe { std::env::remove_var("EDGEFIRST_FORCE_BACKEND") },
6953        }
6954
6955        // Load a source image
6956        let image = edgefirst_bench::testdata::read("zidane.jpg");
6957        let src = load_image(&image, Some(PixelFormat::Rgba), None).unwrap();
6958
6959        // Create a raw tensor, then attach format — simulating the from_fd workflow
6960        let mut dst =
6961            TensorDyn::new(&[640, 640, 3], DType::U8, Some(TensorMemory::Mem), None).unwrap();
6962        dst.set_format(PixelFormat::Rgb).unwrap();
6963
6964        // Convert should work with the set_format-annotated tensor
6965        processor
6966            .convert(&src, &mut dst, Rotation::None, Flip::None, Crop::default())
6967            .unwrap();
6968
6969        // Verify format survived conversion
6970        assert_eq!(dst.format(), Some(PixelFormat::Rgb));
6971        assert_eq!(dst.width(), Some(640));
6972        assert_eq!(dst.height(), Some(640));
6973    }
6974
6975    /// Verify that creating multiple ImageProcessors on the same thread and
6976    /// performing a resize on each does not deadlock or error.
6977    ///
6978    /// Uses automatic memory allocation (DMA → PBO → Mem fallback) so that
6979    /// hardware backends (OpenGL, G2D) are exercised on capable targets.
6980    #[test]
6981    fn test_multiple_image_processors_same_thread() {
6982        let mut processors: Vec<ImageProcessor> = (0..4)
6983            .map(|_| ImageProcessor::new().expect("ImageProcessor::new() failed"))
6984            .collect();
6985
6986        for proc in &mut processors {
6987            let src = proc
6988                .create_image(128, 128, PixelFormat::Rgb, DType::U8, None)
6989                .expect("create src failed");
6990            let mut dst = proc
6991                .create_image(64, 64, PixelFormat::Rgb, DType::U8, None)
6992                .expect("create dst failed");
6993            proc.convert(&src, &mut dst, Rotation::None, Flip::None, Crop::default())
6994                .expect("convert failed");
6995            assert_eq!(dst.width(), Some(64));
6996            assert_eq!(dst.height(), Some(64));
6997        }
6998    }
6999
7000    /// Verify that creating ImageProcessors on separate threads and performing
7001    /// a resize on each does not deadlock or error.
7002    ///
7003    /// Uses automatic memory allocation (DMA → PBO → Mem fallback) so that
7004    /// hardware backends (OpenGL, G2D) are exercised on capable targets.
7005    /// A 60-second timeout prevents CI from hanging on deadlock regressions.
7006    #[test]
7007    fn test_multiple_image_processors_separate_threads() {
7008        use std::sync::mpsc;
7009        use std::time::Duration;
7010
7011        const TIMEOUT: Duration = Duration::from_secs(60);
7012
7013        let (tx, rx) = mpsc::channel::<()>();
7014
7015        std::thread::spawn(move || {
7016            let handles: Vec<_> = (0..4)
7017                .map(|i| {
7018                    std::thread::spawn(move || {
7019                        let mut proc = ImageProcessor::new().unwrap_or_else(|e| {
7020                            panic!("ImageProcessor::new() failed on thread {i}: {e}")
7021                        });
7022                        let src = proc
7023                            .create_image(128, 128, PixelFormat::Rgb, DType::U8, None)
7024                            .unwrap_or_else(|e| panic!("create src failed on thread {i}: {e}"));
7025                        let mut dst = proc
7026                            .create_image(64, 64, PixelFormat::Rgb, DType::U8, None)
7027                            .unwrap_or_else(|e| panic!("create dst failed on thread {i}: {e}"));
7028                        proc.convert(&src, &mut dst, Rotation::None, Flip::None, Crop::default())
7029                            .unwrap_or_else(|e| panic!("convert failed on thread {i}: {e}"));
7030                        assert_eq!(dst.width(), Some(64));
7031                        assert_eq!(dst.height(), Some(64));
7032                    })
7033                })
7034                .collect();
7035
7036            for (i, h) in handles.into_iter().enumerate() {
7037                h.join()
7038                    .unwrap_or_else(|e| panic!("thread {i} panicked: {e:?}"));
7039            }
7040
7041            let _ = tx.send(());
7042        });
7043
7044        rx.recv_timeout(TIMEOUT).unwrap_or_else(|_| {
7045            panic!("test_multiple_image_processors_separate_threads timed out after {TIMEOUT:?}")
7046        });
7047    }
7048
7049    /// Verify that 4 fully-initialized ImageProcessors on separate threads can
7050    /// all operate concurrently without deadlocking each other.
7051    ///
7052    /// All processors are created first, then a barrier synchronizes them so
7053    /// they all start converting at the same instant — maximizing contention.
7054    /// A 60-second timeout prevents CI from hanging on deadlock regressions.
7055    #[test]
7056    fn test_image_processors_concurrent_operations() {
7057        use std::sync::{mpsc, Arc, Barrier};
7058        use std::time::Duration;
7059
7060        const N: usize = 4;
7061        const ROUNDS: usize = 10;
7062        const TIMEOUT: Duration = Duration::from_secs(60);
7063
7064        let (tx, rx) = mpsc::channel::<()>();
7065
7066        std::thread::spawn(move || {
7067            let barrier = Arc::new(Barrier::new(N));
7068
7069            let handles: Vec<_> = (0..N)
7070                .map(|i| {
7071                    let barrier = Arc::clone(&barrier);
7072                    std::thread::spawn(move || {
7073                        let mut proc = ImageProcessor::new().unwrap_or_else(|e| {
7074                            panic!("ImageProcessor::new() failed on thread {i}: {e}")
7075                        });
7076
7077                        // All threads wait here until every processor is initialized.
7078                        barrier.wait();
7079
7080                        // Now all 4 hammer the GPU concurrently.
7081                        for round in 0..ROUNDS {
7082                            let src = proc
7083                                .create_image(128, 128, PixelFormat::Rgb, DType::U8, None)
7084                                .unwrap_or_else(|e| {
7085                                    panic!("create src failed on thread {i} round {round}: {e}")
7086                                });
7087                            let mut dst = proc
7088                                .create_image(64, 64, PixelFormat::Rgb, DType::U8, None)
7089                                .unwrap_or_else(|e| {
7090                                    panic!("create dst failed on thread {i} round {round}: {e}")
7091                                });
7092                            proc.convert(
7093                                &src,
7094                                &mut dst,
7095                                Rotation::None,
7096                                Flip::None,
7097                                Crop::default(),
7098                            )
7099                            .unwrap_or_else(|e| {
7100                                panic!("convert failed on thread {i} round {round}: {e}")
7101                            });
7102                            assert_eq!(dst.width(), Some(64));
7103                            assert_eq!(dst.height(), Some(64));
7104                        }
7105                    })
7106                })
7107                .collect();
7108
7109            for (i, h) in handles.into_iter().enumerate() {
7110                h.join()
7111                    .unwrap_or_else(|e| panic!("thread {i} panicked: {e:?}"));
7112            }
7113
7114            let _ = tx.send(());
7115        });
7116
7117        rx.recv_timeout(TIMEOUT).unwrap_or_else(|_| {
7118            panic!("test_image_processors_concurrent_operations timed out after {TIMEOUT:?}")
7119        });
7120    }
7121}