pineapple
pineapple is a command-line tool for processing and profiling morphological data in bio-imaging datasets.
Installation
Cargo
pineapple can be installed using the rust package manager cargo:
Assuming cargo properly manages your PATH, you can verify the installation as follows.
Pre-compiled binaries
Pre-built binaries for x86-64 linux, x86-64 apple, aarch-64 apple, and x86-64 windows are available for download in releases. Note that the pre-built binaries are untested on windows.
Source
If you have rust (tested on 1.86.0) and cargo installations, you can build directly from source as follows.
If you want to reduce compile time, lto can be disabled in Cargo.toml. However this may increase binary size (particularly on linux) and possibly reduce performance slightly.
Conda
A future conda release is being planned.
Usage
pineapple process
Given a set of valid input images, pineapple can extract object-level data for a variety of segmentation formats including masks, polygons, and bounding boxes. Object-level data (e.g. cropped objects, polygons, etc.) can be extracted and saved as follows.
# Profile image-mask pairs stored in different directories
# Profile image-mask pairs stored in the same directory
# Profile image-polygon pairs stored in different directories
# Profile image-polygon pairs stored in the same directory
# Profile image-bounding-box pairs stored in different directories
# Profile image-bounding-box pairs stored in the same directory
For a more controlled run, a variety of flags can be set for all the process commands.
pineapple profile
Given a set of valid input images, pineapple can compute object-level morphological descriptors across a variety of paired segmentation formats including masks, polygons, and bounding boxes. Descriptors can be computed and saved as follows.
# Profile image-mask pairs stored in different directories
# Profile image-mask pairs stored in the same directory
# Profile image-polygon pairs stored in different directories
# Profile image-polygon pairs stored in the same directory
# Profile image-bounding-box pairs stored in different directories
# Profile image-bounding-box pairs stored in the same directory
For a more controlled run, a variety of flags can be set for all the profile commands.
pineapple neural
Given a set of valid input images, pineapple can compute object-level self-supervised features (aka. 'deep profiles') across a variety of paired segmentation formats including masks, polygons, and bounding boxes. Features can be computed and saved as follows.
# Generate features from image-mask pairs stored in different directories
# Generate features from image-mask pairs stored in the same directory
# Generate features from image-polygon pairs stored in different directories
# Generate features from image-polygon pairs stored in the same directory
# Generate features from image-bounding-box pairs stored in different directories
# Generate features from image-bounding-box pairs stored in the same directory
For a more controlled run, a variety of flags can be set for all the neural commands.
pineapple measure
If you want to compute quantitative features directly from images or polygons without associated segmentation data, then you can use pineapple measure. Various quantitative features can be computed and saved as follows.
# Measure intensity descriptors for a single image (to stdout)
# Measure intensity descriptors for images stored in a directory
# Measure moment descriptors for a single image (to stdout)
# Measure moment descriptors for images stored in a directory
# Measure texture descriptors for a single image (to stdout)
# Measure texture descriptors for images stored in a directory
# Measure zernike descriptors for a single image (to stdout)
# Measure zernike descriptors for images stored in a directory
# Measure form descriptors for a single set of polygons (to stdout)
# Measure form descriptors for polygons stored in a directory
Self-supervised features from a variety of pre-trained models can also easily be computed using pineapple measure.
# Measure self-supervised features for a single image (to stdout)
# Measure self-supervised features for images stored in a directory
# Measure self-supervised features for images stored in a directory on leveraing your apple silicon GPU
Of note, generating self-supervised embeddings from pre-extracted images will be much faster (on GPU or via multi-threading) than performing object-level computation on image-segment pairs. Therefore we recommend using pineapple neural [segment] for cases where you are storage-constrained and pineapple process [segment] then pineapple measure neural for cases where you require faster object-level embeddings.
pineapple utils
Additional utilities to convert between data formats (e.g. images to zarr arrays, segmentation masks to polygons, etc.) are available using pineapple utils. Various non-destructive conversions can be performed as follows.
# Convert images with the same number of channels to a zarr v3 group
# Convert a single segmentation mask to polygon format
# Convert a folder of segmentation masks to polygon format
# Convert a single segmentation mask to bounding boxes format
# Convert a folder of segmentation masks to bounding boxes format
Note that images2zarrs encodes image name strings as fixed-width numpy-style arrays (max length of 100). We currently do this as current zarr string decoding is inconsistent across different implementations. If you are loading the data in python, the saved image names can be mapped to strings via utf8 decoding as follows.
=
=
pineapple download
To enable easier testing and model development/evaluation, we have curated and standardized a variety of previously annotated or generated bio-imaging datasets. We have also collected a variety of pre-trained neural network models for generating self-supervised embeddings. Below we provide an overview of the available datasets and pre-trained weights.
Segmentation datasets
Each segmentation dataset was preprocessed and standardized to include images, segmentation masks, segmentation polygons, and object bounding boxes. Please check the original references and licenses to ensure the license supports your use case. You can download a segmentation dataset as follows.
# List all available segmentation datasets
# Download all available segmentation datasets
# Download a specific segmentation dataset
Below we provide a table of the available segmentation datasets in the current pineapple release.
| Dataset | Author | Size (GB) | License |
|---|---|---|---|
| almeida_2023 | Almeida et al. 2023 | 0.927 | CC BY 4.0 |
| arvidsson_2022 | Arvidsson et al. 2022 | 0.028 | CC BY 4.0 |
| cellpose_2021 | Stringer et al. 2021 | 0.356 | Custom NC |
| conic_2022 | Graham et al. 2022 | 1.920 | CC BY-NC 4.0 |
| cryonuseg_2021 | Mahbod et al. 2021 | 0.031 | MIT |
| dsb_2019 | Caicedo et al. 2019 | 0.112 | CC0 1.0 Universal |
| hpa_2022 | HPA 2022 | 1.630 | CC BY 4.0 |
| livecell_2021 | Edlund et al. 2021 | 3.260 | CC BY-NC 4.0 |
| nuinseg_2024 | Mahbod et al. 2024 | 0.347 | MIT |
| pannuke_2020 | Gamper et al. 2020 | 1.250 | CC BY-NC-SA 4.0 |
| tissuenet_2022 | Greenwald et al. 2022 | 4.270 | Modified NC Apache |
| vicar_2021 | Vicar et al. 2021 | 0.113 | CC BY 4.0 |
Benchmark datasets
Benchmark datasets provide single cell or single object images to evaluate the predictive performance of descriptors or self-supervised embeddings. Each dataset includes single object images, masks, polygons, and bounding boxes. Note that some of the single object segmentation masks were generated roughly and can be improved if desired. We also provide a synthetic image dataset for evaluating runtime performance of various processing/profiling methods. You can download a benchmark dataset as follows.
# List all available classification datasets
# Download all available classification datasets
# Download a specific classification dataset
Below we provide a table of the available classification datasets in the current pineapple release.
| Dataset | Author | Size (GB) | License |
|---|---|---|---|
| amgad_2022 | Amgad et al. 2022 | 0.062 | CC0 1.0 |
| cnmc_2019 | C-NMC Challenge | 0.182 | CC BY 3.0 |
| fracatlas_2023 | Abedeen et al. 2023 | 0.247 | CC BY 4.0 |
| isic_2019 | ISIC | 1.140 | CC BY-NC 4.0 |
| kermany_2018 | Kermany et al. 2018 | 0.638 | CC BY 4.0 |
| kromp_2023 | Kromp et al. 2023 | 0.025 | CC BY 4.0 |
| matek_2021 | Matek et al. 2021 | 0.508 | CC BY 4.0 |
| murphy_2001 | Murphy et al. 2001 | 0.033 | MIT |
| opencell_2024 | OpenCell | 1.030 | MIT |
| phillip_2021 | Phillip et al. 2021 | 0.032 | MIT |
| recursion_2019 | Recursion | 0.037 | CC BY-NC-SA 4.0 |
| verma_2021 | Verma et al. 2021 | 0.021 | CC BY-NC-SA 4.0 |
| runtime | Ouellette et al. 2025 | 0.018 | MIT |
Weights
The self-supervised embeddings generated via pineapple neural or pineapple measure neural are made possible by leveraging a variety of open source pre-trained neural networks. Models can be pre-downloaded or will be downloaded on first use. You can download pre-trained weights as follows.
# Optionally set a cache to save models (defaults to ~/.pineapple_cache)
# List all available pre-trained weights
# Download all available pre-trained weightsdatasets
# Download specific pre-trained weights dataset
Below we provide a table of the available weights in the current pineapple release.
| Model | Author | Size (GB) | License |
|---|---|---|---|
| dino_vit_small | Huggingface/Candle | 0.097 | Apache License 2.0 |
| dino_vit_base | Huggingface/Candle | 0.330 | Apache License 2.0 |
| dinobloom_vit_base | Marr Lab | 0.330 | Apache License 2.0 |
| scdino_vit_small | Snijder Lab | 0.097 | Apache License 2.0 |
| subcell_vit_base | Lundberg Lab | 0.330 | MIT License |
If you would like another model added to pineapple, please open an issue providing a link to the original model implementation and the associated open source weights. For each new model, we have to generate a rust implementation for compatibility with pineapple neural and pineapple measure neural.
License
pineapple is licensed under the BSD 3-Clause license (see LICENSE).
You may not use this file except in compliance with the license. A copy of the license has been included in the root of the repository. Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the MIT license, shall be licensed as above, without any additional terms or conditions.