Crate apple_vision

Expand description

§vision

Safe Rust bindings for Apple’s Vision framework — on-device OCR, object detection, face landmarks, and other computer vision tasks on macOS.

Status: v0.14 ships the full Apple Vision request surface, including all five stateful tracking requests.

§Quick start — OCR

use apple_vision::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let recognizer = TextRecognizer::new()
        .with_recognition_level(RecognitionLevel::Accurate)
        .with_language_correction(true);

    let observations = recognizer.recognize_in_path("/tmp/screenshot.png")?;
    for obs in &observations {
        println!("[{:.2}] '{}'", obs.confidence, obs.text);
    }
    Ok(())
}

§Composes with the rest of the doom-fish stack

screencapturekit-rs / capture ──► IOSurface / PNG ──► vision ──► text
                                                          │
                                                          ▼
                                                  foundation-models
                                                  ("summarise this")

§Feature flags

All request-type modules can be enabled independently, but the default feature set now enables the full Vision surface, including the new tracking module in v0.14.

§Roadmap

Single-image Vision requests (OCR, faces, landmarks, pose, contours, saliency, segmentation, Core ML, and the rest of the stateless request surface)
Pairwise image-registration requests (VNTranslationalImageRegistrationRequest, VNHomographicImageRegistrationRequest)
Stateful tracking requests (VNTrackObjectRequest, VNTrackRectangleRequest, VNTrackOpticalFlowRequest, VNTrackTranslationalImageRegistrationRequest, VNTrackHomographicImageRegistrationRequest)
Async API (VNRequest completion handlers exposed via async fn)

§License

Licensed under either of Apache-2.0 or MIT at your option.

§API Documentation

Safe Rust bindings for Apple’s Vision framework — OCR, object detection, face landmarks, and other on-device computer vision tasks.

v0.14 covers the full Apple Vision request surface, including the stateful tracking requests.

Re-exports§

pub use error::VisionError;
pub use recognize_text::BoundingBox;recognize_text
pub use recognize_text::RecognitionLevel;recognize_text
pub use recognize_text::RecognizedText;recognize_text
pub use recognize_text::TextRecognizer;recognize_text
pub use detect_faces::DetectedFace;detect_faces
pub use detect_faces::FaceDetector;detect_faces
pub use detect_barcodes::detect_barcodes_in_path;detect_barcodes
pub use detect_barcodes::DetectedBarcode;detect_barcodes
pub use saliency::attention_saliency_in_path;saliency
pub use saliency::SalientRegion;saliency
pub use face_landmarks::detect_face_landmarks_in_path;face_landmarks
pub use face_landmarks::FaceWithLandmarks;face_landmarks
pub use face_landmarks::LandmarkPoint;face_landmarks
pub use body_pose::detect_human_body_pose_in_path;body_pose
pub use body_pose::DetectedBodyPose;body_pose
pub use body_pose::JointPoint;body_pose
pub use hand_pose::detect_human_hand_pose_in_path;hand_pose
pub use hand_pose::DetectedHandPose;hand_pose
pub use contours::detect_contours_in_path;contours
pub use contours::Contour;contours
pub use contours::ContourOptions;contours
pub use animals::recognize_animals_in_path;animals
pub use animals::RecognizedAnimal;animals
pub use classify::classify_image_in_path;classify
pub use classify::Classification;classify
pub use rectangles::detect_document_segmentation_in_path;rectangles
pub use rectangles::detect_rectangles_in_path;rectangles
pub use rectangles::RectangleObservation;rectangles
pub use rectangles::RectangleOptions;rectangles
pub use horizon::detect_horizon_in_path;horizon
pub use feature_print::generate_image_feature_print_in_path;feature_print
pub use feature_print::FeaturePrint;feature_print
pub use humans::detect_human_rectangles_in_path;humans
pub use humans::DetectedHuman;humans
pub use aesthetics::calculate_aesthetics_scores_in_path;aesthetics
pub use aesthetics::detect_face_capture_quality_in_path;aesthetics
pub use aesthetics::AestheticsScores;aesthetics
pub use aesthetics::FaceCaptureQuality;aesthetics
pub use segmentation::generate_foreground_instance_mask_in_path;segmentation
pub use segmentation::generate_person_segmentation_in_path;segmentation
pub use segmentation::InstanceMask;segmentation
pub use segmentation::SegmentationMask;segmentation
pub use segmentation::SegmentationQuality;segmentation
pub use optical_flow::generate_optical_flow_in_paths;optical_flow
pub use optical_flow::OpticalFlowAccuracy;optical_flow
pub use coreml::coreml_classify_in_path;coreml
pub use animal_body_pose::detect_animal_body_pose;
pub use animal_body_pose::AnimalJoint;
pub use human_body_pose_3d::detect_human_body_pose_3d;
pub use human_body_pose_3d::HumanJoint3D;
pub use text_rectangles::detect_text_rectangles;
pub use text_rectangles::TextRect;
pub use objectness_saliency::objectness_saliency;
pub use objectness_saliency::ObjectnessRegion;
pub use person_instance_mask::person_instance_mask;
pub use person_instance_mask::PersonInstanceMask;
pub use trajectories::detect_trajectories;
pub use trajectories::Trajectory;
pub use registration::register_homographic;
pub use registration::register_translational;
pub use registration::HomographicAlignment;
pub use registration::TranslationalAlignment;
pub use tracking::HomographicImageTracker;tracking
pub use tracking::ObjectTracker;tracking
pub use tracking::OpticalFlowFrame;tracking
pub use tracking::OpticalFlowTracker;tracking
pub use tracking::RectangleTracker;tracking
pub use tracking::TranslationalImageTracker;tracking

Modules§

aestheticsaesthetics: Aesthetics scoring (VNCalculateImageAestheticsScoresRequest) and face capture quality (VNDetectFaceCaptureQualityRequest).
animal_body_pose: VNDetectAnimalBodyPoseRequest — body-pose keypoints for cats, dogs and similar quadrupeds. Available on macOS 14+.
animalsanimals: Animal recognition (VNRecognizeAnimalsRequest).
body_posebody_pose: Human body pose detection (VNDetectHumanBodyPoseRequest).
classifyclassify: General-purpose image classification (VNClassifyImageRequest).
contourscontours: Edge contour detection (VNDetectContoursRequest).
coremlcoreml: CoreML inference via Vision (VNCoreMLRequest).
detect_barcodesdetect_barcodes: Barcode detection via VNDetectBarcodesRequest (Vision v0.4).
detect_facesdetect_faces: FaceDetector — wraps VNDetectFaceRectanglesRequest.
error: Errors from the Vision bridge.
face_landmarksface_landmarks: detect_face_landmarks_in_path — wraps VNDetectFaceLandmarksRequest.
feature_printfeature_print: Image feature print (VNGenerateImageFeaturePrintRequest) — semantic image embedding for content-based similarity.
ffi: Raw FFI declarations matching the Swift bridge in swift-bridge/Sources/VisionBridge/Vision.swift.
hand_posehand_pose: Human hand pose detection (VNDetectHumanHandPoseRequest).
horizonhorizon: Horizon detection (VNDetectHorizonRequest).
human_body_pose_3d: VNDetectHumanBodyPose3DRequest — 3D human-body keypoints (macOS 14+).
humanshumans: Human-rectangle detection (VNDetectHumanRectanglesRequest) — lightweight person bounding boxes without joint skeletons.
objectness_saliency: VNGenerateObjectnessBasedSaliencyImageRequest — discrete object regions an attention model thinks are salient.
optical_flowoptical_flow: Optical flow generation (VNGenerateOpticalFlowRequest).
person_instance_mask: VNGeneratePersonInstanceMaskRequest — per-person instance mask (macOS 14+).
prelude: Common imports.
recognize_textrecognize_text: TextRecognizer — wraps VNRecognizeTextRequest for image-file OCR.
rectanglesrectangles: Rectangle + document-segmentation detection.
registration: VNTranslationalImageRegistrationRequest + VNHomographicImageRegistrationRequest — pixel-space alignment between two images.
saliencysaliency: Attention-based saliency detection via VNGenerateAttentionBasedSaliencyImageRequest.
segmentationsegmentation: Segmentation mask generation — VNGeneratePersonSegmentationRequest and VNGenerateForegroundInstanceMaskRequest.
text_rectangles: VNDetectTextRectanglesRequest — text-region detection (no OCR).
trackingtracking: Stateful Vision tracking requests backed by retained Swift sessions.
trajectories: VNDetectTrajectoriesRequest — parabolic-trajectory detection.