Crate apple_vision

Expand description

§vision

Safe Rust bindings for Apple’s Vision framework — on-device OCR, object detection, face landmarks, and other computer vision tasks on macOS.

Status: experimental. v0.1 ships text recognition (OCR); object/face detection, classification, barcode scanning land in v0.2.

§Quick start — OCR

use apple_vision::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let recognizer = TextRecognizer::new()
        .with_recognition_level(RecognitionLevel::Accurate)
        .with_language_correction(true);

    let observations = recognizer.recognize_in_path("/tmp/screenshot.png")?;
    for obs in &observations {
        println!("[{:.2}] '{}'", obs.confidence, obs.text);
    }
    Ok(())
}

§Composes with the rest of the doom-fish stack

screencapturekit-rs / capture ──► IOSurface / PNG ──► vision ──► text
                                                          │
                                                          ▼
                                                  foundation-models
                                                  ("summarise this")

§Feature flags

Feature	Status
`recognize_text` (default)	✅
`detect_faces` (default)	✅
`detect_rectangles`	🚧 v0.4
`classify_image`	🚧 v0.4
`detect_barcodes`	🚧 v0.4

§Roadmap

VNRecognizeTextRequest (OCR) via TextRecognizer
VNDetectFaceRectanglesRequest via FaceDetector (returns bounding box + roll/yaw/pitch)
CGImage / CVPixelBuffer ingest (file path AND zero-copy CVPixelBuffer paths)
VNDetectFaceLandmarksRequest (face landmark points)
VNDetectRectanglesRequest
VNClassifyImageRequest
VNDetectBarcodesRequest
Async API (VNRequest completion handlers exposed via async fn)

§License

Licensed under either of Apache-2.0 or MIT at your option.

§API Documentation

Safe Rust bindings for Apple’s Vision framework — OCR, object detection, face landmarks, and other on-device computer vision tasks.

v0.1 ships text recognition (OCR) only. Object/face detection lands in v0.2.

Re-exports§

pub use error::VisionError;
pub use recognize_text::BoundingBox;recognize_text
pub use recognize_text::RecognitionLevel;recognize_text
pub use recognize_text::RecognizedText;recognize_text
pub use recognize_text::TextRecognizer;recognize_text
pub use detect_faces::DetectedFace;detect_faces
pub use detect_faces::FaceDetector;detect_faces
pub use detect_barcodes::detect_barcodes_in_path;detect_barcodes
pub use detect_barcodes::DetectedBarcode;detect_barcodes
pub use saliency::attention_saliency_in_path;saliency
pub use saliency::SalientRegion;saliency
pub use face_landmarks::detect_face_landmarks_in_path;face_landmarks
pub use face_landmarks::FaceWithLandmarks;face_landmarks
pub use face_landmarks::LandmarkPoint;face_landmarks
pub use body_pose::detect_human_body_pose_in_path;body_pose
pub use body_pose::DetectedBodyPose;body_pose
pub use body_pose::JointPoint;body_pose
pub use hand_pose::detect_human_hand_pose_in_path;hand_pose
pub use hand_pose::DetectedHandPose;hand_pose
pub use contours::detect_contours_in_path;contours
pub use contours::Contour;contours
pub use contours::ContourOptions;contours
pub use animals::recognize_animals_in_path;animals
pub use animals::RecognizedAnimal;animals
pub use classify::classify_image_in_path;classify
pub use classify::Classification;classify
pub use rectangles::detect_document_segmentation_in_path;rectangles
pub use rectangles::detect_rectangles_in_path;rectangles
pub use rectangles::RectangleObservation;rectangles
pub use rectangles::RectangleOptions;rectangles
pub use horizon::detect_horizon_in_path;horizon
pub use feature_print::generate_image_feature_print_in_path;feature_print
pub use feature_print::FeaturePrint;feature_print
pub use humans::detect_human_rectangles_in_path;humans
pub use humans::DetectedHuman;humans
pub use aesthetics::calculate_aesthetics_scores_in_path;aesthetics
pub use aesthetics::detect_face_capture_quality_in_path;aesthetics
pub use aesthetics::AestheticsScores;aesthetics
pub use aesthetics::FaceCaptureQuality;aesthetics
pub use segmentation::generate_foreground_instance_mask_in_path;segmentation
pub use segmentation::generate_person_segmentation_in_path;segmentation
pub use segmentation::InstanceMask;segmentation
pub use segmentation::SegmentationMask;segmentation
pub use segmentation::SegmentationQuality;segmentation
pub use optical_flow::generate_optical_flow_in_paths;optical_flow
pub use optical_flow::OpticalFlowAccuracy;optical_flow
pub use coreml::coreml_classify_in_path;coreml

Modules§

aestheticsaesthetics: Aesthetics scoring (VNCalculateImageAestheticsScoresRequest) and face capture quality (VNDetectFaceCaptureQualityRequest).
animalsanimals: Animal recognition (VNRecognizeAnimalsRequest).
body_posebody_pose: Human body pose detection (VNDetectHumanBodyPoseRequest).
classifyclassify: General-purpose image classification (VNClassifyImageRequest).
contourscontours: Edge contour detection (VNDetectContoursRequest).
coremlcoreml: CoreML inference via Vision (VNCoreMLRequest).
detect_barcodesdetect_barcodes: Barcode detection via VNDetectBarcodesRequest (Vision v0.4).
detect_facesdetect_faces: FaceDetector — wraps VNDetectFaceRectanglesRequest.
error: Errors from the Vision bridge.
face_landmarksface_landmarks: detect_face_landmarks_in_path — wraps VNDetectFaceLandmarksRequest.
feature_printfeature_print: Image feature print (VNGenerateImageFeaturePrintRequest) — semantic image embedding for content-based similarity.
ffi: Raw FFI declarations matching the Swift bridge in swift-bridge/Sources/VisionBridge/Vision.swift.
hand_posehand_pose: Human hand pose detection (VNDetectHumanHandPoseRequest).
horizonhorizon: Horizon detection (VNDetectHorizonRequest).
humanshumans: Human-rectangle detection (VNDetectHumanRectanglesRequest) — lightweight person bounding boxes without joint skeletons.
optical_flowoptical_flow: Optical flow generation (VNGenerateOpticalFlowRequest).
prelude: Common imports.
recognize_textrecognize_text: TextRecognizer — wraps VNRecognizeTextRequest for image-file OCR.
rectanglesrectangles: Rectangle + document-segmentation detection.
saliencysaliency: Attention-based saliency detection via VNGenerateAttentionBasedSaliencyImageRequest.
segmentationsegmentation: Segmentation mask generation — VNGeneratePersonSegmentationRequest and VNGenerateForegroundInstanceMaskRequest.

Crate apple_vision

Crate apple_vision Copy item path

§vision

§Quick start — OCR

§Composes with the rest of the doom-fish stack

§Feature flags

§Roadmap

§License

§API Documentation

Re-exports§

Modules§

Crate apple_vision