Expand description
§vision
Safe Rust bindings for Apple’s Vision framework — on-device OCR, object detection, face landmarks, and other computer vision tasks on macOS.
Status: v0.16.0 keeps the full Vision request surface, adds a Tier-1
async_apimodule for one-shot OCR / face / barcode / segmentation workflows, and ships a fully-implementedCOVERAGE.md+COVERAGE_AUDIT.mdmatrix plus a gold-standard multi-file Swift bridge.
§Quick start — OCR
use apple_vision::prelude::*;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let recognizer = TextRecognizer::new()
.with_recognition_level(RecognitionLevel::Accurate)
.with_language_correction(true);
let observations = recognizer.recognize_in_path("screenshot.png")?;
for obs in &observations {
println!("[{:.2}] '{}'", obs.confidence, obs.text);
}
Ok(())
}§Composes with the rest of the doom-fish stack
screencapturekit-rs / capture ──► IOSurface / PNG ──► vision ──► text
│
▼
foundation-models
("summarise this")§Feature flags
All request-type modules can be enabled independently, and the default feature set still enables the full Vision surface. v0.16.0 also adds an optional async feature for executor-agnostic Future wrappers around the Tier-1 one-shot request surface.
§Async API
Enable async plus the request features you need:
apple-vision = { version = "0.16.0", features = ["async", "recognize_text"] }use apple_vision::async_api::AsyncRecognizeText;
use apple_vision::RecognitionLevel;
let texts = AsyncRecognizeText::new(RecognitionLevel::Accurate, true)
.recognize_in_path("screenshot.png")
.await?;
println!("found {} text observations", texts.len());Tier-1 currently covers background-queue wrappers for OCR, face detection, barcode detection, and person segmentation. Multi-fire delegate / stream-style Vision APIs remain future Tier-2 work.
§Roadmap
- Single-image Vision requests (OCR, faces, landmarks, pose, contours, saliency, segmentation, Core ML, and the rest of the stateless request surface)
-
Pairwise image-registration requests (
VNTranslationalImageRegistrationRequest,VNHomographicImageRegistrationRequest) -
Stateful tracking requests (
VNTrackObjectRequest,VNTrackRectangleRequest,VNTrackOpticalFlowRequest,VNTrackTranslationalImageRegistrationRequest,VNTrackHomographicImageRegistrationRequest) -
Header-audited request + observation coverage matrix (
COVERAGE.md) with dedicated wrappers for every current request/observation type and a split Swift bridge (all bridge files stay under 500 lines) -
Explicit
VNRequest/VNObservation/ request-handler /VNVideoProcessorwrappers for OCR pipelines, plus base request/observation helpers reused across the rest of the crate -
Async API (Tier-1
Futurewrappers for OCR, face detection, barcode detection, and person segmentation; Tier-2 stream/delegate surfaces still TBD)
§License
Licensed under either of Apache-2.0 or MIT at your option.
§API Documentation
Safe Rust bindings for Apple’s Vision framework — OCR, object detection, face landmarks, and other on-device computer vision tasks.
v0.16.0 adds a Tier-1 async_api module for one-shot OCR / face / barcode /
segmentation requests while keeping the full audited Vision request surface
and the split Swift bridge / coverage matrix introduced in v0.15.x.
Re-exports§
pub use error::VisionError;pub use geometry::element_type_size;pub use geometry::image_point_for_face_landmark_point;pub use geometry::image_point_for_normalized_point;pub use geometry::image_point_for_normalized_point_using_region_of_interest;pub use geometry::image_rect_for_normalized_rect;pub use geometry::image_rect_for_normalized_rect_using_region_of_interest;pub use geometry::normalized_face_bounding_box_point_for_landmark_point;pub use geometry::normalized_identity_rect;pub use geometry::normalized_point_for_image_point;pub use geometry::normalized_point_for_image_point_using_region_of_interest;pub use geometry::normalized_rect_for_image_rect;pub use geometry::normalized_rect_for_image_rect_using_region_of_interest;pub use geometry::normalized_rect_is_identity_rect;pub use geometry::Transform3D;pub use geometry::VisionCircle;pub use geometry::VisionGeometryUtils;pub use geometry::VisionPoint;pub use geometry::VisionPoint3D;pub use geometry::VisionVector;pub use recognized_points::HumanBodyRecognizedPoint3D;pub use recognized_points::RecognizedPoint;pub use recognized_points::RecognizedPoint3D;pub use recognized_points::RecognizedPoints3DObservation;pub use recognized_points::RecognizedPointsObservation;pub use recognized_points::VisionDetectedPoint;pub use recognized_points::VisionRecognizedPoint;pub use recognized_points::VisionRecognizedPoint3D;pub use request_base::ImageAlignmentObservation;pub use request_base::ImageBasedRequest;pub use request_base::ImageRegistrationRequest;pub use request_base::NormalizedRect;pub use request_base::PixelBufferObservation;pub use request_base::RequestProgress;pub use request_base::RequestProgressHandler;pub use request_base::RequestProgressProviding;pub use request_base::RequestRevisionProviding;pub use request_base::StatefulRequest;pub use request_base::TargetedImageRequest;pub use request_base::TrackingLevel;pub use request_base::TrackingRequest;pub use sdk::vision_version_number;pub use sdk::AnimalIdentifier;pub use sdk::BarcodeCompositeType;pub use sdk::BarcodeSymbology;pub use sdk::ComputeStage;pub use sdk::ElementType;pub use sdk::ImageCropAndScaleOption;pub use sdk::ImageOption;pub use sdk::PointsClassification;pub use sdk::RecognizedPoint3DGroupKey;pub use sdk::RecognizedPointGroupKey;pub use sdk::VisionErrorCode;pub use sdk::VISION_ERROR_DOMAIN;pub use processing::ImageRequestHandler;recognize_textpub use processing::Observation;recognize_textpub use processing::RecognizedTextObservation;recognize_textpub use processing::Request;recognize_textpub use processing::RequestKind;recognize_textpub use processing::SequenceRequestHandler;recognize_textpub use processing::TimeRange;recognize_textpub use processing::VideoCadence;recognize_textpub use processing::VideoProcessingOptions;recognize_textpub use processing::VideoProcessor;recognize_textpub use processing::VideoProcessorCadence;recognize_textpub use processing::VideoProcessorFrameRateCadence;recognize_textpub use processing::VideoProcessorRequestProcessingOptions;recognize_textpub use processing::VideoProcessorTimeIntervalCadence;recognize_textpub use recognize_text::BoundingBox;recognize_textpub use recognize_text::RecognitionLevel;recognize_textpub use recognize_text::RecognizedText;recognize_textpub use recognize_text::RecognizedTextCandidate;recognize_textpub use recognize_text::TextRecognizer;recognize_textpub use detect_faces::DetectedFace;detect_facespub use detect_faces::FaceDetector;detect_facespub use detect_barcodes::detect_barcodes_in_path;detect_barcodespub use detect_barcodes::BarcodeCompositeType as DetectedBarcodeCompositeType;detect_barcodespub use detect_barcodes::BarcodeSymbology as DetectedBarcodeSymbology;detect_barcodespub use detect_barcodes::DetectedBarcode;detect_barcodespub use saliency::attention_saliency_in_path;saliencypub use saliency::SalientRegion;saliencypub use face_landmarks::detect_face_landmarks_in_path;face_landmarkspub use face_landmarks::FaceLandmarkRegion;face_landmarkspub use face_landmarks::FaceLandmarkRegion2D;face_landmarkspub use face_landmarks::FaceLandmarks;face_landmarkspub use face_landmarks::FaceLandmarks2D;face_landmarkspub use face_landmarks::FaceLandmarksRequest;face_landmarkspub use face_landmarks::FaceObservationAccepting;face_landmarkspub use face_landmarks::FaceWithLandmarks;face_landmarkspub use face_landmarks::LandmarkPoint;face_landmarkspub use face_landmarks::RequestFaceLandmarksConstellation;face_landmarkspub use body_pose::detect_human_body_pose_in_path;body_posepub use body_pose::detect_human_body_pose_observations_in_path;body_posepub use body_pose::DetectedBodyPose;body_posepub use body_pose::HumanBodyPoseJointGroupName;body_posepub use body_pose::HumanBodyPoseJointName;body_posepub use body_pose::HumanBodyPoseObservation;body_posepub use body_pose::JointPoint;body_posepub use hand_pose::detect_human_hand_pose_in_path;hand_posepub use hand_pose::detect_human_hand_pose_observations_in_path;hand_posepub use hand_pose::DetectedHandPose;hand_posepub use hand_pose::HandChirality;hand_posepub use hand_pose::HumanHandPoseJointGroupName;hand_posepub use hand_pose::HumanHandPoseJointName;hand_posepub use hand_pose::HumanHandPoseObservation;hand_posepub use contours::detect_contours_in_path;contourspub use contours::detect_contours_observation_in_path;contourspub use contours::Contour;contourspub use contours::ContourOptions;contourspub use contours::ContoursObservation;contourspub use contours::VisionContour;contourspub use animals::recognize_animals_in_path;animalspub use animals::AnimalIdentifier as RecognizedAnimalIdentifier;animalspub use animals::RecognizedAnimal;animalspub use classify::classify_image_in_path;classifypub use classify::Classification;classifypub use rectangles::detect_document_segmentation_in_path;rectanglespub use rectangles::detect_rectangles_in_path;rectanglespub use rectangles::RectangleObservation;rectanglespub use rectangles::RectangleOptions;rectanglespub use horizon::detect_horizon_in_path;horizonpub use horizon::detect_horizon_observation_in_path;horizonpub use horizon::AffineTransform;horizonpub use horizon::HorizonObservation;horizonpub use feature_print::generate_image_feature_print_in_path;feature_printpub use feature_print::FeaturePrint;feature_printpub use humans::detect_human_rectangles_in_path;humanspub use humans::DetectedHuman;humanspub use aesthetics::calculate_aesthetics_scores_in_path;aestheticspub use aesthetics::detect_face_capture_quality_in_path;aestheticspub use aesthetics::AestheticsScores;aestheticspub use aesthetics::FaceCaptureQuality;aestheticspub use segmentation::generate_foreground_instance_mask_in_path;segmentationpub use segmentation::generate_foreground_instance_mask_observation_in_path;segmentationpub use segmentation::generate_person_segmentation_in_path;segmentationpub use segmentation::InstanceMask;segmentationpub use segmentation::InstanceMaskObservation;segmentationpub use segmentation::SegmentationMask;segmentationpub use segmentation::SegmentationQuality;segmentationpub use optical_flow::generate_optical_flow_in_paths;optical_flowpub use optical_flow::generate_optical_flow_observation_in_paths;optical_flowpub use optical_flow::OpticalFlowAccuracy;optical_flowpub use coreml::coreml_classify_in_path;coremlpub use coreml::coreml_feature_value_in_path;coremlpub use coreml::CoreMLFeatureValue;coremlpub use coreml::CoreMLFeatureValueObservation;coremlpub use coreml::CoreMLImageCropAndScaleOption;coremlpub use coreml::CoreMLModel;coremlpub use coreml::CoreMLRequest;coremlpub use animal_body_pose::detect_animal_body_pose;pub use animal_body_pose::AnimalBodyPoseJointGroupName;pub use animal_body_pose::AnimalBodyPoseJointName;pub use animal_body_pose::AnimalJoint;pub use human_body_pose_3d::detect_human_body_pose_3d;pub use human_body_pose_3d::detect_human_body_pose_3d_observations;pub use human_body_pose_3d::detect_human_body_recognized_points_3d;pub use human_body_pose_3d::BodyHeightEstimation;pub use human_body_pose_3d::HumanBodyPose3DJointGroupName;pub use human_body_pose_3d::HumanBodyPose3DJointName;pub use human_body_pose_3d::HumanBodyPose3DObservation;pub use human_body_pose_3d::HumanBodyPose3DObservationHeightEstimation;pub use human_body_pose_3d::HumanJoint3D;pub use objectness_saliency::objectness_saliency;pub use objectness_saliency::ObjectnessRegion;pub use person_instance_mask::person_instance_mask;pub use person_instance_mask::PersonInstanceMask;pub use registration::register_homographic;pub use registration::register_homographic_observation;pub use registration::register_translational;pub use registration::register_translational_observation;pub use registration::HomographicAlignment;pub use registration::TranslationalAlignment;pub use text_rectangles::detect_text_observations;pub use text_rectangles::detect_text_rectangles;pub use text_rectangles::TextObservation;pub use text_rectangles::TextRect;pub use text_rectangles::TextRectanglesRequest;pub use trajectories::detect_trajectories;pub use trajectories::Trajectory;pub use tracking::HomographicImageTracker;trackingpub use tracking::ObjectTracker;trackingpub use tracking::OpticalFlowFrame;trackingpub use tracking::OpticalFlowTracker;trackingpub use tracking::RectangleTracker;trackingpub use tracking::TrackOpticalFlowRequestComputationAccuracy;trackingpub use tracking::TranslationalImageTracker;tracking
Modules§
- aesthetics
aesthetics - Aesthetics scoring (
VNCalculateImageAestheticsScoresRequest) and face capture quality (VNDetectFaceCaptureQualityRequest). - animal_
body_ pose VNDetectAnimalBodyPoseRequest— body-pose keypoints for cats, dogs and similar quadrupeds. Available on macOS 14+.- animals
animals - Animal recognition (
VNRecognizeAnimalsRequest). - async_
api async - Async Vision API — Future-based wrappers for
VNImageRequestHandlerand friends. - body_
pose body_pose - Human body pose detection (
VNDetectHumanBodyPoseRequest). - classify
classify - General-purpose image classification (
VNClassifyImageRequest). - contours
contours - Edge contour detection (
VNDetectContoursRequest). - coreml
coreml CoreMLinference via Vision (VNCoreMLModel,VNCoreMLRequest, andVNCoreMLFeatureValueObservation).- detect_
barcodes detect_barcodes - Barcode detection via
VNDetectBarcodesRequest(Vision v0.4). - detect_
faces detect_faces FaceDetector— wrapsVNDetectFaceRectanglesRequest.- error
- Errors from the Vision bridge.
- face_
landmarks face_landmarks detect_face_landmarks_in_path— wrapsVNDetectFaceLandmarksRequest.- feature_
print feature_print - Image feature print (
VNGenerateImageFeaturePrintRequest) — semantic image embedding for content-based similarity. - ffi
- Raw FFI declarations matching the Swift bridge in
swift-bridge/Sources/VisionBridge/*.swift. - geometry
- Geometry wrappers and utility helpers mirroring Vision’s
VNGeometry*andVNUtilssurfaces. - hand_
pose hand_pose - Human hand pose detection (
VNDetectHumanHandPoseRequest). - horizon
horizon - Horizon detection (
VNDetectHorizonRequest). - human_
body_ pose_ 3d VNDetectHumanBodyPose3DRequest— 3D human-body keypoints (macOS 14+).- humans
humans - Human-rectangle detection (
VNDetectHumanRectanglesRequest) — lightweight person bounding boxes without joint skeletons. - objectness_
saliency VNGenerateObjectnessBasedSaliencyImageRequest— discrete object regions an attention model thinks are salient.- optical_
flow optical_flow - Optical flow generation (
VNGenerateOpticalFlowRequest). - person_
instance_ mask VNGeneratePersonInstanceMaskRequest— per-person instance mask (macOS 14+).- prelude
- Common imports.
- processing
recognize_text - Explicit request / handler / video-processing wrappers backed by Vision.
- recognize_
text recognize_text TextRecognizer— wrapsVNRecognizeTextRequestfor image-file OCR.- recognized_
points - Generic recognized-point wrappers shared by pose observations.
- rectangles
rectangles - Rectangle + document-segmentation detection.
- registration
VNTranslationalImageRegistrationRequest+VNHomographicImageRegistrationRequest— pixel-space alignment between two images.- request_
base - Shared request / observation building blocks for Vision base classes.
- saliency
saliency - Attention-based saliency detection via
VNGenerateAttentionBasedSaliencyImageRequest. - sdk
- Vision SDK-wide enums, string constants, and version helpers.
- segmentation
segmentation - Segmentation mask generation —
VNGeneratePersonSegmentationRequestandVNGenerateForegroundInstanceMaskRequest. - text_
rectangles VNDetectTextRectanglesRequest— text-region detection (no OCR).- tracking
tracking - Stateful Vision tracking requests backed by retained Swift sessions.
- trajectories
VNDetectTrajectoriesRequest— parabolic-trajectory detection.