Expand description
§vision
Safe Rust bindings for Apple’s Vision framework — on-device OCR, object detection, face landmarks, and other computer vision tasks on macOS.
Status: v0.14 ships the full Apple Vision request surface, including all five stateful tracking requests.
§Quick start — OCR
use apple_vision::prelude::*;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let recognizer = TextRecognizer::new()
.with_recognition_level(RecognitionLevel::Accurate)
.with_language_correction(true);
let observations = recognizer.recognize_in_path("/tmp/screenshot.png")?;
for obs in &observations {
println!("[{:.2}] '{}'", obs.confidence, obs.text);
}
Ok(())
}§Composes with the rest of the doom-fish stack
screencapturekit-rs / capture ──► IOSurface / PNG ──► vision ──► text
│
▼
foundation-models
("summarise this")§Feature flags
All request-type modules can be enabled independently, but the default feature set now enables the full Vision surface, including the new tracking module in v0.14.
§Roadmap
- Single-image Vision requests (OCR, faces, landmarks, pose, contours, saliency, segmentation, Core ML, and the rest of the stateless request surface)
-
Pairwise image-registration requests (
VNTranslationalImageRegistrationRequest,VNHomographicImageRegistrationRequest) -
Stateful tracking requests (
VNTrackObjectRequest,VNTrackRectangleRequest,VNTrackOpticalFlowRequest,VNTrackTranslationalImageRegistrationRequest,VNTrackHomographicImageRegistrationRequest) -
Async API (
VNRequestcompletion handlers exposed viaasync fn)
§License
Licensed under either of Apache-2.0 or MIT at your option.
§API Documentation
Safe Rust bindings for Apple’s Vision framework — OCR, object detection, face landmarks, and other on-device computer vision tasks.
v0.14 covers the full Apple Vision request surface, including the stateful tracking requests.
Re-exports§
pub use error::VisionError;pub use recognize_text::BoundingBox;recognize_textpub use recognize_text::RecognitionLevel;recognize_textpub use recognize_text::RecognizedText;recognize_textpub use recognize_text::TextRecognizer;recognize_textpub use detect_faces::DetectedFace;detect_facespub use detect_faces::FaceDetector;detect_facespub use detect_barcodes::detect_barcodes_in_path;detect_barcodespub use detect_barcodes::DetectedBarcode;detect_barcodespub use saliency::attention_saliency_in_path;saliencypub use saliency::SalientRegion;saliencypub use face_landmarks::detect_face_landmarks_in_path;face_landmarkspub use face_landmarks::FaceWithLandmarks;face_landmarkspub use face_landmarks::LandmarkPoint;face_landmarkspub use body_pose::detect_human_body_pose_in_path;body_posepub use body_pose::DetectedBodyPose;body_posepub use body_pose::JointPoint;body_posepub use hand_pose::detect_human_hand_pose_in_path;hand_posepub use hand_pose::DetectedHandPose;hand_posepub use contours::detect_contours_in_path;contourspub use contours::Contour;contourspub use contours::ContourOptions;contourspub use animals::recognize_animals_in_path;animalspub use animals::RecognizedAnimal;animalspub use classify::classify_image_in_path;classifypub use classify::Classification;classifypub use rectangles::detect_document_segmentation_in_path;rectanglespub use rectangles::detect_rectangles_in_path;rectanglespub use rectangles::RectangleObservation;rectanglespub use rectangles::RectangleOptions;rectanglespub use horizon::detect_horizon_in_path;horizonpub use feature_print::generate_image_feature_print_in_path;feature_printpub use feature_print::FeaturePrint;feature_printpub use humans::detect_human_rectangles_in_path;humanspub use humans::DetectedHuman;humanspub use aesthetics::calculate_aesthetics_scores_in_path;aestheticspub use aesthetics::detect_face_capture_quality_in_path;aestheticspub use aesthetics::AestheticsScores;aestheticspub use aesthetics::FaceCaptureQuality;aestheticspub use segmentation::generate_foreground_instance_mask_in_path;segmentationpub use segmentation::generate_person_segmentation_in_path;segmentationpub use segmentation::InstanceMask;segmentationpub use segmentation::SegmentationMask;segmentationpub use segmentation::SegmentationQuality;segmentationpub use optical_flow::generate_optical_flow_in_paths;optical_flowpub use optical_flow::OpticalFlowAccuracy;optical_flowpub use coreml::coreml_classify_in_path;coremlpub use animal_body_pose::detect_animal_body_pose;pub use animal_body_pose::AnimalJoint;pub use human_body_pose_3d::detect_human_body_pose_3d;pub use human_body_pose_3d::HumanJoint3D;pub use text_rectangles::detect_text_rectangles;pub use text_rectangles::TextRect;pub use objectness_saliency::objectness_saliency;pub use objectness_saliency::ObjectnessRegion;pub use person_instance_mask::person_instance_mask;pub use person_instance_mask::PersonInstanceMask;pub use trajectories::detect_trajectories;pub use trajectories::Trajectory;pub use registration::register_homographic;pub use registration::register_translational;pub use registration::HomographicAlignment;pub use registration::TranslationalAlignment;pub use tracking::HomographicImageTracker;trackingpub use tracking::ObjectTracker;trackingpub use tracking::OpticalFlowFrame;trackingpub use tracking::OpticalFlowTracker;trackingpub use tracking::RectangleTracker;trackingpub use tracking::TranslationalImageTracker;tracking
Modules§
- aesthetics
aesthetics - Aesthetics scoring (
VNCalculateImageAestheticsScoresRequest) and face capture quality (VNDetectFaceCaptureQualityRequest). - animal_
body_ pose VNDetectAnimalBodyPoseRequest— body-pose keypoints for cats, dogs and similar quadrupeds. Available on macOS 14+.- animals
animals - Animal recognition (
VNRecognizeAnimalsRequest). - body_
pose body_pose - Human body pose detection (
VNDetectHumanBodyPoseRequest). - classify
classify - General-purpose image classification (
VNClassifyImageRequest). - contours
contours - Edge contour detection (
VNDetectContoursRequest). - coreml
coreml CoreMLinference via Vision (VNCoreMLRequest).- detect_
barcodes detect_barcodes - Barcode detection via
VNDetectBarcodesRequest(Vision v0.4). - detect_
faces detect_faces FaceDetector— wrapsVNDetectFaceRectanglesRequest.- error
- Errors from the Vision bridge.
- face_
landmarks face_landmarks detect_face_landmarks_in_path— wrapsVNDetectFaceLandmarksRequest.- feature_
print feature_print - Image feature print (
VNGenerateImageFeaturePrintRequest) — semantic image embedding for content-based similarity. - ffi
- Raw FFI declarations matching the Swift bridge in
swift-bridge/Sources/VisionBridge/Vision.swift. - hand_
pose hand_pose - Human hand pose detection (
VNDetectHumanHandPoseRequest). - horizon
horizon - Horizon detection (
VNDetectHorizonRequest). - human_
body_ pose_ 3d VNDetectHumanBodyPose3DRequest— 3D human-body keypoints (macOS 14+).- humans
humans - Human-rectangle detection (
VNDetectHumanRectanglesRequest) — lightweight person bounding boxes without joint skeletons. - objectness_
saliency VNGenerateObjectnessBasedSaliencyImageRequest— discrete object regions an attention model thinks are salient.- optical_
flow optical_flow - Optical flow generation (
VNGenerateOpticalFlowRequest). - person_
instance_ mask VNGeneratePersonInstanceMaskRequest— per-person instance mask (macOS 14+).- prelude
- Common imports.
- recognize_
text recognize_text TextRecognizer— wrapsVNRecognizeTextRequestfor image-file OCR.- rectangles
rectangles - Rectangle + document-segmentation detection.
- registration
VNTranslationalImageRegistrationRequest+VNHomographicImageRegistrationRequest— pixel-space alignment between two images.- saliency
saliency - Attention-based saliency detection via
VNGenerateAttentionBasedSaliencyImageRequest. - segmentation
segmentation - Segmentation mask generation —
VNGeneratePersonSegmentationRequestandVNGenerateForegroundInstanceMaskRequest. - text_
rectangles VNDetectTextRectanglesRequest— text-region detection (no OCR).- tracking
tracking - Stateful Vision tracking requests backed by retained Swift sessions.
- trajectories
VNDetectTrajectoriesRequest— parabolic-trajectory detection.