Expand description
§vision
Safe Rust bindings for Apple’s Vision framework — on-device OCR, object detection, face landmarks, and other computer vision tasks on macOS.
Status: experimental. v0.1 ships text recognition (OCR); object/face detection, classification, barcode scanning land in v0.2.
§Quick start — OCR
use apple_vision::prelude::*;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let recognizer = TextRecognizer::new()
.with_recognition_level(RecognitionLevel::Accurate)
.with_language_correction(true);
let observations = recognizer.recognize_in_path("/tmp/screenshot.png")?;
for obs in &observations {
println!("[{:.2}] '{}'", obs.confidence, obs.text);
}
Ok(())
}§Composes with the rest of the doom-fish stack
screencapturekit-rs / capture ──► IOSurface / PNG ──► vision ──► text
│
▼
foundation-models
("summarise this")§Feature flags
| Feature | Status |
|---|---|
recognize_text (default) | ✅ |
detect_faces (default) | ✅ |
detect_rectangles | 🚧 v0.4 |
classify_image | 🚧 v0.4 |
detect_barcodes | 🚧 v0.4 |
§Roadmap
-
VNRecognizeTextRequest(OCR) viaTextRecognizer -
VNDetectFaceRectanglesRequestviaFaceDetector(returns bounding box + roll/yaw/pitch) -
CGImage/CVPixelBufferingest (file path AND zero-copyCVPixelBufferpaths) -
VNDetectFaceLandmarksRequest(face landmark points) -
VNDetectRectanglesRequest -
VNClassifyImageRequest -
VNDetectBarcodesRequest -
Async API (
VNRequestcompletion handlers exposed viaasync fn)
§License
Licensed under either of Apache-2.0 or MIT at your option.
§API Documentation
Safe Rust bindings for Apple’s Vision framework — OCR, object detection, face landmarks, and other on-device computer vision tasks.
v0.1 ships text recognition (OCR) only. Object/face detection lands in v0.2.
Re-exports§
pub use error::VisionError;pub use recognize_text::BoundingBox;recognize_textpub use recognize_text::RecognitionLevel;recognize_textpub use recognize_text::RecognizedText;recognize_textpub use recognize_text::TextRecognizer;recognize_textpub use detect_faces::DetectedFace;detect_facespub use detect_faces::FaceDetector;detect_facespub use detect_barcodes::detect_barcodes_in_path;detect_barcodespub use detect_barcodes::DetectedBarcode;detect_barcodespub use saliency::attention_saliency_in_path;saliencypub use saliency::SalientRegion;saliencypub use face_landmarks::detect_face_landmarks_in_path;face_landmarkspub use face_landmarks::FaceWithLandmarks;face_landmarkspub use face_landmarks::LandmarkPoint;face_landmarkspub use body_pose::detect_human_body_pose_in_path;body_posepub use body_pose::DetectedBodyPose;body_posepub use body_pose::JointPoint;body_posepub use hand_pose::detect_human_hand_pose_in_path;hand_posepub use hand_pose::DetectedHandPose;hand_posepub use contours::detect_contours_in_path;contourspub use contours::Contour;contourspub use contours::ContourOptions;contourspub use animals::recognize_animals_in_path;animalspub use animals::RecognizedAnimal;animalspub use classify::classify_image_in_path;classifypub use classify::Classification;classifypub use rectangles::detect_document_segmentation_in_path;rectanglespub use rectangles::detect_rectangles_in_path;rectanglespub use rectangles::RectangleObservation;rectanglespub use rectangles::RectangleOptions;rectanglespub use horizon::detect_horizon_in_path;horizonpub use feature_print::generate_image_feature_print_in_path;feature_printpub use feature_print::FeaturePrint;feature_printpub use humans::detect_human_rectangles_in_path;humanspub use humans::DetectedHuman;humanspub use aesthetics::calculate_aesthetics_scores_in_path;aestheticspub use aesthetics::detect_face_capture_quality_in_path;aestheticspub use aesthetics::AestheticsScores;aestheticspub use aesthetics::FaceCaptureQuality;aestheticspub use segmentation::generate_foreground_instance_mask_in_path;segmentationpub use segmentation::generate_person_segmentation_in_path;segmentationpub use segmentation::InstanceMask;segmentationpub use segmentation::SegmentationMask;segmentationpub use segmentation::SegmentationQuality;segmentationpub use optical_flow::generate_optical_flow_in_paths;optical_flowpub use optical_flow::OpticalFlowAccuracy;optical_flowpub use coreml::coreml_classify_in_path;coreml
Modules§
- aesthetics
aesthetics - Aesthetics scoring (
VNCalculateImageAestheticsScoresRequest) and face capture quality (VNDetectFaceCaptureQualityRequest). - animals
animals - Animal recognition (
VNRecognizeAnimalsRequest). - body_
pose body_pose - Human body pose detection (
VNDetectHumanBodyPoseRequest). - classify
classify - General-purpose image classification (
VNClassifyImageRequest). - contours
contours - Edge contour detection (
VNDetectContoursRequest). - coreml
coreml CoreMLinference via Vision (VNCoreMLRequest).- detect_
barcodes detect_barcodes - Barcode detection via
VNDetectBarcodesRequest(Vision v0.4). - detect_
faces detect_faces FaceDetector— wrapsVNDetectFaceRectanglesRequest.- error
- Errors from the Vision bridge.
- face_
landmarks face_landmarks detect_face_landmarks_in_path— wrapsVNDetectFaceLandmarksRequest.- feature_
print feature_print - Image feature print (
VNGenerateImageFeaturePrintRequest) — semantic image embedding for content-based similarity. - ffi
- Raw FFI declarations matching the Swift bridge in
swift-bridge/Sources/VisionBridge/Vision.swift. - hand_
pose hand_pose - Human hand pose detection (
VNDetectHumanHandPoseRequest). - horizon
horizon - Horizon detection (
VNDetectHorizonRequest). - humans
humans - Human-rectangle detection (
VNDetectHumanRectanglesRequest) — lightweight person bounding boxes without joint skeletons. - optical_
flow optical_flow - Optical flow generation (
VNGenerateOpticalFlowRequest). - prelude
- Common imports.
- recognize_
text recognize_text TextRecognizer— wrapsVNRecognizeTextRequestfor image-file OCR.- rectangles
rectangles - Rectangle + document-segmentation detection.
- saliency
saliency - Attention-based saliency detection via
VNGenerateAttentionBasedSaliencyImageRequest. - segmentation
segmentation - Segmentation mask generation —
VNGeneratePersonSegmentationRequestandVNGenerateForegroundInstanceMaskRequest.